Getting started

This guide will help you get started with the the Mindee Python SDK library to easily extract data from your documents.

The Python SDK supports invoice, passport, receipt OCR APIs and custom-built API from the API Builder.

You can view the source code on GitHub, and the package on PyPI.

Prerequisite

  • Download and install Python. This library is officially supported on Python 3.7 to 3.10.
  • Download and install pip package manager.

Installation

To quickly get started with the Python SDK anywhere, the preferred installation method is via pip.

pip install mindee

Development Installation

If you'll be modifying the source code, you'll need to install the development requirements to get started.

  1. First clone the repo.
git clone [email protected]:mindee/mindee-api-python.git
  1. Then navigate to the cloned directory and install all development requirements.
cd mindee-api-python
pip install -e ".[dev,test]"

Updating Version

It is important to always check the version of the Mindee SDK you are using, as new and updated features won’t work on old versions.

To check the version of your SDK.

pip show mindee

To get the latest version of your SDK.

pip install mindee --upgrade

To install a specific version.

pip install mindee==<your_version>

The Client

The client centralizes document configurations in a single object.

Documents are added to the Client using a config_xxx method.

Since each config_xxx method returns the current Client object, you can simply chain all the calls together.

You only need to specify the API keys for the document endpoints you'll be using.

from mindee import Client

mindee_client = (
    Client()
        .config_receipt("receipt-api-key")
        .config_invoice("invoice-api-key")
        .config_financial_doc("receipt-api-key", "invoice-api-key")
        .config_passport("passport-api-key")
        .config_custom_doc(
          document_type="pokemon-card",
          singular_name="card",
          plural_name="cards",
          account_name="pikachu",
          api_key="pokemon-card-api-key"
    )
)

Environment Variables

We highly suggest to use environment variables for the API keys, especially for production.

For Off-The-Shelf APIs, here are the keys you can set:

  • MINDEE_RECEIPT_API_KEY
  • MINDEE_INVOICE_API_KEY
  • MINDEE_PASSPORT_API_KEY

Custom documents can be set as well, in keeping with the example above:

  • MINDEE_PIKACHU_POKEMON_CARD_API_KEY

Document Parsing

When parsing your document, the client calls the parse method, which returns an object that you can serialize to the API. The document (API) parse type must be specified when calling the parse method. The object containing the parsed data will be an attribute of the response object.

The different ways you can load your documents:

Path

This requires an absolute path, as a string.

api_response = mindee_client.doc_from_path("/path/to/the/invoice").parse("invoice")

# Print a summary of the parsed data
print(api_response.invoice)

File Object

A normal Python file object with a path. Must be in binary mode.

with open("/path/to/the/receipt", 'rb') as fo:
     api_response = mindee_client.doc_from_file(fo, "receipt").parse("receipt")
     
# Print a summary of the parsed data
print(api_response.receipt)

Base64

Requires a base64 encoded string.

Note: The original filename of the encoded file is required when calling the method.

b64_string = "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLD...."
api_response = mindee_client.doc_from_b64string(b64_string, "receipt.jpg").parse("receipt")

# Print a summary of the parsed data
print(api_response.receipt)

Bytes

Requires raw bytes.

Note: The original filename is required when calling the method.

raw_bytes = b"%PDF-1.3\n%\xbf\xf7\xa2\xfe\n1 0 ob..."
api_response = mindee_client.doc_from_bytes(raw_bytes, "invoice.pdf").parse("invoice")

# Print a summary of the parsed data
print(api_response.receipt)

Loading from bytes is useful when using FastAPI UploadFile objects.

@app.post("/invoice")
async def upload(upload: UploadFile):
    invoice_data = mindee_client.doc_from_bytes(
        upload.file.read(),
        filename=upload.filename
    ).parse(
        "invoice"
    )

 

Questions?
Slack Logo IconSlack Logo Icon  Join our Slack


Did this page help you?