Getting started

This guide will help you get started with the Mindee Python client library to easily extract data from your documents.

The Python client library supports invoice, passport, receipt OCR APIs and custom-built API from the API Builder.

You can view the source code on GitHub, and the package on PyPI.

Prerequisite

  • Download and install Python. This library is officially supported on Python 3.7 to 3.10.
  • Download and install pip package manager.

Installation

To quickly get started with the Python client library anywhere, the preferred installation method is via pip.

pip install mindee

Development Installation

If you'll be modifying the source code, you'll need to install the development requirements to get started.

  1. First clone the repo.
git clone [email protected]:mindee/mindee-api-python.git
  1. Then navigate to the cloned directory and install all development requirements.
cd mindee-api-python
pip install -e ".[dev,test]"

Updating Version

It is important to always check the version of the Mindee client library you are using, as new and updated features won’t work on old versions.

To check the version of your client library.

pip show mindee

To get the latest version of your client library.

pip install mindee --upgrade

To install a specific version.

pip install mindee==<your_version>

Usage

To get started with Mindee's APIs, you need to create a Client and you're ready to go.

Let's take a deep dive into how this works.

The Client

The client centralizes document configurations in a single object. Documents are added to the Client using a config_xxx method. Since each config_xxx method returns the current Client object, you can chain all the calls together.

The Client requires your API key. You can either pass these directly to the constructor or through environment variables. You only need to specify the API keys for the document endpoints you'll be using.

There are three ways to add documents to the client using the config-xxx method.

Single Document

You can have a separate client for each document. If you have only a single document type you're working with, this is the easiest way to get started.

from mindee import Client
receipt_client = Client().config_receipt("receipt-api-key")
invoice_client = Client().config_invoice("invoice-api-key")
passport_client = Client().config_passport("passport-api-key")
pokemon_client = Client().config_custom_doc(
    document_type="pokemon-card",
    singular_name="card",
    plural_name="cards",
    account_name="pikachu",
    api_key="pokemon-card-api-key"
)

Multiple Documents

You can have all your documents configured in the same client. If you're working with multiple document types this is the easiest way to get started. Since each config_xxx method returns the current client object, you can chain all the calls together.

from mindee import Client

mindee_client = (
    Client()
        .config_receipt("receipt-api-key")
        .config_invoice("invoice-api-key")
        .config_financial_doc("receipt-api-key", "invoice-api-key")
        .config_passport("passport-api-key")
        .config_custom_doc(
          document_type="pokemon-card",
          singular_name="card",
          plural_name="cards",
          account_name="pikachu",
          api_key="pokemon-card-api-key"
    )
)

Mix and Match

You can also mix and match. This approach is useful if you have a group of documents that needs to be handled in different ways.

from mindee import Client
receipt_client = Client() .config_financial_doc("receipt-api-key", "invoice-api-key")

📘

Info

financial_doc is the mixed data flow of invoices and receipts. You'll need an API key for both receipt and invoice endpoints.

Environment Variables

API keys should be set as environment variables, especially for any production deployment. The environment variables can also be used for basic logging at various levels.

For off-the-shelf APIs, here are the environment variables for the API keys you can set:

export MINDEE_INVOICE_API_KEY="invoice-api-key"
export MINDEE_RECEIPT_API_KEY="receipt-api-key"
export MINDEE_PASSPORT_API_KEY="passport-api-key"

For custom documents, you can set also set the environment variables for the API keys. From the example above, we will have:

export MINDEE_PIKACHU_POKEMON_CARD_API_KEY="pokemon-card-api-key"

Document Parsing

When parsing your document, the client calls the parse method, which return an object that you can serialize to the API. The document parse type must be specified when calling the parse method. The object containing the parsed data will be an attribute of the response object.

The different ways you can load and parse your data are through:

Path

This requires an absolute path, as a string.

api_response = mindee_client.doc_from_path("/path/to/the/invoice").parse("invoice")

# Print a summary of the parsed data
print(api_response.invoice)

File Object

A normal Python file object with a path. Must be in binary mode.

with open("/path/to/the/receipt", 'rb') as fo:
     api_response = mindee_client.doc_from_file(fo, "receipt").parse("receipt")
     
# Print a summary of the parsed data
print(api_response.receipt)

Base64

Requires a base64 encoded string.

Note: The original filename of the encoded file is required when calling the method.

b64_string = "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLD...."
api_response = mindee_client.doc_from_b64string(b64_string, "receipt.jpg").parse("receipt")

# Print a summary of the parsed data
print(api_response.receipt)

Bytes

Requires raw bytes.

Note: The original filename is required when calling the method.

raw_bytes = b"%PDF-1.3\n%\xbf\xf7\xa2\xfe\n1 0 ob..."
api_response = mindee_client.doc_from_bytes(raw_bytes, "invoice.pdf").parse("invoice")

# Print a summary of the parsed data
print(api_response.receipt)

Loading from bytes is useful when using FastAPI UploadFile objects.

@app.post("/invoice")
async def upload(upload: UploadFile):
    invoice_data = mindee_client.doc_from_bytes(
        upload.file.read(),
        filename=upload.filename
    ).parse(
        "invoice"
    )

 

Questions?
Slack Logo IconSlack Logo Icon  Join our Slack


Did this page help you?