Custom OCR Python

The Python OCR SDK supports custom-built API from the API Builder. If your document isn't covered by one of Mindee's Off-the-Shelf APIs, you can create your own API using the API Builder.

If your document isn't covered by one of Mindee's Off-the-Shelf APIs, you can create your own API using the
API Builder.

For the following examples, we are using our own W9s custom API,
created with the API Builder.

from mindee import Client, documents

# Init a new client and add your custom endpoint (document)
mindee_client = Client(api_key="my-api-key").add_endpoint(
    account_name="john",
    endpoint_name="wsnine",
    # version="1.2",  # optional, see configuring client section below
)

# Load a file from disk and parse it.
# The endpoint name must be specified since it can't be determined from the class.
result = mindee_client.doc_from_path(
    "/path/to/the/w9.jpg"
).parse(documents.TypeCustomV1, endpoint_name="wnine")

# Print a brief summary of the parsed data
print(result.document)

Adding the Endpoint

Below are the arguments for adding a custom endpoint using the add_endpoint method.

endpoint_name: The endpoint name is the API name from Settings page

account_name: Your organization's or user's name in the API Builder.

version: If set, locks the version of the model to use, you'll be required to update your code every time a new model is trained.
This is probably not needed for development but essential for production use.
If not set, uses the latest version of the model.

Parsing Documents

The client calls the parse method when parsing your custom document, which will return an object containing the prediction results of sent file.
The endpoint_name must be specified when calling the parse method for a custom endpoint.

result = mindee_client.doc_from_path("/path/to/receipt.jpg").parse(
    documents.TypeCustomV1, endpoint_name="wnine"
)

print(result.document)

📘

Info

If your custom document has the same name as an off-the-shelf APIs document,
you must specify your account name when calling the parse method:

from mindee import Client, documents

mindee_client = Client(api_key="johndoe-receipt-api-key").add_endpoint(
    endpoint_name="receipt",
    account_name="JohnDoe",
)

result = mindee_client.doc_from_path("/path/to/receipt.jpg").parse(
    documents.TypeCustomV1,
    endpoint_name="wnine",
    account_name="JohnDoe",
)

Document Fields

All the fields defined in the API Builder when creating your custom document are available.

In custom documents, each field will hold an array of all the words in the document which are related to that field.
Each word is an object that has the text content, geometry information, and confidence score.

Value fields can be accessed via the fields attribute.

Classification fields can be accessed via the classifications attribute.

📘

Info

Both document level and page level objects work in the same way.

Fields Attribute

The fields attribute is a dictionary with the following structure:

  • key: the API name of the field, as a str
  • value: a ListField object which has a values attribute, containing a list of all values found for the field.

Individual field values can be accessed by using the field's API name, in the examples below we'll use the address field.

# raw data, list of each word object
print(result.document.fields["address"].values)

# list of all values
print(result.document.fields["address"].contents_list)

# default string representation
print(str(result.document.fields["address"]))

# custom string representation
print(result.document.fields["address"].contents_string(separator="_"))

To iterate over all the fields:

for name, info in result.document.fields.items():
    print(name)
    print(info.values)

Classifications Attribute

The classifications attribute is a dictionary with the following structure:

  • key: the API name of the field, as a str
  • value: a ClassificationField object which has a value attribute, containing a string representation of the detected classification.
# raw data, list of each word object
print(result.document.classifications["doc_type"].values)

To iterate over all the classifications:

for name, info in result.document.classifications.items():
    print(name)
    print(info.values)

Questions?

Join our Slack