from mindee import Client mindee_client = Client().config_custom_doc( document_type="wsnine", singular_name="w9", plural_name="w9s", account_name="john", api_key="w9-form-api-key", # optional, can be set in environment # version="1.2", # optional, see configuring client section below ) # Load a file from disk and parse it w9_data = mindee_client.doc_from_path(CUSTOM_API_FILE).parse("wsnine") # Print a brief summary of the parsed data print(w9_data.w9)
Below are the specification for custom endpoint configuration.
The document type is the API name from Settings page
The name of the attribute used to retrieve a single document from the API response.
The name of the attribute used to retrieve multiple documents from the API response.
Your organization's username in the API Builder.
Your API key for the endpoint. This can be set with an environment variable.
If set, locks the version of the model to use. If not set, uses the latest version of the model.
If this is set, you'll be required to update your code every time a new model is trained. This is probably not needed for development but essential for production use.
API keys should be set as environment variables, especially for any production deployment. The environment variables can also be used for basic logging at various levels.
The format is
<document_type> are uppercase and any
- replaced with
For the example above our environmental variable will be:
The client calls the
parse method when parsing your custom document, which will return an object that you can send to the API. The document type must be specified when calling the parse method.
w9_data = mindee_client.doc_from_path("/path/to/receipt.jpg").parse("wsnine") print(w9_data.w9)
If your custom document has the same name as an off-the-shelf APIs document, you must specify your account name when calling the
from mindee import Client mindee_client = Client().config_custom_doc( document_type="receipt", singular_name="receipt", plural_name="receipts", account_name="JohnDoe", api_key="johndoe-receipt-api-key", ) receipt_data = mindee_client.doc_from_path("/path/to/receipt.jpg").parse("receipt", "JohnDoe")
The custom document object JSON data structure consists of:
We used a data model that may be different from yours. To modify this to your own custom API, change the
config_custom_doccall with your own parameters.
For document-level prediction, we construct the document class by using the different pages put in a single document. The method used for creating a single invoice object with multiple pages relies on field confidence scores.
Basically, we iterate over each page, and for each field, we keep the one that has the highest probability.
For page level prediction, in multi-page PDFs, we construct the document class by using a unique page of the PDF.
for w9 in w9_data.w9s: print(w9)
This contains the full Mindee API HTTP response object in JSON format
# Using json.dumps function to display the fill HTTP response in a proper JSON format. print(json.dumps(w9_data.http_response, indent=4, sort_keys=True))
You can extract additional fields from your custom document. To do so, you'll need to specify the fields to be extracted from your document based on your data model.
The following is the list of fields we want to extract based on our own data model provided as an example: yours will be different. The object name of the fields are the same as the fields names from your data model.
.. "features_name": [ "name", "street_address", "city", "state", "zip_code", "social_security_number" ] } } }
The information for each field is an array as there is no post-processing of your documents. To access specific information for a specific page we can do a HTTP response.
The taxpayer's city.
for city in w9_data.w9.city["values"]: print(city["content"])
The taxpayer's name.
for name in w9_data.w9.name["values"]: print(name["content"])
The taxpayer's social security number
for social_security_number in w9_data.w9.social_security_number["values"]: print(social_security_number["content"])
The taxpayer's state.
for state in w9_data.w9.state["values"]: print(state["content"])
The taxpayer's street address.
for street_address in w9_data.w9.street_address["values"]: print(street_address["content"])
The taxpayer's zip code.
for zip_code in w9_data.w9.zip_code["values"]: print(zip_code["content"])
Join our Slack
Updated 22 days ago