The Mindee API Builder is our production-tested deep learning OCR algorithm that enables users to create and train an API to extract the precise data and fields they need from any form of document such as transport tickets, bank statements, tax returns, etc.
By training the API on your specific documents, users can obtain a consistent neural network model to accurately extract all of their documents with minimal human involvement.
The Mindee API Builder can extract data from both handwritten and printed text, photos, images, and multi-page pdfs. The API builder algorithm will be able to find the right boxes in the documents by relying on the surrounding context of each candidate to score them and decide whether they are part of a field you want to extract or not.
The API Builder algorithm will work well, but we need to train with a lot of data. Given a candidate, the API builder algorithm will try to infer whether or not it’s the candidate we are looking for using a few parameters:
- The candidate's position in the image
- The size of the candidate
- The surrounding words of context:
- The content of each context word
- The relative distance between each context word to the candidate
Below is a quick overview on how to create a custom document data extraction API using Mindee API Builder
- Set up your custom API
- Train a deep learning model that can parse and extract the data of your choice in your documents.
- Create an API key
- Set your custom Mindee REST API in your application code using the language of your choice.
For more in-depth information check How to build your document parsing API
Updated 18 days ago