Invoice OCR

Automatically extract data from unstructured invoices

Mindee’s receipt OCR API uses deep learning to automatically, accurately, and instantaneously parse invoices in your applications.

In a few seconds, the API extracts a set of data from your pdfs or photos of invoices:

Total amount including taxes
Total amount excluding taxes
Invoice number
Invoice date
Due date
Supplier name
Supplier identification number (SIRET, EIN, VAT number...)
Taxes details
Locale & currency
Payment details (IBAN, Swift, Bic, Account number...)

API Prerequisites

  1. You’ll need a free Mindee account. Sign up and confirm your email to log in.
  2. An invoice. Use a recently received invoice, or do a Google Image search for an invoice and download a few to test with.

Set up the API

Log into your Mindee account and access your Expense Receipt API environment by clicking the Invoice API card:

When clicking this card, you land on the dashboard page - where you can quickly see API usage (you have none right now, but that will change). On the left navigation, there are links to “Documentation”, “API Keys” and “Live Interface”. The docs tab has all of the technical details you’ll need to build for the invoice API.

Rather than try out the demo, we want to build with the API, so click on API Keys to create an API key.

Click on the Create a new API key button and name your API key:

Now, we are ready to make an API call. You can find sample codes for the most popular languages in the "Documentation" tab:

curl -X POST \
  https://api.mindee.net/v1/products/mindee/invoices/v2/predict \
  -H 'Authorization: Token my-api-key-here' \
  -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
  -F [email protected]/path/to/your/file.png
import requests

url = "https://api.mindee.net/v1/products/mindee/invoices/v2/predict"

with open("/path/to/my/file", "rb") as myfile:
    files = {"document": myfile}
    headers = {"Authorization": "Token my-api-key-here"}
    response = requests.post(url, files=files, headers=headers)
    print(response.text)
// works for NODE > v10
const axios = require('axios');
const fs = require("fs");
const FormData = require('form-data')

async function makeRequest() {
    let data = new FormData()
    data.append('document', fs.createReadStream('./file.jpg'))
    const config = {
        method: 'POST',
        url: 'https://api.mindee.net/v1/products/mindee/invoices/v2/predict',
        headers: { 
          'Authorization':'Token my-api-key-here',
          ...data.getHeaders()
           },
        data
    }

    try {
      let response = await axios(config)
      console.log(response.data);
    } catch (error) {
      console.log(error)
    }

}

makeRequest()
# tested with Ruby 2.5
require 'uri'
require 'net/http'
require 'net/https'
require 'mime/types'

url = URI("https://api.mindee.net/v1/products/mindee/invoices/v2/predict")
file = "/path/to/your/file.png"

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
request = Net::HTTP::Post.new(url)
request["Authorization"] = 'Token my-api-key-here'
request.set_form([['document', File.open(file)]], 'multipart/form-data')

response = http.request(request)
puts response.read_body
<form onsubmit="mindeeSubmit(event)" >
  <input type="file" id="my-file-input" name="file" />
  <input type="submit" />
</form>

<script type="text/javascript">
const mindeeSubmit = (evt) => {
  evt.preventDefault()
  let myFileInput = document.getElementById('my-file-input');
  let myFile = myFileInput.files[0]
  if (!myFile) { return }
  let data = new FormData();
  data.append("document", myFile, myFile.name);

  let xhr = new XMLHttpRequest();

  xhr.addEventListener("readystatechange", function () {
    if (this.readyState === 4) {
      console.log(this.responseText);
    }
  });

  xhr.open("POST", "https://api.mindee.net/v1/products/mindee/invoices/v2/predict");
  xhr.setRequestHeader("Authorization", "Token my-api-key-here");
  xhr.send(data);
}
</script>

Replace {my-api-key-here} with your new API key, and /path/to/your/file/png with the path to your invoice.

For our example, we'll use this fake photo of invoice:

Paste the cURL sample into your terminal, hit enter, and about a second later, you will receive a JSON response with the invoice details. Since the response is quite verbose, we will walk through the fields section by section.

API Response

Here is the full JSON response you get when you call the API:

{
  "api_request": {
    "error": {},
    "resources": [
      "document"
    ],
    "status": "success",
    "status_code": 201,
    "url": "http://api.mindee.net/v1/products/mindee/invoices/v2/predict"
  },
  "document": {
    "annotations": {
      "labels": []
    },
    "id": "cbf37732-9570-4c77-a81f-82cb023aba7b",
    "inference": {
      "finished_at": "2021-05-26T12:18:50+00:00",
      "pages": [
        {
          "id": 0,
          "prediction": {
            "company_registration": [],
            "date": {
              "confidence": 0.99,
              "polygon": [
                [
                  0.84,
                  0.305
                ],
                [
                  0.932,
                  0.305
                ],
                [
                  0.932,
                  0.318
                ],
                [
                  0.84,
                  0.318
                ]
              ],
              "value": "2018-09-25"
            },
            "document_type": {
              "value": "INVOICE"
            },
            "due_date": {
              "confidence": 0.86,
              "polygon": [
                [
                  0.841,
                  0.323
                ],
                [
                  0.941,
                  0.323
                ],
                [
                  0.941,
                  0.338
                ],
                [
                  0.841,
                  0.338
                ]
              ],
              "raw": "Upon receipt",
              "value": "2018-09-25"
            },
            "invoice_number": {
              "confidence": 0.99,
              "polygon": [
                [
                  0.841,
                  0.264
                ],
                [
                  0.864,
                  0.264
                ],
                [
                  0.864,
                  0.279
                ],
                [
                  0.841,
                  0.279
                ]
              ],
              "value": "14"
            },
            "locale": {
              "confidence": 0.94,
              "currency": "CAD",
              "language": "en"
            },
            "orientation": {
              "confidence": 0.99,
              "degrees": 0
            },
            "payment_details": [],
            "supplier": {
              "confidence": 0.11,
              "polygon": [
                [
                  0.165,
                  0.089
                ],
                [
                  0.385,
                  0.089
                ],
                [
                  0.385,
                  0.145
                ],
                [
                  0.165,
                  0.145
                ]
              ],
              "value": "DESIGNS TURNPIKE CO"
            },
            "taxes": [
              {
                "confidence": 0.76,
                "polygon": [
                  [
                    0.784,
                    0.744
                  ],
                  [
                    0.965,
                    0.744
                  ],
                  [
                    0.965,
                    0.758
                  ],
                  [
                    0.784,
                    0.758
                  ]
                ],
                "rate": 8.0,
                "value": 193.2
              }
            ],
            "total_excl": {
              "confidence": 0.0,
              "polygon": [],
              "value": null
            },
            "total_incl": {
              "confidence": 0.99,
              "polygon": [
                [
                  0.886,
                  0.839
                ],
                [
                  0.971,
                  0.839
                ],
                [
                  0.971,
                  0.858
                ],
                [
                  0.886,
                  0.858
                ]
              ],
              "value": 2608.2
            }
          }
        }
      ],
      "prediction": {
        "company_registration": [],
        "date": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [
            [
              0.84,
              0.305
            ],
            [
              0.932,
              0.305
            ],
            [
              0.932,
              0.318
            ],
            [
              0.84,
              0.318
            ]
          ],
          "value": "2018-09-25"
        },
        "document_type": {
          "value": "INVOICE"
        },
        "due_date": {
          "confidence": 0.86,
          "page_id": 0,
          "polygon": [
            [
              0.841,
              0.323
            ],
            [
              0.941,
              0.323
            ],
            [
              0.941,
              0.338
            ],
            [
              0.841,
              0.338
            ]
          ],
          "raw": "Upon receipt",
          "value": "2018-09-25"
        },
        "invoice_number": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [
            [
              0.841,
              0.264
            ],
            [
              0.864,
              0.264
            ],
            [
              0.864,
              0.279
            ],
            [
              0.841,
              0.279
            ]
          ],
          "value": "14"
        },
        "locale": {
          "confidence": 0.94,
          "currency": "CAD",
          "language": "en"
        },
        "payment_details": [],
        "supplier": {
          "confidence": 0.11,
          "page_id": 0,
          "polygon": [
            [
              0.165,
              0.089
            ],
            [
              0.385,
              0.089
            ],
            [
              0.385,
              0.145
            ],
            [
              0.165,
              0.145
            ]
          ],
          "value": "DESIGNS TURNPIKE CO"
        },
        "taxes": [
          {
            "confidence": 0.76,
            "page_id": 0,
            "polygon": [
              [
                0.784,
                0.744
              ],
              [
                0.965,
                0.744
              ],
              [
                0.965,
                0.758
              ],
              [
                0.784,
                0.758
              ]
            ],
            "rate": 8.0,
            "value": 193.2
          }
        ],
        "total_excl": {
          "confidence": 0.0,
          "page_id": null,
          "polygon": [],
          "value": null
        },
        "total_incl": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [
            [
              0.886,
              0.839
            ],
            [
              0.971,
              0.839
            ],
            [
              0.971,
              0.858
            ],
            [
              0.886,
              0.858
            ]
          ],
          "value": 2608.2
        }
      },
      "processing_time": 1.114,
      "product": {
        "features": [
          "locale",
          "invoice_number",
          "date",
          "due_date",
          "total_incl",
          "total_excl",
          "taxes",
          "document_type",
          "payment_details",
          "company_registration",
          "supplier",
          "orientation"
        ],
        "name": "Mindee-Demo/invoices",
        "type": "standard",
        "version": "2.0"
      },
      "started_at": "2021-05-26T12:18:49+00:00"
    },
    "n_pages": 1,
    "name": "sample_invoice.jpg",
    "ocr": {}
  }
}

Extracted fields

Under the api_request key of the JSON response, you can find some metadata about the request.

What is probably most important to you is the extracted data. Under the document key, you can find a structure like this:

{
  "document": {
    "annotations": {
      "labels": []
    },
    "id": "cbf37732-9570-4c77-a81f-82cb023aba7b",
    "inference": {
      "finished_at": "2021-05-26T12:18:50+00:00",
      "pages": [
        {
          "id": 0,
          "prediction": {
            "company_registration": [],
            "date": {},
            "document_type": {},
            "due_date": {},
            "invoice_number": {},
            "locale": {},
            "orientation": {},
            "payment_details": [],
            "supplier": {},
            "taxes": [],
            "total_excl": {},
            "total_incl": {}
          }
        }
      ],
      "prediction": {
        "company_registration": [],
        "date": {},
        "document_type": {},
        "due_date": {},
        "invoice_number": {},
        "locale": {},
        "payment_details": [],
        "supplier": {},
        "taxes": [],
        "total_excl": {},
        "total_incl": {}
      },
      "processing_time": 1.114,
      "product": {
        "features": [
          "locale",
          "invoice_number",
          "date",
          "due_date",
          "total_incl",
          "total_excl",
          "taxes",
          "document_type",
          "payment_details",
          "company_registration",
          "supplier",
          "orientation"
        ],
        "name": "Mindee-Demo/invoices",
        "type": "standard",
        "version": "2.0"
      },
      "started_at": "2021-05-26T12:18:49+00:00"
    },
    "n_pages": 1,
    "name": "sample_invoice.jpg",
    "ocr": {}
  }
}

The extracted data appears in two different elements of the list.

Document-level prediction: document > inference > prediction is the document level prediction. It contains the different fields extracted at the document level, meaning that for multi-pages pdfs, we reconstruct a single invoice object using all the pages.

Page-level prediction: document > inference > pages[] > prediction is an array, containing the extracted data from each page. For images, there is only one element on this array, but for pdfs, you can find the extracted data for each pdf page.

Each predicted field contains a confidence_score as well as a polygon when the information is located in the image.

invoice_number

{
  "invoice_number": {
    "confidence": 0.99,
    "page_id": 0,
    "polygon": [
      [
        0.841,
        0.264
      ],
      [
        0.864,
        0.264
      ],
      [
        0.864,
        0.279
      ],
      [
        0.841,
        0.279
      ]
    ],
    "value": "14"
  }
}

date

ISO formatted invoicing date.

{
  "date": {
    "confidence": 0.99,
    "page_id": 0,
    "polygon": [
      [
        0.84,
        0.305
      ],
      [
        0.932,
        0.305
      ],
      [
        0.932,
        0.318
      ],
      [
        0.84,
        0.318
      ]
    ],
    "value": "2018-09-25"
  }
}

due_date

ISO formatted invoice due date

{
  "due_date": {
    "confidence": 0.86,
    "page_id": 0,
    "polygon": [
      [
        0.841,
        0.323
      ],
      [
        0.941,
        0.323
      ],
      [
        0.941,
        0.338
      ],
      [
        0.841,
        0.338
      ]
    ],
    "raw": "Upon receipt",
    "value": "2018-09-25"
  }
}

total_incl

Total amount including taxes.

{
  "total_incl": {
    "confidence": 0.99,
    "page_id": 0,
    "polygon": [
      [
        0.886,
        0.839
      ],
      [
        0.971,
        0.839
      ],
      [
        0.971,
        0.858
      ],
      [
        0.886,
        0.858
      ]
    ],
    "value": 2608.2
  }
}

total_excl

Total amount excluding taxes.

{
  "total_excl": {
    "confidence": 0.4,
    "page_id": 0,
    "polygon": [
      [
        0.886,
        0.839
      ],
      [
        0.971,
        0.839
      ],
      [
        0.971,
        0.858
      ],
      [
        0.886,
        0.858
      ]
    ],
    "value": 2608.2
  }
}

taxes

List of taxes detected in the invoice. Each tax item includes:

value: tax item amount in the invoice currency
rate: tax rate associated to the amount

{
  "taxes": [
    {
      "confidence": 0.76,
      "page_id": 0,
      "polygon": [
        [
          0.784,
          0.744
        ],
        [
          0.965,
          0.744
        ],
        [
          0.965,
          0.758
        ],
        [
          0.784,
          0.758
        ]
      ],
      "rate": 8.0,
      "value": 193.2
    }
  ]
}

payment_details

List of supplier's payment details. Supports IBAN, BIC and routing numbers.

📘

Why a list?

On some invoices, there are many payment details written. Our Invoice OCR extracts all of them.

Each item contains different fields, set to null or filled with the right value depending on the invoice:

account_number
iban
routing_number
bic

{
  "payment_details": [
    {
      "account_number": "XXXX",
      "confidence": 0.95,
      "iban": "XXXX",
      "page_id": 0,
      "polygon": [
        [ 0.075, 0.539 ],
        [ 0.312, 0.539 ],
        [ 0.312, 0.564 ],
        [ 0.075, 0.564 ]
      ],
      "routing_number": "XXX",
      "swift": "XXX"
    }
  ]
}

company_registration

List of company identifier. Each item contains:

value: the company registration number value
type: Generic: VAT NUMBER, TAX ID, COMPANY REGISTRATION NUMBER or country specific: TIN (United States), GST/HST (Canada), SIREN/SIRET (France), UEN (Singapore), STNR (Germany), KVK (NL), CIF (Spain), NIF (Portugal), CVR (Denmark), CF (Italy), DIC (Czech Republic), RFC (Mexico), GSTIN (India) ...etc

📘

Why a list?

The API extract all the supplier identifiers in the invoice, along with the corresponding type.

{
  "company_registration": [
    {
      "confidence": 0.99,
      "page_id": 0,
      "polygon": [
        [ 0.515, 0.962 ],
        [ 0.59, 0.962 ],
        [ 0.59, 0.973 ],
        [ 0.515, 0.973 ]
      ],
      "type": "SIRET",
      "value": "XXX81125600010"
    },
    {
      "confidence": 0.99,
      "page_id": 0,
      "polygon": [
        [ 0.658, 0.963 ],
        [ 0.729, 0.963 ],
        [ 0.729, 0.973 ],
        [ 0.658, 0.973 ]
      ],
      "type": "VAT NUMBER",
      "value": "FR44837811XXX"
    }
  ]
}

supplier

Supplier name as written in the invoice.

{
  "supplier": {
    "confidence": 0.11,
    "page_id": 0,
    "polygon": [
      [
        0.165,
        0.089
      ],
      [
        0.385,
        0.089
      ],
      [
        0.385,
        0.145
      ],
      [
        0.165,
        0.145
      ]
    ],
    "value": "DESIGNS TURNPIKE CO"
  }
}

locale

Currency and language of the invoice.

{
  "locale": {
    "confidence": 0.94,
    "currency": "CAD",
    "language": "en"
  }
}

Updated about a month ago


Invoice OCR


Automatically extract data from unstructured invoices

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.