Invoice OCR

Automatically extract data from unstructured invoices

Mindee’s Invoice OCR API uses deep learning to automatically, accurately, and instantaneously parse invoices in your applications.

It takes the API a few seconds to extract data from your PDFs or photos of invoices. The API extracts data such as:

  • Due date
  • Invoice date
  • Invoice number
  • Locale & currency
  • Payment details (IBAN, Swift, Bic, Account number...) etc
  • Supplier identification number (SIRET, EIN, VAT number...)
  • Supplier name
  • Taxes details
  • Total amount including taxes

API Prerequisites

  1. You’ll need a free Mindee account. Sign up and confirm your email to log in.
  2. An invoice. Use a recently received invoice, or do a Google Image search for an invoice and download a few to test with.

Below is a sample of an invoice we will be using for this example.

Set up the API

  1. Log into your Mindee account and access your Invoice API dashboard by clicking the Invoice API card:
  1. You'll land on the dashboard page - where you can quickly see API usage (you have none right now, but that will change). On the left navigation, there are links to “Documentation”, “API Keys” and “Live Interface”. The documentation tab has all of the technical details you’ll need to build the invoice API. Rather than try out the Live Interface, we will make an API call manually.

Click on API Keys to create an API key.

  1. Click on the Create a new API key button and name your API key:
  1. Now, we are ready to make an API call. You can find sample codes for the most popular languages in the "Documentation" tab:
curl -X POST \
  https://api.mindee.net/v1/products/mindee/invoices/v2/predict \
  -H 'Authorization: Token my-api-key-here' \
  -H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
  -F [email protected]/path/to/your/file.png
import requests

url = "https://api.mindee.net/v1/products/mindee/invoices/v2/predict"

with open("/path/to/my/file", "rb") as myfile:
    files = {"document": myfile}
    headers = {"Authorization": "Token my-api-key-here"}
    response = requests.post(url, files=files, headers=headers)
    print(response.text)
// works for NODE > v10
const axios = require('axios');
const fs = require("fs");
const FormData = require('form-data')

async function makeRequest() {
    let data = new FormData()
    data.append('document', fs.createReadStream('./file.jpg'))
    const config = {
        method: 'POST',
        url: 'https://api.mindee.net/v1/products/mindee/invoices/v2/predict',
        headers: { 
          'Authorization':'Token my-api-key-here',
          ...data.getHeaders()
           },
        data
    }

    try {
      let response = await axios(config)
      console.log(response.data);
    } catch (error) {
      console.log(error)
    }

}

makeRequest()
# tested with Ruby 2.5
require 'uri'
require 'net/http'
require 'net/https'
require 'mime/types'

url = URI("https://api.mindee.net/v1/products/mindee/invoices/v2/predict")
file = "/path/to/your/file.png"

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
request = Net::HTTP::Post.new(url)
request["Authorization"] = 'Token my-api-key-here'
request.set_form([['document', File.open(file)]], 'multipart/form-data')

response = http.request(request)
puts response.read_body
<form onsubmit="mindeeSubmit(event)" >
  <input type="file" id="my-file-input" name="file" />
  <input type="submit" />
</form>

<script type="text/javascript">
const mindeeSubmit = (evt) => {
  evt.preventDefault()
  let myFileInput = document.getElementById('my-file-input');
  let myFile = myFileInput.files[0]
  if (!myFile) { return }
  let data = new FormData();
  data.append("document", myFile, myFile.name);

  let xhr = new XMLHttpRequest();

  xhr.addEventListener("readystatechange", function () {
    if (this.readyState === 4) {
      console.log(this.responseText);
    }
  });

  xhr.open("POST", "https://api.mindee.net/v1/products/mindee/invoices/v2/predict");
  xhr.setRequestHeader("Authorization", "Token my-api-key-here");
  xhr.send(data);
}
</script>
  1. Replace {my-api-key-here} with your new API key, and /path/to/your/file/png with the path to your invoice.

  2. Paste the CURL sample into your terminal, hit enter, and about a second later, you will receive a JSON response with the invoice details. Since the response is quite verbose, we will walk through the fields section by section.

API Response

Here is the full JSON response you get when you call the API:

{
  "api_request": {
    "error": {},
    "resources": [
      "document"
    ],
    "status": "success",
    "status_code": 201,
    "url": "http://api.mindee.net/v1/products/mindee/invoices/v2/predict"
  },
  "document": {
    "annotations": {
      "labels": []
    },
    "id": "cbf37732-9570-4c77-a81f-82cb023aba7b",
    "inference": {
      "finished_at": "2021-05-26T12:18:50+00:00",
      "pages": [
        {
          "id": 0,
          "prediction": {
            "company_registration": [],
            "date": {
              "confidence": 0.99,
              "polygon": [
                [
                  0.84,
                  0.305
                ],
                [
                  0.932,
                  0.305
                ],
                [
                  0.932,
                  0.318
                ],
                [
                  0.84,
                  0.318
                ]
              ],
              "value": "2018-09-25"
            },
            "document_type": {
              "value": "INVOICE"
            },
            "due_date": {
              "confidence": 0.86,
              "polygon": [
                [
                  0.841,
                  0.323
                ],
                [
                  0.941,
                  0.323
                ],
                [
                  0.941,
                  0.338
                ],
                [
                  0.841,
                  0.338
                ]
              ],
              "raw": "Upon receipt",
              "value": "2018-09-25"
            },
            "invoice_number": {
              "confidence": 0.99,
              "polygon": [
                [
                  0.841,
                  0.264
                ],
                [
                  0.864,
                  0.264
                ],
                [
                  0.864,
                  0.279
                ],
                [
                  0.841,
                  0.279
                ]
              ],
              "value": "14"
            },
            "locale": {
              "confidence": 0.94,
              "currency": "CAD",
              "language": "en"
            },
            "orientation": {
              "confidence": 0.99,
              "degrees": 0
            },
            "payment_details": [],
            "supplier": {
              "confidence": 0.11,
              "polygon": [
                [
                  0.165,
                  0.089
                ],
                [
                  0.385,
                  0.089
                ],
                [
                  0.385,
                  0.145
                ],
                [
                  0.165,
                  0.145
                ]
              ],
              "value": "DESIGNS TURNPIKE CO"
            },
            "taxes": [
              {
                "confidence": 0.76,
                "polygon": [
                  [
                    0.784,
                    0.744
                  ],
                  [
                    0.965,
                    0.744
                  ],
                  [
                    0.965,
                    0.758
                  ],
                  [
                    0.784,
                    0.758
                  ]
                ],
                "rate": 8.0,
                "value": 193.2
              }
            ],
            "total_excl": {
              "confidence": 0.0,
              "polygon": [],
              "value": null
            },
            "total_incl": {
              "confidence": 0.99,
              "polygon": [
                [
                  0.886,
                  0.839
                ],
                [
                  0.971,
                  0.839
                ],
                [
                  0.971,
                  0.858
                ],
                [
                  0.886,
                  0.858
                ]
              ],
              "value": 2608.2
            }
          }
        }
      ],
      "prediction": {
        "company_registration": [],
        "date": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [
            [
              0.84,
              0.305
            ],
            [
              0.932,
              0.305
            ],
            [
              0.932,
              0.318
            ],
            [
              0.84,
              0.318
            ]
          ],
          "value": "2018-09-25"
        },
        "document_type": {
          "value": "INVOICE"
        },
        "due_date": {
          "confidence": 0.86,
          "page_id": 0,
          "polygon": [
            [
              0.841,
              0.323
            ],
            [
              0.941,
              0.323
            ],
            [
              0.941,
              0.338
            ],
            [
              0.841,
              0.338
            ]
          ],
          "raw": "Upon receipt",
          "value": "2018-09-25"
        },
        "invoice_number": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [
            [
              0.841,
              0.264
            ],
            [
              0.864,
              0.264
            ],
            [
              0.864,
              0.279
            ],
            [
              0.841,
              0.279
            ]
          ],
          "value": "14"
        },
        "locale": {
          "confidence": 0.94,
          "currency": "CAD",
          "language": "en"
        },
        "payment_details": [],
        "supplier": {
          "confidence": 0.11,
          "page_id": 0,
          "polygon": [
            [
              0.165,
              0.089
            ],
            [
              0.385,
              0.089
            ],
            [
              0.385,
              0.145
            ],
            [
              0.165,
              0.145
            ]
          ],
          "value": "DESIGNS TURNPIKE CO"
        },
        "taxes": [
          {
            "confidence": 0.76,
            "page_id": 0,
            "polygon": [
              [
                0.784,
                0.744
              ],
              [
                0.965,
                0.744
              ],
              [
                0.965,
                0.758
              ],
              [
                0.784,
                0.758
              ]
            ],
            "rate": 8.0,
            "value": 193.2
          }
        ],
        "total_excl": {
          "confidence": 0.0,
          "page_id": null,
          "polygon": [],
          "value": null
        },
        "total_incl": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [
            [
              0.886,
              0.839
            ],
            [
              0.971,
              0.839
            ],
            [
              0.971,
              0.858
            ],
            [
              0.886,
              0.858
            ]
          ],
          "value": 2608.2
        }
      },
      "processing_time": 1.114,
      "product": {
        "features": [
          "locale",
          "invoice_number",
          "date",
          "due_date",
          "total_incl",
          "total_excl",
          "taxes",
          "document_type",
          "payment_details",
          "company_registration",
          "supplier",
          "orientation"
        ],
        "name": "Mindee-Demo/invoices",
        "type": "standard",
        "version": "2.0"
      },
      "started_at": "2021-05-26T12:18:49+00:00"
    },
    "n_pages": 1,
    "name": "sample_invoice.jpg",
    "ocr": {}
  }
}

Extracted fields

Under the api_request key of the JSON response, you can find some metadata about the request.

What is probably most important to you is the extracted data. Under the document key, you can find a structure like this:

{
  "document": {
    "annotations": {
      "labels": []
    },
    "id": "cbf37732-9570-4c77-a81f-82cb023aba7b",
    "inference": {
      "finished_at": "2021-05-26T12:18:50+00:00",
      "pages": [
        {
          "id": 0,
          "prediction": {
            "company_registration": [],
            "date": {},
            "document_type": {},
            "due_date": {},
            "invoice_number": {},
            "locale": {},
            "orientation": {},
            "payment_details": [],
            "supplier": {},
            "taxes": [],
            "total_excl": {},
            "total_incl": {}
          }
        }
      ],
      "prediction": {
        "company_registration": [],
        "date": {},
        "document_type": {},
        "due_date": {},
        "invoice_number": {},
        "locale": {},
        "payment_details": [],
        "supplier": {},
        "taxes": [],
        "total_excl": {},
        "total_incl": {}
      },
      "processing_time": 1.114,
      "product": {
        "features": [
          "locale",
          "invoice_number",
          "date",
          "due_date",
          "total_incl",
          "total_excl",
          "taxes",
          "document_type",
          "payment_details",
          "company_registration",
          "supplier",
          "orientation"
        ],
        "name": "Mindee-Demo/invoices",
        "type": "standard",
        "version": "2.0"
      },
      "started_at": "2021-05-26T12:18:49+00:00"
    },
    "n_pages": 1,
    "name": "sample_invoice.jpg",
    "ocr": {}
  }
}

The extracted data appears in two different elements on the list.

  • Document-level prediction: document > inference > prediction is the document level prediction. It contains the different fields extracted at the document level, meaning that for multi-pages pdfs, we reconstruct a single invoice object using all the pages.

  • Page-level prediction: document > inference > pages[] > prediction is an array, containing the extracted data from each page. For images, there is only one element on this array, but for pdfs, you can find the extracted data for each pdf page.

Each predicted field contains a confidence_score as well as a polygon when the information is located in the image.

invoice_number

{
  "invoice_number": {
    "confidence": 0.99,
    "page_id": 0,
    "polygon": [
      [
        0.841,
        0.264
      ],
      [
        0.864,
        0.264
      ],
      [
        0.864,
        0.279
      ],
      [
        0.841,
        0.279
      ]
    ],
    "value": "14"
  }
}

date

ISO formatted invoicing date.

{
  "date": {
    "confidence": 0.99,
    "page_id": 0,
    "polygon": [
      [
        0.84,
        0.305
      ],
      [
        0.932,
        0.305
      ],
      [
        0.932,
        0.318
      ],
      [
        0.84,
        0.318
      ]
    ],
    "value": "2018-09-25"
  }
}

due_date

ISO formatted invoice due date

{
  "due_date": {
    "confidence": 0.86,
    "page_id": 0,
    "polygon": [
      [
        0.841,
        0.323
      ],
      [
        0.941,
        0.323
      ],
      [
        0.941,
        0.338
      ],
      [
        0.841,
        0.338
      ]
    ],
    "raw": "Upon receipt",
    "value": "2018-09-25"
  }
}

total_incl

Total amount including taxes.

{
  "total_incl": {
    "confidence": 0.99,
    "page_id": 0,
    "polygon": [
      [
        0.886,
        0.839
      ],
      [
        0.971,
        0.839
      ],
      [
        0.971,
        0.858
      ],
      [
        0.886,
        0.858
      ]
    ],
    "value": 2608.2
  }
}

total_excl

Total amount excluding taxes.

{
  "total_excl": {
    "confidence": 0.4,
    "page_id": 0,
    "polygon": [
      [
        0.886,
        0.839
      ],
      [
        0.971,
        0.839
      ],
      [
        0.971,
        0.858
      ],
      [
        0.886,
        0.858
      ]
    ],
    "value": 2608.2
  }
}

taxes

List of taxes detected in the invoice. Each tax item includes:

  • value: tax item amount in the invoice currency
  • rate: tax rate associated to the amount
{
  "taxes": [
    {
      "confidence": 0.76,
      "page_id": 0,
      "polygon": [
        [
          0.784,
          0.744
        ],
        [
          0.965,
          0.744
        ],
        [
          0.965,
          0.758
        ],
        [
          0.784,
          0.758
        ]
      ],
      "rate": 8.0,
      "value": 193.2
    }
  ]
}

payment_details

List of supplier's payment details. Supports IBAN, BIC and routing numbers.

📘

Why a list?

On some invoices, there are many payment details written. Our Invoice OCR extracts all of them.

Each item contains different fields, set to null or filled with the right value depending on the invoice:

account_number
iban
routing_number
bic

{
  "payment_details": [
    {
      "account_number": "XXXX",
      "confidence": 0.95,
      "iban": "XXXX",
      "page_id": 0,
      "polygon": [
        [ 0.075, 0.539 ],
        [ 0.312, 0.539 ],
        [ 0.312, 0.564 ],
        [ 0.075, 0.564 ]
      ],
      "routing_number": "XXX",
      "swift": "XXX"
    }
  ]
}

company_registration

List of company identifier. Each item contains:

  • value: The company registration number value.
  • type: This is generic and can include: VAT NUMBER, TAX ID, COMPANY REGISTRATION NUMBER or country specific: TIN (United States), GST/HST (Canada), SIREN/SIRET (France), UEN (Singapore), STNR (Germany), KVK (NL), CIF (Spain), NIF (Portugal), CVR (Denmark), CF (Italy), DIC (Czech Republic), RFC (Mexico), GSTIN (India) ...etc

📘

Why a list?

The API extract all the supplier identifiers in the invoice, along with the corresponding type.

{
  "company_registration": [
    {
      "confidence": 0.99,
      "page_id": 0,
      "polygon": [
        [ 0.515, 0.962 ],
        [ 0.59, 0.962 ],
        [ 0.59, 0.973 ],
        [ 0.515, 0.973 ]
      ],
      "type": "SIRET",
      "value": "XXX81125600010"
    },
    {
      "confidence": 0.99,
      "page_id": 0,
      "polygon": [
        [ 0.658, 0.963 ],
        [ 0.729, 0.963 ],
        [ 0.729, 0.973 ],
        [ 0.658, 0.973 ]
      ],
      "type": "VAT NUMBER",
      "value": "FR44837811XXX"
    }
  ]
}

supplier

Supplier name as written in the invoice.

{
  "supplier": {
    "confidence": 0.11,
    "page_id": 0,
    "polygon": [
      [
        0.165,
        0.089
      ],
      [
        0.385,
        0.089
      ],
      [
        0.385,
        0.145
      ],
      [
        0.165,
        0.145
      ]
    ],
    "value": "DESIGNS TURNPIKE CO"
  }
}

locale

Currency and language of the invoice.

{
  "locale": {
    "confidence": 0.94,
    "currency": "CAD",
    "language": "en"
  }
}

Updated 10 days ago


Invoice OCR


Automatically extract data from unstructured invoices

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.