Invoice OCR
Automatically extract data from unstructured invoices
Mindee's Invoice OCR API revolutionizes invoice processing by leveraging advanced deep-learning. Within seconds, it precisely extracts essential data from your invoices, eliminating manual effort and ensuring accuracy. Integrate this powerful capability directly into your applications and experience the future of invoice management.
Mindee's Invoice OCR API intelligently extracts a wide range of crucial data points, including:
Core Invoice Details:
- Invoice Identification:
- Invoice Number
- Purchase Order Number
- Reference Numbers
- Dates:
- Invoice Date
- Due Date
- Payment Date
- Financial Breakdown:
- Line Items (with details like description, quantity, unit price, total amount, ...)
- Total Tax Amount
- Breakdown of Taxes
- Total Amount (including taxes)
- Net Amount (excluding taxes)
- Supplier Information:
- Supplier Name and Address
- Supplier Company Registrations
- Customer Information:
- Customer Name and Address
- Billing Address
- Shipping Address
- Customer Company Registrations
- Payment Details:
- Payment Details (e.g., IBAN, SWIFT/BIC Code, Account Number)
- Additional Information:
- Locale, Country, Currency, Document Type, ... and much more!
Global Reach with Enhanced Support
Mindee's Invoice OCR API is built with a global perspective. While our intelligent algorithms can process invoices from any country, we offer officially enhanced support for the following regions. This means invoices originating from these countries benefit from optimized accuracy, especially for country-specific data like detailed tax information.
Our Officially Supported Regions:
North America
- Canada and United States
Europe
- Austria, Belgium, Croatia, Czech Republic, Denmark, Estonia, Finland, France, Germany, Hungary, Iceland, Ireland, Italy, Latvia, Luxembourg, Monaco, Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland and United Kingdom
Key Takeaway
You can still use the Mindee Invoice OCR API to process invoices from countries not listed above. While the level of specific support might vary, our core extraction capabilities remain robust, and you may still obtain valuable country-specific data like tax details.
Set up the API
Create an API key
To begin using the Mindee Invoice OCR API, your first step is to create your API key.
-
To test the Mindee Invoice OCR API, you will need an invoice document. You have several options for obtaining one:
- Use a Recent Invoice: You can use an invoice you have recently received.
- Download a Sample Invoice: Search online for freely available invoice samples that you can download for testing purposes.
- Use the Provided Sample Invoice: For your convenience, a sample invoice is included below.
Invoice Sample
-
To access the Invoice API, navigate to the APIs Store and click on the Invoice card.

- Sample code in popular languages and for command-line usage can be found in the API Reference section, accessible via the left navigation menu under Documentation.

- You can switch from synchronous to asynchronous to retrieve the sample code adapted to your use case.
from mindee import Client, PredictResponse, product
# Init a new client
mindee_client = Client(api_key="my-api-key-here")
# Load a file from disk
input_doc = mindee_client.source_from_path("/path/to/the/file.ext")
# Load a file from disk and parse it.
# The endpoint name must be specified since it cannot be determined from the class.
result: PredictResponse = mindee_client.parse(product.InvoiceV4, input_doc)
# Print a summary of the API result
print(result.document)
# Print the document-level summary
# print(result.document.inference.prediction)
const mindee = require("mindee");
// for TS or modules:
// import * as mindee from "mindee";
// Init a new client
const mindeeClient = new mindee.Client({ apiKey: "my-api-key-here" });
// Load a file from disk
const inputSource = mindeeClient.docFromPath("/path/to/the/file.ext");
// Parse the file
const apiResponse = mindeeClient.parse(
mindee.product.InvoiceV4,
inputSource
);
// Handle the response Promise
apiResponse.then((resp) => {
// print a string summary
console.log(resp.document.toString());
});
using Mindee;
using Mindee.Input;
using Mindee.Product.Invoice;
string apiKey = "my-api-key-here";
string filePath = "/path/to/the/file.ext";
// Construct a new client
MindeeClient mindeeClient = new MindeeClient(apiKey);
// Load an input source as a path string
// Other input types can be used, as mentioned in the docs
var inputSource = new LocalInputSource(filePath);
// Call the API and parse the input
var response = await mindeeClient
.ParseAsync<InvoiceV4>(inputSource);
// Print a summary of all the predictions
System.Console.WriteLine(response.Document.ToString());
// Print only the document-level predictions
// System.Console.WriteLine(response.Document.Inference.Prediction.ToString());
#
# Install the Ruby client library by running:
# gem install mindee
#
require 'mindee'
# Init a new client
mindee_client = Mindee::Client.new(api_key: 'my-api-key')
# Load a file from disk
input_source = mindee_client.source_from_path('/path/to/the/file.ext')
# Parse the file
result = mindee_client.parse(
input_source,
Mindee::Product::Invoice::InvoiceV4,
enqueue: true
)
# Print a full summary of the parsed data in RST format
puts result.document
# Print the document-level parsed data
# puts result.document.inference.prediction
import com.mindee.MindeeClient;
import com.mindee.input.LocalInputSource;
import com.mindee.parsing.common.PredictResponse;
import com.mindee.product.invoice.InvoiceV4;
import java.io.File;
import java.io.IOException;
public class SimpleMindeeClient {
public static void main(String[] args) throws IOException {
String apiKey = "my-api-key-here";
String filePath = "/path/to/the/file.ext";
// Init a new client
MindeeClient mindeeClient = new MindeeClient(apiKey);
// Load a file from disk
LocalInputSource inputSource = new LocalInputSource(filePath);
// Parse the file
PredictResponse<InvoiceV4> response = mindeeClient.parse(
InvoiceV4.class,
inputSource
);
// Print a summary of the response
System.out.println(response.toString());
// Print a summary of the predictions
// System.out.println(response.getDocument().toString());
// Print the document-level predictions
// System.out.println(response.getDocument().getInference().getPrediction().toString());
// Print the page-level predictions
// response.getDocument().getInference().getPages().forEach(
// page -> System.out.println(page.toString())
// );
}
}
<form onsubmit="mindeeSubmit(event)" >
<input type="file" id="my-file-input" name="file" />
<input type="submit" />
</form>
<script type="text/javascript">
const mindeeSubmit = (evt) => {
evt.preventDefault()
let myFileInput = document.getElementById('my-file-input');
let myFile = myFileInput.files[0]
if (!myFile) { return }
let data = new FormData();
data.append("document", myFile, myFile.name);
let xhr = new XMLHttpRequest();
xhr.addEventListener("readystatechange", function () {
if (this.readyState === 4) {
console.log(this.responseText);
}
});
xhr.open("POST", "https://api.mindee.net/v1/products/mindee/invoices/v4/predict");
xhr.setRequestHeader("Authorization", "Token my-api-key-here");
xhr.send(data);
}
</script>
curl -X POST \\
https://api.mindee.net/v1/products/mindee/invoices/v4/predict \\
-H 'Authorization: Token my-api-key-here' \\
-H 'content-type: multipart/form-data' \\
-F document=@/path/to/your/file.png
<?php
use Mindee\Client;
use Mindee\Product\Invoice\InvoiceV4;
// Init a new client
$mindeeClient = new Client("my-api-key-here");
// Load a file from disk
$inputSource = $mindeeClient->sourceFromPath("/path/to/the/file.ext");
// Parse the file
$apiResponse = $mindeeClient->parse(InvoiceV4::class, $inputSource);
echo $apiResponse->document;
- Replace my-api-key-here with your new API key, or use the select an API key feature and it will be filled automatically.
- Copy and paste the sample code of your desired choice in your application, code environment, terminal etc.
- Replace
/path/to/your/file/png
with the path to your invoice.
Remember to replace you API key
- Run your code. You will receive a JSON response with the invoice details.
API Response
Below is an example of the complete JSON response returned after calling the API. As the response is quite detailed, we'll break down each part step by step in the following sections.
{
"api_request": {
"error": {},
"resources": [
"document"
],
"status": "success",
"status_code": 201,
"url": "http://api.mindee.net/v1/products/mindee/invoices/v4/predict"
},
"document": {
"id": "ecdbe7bd-1037-47a5-87a8-b90d49475a1f",
"name": "sample_invoce.jpeg",
"n_pages": 1,
"is_rotation_applied": true,
"inference": {
"started_at": "2021-05-06T16:37:28",
"finished_at": "2021-05-06T16:37:29",
"processing_time": 1.125,
"pages": [
{
"id": 0,
"orientation": {"value": 0},
"prediction": { .. },
"extras": {}
}
],
"prediction": { .. },
"extras": {}
}
}
}
The extracted invoice data (predictions) can be accessed in two locations:
- Document-level predictions (document > inference > prediction): Contains the consolidated invoice data across all pages. For multi-page PDF invoices, this prediction combines data extracted from every page into a single, unified invoice object.
- Page-level predictions (document > inference > pages[ ] > prediction): Contains predictions specific to each individual page. For image invoices, you will have only one element in this array. For multi-page PDFs, each page will have a corresponding element with its own extracted data.
Most extracted fields contain the following properties
value
: The extracted information as text or numeric data.confidence
: A score (between 0 and 1) indicating the reliability of the extracted information.polygon
: Coordinates indicating the exact position of the extracted data within the document image.page_id
: Identifier of the page from which the data was extracted (particularly useful for multi-page documents).
Please note:
- These fields are optional and might occasionally be empty or null, depending on the document's content or extraction reliability.
- The structure described here applies to most standard fields, but it may vary slightly for more complex data objects.
Invoice Data Fields Summary
Field Name | Description |
---|---|
billing_address | Details about the billing address of the customer, including street, city, postal code, etc. |
customer_address | Details about the customer's address, including street number, street name, city, state, and postal code. |
customer_company_registrations | An array to hold information about the customer's company registrations (e.g., VAT number). |
customer_id | The identifier of the customer. |
customer_name | The name of the customer. |
date | The date of the invoice. |
document_type | The type of the document (e.g., "INVOICE"). |
document_type_extended | A more detailed or extended classification of the document type. |
due_date | The date when the invoice is due for payment. |
invoice_number | The unique identifier for the invoice. |
line_items | An array of objects, where each object represents a line item on the invoice, including details like description, quantity, unit price, and total amount. |
locale | Information about the language, country, and currency of the invoice (e.g., "en-US" for English, United States, USD). |
orientation | Information about the orientation of the document, such as the number of degrees it might be rotated. |
payment_date | The date when the invoice was paid. |
po_number | The purchase order number associated with the invoice. |
reference_numbers | An array of reference numbers associated with the invoice. |
shipping_address | Details about the shipping address of the customer, including street number, street name, city, state, and postal code. |
supplier_address | Details about the supplier's address, including street number, street name, city, state, and postal code. |
supplier_company_registrations | An array to hold information about the supplier's company registrations (e.g., VAT number). |
supplier_email | The email address of the supplier. |
supplier_name | The name of the supplier. |
supplier_payment_details | An array to hold payment details for the supplier (e.g., bank account information). |
supplier_phone_number | The phone number of the supplier. |
supplier_website | The website address of the supplier. |
taxes | An array of tax objects, each containing information about a specific tax applied to the invoice, including base amount, rate, and value. |
total_amount | The total amount of the invoice, including taxes. |
total_net | The total amount of the invoice, excluding taxes. |
total_tax | The total tax amount on the invoice. |
Detailed Field Information
This section provides more specific details about certain key fields in the API response.
Address Components
The API provides a structured breakdown of address information. For fields like customer_address
, shipping_address
, billing_address
, and supplier_address
, the following components are typically included:
address_complement
: (String or null) Additional information about the address, such as an apartment number or suite.city
: (String or null) The city of the address.country
: (String or null) The country of the address.po_box
: (String or null) The post office box number.postal_code
: (String or null) The postal code or zip code of the address.state
: (String or null) The state, province, or region of the address.street_name
: (String or null) The name of the street.street_number
: (String or null) The number of the building on the street.value
: (String or null) The full, concatenated address as it appears on the document.
Document Type Fields
The API identifies the type of the processed document using two key fields:
-
document_type
: (String) This field provides a high-level classification of the document. It can be one of the following values:INVOICE
: A request for payment for goods or services.CREDIT NOTE
: A document that reduces the amount a buyer owes to a seller.
-
document_type_extended
: (String) This field offers a more granular classification of the document type, providing additional context. Possible values include:CREDIT NOTE
: Reduces the amount a buyer owes.INVOICE
: Requests payment for goods or services.PAYSLIP
: Details employee earnings and deductions.PURCHASE ORDER
: Buyer's official request to purchase.QUOTE
: Seller's estimated cost for goods or services.RECEIPT
: Acknowledges payment.STATEMENT
: Summary of financial transactions over a period.OTHER FINANCIAL
: Miscellaneous financial documents.OTHER
: Documents not fitting other financial categories.
Payment and Due Date
The payment_date
and due_date
fields contains an additional boolean field called is_computed
.
is_computed: true
: This indicates that the date was not explicitly found on the document but was automatically calculated by the API based on other information. For example, if an invoice states "Payment is requested within one week," the API might compute the due date based on the invoice date.is_computed: false
: This indicates that the date was directly extracted from a date explicitly present on the document.
Company Registrations
The customer_company_registrations
and supplier_company_registrations
fields are arrays designed to capture various official registration numbers associated with the respective entities. Each element in these arrays is an object containing:
value
: (String) The actual company registration number as extracted from the document.type
: (String) A standardized code indicating the type of the registration number. The following values are possible:
Type | Description |
---|---|
VAT | Value Added Tax Identification Number |
SIRET | Système d'Identification du Répertoire des ÉTablissements (France) |
SIREN | Système d'Identification du Répertoire des ENtreprises (France) |
NIF | Número de Identificación Fiscal (Spain, Portugal, etc.) |
CF | Codice Fiscale (Italy) |
UID | Unternehmens-Identifikationsnummer (Switzerland, Austria) |
STNR | Steuernummer (Germany, Austria) |
HRA_HRB | Handelsregister Abteilung A/B (Germany) |
TIN | Taxpayer Identification Number (USA, Canada, etc.) |
RFC | Registro Federal de Contribuyentes (Mexico) |
BTW | Belasting Toegevoegde Waarde (Netherlands) |
ABN | Australian Business Number |
UEN | Unique Entity Number (Singapore) |
CVR | Centralt Virksomhedsregister (Denmark) |
ORGNR | Organisasjonsnummer (Norway, Sweden) |
INN | Идентификационный номер налогоплательщика (Russia) |
DPH | Daň z přidané hodnoty (Czech Republic, Slovakia) |
NIP | Numer Identyfikacji Podatkowej (Poland) |
GSTIN | Goods and Services Tax Identification Number (India) |
CRN | Company Registration Number (United Kingdom, etc.) |
KVK | Kamer van Koophandel (Netherlands) |
DIC | เลขประจำตัวผู้เสียภาษีอากร (Thailand) |
TAX_ID | Generic Tax Identification Number |
CIF | Código de Identificación Fiscal (Spain) |
GST_HST_CA | Goods and Services Tax / Harmonized Sales Tax (Canada) |
COC | Chamber of Commerce number (various countries) |
Orientation
The orientation
field provides information about the orientation of the document:
degrees
: (Number or null) The number of degrees the document might be rotated. Possible values are0
,90
, or270
.
Note:
Currently, the API does not support the processing of documents with a 180° rotation.
Support for 180° rotations is planned for an upcoming release.
Line Items
The line_items
field is an array of objects, where each object represents a single item listed on the invoice. Each line item object contains the following information:
description
: (String or null) A textual description of the product or service provided in this line item.product_code
: (String or null) A code or identifier assigned to the product or service.quantity
: (Number or null) The number of units of the product or service.tax_amount
: (Number or null) The total amount of tax applied to this specific line item.tax_rate
: (Number or null) The tax rate applied to this line item. Please note that this value is expressed as a percentage (e.g.,20
represents 20%).total_amount
: (Number or null) The total amount for this line item, usually including taxes.unit_measure
: (String or null) The unit of measurement for the quantity (e.g., "kg", "hours", "each").unit_price
: (Number or null) The price of a single unit of the product or service.
Taxes
The taxes
field is an array of tax objects, where each object provides details about a specific tax identified on the invoice. Each tax object includes the following information:
rate
: (Number or null) The tax rate applied. This value is typically expressed as a percentage (e.g.,5
represents 5%).base
: (Number or null) The taxable amount to which the tax rate is applied.value
: (Number or null) The calculated amount of the tax.
Supplier Payment Details
The supplier_payment_details
field is an array that contains information about how the supplier accepts payments. Each element in this array is typically an object that can include the following fields:
account_number
: (String or null) The supplier's bank account number.iban
: (String or null) The International Bank Account Number.swift
: (String or null) The SWIFT Code or Bank Identifier Code (BIC).routing_number
: (String or null) A bank code used in some countries (e.g., ABA routing number in the USA).
Currencies
The currency
sub-field within locale
will contain one of the following ISO 4217 currency codes:
AUD
, CAD
, CHF
, EUR
, GBP
, NOK
, PLN
, SGD
, USD
, HUF
, CZK
, DKK
, SEK
, BGN
, HRK
, RON
, XPF
, COP
, BRL
, GNF
, AED
, NZD
, HKD
, JPY
, CNY
, Other
.
Questions?
Join our Slack
Updated 7 days ago