Financial Documents OCR

Automatically extract data from unstructured financial documents

Ovierview

The Financial Document API is ideal for applications where you might receive both invoices and receipts and want a single integration point. The core functionality of this API involves a two-step process:

  1. Document Classification: Upon receiving a document, the API first analyzes it to determine whether it is an invoice or a receipt.
  2. Intelligent Routing: Based on the classification, the document is then automatically routed to the appropriate underlying API for detailed information extraction.

This approach simplifies your integration by eliminating the need to pre-classify documents on your end and choose which specific API to call.


Underlying APIs

The Financial Document API leverages the capabilities of our dedicated Invoice and Receipt APIs. Once a document is classified, it is processed by one of these APIs. You can find their respective documentation here:

Understanding the specific functionalities and responses of these underlying APIs can be helpful when working with the Financial Document API.


Financial Documents Data Fields Summary

Field NameDescription
billing_addressThe address used for billing the customer. This is typically found on invoices.
categoryThe main classification of the transaction or document (e.g., food, parking for receipts, miscellaneous for invoices).
customer_addressThe physical address of the customer. This is more commonly found on invoices.
customer_company_registrationsAn array containing registration details of the customer, such as VAT number or other tax identifiers. This is more common on invoices.
customer_idA unique identifier assigned to the customer by the supplier. This is more common on invoices.
customer_nameThe name of the customer. This is more commonly found on invoices but might appear on some receipts (e.g., for loyalty programs).
dateThe date the document was issued (invoice) or the transaction occurred (receipt).
document_numberA general identifier for the document, which could be the invoice number or the receipt number.
document_typeThe broad type of the document (e.g., "INVOICE", "RECEIPT").
document_type_extendedA more specific classification of the document type (e.g., "NVOICE", "CREDIT CARD RECEIPT", "PAYSLIP" ...).
due_dateThe date by which payment is expected for an invoice. For receipts, this field will contain the transaction date.
invoice_numberThe unique identifier assigned to the invoice by the supplier. This field will typically be null for receipts.
line_itemsAn array containing details of each item or service listed on the document, including description, quantity, unit price, and total amount.
localeInformation about the language, country, and currency of the document (e.g., "en-US" for English, United States, USD).
orientationThe detected orientation of the document.
payment_dateThe date when the payment was made. This might be present on both invoices and receipts, depending on the context.
po_numberThe purchase order number associated with the invoice, if available. This field will likely be null for receipts.
receipt_numberThe unique identifier printed on the receipt. This field will typically be null for invoices.
reference_numbersAn array containing any additional reference numbers present on the document.
shipping_addressThe address where the purchased goods are to be shipped. This is typically found on invoices.
subcategoryA more granular classification of the transaction or document (e.g. taxiunder transportfor receipts).
supplier_addressThe physical address of the supplier or merchant.
supplier_company_registrationsAn array containing registration details of the supplier, such as VAT number or other tax identifiers.
supplier_emailThe email address of the supplier or merchant.
supplier_nameThe name of the supplier or merchant.
supplier_payment_detailsAn array containing details about how the supplier should be paid, such as bank account information. This is typically found on invoices.
supplier_phone_numberThe phone number of the supplier or merchant.
supplier_websiteThe website address of the supplier or merchant.
taxesAn array containing details of each tax applied to the document, including the tax code, rate, and amount.
timeThe time of the transaction, typically found on receipts. This field might be absent or null for invoices.
tipThe amount of tip or gratuity paid, typically found on receipts. This field will likely be absent or null for invoices.
total_amountThe final amount payable or paid, including all taxes and tips.
total_netThe total amount before the application of taxes.
total_taxThe total amount of tax applied to the document.

Field-by-Field Explanation

This section provides details about specific fields within the Financial Document API response and highlights how their behavior might differ based on whether the processed document was classified as an invoice or a receipt.

Category & Subcategory

The category field in the API response provides information about the type of the financial document. However, its interpretation depends on the document type:

  • Invoice: For documents classified as invoices, the category field will always return the value miscellaneous. This is because our primary invoice processing focuses on extracting key financial details rather than granular categorization.
  • Receipt: For documents classified as receipts, the category field reflects the output of our dedicated receipt categorization model. This model analyzes the receipt content to determine the specific category (e.g., toll, food, parking, transport, accommodation, gasoline, telecom, miscellaneous.).

Due Date

The due_date field represents the expected payment date for a financial document. However, its source and meaning differ slightly between invoices and receipts:

  • Invoice: For invoices, the due_date field accurately reflects the actual due date explicitly stated on the invoice document. Our system extracts this information to provide you with the correct payment deadline.
  • Receipt: Receipts generally do not have a concept of a formal "due date." In the context of our Financial Document API, when a receipt is processed, the date of the transaction (the date printed on the receipt) is copied into the due_date field. This is done to maintain consistency across the API response structure. Therefore, for receipts, the due_date essentially represents the transaction date, not a payment deadline.