Invoice OCR Ruby

The Ruby OCR SDK supports the invoice API for extracting data from invoices.

Using this sample below, we are going to illustrate how to extract the data that we want using the OCR SDK.

sample invoice

Quick Start

require 'mindee'

# Init a new client, specifying an API key
mindee_client = Mindee::Client.new(api_key: 'my-api-key')

# Send the file
result = mindee_client.doc_from_path('/path/to/the/file.ext').parse(Mindee::Prediction::InvoiceV4)

# Print a summary of the document prediction in RST format
puts result.inference.prediction

Output:

:Locale: en; en; CAD;
:Document type: INVOICE
:Invoice number: 14
:Reference numbers: AD29094
:Invoice date: 2018-09-25
:Invoice due date: 2018-09-25
:Supplier name: TURNPIKE DESIGNS CO.
:Supplier address: 156 University Ave, Toronto ON, Canada M5H 2H7
:Supplier company registrations:
:Supplier payment details:
:Customer name: JIRO DOI
:Customer address: 1954 Bloor Street West Toronto, ON, M6P 3K9 Canada
:Customer company registrations:
:Taxes: 193.20 8.00%
:Total net: 2415.00
:Total taxes: 193.20
:Total amount: 2608.20

:Line Items:
====================== ======== ========= ========== ================== ====================================
Code                   QTY      Price     Amount     Tax (Rate)         Description
====================== ======== ========= ========== ================== ====================================
                       1.00     65.00     65.00                         Platinum web hosting package Down...
                       3.00     2100.00   2100.00                       2 page website design Includes ba...
                       1.00     250.00    250.00                        Mobile designs Includes responsiv...
====================== ======== ========= ========== ================== ====================================

📘

Info

Line item descriptions are truncated here only for display purposes.
The full text is available in the details.

Fields

Each prediction object contains a set of different fields.
Each Field object contains at a minimum the following attributes:

  • value (String or Float depending on the field type): corresponds to the field value. Can be nil if no value was extracted.
  • confidence (Float): the confidence score of the field prediction.
  • bounding_box (Array< Array< Float > >): contains exactly 4 relative vertices coordinates (points) of a right rectangle containing the field in the document.
  • polygon (Array< Array< Float > >): contains the relative vertices coordinates (points) of a polygon containing the field in the image.
  • reconstructed (Boolean): True if the field was reconstructed or computed using other fields.

Attributes

Depending on the field type, there might be additional attributes that will be extracted in the Invoice object.

Using the above sample, the following are the basic fields that can be extracted:

Customer Information

customer_name (Field): Customer's name

puts result.inference.prediction.customer_name.value

customer_address (Field): Customer's postal address

puts result.inference.prediction.customer_address.value

customer_company_registrations (Array): Customer's company registration

result.inference.prediction.customer_company_registrations.each do |registration|
  puts registration.value
  puts registration.type
end

Dates

Date fields:

  • contain the date_object attribute, which is a standard Ruby date object
  • have a value attribute which is the ISO 8601 representation of the date.

The following date fields are available:

date: Date the invoice was issued

puts result.inference.prediction.date.value

due_date: Payment due date of the invoice.

puts result.inference.prediction.due_date.value

Locale

locale [Locale]: Locale information.

  • locale.language (String): Language code in ISO 639-1 format as seen on the document.
puts result.inference.prediction.locale.language
  • locale.currency (String): Currency code in ISO 4217 format as seen on the document.
puts result.inference.prediction.locale.currency
  • locale.country (String): Country code in ISO 3166-1 alpha-2 format as seen on the document.
puts result.inference.prediction.locale.country

Supplier Information

supplier_name: Supplier name as written in the invoice (logo or supplier Info).

puts result.inference.prediction.supplier_name.value

supplier_address: Supplier address as written in the invoice.

puts result.inference.prediction.supplier_address.value

supplier__payment_details (Array< PaymentDetails >): List of invoice's supplier payment details.
Each object in the list contains extra attributes:

  • iban (String)
# Show the IBAN of the first payment
puts result.inference.prediction.supplier_payment_details[0].iban
  • swift (String)
# Show the SWIFT of the first payment
puts result.inference.prediction.supplier_payment_details[0].swift
  • routing_number (String)
# Show the routing number of the first payment
puts result.inference.prediction.supplier_payment_details[0].routing_number
  • account_number (String)
# Show the account number of the first payment
puts result.inference.prediction.supplier_payment_details[0].account_number

supplier_company_registrations (Array< CompanyRegistration >):
List of detected supplier's company registration numbers.
Each object in the list contains an extra attribute:

  • type (String): Type of company registration number among predefined categories.
result.inference.prediction.supplier_company_registrations.each do |registration|
  puts registration.value
  puts registration.type
end

Taxes

taxes (Array< TaxField >): Contains tax fields as seen on the receipt.

  • value (Float): The tax amount.
# Show the amount of the first tax
puts result.inference.prediction.taxes[0].value
  • code (String): The tax code (HST, GST... for Canadian; City Tax, State tax for US, etc..).
# Show the code of the first tax
puts result.inference.prediction.taxes[0].code
  • rate (Float): The tax rate.
# Show the rate of the first tax
puts result.inference.prediction.taxes[0].rate

Totals

total_amount (Field): Total amount including taxes.

puts result.inference.prediction.total_amount.value

total_net (Field): Total amount excluding taxes.

puts result.inference.prediction.total_net.value

total_tax (Field): Total tax value from tax lines.

puts result.inference.prediction.total_tax.value

Line items

line_items (Array): Line items details.
Each object in the list contains:

  • product_code (String)
  • description (String)
  • quantity (Float)
  • unit_price (Float)
  • total_amount (Float)
  • tax_rate (Float)
  • tax_amount (Float)
  • confidence (Float)
  • page_id (Integer)
  • polygon (Polygon)
result.inference.prediction.line_items.each do |line_item|
  pp line_item
end

Questions?

Join our Slack