US W9 OCR Ruby

Using the sample below, we are going to illustrate how to extract the data that we want using the OCR SDK.
US W9 sample

Quick-Start

require 'mindee'

# Init a new client
mindee_client = Mindee::Client.new(api_key: 'my-api-key')

# Load a file from disk
input_source = mindee_client.source_from_path('/path/to/the/file.ext')

# Parse the file
result = mindee_client.parse(
  input_source,
  Mindee::Product::US::W9::W9V1
)

# Print a full summary of the parsed data in RST format
puts result.document

# Print the document-level parsed data
# puts result.document.inference.prediction

Output (RST):

########
Document
########
:Mindee ID: d7c5b25f-e0d3-4491-af54-6183afa1aaab
:Filename: default_sample.jpg

Inference
#########
:Product: mindee/us_w9 v1.0
:Rotation applied: Yes

Prediction
==========

Page Predictions
================

Page 0
------
:Name: Stephen W Hawking
:SSN: 560758145
:Address: Somewhere In Milky Way
:City State Zip: Probably Still At Cambridge P O Box CB1
:Business Name:
:EIN: 942203664
:Tax Classification: individual
:Tax Classification Other Details:
:W9 Revision Date: august 2013
:Signature Position: Polygon with 4 points.
:Signature Date Position:
:Tax Classification LLC:

Field Types

Standard Fields

These fields are generic and used in several products.

Basic Field

Each prediction object contains a set of fields that inherit from the generic Field class.
A typical Field object will have the following attributes:

  • value (String, Float, Integer, Boolean): corresponds to the field value. Can be nil if no value was extracted.
  • confidence (Float, nil): the confidence score of the field prediction.
  • bounding_box (Mindee::Geometry::Quadrilateral, nil): contains exactly 4 relative vertices (points) coordinates of a right rectangle containing the field in the document.
  • polygon (Mindee::Geometry::Polygon, nil): contains the relative vertices coordinates (Point) of a polygon containing the field in the image.
  • page_id (Integer, nil): the ID of the page, is nil when at document-level.
  • reconstructed (Boolean): indicates whether or not an object was reconstructed (not extracted as the API gave it).

Aside from the previous attributes, all basic fields have access to a to_s method that can be used to print their value as a string.

Position Field

The position field PositionField does not implement all the basic Field attributes, only bounding_box, polygon and page_id. On top of these, it has access to:

  • rectangle (Mindee::Geometry::Quadrilateral): a Polygon with four points that may be oriented (even beyond canvas).
  • quadrangle (Mindee::Geometry::Quadrilateral): a free polygon made up of four points.

String Field

The text field StringField only has one constraint: it's value is a String (or nil).

Page-Level Fields

Some fields are constrained to the page level, and so will not be retrievable to through the document.

Attributes

The following fields are extracted for US W9 V1:

Address

📄address (StringField): The street address (number, street, and apt. or suite no.) of the applicant.

for address_elem in result.document.address do
  puts address_elem.value
end

Business Name

📄business_name (StringField): The business name or disregarded entity name, if different from Name.

for business_name_elem in result.document.business_name do
  puts business_name_elem.value
end

City State Zip

📄city_state_zip (StringField): The city, state, and ZIP code of the applicant.

for city_state_zip_elem in result.document.city_state_zip do
  puts city_state_zip_elem.value
end

EIN

📄ein (StringField): The employer identification number.

for ein_elem in result.document.ein do
  puts ein_elem.value
end

Name

📄name (StringField): Name as shown on the applicant's income tax return.

for name_elem in result.document.name do
  puts name_elem.value
end

Signature Date Position

📄signature_date_position (PositionField): Position of the signature date on the document.

for signature_date_position_elem in result.document.signature_date_position do
  puts signature_date_position_elem.polygon
end

Signature Position

📄signature_position (PositionField): Position of the signature on the document.

for signature_position_elem in result.document.signature_position do
  puts signature_position_elem.polygon
end

SSN

📄ssn (StringField): The applicant's social security number.

for ssn_elem in result.document.ssn do
  puts ssn_elem.value
end

Tax Classification

📄tax_classification (StringField): The federal tax classification, which can vary depending on the revision date.

for tax_classification_elem in result.document.tax_classification do
  puts tax_classification_elem.value
end

Tax Classification LLC

📄tax_classification_llc (StringField): Depending on revision year, among S, C, P or D for Limited Liability Company Classification.

for tax_classification_llc_elem in result.document.tax_classification_llc do
  puts tax_classification_llc_elem.value
end

Tax Classification Other Details

📄tax_classification_other_details (StringField): Tax Classification Other Details.

for tax_classification_other_details_elem in result.document.tax_classification_other_details do
  puts tax_classification_other_details_elem.value
end

W9 Revision Date

📄w9_revision_date (StringField): The Revision month and year of the W9 form.

for w9_revision_date_elem in result.document.w9_revision_date do
  puts w9_revision_date_elem.value
end

Questions?

Join our Slack