US W9 OCR Ruby
Using the sample below, we are going to illustrate how to extract the data that we want using the OCR SDK.
Quick-Start
require 'mindee'
# Init a new client
mindee_client = Mindee::Client.new(api_key: 'my-api-key')
# Load a file from disk
input_source = mindee_client.source_from_path('/path/to/the/file.ext')
# Parse the file
result = mindee_client.parse(
input_source,
Mindee::Product::US::W9::W9V1
)
# Print a full summary of the parsed data in RST format
puts result.document
# Print the document-level parsed data
# puts result.document.inference.prediction
Output (RST):
########
Document
########
:Mindee ID: d7c5b25f-e0d3-4491-af54-6183afa1aaab
:Filename: default_sample.jpg
Inference
#########
:Product: mindee/us_w9 v1.0
:Rotation applied: Yes
Prediction
==========
Page Predictions
================
Page 0
------
:Name: Stephen W Hawking
:SSN: 560758145
:Address: Somewhere In Milky Way
:City State Zip: Probably Still At Cambridge P O Box CB1
:Business Name:
:EIN: 942203664
:Tax Classification: individual
:Tax Classification Other Details:
:W9 Revision Date: august 2013
:Signature Position: Polygon with 4 points.
:Signature Date Position:
:Tax Classification LLC:
Field Types
Standard Fields
These fields are generic and used in several products.
Basic Field
Each prediction object contains a set of fields that inherit from the generic Field
class.
A typical Field
object will have the following attributes:
- value (
String
,Float
,Integer
,Boolean
): corresponds to the field value. Can benil
if no value was extracted. - confidence (Float, nil): the confidence score of the field prediction.
- bounding_box (
Mindee::Geometry::Quadrilateral
,nil
): contains exactly 4 relative vertices (points) coordinates of a right rectangle containing the field in the document. - polygon (
Mindee::Geometry::Polygon
,nil
): contains the relative vertices coordinates (Point
) of a polygon containing the field in the image. - page_id (
Integer
,nil
): the ID of the page, isnil
when at document-level. - reconstructed (
Boolean
): indicates whether or not an object was reconstructed (not extracted as the API gave it).
Aside from the previous attributes, all basic fields have access to a to_s
method that can be used to print their value as a string.
Position Field
The position field PositionField
does not implement all the basic Field
attributes, only bounding_box, polygon and page_id. On top of these, it has access to:
- rectangle (
Mindee::Geometry::Quadrilateral
): a Polygon with four points that may be oriented (even beyond canvas). - quadrangle (
Mindee::Geometry::Quadrilateral
): a free polygon made up of four points.
String Field
The text field StringField
only has one constraint: it's value is a String
(or nil
).
Page-Level Fields
Some fields are constrained to the page level, and so will not be retrievable to through the document.
Attributes
The following fields are extracted for US W9 V1:
Address
📄address (StringField): The street address (number, street, and apt. or suite no.) of the applicant.
for address_elem in result.document.address do
puts address_elem.value
end
Business Name
📄business_name (StringField): The business name or disregarded entity name, if different from Name.
for business_name_elem in result.document.business_name do
puts business_name_elem.value
end
City State Zip
📄city_state_zip (StringField): The city, state, and ZIP code of the applicant.
for city_state_zip_elem in result.document.city_state_zip do
puts city_state_zip_elem.value
end
EIN
📄ein (StringField): The employer identification number.
for ein_elem in result.document.ein do
puts ein_elem.value
end
Name
📄name (StringField): Name as shown on the applicant's income tax return.
for name_elem in result.document.name do
puts name_elem.value
end
Signature Date Position
📄signature_date_position (PositionField): Position of the signature date on the document.
for signature_date_position_elem in result.document.signature_date_position do
puts signature_date_position_elem.polygon
end
Signature Position
📄signature_position (PositionField): Position of the signature on the document.
for signature_position_elem in result.document.signature_position do
puts signature_position_elem.polygon
end
SSN
📄ssn (StringField): The applicant's social security number.
for ssn_elem in result.document.ssn do
puts ssn_elem.value
end
Tax Classification
📄tax_classification (StringField): The federal tax classification, which can vary depending on the revision date.
for tax_classification_elem in result.document.tax_classification do
puts tax_classification_elem.value
end
Tax Classification LLC
📄tax_classification_llc (StringField): Depending on revision year, among S, C, P or D for Limited Liability Company Classification.
for tax_classification_llc_elem in result.document.tax_classification_llc do
puts tax_classification_llc_elem.value
end
Tax Classification Other Details
📄tax_classification_other_details (StringField): Tax Classification Other Details.
for tax_classification_other_details_elem in result.document.tax_classification_other_details do
puts tax_classification_other_details_elem.value
end
W9 Revision Date
📄w9_revision_date (StringField): The Revision month and year of the W9 form.
for w9_revision_date_elem in result.document.w9_revision_date do
puts w9_revision_date_elem.value
end
Questions?
Updated 11 days ago