Custom OCR Ruby

The Ruby OCR SDK supports custom-built API from the API Builder. If your document isn't covered by one of Mindee's Off-the-Shelf APIs, you can create your own API using the API Builder.

For the following examples, we are using our own W9s custom API created with the API Builder.

📘

Info

We used a data model that may be different from yours. To modify this to your own custom API, change the config_custom_doc call with your own parameters.

require 'mindee'

# Init a new client and configure your custom document
mindee_client = Mindee::Client.new.config_custom_doc(
  'wsnine',
  'john',
  api_key: 'w9-form-api-key', # optional, can be set in environment
  version: '1.1' # optional, if not set, use the latest version of the model
)

# Load a file from disk and parse it
w9_data = mindee_client.doc_from_path('/path/to/file.pdf').parse('wsnine')

# Print a brief summary of the parsed data
puts w9_data.document.to_s

If the version argument is set, you'll be required to update it every time a new model is trained. This is probably not needed for development but essential for production use.

Environment Variables

API keys should be set as environment variables, especially for any production deployment. The environment variables can also be used for basic logging at various levels.

The format is:

MINDEE_<username>_<document_type>_API_KEY

where <username> and <document_type> are uppercase and any - replaced with _.

For the example above our environmental variable will be:

export MINDEE_JOHN_WSNINE_API_KEY="w9-form-api-key"

Parsing Documents

The client calls the parse method when parsing your custom document, which will return an object that you can send to the API. The document type must be specified when calling the parse method.

w9_data = mindee_client.doc_from_path('/path/to/custom_file').parse('wsnine')
puts w9_data.document

📘

Info

If your custom document has the same name as an off-the-shelf APIs document, you must specify your account name when calling the parse method:

mindee_client = Mindee::Client.new.config_custom_doc(
  'receipt',
  'john'
)

receipt_data = mindee_client.doc_from_path('/path/to/receipt.jpg')
                            .parse('receipt', username: 'john')

Response Objects

The response object is common to all documents, including custom documents. The main properties are:

Document Level Prediction

The document attribute contains a CustomDocument object which contains the data extracted from the entire document, all pages included.

It's possible to have the same field in various pages, but at the document level only the highest confidence field data will be shown (this is all done automatically at the API level).

# as object, complete
pp w9_data.document

# as string, summary
puts w9_data.document

Page Level Prediction

The pages attribute is an array holding CustomDocument objects, holding the data extracted in each page of the document. All response objects have this property, regardless of the number of pages. Single page documents will have a single entry.

Iteration is done like any Ruby array:

w9_data.pages.each do |page|
    # as object, complete
    pp page

    # as string, summary
    puts page
end

Raw HTTP Response

This contains the full Mindee API HTTP response object in JSON format

puts w9_data.http_response

Document Fields

All the fields defined in the API builder when creating your custom document are available.

In custom documents, each field will hold an array of all the words in the document which are related to that field.
Each word is an object that has the text content, geometry information, and confidence score.

Value fields can be accessed either via the fields attribute, or as their own attributes set at run-time.

Classification fields can be accessed either via the classifications attribute, or as their own attributes set at run-time.

📘

Info

Both document level and page level objects work in the same way.

Run-time Attributes

Individual field values can be accessed simply by using the field's API name, in the examples below we'll use the address field.

# raw data, list of each word object
puts w9_data.document.address.values

# list of all values
puts w9_data.document.address.contents_list

# default string representation
puts w9_data.document.address.to_s

# custom string representation
puts w9_data.document.address.contents_str(separator: '_')

Fields property

In addition to accessing a value field directly, it's possible to access it through the fields attribute.
It's a hashmap with the following structure:

  • key: the API name of the field, as a symbol
  • value: a ListField object which has a values attribute, containing a list of all values found for the field.
# raw data, list of each word object
puts w9_data.document.fields[:address].values

This makes it simple to iterate over all the fields:

w9_data.document.fields.each do |name, info|
  puts name
  puts info.values
end

Classifications property

In addition to accessing a classification field directly, it's possible to access it through the classifications attribute.
It's a hashmap with the following structure:

  • key: the API name of the field, as a symbol
  • value: a ClassificationField object which has a value attribute, containing a string representation of the detected classification.
# raw data, list of each word object
puts w9_data.document.classifications[:doc_type].value

This makes it simple to iterate over all the fields:

w9_data.document.classifications.each do |name, info|
  puts name
  puts info.value
end

 

Questions?
Slack Logo IconSlack Logo Icon  Join our Slack