This guide will help you get started with the Mindee Ruby client library to easily extract data from your documents.
You can view the source code on GitHub.
Download and install Ruby. This library is officially supported on Ruby
To quickly get started with the Ruby client library, Install by adding this line to your application's Gemfile:
And then execute:
Or you can install it like this:
gem install mindee
Finally, Ruby away!
If you'll be modifying the source code, you'll need to install the required libraries to get started.
We recommend using Bundler.
- First clone the repo.
git clone [email protected]:mindee/mindee-api-ruby.git
- Navigate to the cloned directory and install all required libraries.
cd mindee-api-ruby bundle install
It is important to always check the version of the Mindee client library you are using, as new and updated features won’t work on old versions.
To upgrade the Mindee Ruby client library to the latest version, re-install the gem without specifying any version number.
gem install mindee
To upgrade Mindee Ruby client library to a specific version, re-install the gem and specify the version number.
gem install [email protected]<version>
To get started with Mindee's APIs, you need to create a
Client and you're ready to go.
Let's take a deep dive into how this works.
Client centralizes document configurations in a single object. Documents are added to the
Client using a
config_xxx method. Since each
config_xxx method returns the current
Client object, you can chain all the calls together.
Client requires your API key. You can either pass these directly to the constructor or through environment variables. You only need to specify the API keys for the document endpoints you'll be using.
There are three ways to add documents to the client using the
You can have a separate client for each document. If you have only a single document type you're working with, this is the easiest way to get started.
require 'mindee' # Init a new client and configure the Invoice API mindee_client = Mindee::Client.new(api_key: 'my-api-key').config_invoice # Load a file from disk and parse it api_response = mindee_client.doc_from_path("/path/to/the/invoice.pdf").parse("invoice") # Print a brief summary of the parsed data puts api_response.document.to_s
You can have all your documents configured in the same client. If you're working with multiple document types this is the easiest way to get started. Since each
config_xxx method returns the current client object, you can chain all the calls together.
You can also pass an API key for a specific document.
require 'mindee' mindee_client = Mindee::Client.new(api_key: 'my-api-key') mindee_client = mindee_client.config_invoice .config_receipt(api_key: 'receipt-api-key') .config_passport .config_financial_doc .config_custom_doc( 'wsnine', 'john' )
You can also mix and match. This approach is useful if you have a group of documents that needs to be handled in different ways.
require 'mindee' financial_doc_client = Mindee::Client.new(api_key: 'my-api-key-1', raise_on_error: true) .config_financial_doc passport_client = Mindee::Client.new(api_key: 'my-api-key-2', raise_on_error: false) .config_passport
API keys should be set as environment variables, especially for any production deployment.
The following environment variable will set the global API key:
This is generally all you need to do, all your APIs will work!
However you can also set specific keys as needed, much like the
For off-the-shelf APIs, here are the environment variables for the API keys you can set:
MINDEE_INVOICE_API_KEY="invoice-api-key" MINDEE_RECEIPT_API_KEY="receipt-api-key" MINDEE_PASSPORT_API_KEY="passport-api-key"
financial_doc is a mixed data flow of invoices and receipts. You'll need an API key for both receipt and invoice endpoints.
For custom documents, you can set also set the environment variables for the API keys. From the example above, we will have:
Order in which keys are applied:
- set in
- set in
- set in
MINDEE_XXXX_API_KEYspecific environment variable
- set in
When parsing your document, the client calls the
parse method, which return an object that you can serialize to the API. The document parse type must be specified when calling the
parse method. The object containing the parsed data will be an attribute of the response object.
The different ways you can load and parse your data are through:
This requires an absolute path, as a string.
invoice_response = mindee_client.doc_from_path("/path/to/the/invoice").parse("invoice") # Print a summary of the parsed data puts invoice_response.document.to_s
A normal Ruby file object with a path. Must be in binary mode.
receipt_response = nil File.open(INVOICE_FILE, 'rb') do |fo| receipt_response = mindee_client.doc_from_file(fo, "invoice").parse("invoice") end # Print a summary of the parsed data puts receipt_response.document.to_s
Requires a base64 encoded string.
Note: The original filename of the encoded file is required when calling the method.
b64_string = "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLD...." receipt_response = mindee_client.doc_from_b64string(b64_string, "receipt.jpg").parse("receipt") # Print a summary of the parsed data puts receipt_response.document.to_s
Requires raw bytes.
Note: The original filename is required when calling the method.
raw_bytes = b"%PDF-1.3\n%\xbf\xf7\xa2\xfe\n1 0 ob..." invoice_response = mindee_client.doc_from_bytes(raw_bytes, "invoice.pdf").parse("invoice") # Print a summary of the parsed data puts invoice_response.document.to_s
Updated 20 days ago