This guide will help you get the most out of the Mindee .NET OCR SDK to easily extract data from your documents.

Installation

Prerequisites

This library is compatible with:

  • .NET 4.7.2
  • .NET 6+

You'll also need NuGet for installing the package.

Standard Installation

Using the .NET Core command-line interface (CLI) tools:

dotnet add package Mindee

Or using the NuGet Command Line Interface (CLI):

nuget install Mindee

Or using the Package Manager Console:

Install-Package Mindee

Development Installation

If you plan to update the source code, you'll need to follow these steps to get started.

  1. First clone the repo.
git clone [email protected]:mindee/mindee-api-dotnet.git
  1. Navigate to the cloned directory and install all required libraries.
dotnet restore 

Updating the Library

It is important to always check the version of the Mindee OCR SDK you are using, as new and updated
features won’t work on older versions.

To get the latest version of your OCR SDK:

dotnet update package Mindee

To install a specific version of Mindee:

dotnet add package Mindee -v <VERSION>

Usage

Using Mindee's APIs can be broken down into the following steps:

  1. Get a MindeeClient
  2. Load a file
  3. Send the file to Mindee's API
  4. Process the result in some way

Let's take a deep dive into how this works.

Initializing the Client

The MindeeClient enables you to load a document and execute the parse method on it, according to a specific model.

In most cases, you'll just to pass MindeeClient as a constructor parameter of your class and your DI engine will do the rest.

However, you will need to declare the MindeeClient in your Startup.cs or Program.cs file as follows:

services.AddMindeeClient();

(Or, you could also simply instantiate a new instance of MindeeClient and it will require to pass an instance of IOptions<MindeeSettings> to get the Api Key.)

This call will configure the client entry point and the PDF library used internally.

This call will configure the client entry point and the PDF library used internally.

Do not forget to initialize your API key.

You must pass the value through arguments of your application, environment variables or from app settings directly.

Set the API key in the environment

API keys should be set as environment variables, especially for any production deployment.

The following environment variable will set the global API key:

MindeeApiSettings__ApiKey="my-api-key"

You could also define the key in your appsettings.json config file:

{
  "$schema": "https://json.schemastore.org/appsettings.json",
  "MindeeApiSettings": {
    "ApiKey": "my-api-key"
  }
}

Loading a Document File

Before being able to send a document to the API, it must be loaded first.

You don't need to worry about different MIME types, the library will take care of handling
all supported types automatically.

Once a document is loaded, interacting with it is done in exactly the same way, regardless
of how it was loaded.

There are a few different ways of loading a document file, depending on your use case:

Path

Load from a file directly from disk. Requires an absolute path, as a string.

var mindeeClient = await _mindeeClient
    .LoadDocument(File.OpenRead(Path), System.IO.Path.GetFileName(Path));

Stream Object

Load a standard readable stream object.

Note: The original filename of the encoded file is required when calling the method.

Stream myStream;
var mindeeClient = await _mindeeClient
    .LoadDocument(myStream, System.IO.Path.GetFileName(Path));

Bytes

Load file contents from a string of raw bytes.

Note: The original filename of the encoded file is required when calling the method.

byte[] myFileInBytes = new byte[] { byte.MinValue };
var mindeeClient = await _mindeeClient
    .LoadDocument(myFileInBytes, System.IO.Path.GetFileName(Path));

Sending a Document

To send a file to the API, we need to specify how to process the document.
This will determine which API endpoint is used and how the API return will be handled internally by the library.

More specifically, we need to set the class object which will represent the values extracted by the API.

The ParseAsync method is generic, and its return type will depend on it.

Each document type available in the library has its corresponding object class.
This is detailed in each document-specific guide.

Off-the-Shelf Documents

Simply setting the correct class is enough:

var prediction = await _mindeeClient
    .LoadDocument(File.OpenRead(Path), System.IO.Path.GetFileName(Path))
    .ParseAsync<ReceiptV4Prediction>();

Custom Documents

In this case, you will have two ways to handle them.

The first one enables the possibility to use a class object which represents a kind of dictionary where,
keys will be the name of each field define in your Custom API model (on the Mindee platform).

It also requires that you instantiate a new CustomEndpoint object to define the information of your custom API built.

CustomEndpoint myEndpoint = new CustomEndpoint(
    endpointName: "wnine",
    accountName: "john",
    version: "1.1" // optional
);

var prediction = await _mindeeClient
    .LoadDocument(new FileInfo(Path))
    .ParseAsync(myEndpoint);

Process the result

Regardless of the model, it will be encapsulated in a Document object and therefore will have the following attributes:

Inference

Regroup the prediction on all the pages of the document and the prediction for all the document.

Document level prediction

The prediction attribute is an object specific to the type of document being processed.
It contains the data extracted from the entire document, all pages combined.

It's possible to have the same field in various pages, but at the document level,
only the highest confidence field data will be shown (this is all done automatically at the API level).

// print a summary of the document-level info
_logger.Debug(prediction.Inference.Prediction.ToString());

The various attributes are detailed in these document-specific guides:

Page level prediction

The pages attribute is a list of prediction object of the same class as the prediction attribute.

Each page element contains the data extracted for a particular page of the document.
The order of the elements in the array matches the order of the pages in the document.

All response objects have this property, regardless of the number of pages.
Single page documents will have a single entry.

OCR

The ocr attribute could be filled by the API when setting withFullText to true.

It will contain all the words that have been read in the document.

 

Questions?
Slack Logo IconSlack Logo Icon  Join our Slack