Overview
This guide will help you get the most out of the Mindee .NET OCR SDK to easily extract data from your documents.
Installation
Prerequisites
This library is compatible with:
- .NET 4.7.2
- .NET 6+
You'll also need NuGet for installing the package.
Standard Installation
Using the .NET Core command-line interface (CLI) tools:
dotnet add package Mindee
Or using the NuGet Command Line Interface (CLI):
nuget install Mindee
Or using the Package Manager Console:
Install-Package Mindee
Development Installation
If you plan to update the source code, you'll need to follow these steps to get started.
- First clone the repo.
git clone [email protected]:mindee/mindee-api-dotnet.git
- Navigate to the cloned directory and install all required libraries.
dotnet restore
Updating the Library
It is important to always check the version of the Mindee OCR SDK you are using, as new and updated
features won’t work on older versions.
To get the latest version of your OCR SDK:
dotnet update package Mindee
To install a specific version of Mindee:
dotnet add package Mindee -v <VERSION>
Usage
Using Mindee's APIs can be broken down into the following steps:
- Get a
MindeeClient
- Load a file
- Send the file to Mindee's API
- Process the result in some way
Let's take a deep dive into how this works.
Initializing the Client
The MindeeClient
enables you to load a document and execute the parse method on it, according to a specific model.
In most cases, you'll just to pass MindeeClient as a constructor parameter of your class and your DI engine will do the rest.
However, you will need to declare the MindeeClient in your Startup.cs or Program.cs file as follows:
services.AddMindeeClient();
(Or, you could also simply instantiate a new instance of MindeeClient
and it will require to pass an instance of IOptions<MindeeSettings>
to get the Api Key.)
This call will configure the client entry point and the PDF library used internally.
This call will configure the client entry point and the PDF library used internally.
Do not forget to initialize your API key.
You must pass the value through arguments of your application, environment variables or from app settings directly.
Set the API key in the environment
API keys should be set as environment variables, especially for any production deployment.
The following environment variable will set the global API key:
MindeeApiSettings__ApiKey="my-api-key"
You could also define the key in your appsettings.json config file:
{
"$schema": "https://json.schemastore.org/appsettings.json",
"MindeeApiSettings": {
"ApiKey": "my-api-key"
}
}
Loading a Document File
Before being able to send a document to the API, it must be loaded first.
You don't need to worry about different MIME types, the library will take care of handling
all supported types automatically.
Once a document is loaded, interacting with it is done in exactly the same way, regardless
of how it was loaded.
There are a few different ways of loading a document file, depending on your use case:
Path
Load from a file directly from disk. Requires an absolute path, as a string.
var mindeeClient = await _mindeeClient
.LoadDocument(File.OpenRead(Path), System.IO.Path.GetFileName(Path));
Stream Object
Load a standard readable stream object.
Note: The original filename of the encoded file is required when calling the method.
Stream myStream;
var mindeeClient = await _mindeeClient
.LoadDocument(myStream, System.IO.Path.GetFileName(Path));
Bytes
Load file contents from a string of raw bytes.
Note: The original filename of the encoded file is required when calling the method.
byte[] myFileInBytes = new byte[] { byte.MinValue };
var mindeeClient = await _mindeeClient
.LoadDocument(myFileInBytes, System.IO.Path.GetFileName(Path));
Sending a Document
To send a file to the API, we need to specify how to process the document.
This will determine which API endpoint is used and how the API return will be handled internally by the library.
More specifically, we need to set the class object which will represent the values extracted by the API.
The ParseAsync
method is generic, and its return type will depend on it.
Each document type available in the library has its corresponding object class.
This is detailed in each document-specific guide.
Off-the-Shelf Documents
Simply setting the correct class is enough:
var prediction = await _mindeeClient
.LoadDocument(File.OpenRead(Path), System.IO.Path.GetFileName(Path))
.ParseAsync<ReceiptV4Prediction>();
Custom Documents
In this case, you will have two ways to handle them.
The first one enables the possibility to use a class object which represents a kind of dictionary where,
keys will be the name of each field define in your Custom API model (on the Mindee platform).
It also requires that you instantiate a new CustomEndpoint
object to define the information of your custom API built.
CustomEndpoint myEndpoint = new CustomEndpoint(
endpointName: "wnine",
accountName: "john",
version: "1.1" // optional
);
var prediction = await _mindeeClient
.LoadDocument(new FileInfo(Path))
.ParseAsync(myEndpoint);
Process the result
Regardless of the model, it will be encapsulated in a Document
object and therefore will have the following attributes:
Inference
Regroup the prediction on all the pages of the document and the prediction for all the document.
Prediction
— Document level predictionPages
— Page level prediction
Document level prediction
The prediction
attribute is an object specific to the type of document being processed.
It contains the data extracted from the entire document, all pages combined.
It's possible to have the same field in various pages, but at the document level,
only the highest confidence field data will be shown (this is all done automatically at the API level).
// print a summary of the document-level info
_logger.Debug(prediction.Inference.Prediction.ToString());
The various attributes are detailed in these document-specific guides:
Page level prediction
The pages
attribute is a list of prediction
objects of the same class as the prediction
attribute.
Each page element contains the data extracted for a particular page of the document.
The order of the elements in the array matches the order of the pages in the document.
All response objects have this property, regardless of the number of pages.
Single page documents will have a single entry.
OCR
The ocr
attribute could be filled by the API when setting withFullText
to true.
It will contain all the words that have been read in the document.
Questions?
Join our Slack
Updated 3 months ago