Invoice Splitter
What is Invoice Splitter
Users may have to manage in production, multi invoices documents without knowing boundaries of each single invoices. The Invoice Splitter API allows users to get these boundaries, enabling then an Invoices API call on each single invoices.
How it Works
You need first to subscribe to the API, by going on the platform : https://platform.mindee.com/ and under Utilities
clicking on the product card.
Invoice Splitter is asynchronous it requires:
- a
POST
to retrieve a job_id: POST https://api.mindee.net/v1/products/mindee/invoice_splitter/v1/predict_async - a
GET
using the job_id to retrieve the predictions: GET https://api.mindee.net/v1/products/mindee/invoice_splitter/v1/documents/queue/{job_id}
Note that you can use the sdk
from mindee import Client, product
from time import sleep
from mindee.parsing.common import AsyncPredictResponse
# Init a new client
mindee_client = Client(api_key="my-api-key-here")
# Load a file from disk
input_doc = mindee_client.source_from_path("/path/to/the/file.ext")
# Load a file from disk and enqueue it.
result: AsyncPredictResponse = mindee_client.enqueue_and_parse(
product.InvoiceSplitterV1,
input_doc,
)
# Print a brief summary of the parsed data
print(result.document)
const mindee = require("mindee");
// for TS or modules:
// import * as mindee from "mindee";
// Init a new client
const mindeeClient = new mindee.Client({ apiKey: "my-api-key-here" });
// Load a file from disk
const inputSource = mindeeClient.docFromPath("/path/to/the/file.ext");
// Parse the file asynchronously.
const asyncApiResponse = mindeeClient.enqueueAndParse(
mindee.product.InvoiceSplitterV1,
inputSource
);
// Handle the response Promise
asyncApiResponse.then((resp) => {
// print a string summary
console.log(resp.document.toString());
});
using Mindee;
using Mindee.Input;
using Mindee.Product.InvoiceSplitter;
string apiKey = "my-api-key-here";
string filePath = "/path/to/the/file.ext";
// Construct a new client
MindeeClient mindeeClient = new MindeeClient(apiKey);
// load an input source
var inputSource = new LocalInputSource(filePath);
// call the product asynchronously with auto-polling
var response = await mindeeClient
.EnqueueAndParseAsync<InvoiceSplitterV1>(inputSource);
// Print a summary of all the predictions
System.Console.WriteLine(response.Document.ToString());
// Print only the document-level predictions
// System.Console.WriteLine(response.Document.Inference.Prediction.ToString());
require 'mindee'
# Init a new client
mindee_client = Mindee::Client.new(api_key: 'my-api-key-here')
# Load a file from disk
input_source = mindee_client.source_from_path('/path/to/the/file.ext')
result = mindee_client.enqueue_and_parse(
input_source,
Mindee::Product::InvoiceSplitter::InvoiceSplitterV1
)
# Print a full summary of the parsed data in RST format
puts result.document
# Print the document-level parsed data
# puts result.document.inference.prediction
import com.mindee.MindeeClient;
import com.mindee.input.LocalInputSource;
import com.mindee.parsing.common.AsyncPredictResponse;
import com.mindee.parsing.common.Job;
import com.mindee.parsing.common.Document;
import com.mindee.product.invoicesplitter.InvoiceSplitterV1;
import java.io.File;
import java.io.IOException;
import java.util.Optional;
public class SimpleMindeeClient {
public static void main(String[] args) throws IOException, InterruptedException {
String apiKey = "my-api-key-here";
String filePath = "/path/to/the/file.ext";
// Init a new client
MindeeClient mindeeClient = new MindeeClient(apiKey);
// Load a file from disk
LocalInputSource inputSource = new LocalInputSource(new File(filePath));
// Parse the file asynchronously
AsyncPredictResponse<InvoiceSplitterV1> response = mindeeClient.enqueueAndParse(
InvoiceSplitterV1.class,
inputSource
);
// Print a summary of the response
System.out.println(response.toString());
// Print a summary of the predictions
// System.out.println(response.getDocumentObj().toString());
// Print the document-level predictions
// System.out.println(response.getDocumentObj().getInference().getPrediction().toString());
// Print the page-level predictions
// response.getDocumentObj().getInference().getPages().forEach(
// page -> System.out.println(page.toString())
// );
}
}
API_KEY='my-api-key-here'
ACCOUNT='mindee'
ENDPOINT='invoice_splitter'
VERSION='1'
FILE_PATH='/path/to/your/file.png'
# Maximum amount of retries to get the result of a queue
MAX_RETRIES=10
# Delay between requests
DELAY=6
# Enqueue the document for async parsing
QUEUE_RESULT=$(curl -sS --request POST \
-H "Authorization: Token $API_KEY" \
-H "Content-Type: multipart/form-data" \
-F "document=@$FILE_PATH" \
"https://api.mindee.net/v1/products/$ACCOUNT/$ENDPOINT/v$VERSION/predict_async")
# Status code sent back from the server
STATUS_CODE=$(echo "$QUEUE_RESULT" | grep -oP "[\"|']status_code[\"|']:[\s][\"|']*[a-zA-Z0-9-]*" | rev | cut --complement -f2- -d" " | rev)
# Check that the document was properly queued
if [ -z "$STATUS_CODE" ] || [ "$STATUS_CODE" -gt 399 ] || [ "$STATUS_CODE" -lt 200 ]
then
if [ -z "$STATUS_CODE" ]
then
echo "Request couldn't be processed."
exit 1
fi
echo "Error $STATUS_CODE was returned by API during enqueuing. "
# Print the additional details, if there are any:
ERROR=$(echo "$QUEUE_RESULT" | grep -oP "[\"|']error[\"|']:[\s]\{[^\}]*" | rev | cut --complement -f2- -d"{" | rev)
if [ -z "$ERROR" ]
then
exit 1
fi
# Details on the potential error:
ERROR_CODE=$(echo "$ERROR" | grep -oP "[\"|']code[\"|']:[\s]\"[^(\"|\')]*" | rev | cut --complement -f2- -d"\"" | rev)
MESSAGE=$(echo "$QUEUE_RESULT" | grep -oP "[\"|']message[\"|']:[\s]\"[^(\"|\')]*" | rev | cut --complement -f2- -d"\"" | rev)
DETAILS=$(echo "$QUEUE_RESULT" | grep -oP "[\"|']details[\"|']:[\s]\"[^(\"|\')]*" | rev | cut --complement -f2- -d"\"" | rev)
echo "This was the given explanation:"
echo "-------------------------"
echo "Error Code: $ERROR_CODE"
echo "Message: $MESSAGE"
echo "Details: $DETAILS"
echo "-------------------------"
exit 1
else
echo "File sent, starting to retrieve from server..."
# Get the document's queue ID
QUEUE_ID=$(echo "$QUEUE_RESULT" | grep -oP "[\"|']id[\"|']:[\s][\"|'][a-zA-Z0-9-]*" | rev | cut --complement -f2- -d"\"" | rev)
# Amount of attempts to retrieve the parsed document were made
TIMES_TRIED=1
# Try to fetch the file until we get it, or until we hit the maximum amount of retries
while [ "$TIMES_TRIED" -lt "$MAX_RETRIES" ]
do
# Wait for a bit at each step
sleep $DELAY
# Note: we use -L here because the location of the file might be behind a redirection
PARSED_RESULT=$(curl -sS -L \
-H "Authorization: Token $API_KEY" \
"https://api.mindee.net/v1/products/$ACCOUNT/$ENDPOINT/v$VERSION/documents/queue/$QUEUE_ID")
# Isolating the job (queue) & the status to monitor the document
JOB=$(echo "$PARSED_RESULT" | grep -ioP "[\"|']job[\"|']:[\s]\{[^\}]*" | rev | cut --complement -f2- -d"{" | rev)
QUEUE_STATUS=$(echo "$JOB" | grep -ioP "[\"|']status[\"|']:[\s][\"|'][a-zA-Z0-9-]*" | rev | cut --complement -f2- -d"\"" | rev)
if [ "$QUEUE_STATUS" = "completed" ]
then
# Print the result
echo "$PARSED_RESULT"
# Optional: isolate the document:
# DOCUMENT=$(echo "$PARSED_RESULT" | grep -ioP "[\"|']document[\"|']:[\s].*([\"|']job[\"|'])" | rev | cut -f2- -d"," | rev)
# echo "{$DOCUMENT}"
# Remark: on compatible shells, fields can also be extracted through the use of tools like jq:
# DOCUMENT=$(echo "$PARSED_RESULT" | jq '.["document"]')
exit 0
fi
TIMES_TRIED=$((TIMES_TRIED+1))
done
fi
echo "Operation aborted, document not retrieved after $TIMES_TRIED tries"
exit 1
<?php
use Mindee\Client;
use Mindee\Product\InvoiceSplitter\InvoiceSplitterV1;
// Init a new client
$mindeeClient = new Client("my-api-key-here");
// Load a file from disk
$inputSource = $mindeeClient->sourceFromPath("/path/to/the/file.ext");
// Parse the file
$apiResponse = $mindeeClient->enqueueAndParse(InvoiceSplitterV1::class, $inputSource);
echo strval($apiResponse->document);
- Replace my-api-key-here with your new API key, or use the select an API key feature and it will be filled automatically.
- Copy and paste the sample code of your desired choice in your application, code environment, terminal etc.
- Replace
/path/to/your/file/png
with the path to your document.
API Response
Below is the full sample JSON response you get when you call the API. Since the response is quite verbose, we will walk through the fields section by section.
{
"api_request": {
"error": {},
"resources": [
"document",
"job"
],
"status": "success",
"status_code": 200,
"url": "https://api.mindee.net/v1/products/mindee/invoice_splitter/v1/documents/9c98445f-b2ae-46eb-99d7-1fb3c2b973a5"
},
"document": {
"id": "9c98445f-b2ae-46eb-99d7-1fb3c2b973a5",
"inference": {
"extras": {},
"finished_at": "2024-01-09T16:16:24.395000",
"is_rotation_applied": null,
"pages": [
{
"extras": {},
"id": 0,
"orientation": {
"value": null
},
"prediction": {}
},
{
"extras": {},
"id": 1,
"orientation": {
"value": null
},
"prediction": {}
},
{
"extras": {},
"id": 2,
"orientation": {
"value": null
},
"prediction": {}
}
],
"prediction": {...},
"processing_time": 2.6746561527252197,
"product": {
"features": [
"invoice_page_groups"
],
"name": "mindee/invoice_splitter",
"type": "standard",
"version": "1.2"
},
"started_at": "2024-01-09T16:16:21.580548+00:00"
},
"n_pages": 3,
"name": "mydocument.pdf"
},
"job": {
"available_at": "2024-01-09T16:16:24.406040",
"id": "0e34b684-c12b-403d-9df7-7865b9976f37",
"issued_at": "2024-01-09T16:16:21.580548",
"status": "completed"
}
}
You can find the prediction within the prediction
key found in two locations:
- In
document > inference > prediction
for document-level predictions: it contains the invoice page groups: the list of page indexes and its confidence score. - In
document > inference > pages[ ] > prediction
for page-level predictions: it is always empty, as the split i done at document level and not page level.
Each predicted invoice_page_groups
contains:
- a
page_indexes
defining the pages belonging to a single invoice - a
confidence
representing a binary score- 0: in case the model is not confident in the split.
- 1: in case the model is confident in the split
{
"prediction": {
"invoice_page_groups": [
{
"confidence": 1.0,
"page_indexes": [
0
]
},
{
"confidence": 1.0,
"page_indexes": [
1,
2
]
}
]
},
}
Updated about 2 months ago