Receipts API

The Python SDK supports the receipt API for extracting data from receipts.

from mindee import Client

mindee_client = Client().config_receipt("receipt-api-key")
receipt_data = mindee_client.doc_from_path("/path/to/file").parse("receipt")
print(receipt_data.receipt)

Using this sample receipt below, we are going to illustrate how to extract the data that we want using the SDK.
sample receiptsample receipt

Receipt Data Structure

The receipt object JSON data structure consists of:

Document Level Prediction

For document level prediction, we construct the document class by combining the different pages in a single document. This method used for creating a single receipt object from multiple pages relies on field confidence scores.

Basically, we iterate over each page, and for each field, we keep the one that has the highest probability.

For example, if you send a three-page receipt, document level will provide you one tax, one total, and so on.

receipt_data.receipt # returns a unique object from class Receipt

Output

-----Receipt data-----
Filename: receipt.png
Total amount including taxes: 7.27
Total amount excluding taxes: 6.86
Date: 2022-04-03
Category: food
Time: 10:00
Merchant name: MINDEE TAKE OUT
Taxes: 0.41 None%
Total taxes: 0.41
Locale: en-US; en; US; USD;
----------------------

Page Level Prediction

We create the document class by iterating over each page one by one. Each page in the PDF is treated as a unique page.

For example, if you send a three-page receipt, page level prediction will provide you with three tax, three total and so on.

receipt_data.receipts # [Receipt, Receipt ...]

Raw HTTP Response

Contains the full Mindee API HTTP response object in JSON format

receipt_data.http_response # full HTTP request object

Output

{
  "api_request": {
    "error": {},
    "resources": [
      "document"
    ],
    "status": "success",
    "status_code": 201,
    "url": "http://api.mindee.net/v1/products/mindee/expense_receipts/v3/predict"
  },
  "document": {
    "annotations": {
      "labels": {}
    },
    "id": "dd457c26-5baa-4612-827b-e10c3d1b7b3d",
    "inference": {
      "extras": {},
      "finished_at": "2022-03-06T07:49:25+00:00",
      "pages": [
        {
          "extras": {},
          "id": 0,
          "prediction": {
            "category": {
              "confidence": 0.96,
              "value": "food"
            },
            "date": {
              "confidence": 0.99,
              "polygon": [
                [
                  0.101,
                  0.233
                ],
                [
                  0.345,
                  0.233
                ],
                [
                  0.345,
                  0.251
                ],
                [
                  0.101,
                  0.251
                ]
              ],
              "raw": "04-03-2022",
              "value": "2022-04-03"
            },
            "locale": {
              "confidence": 0.92,
              "country": "US",
              "currency": "USD",
              "language": "en",
              "value": "en-US"
            },
            "orientation": {
              "confidence": 0.99,
              "degrees": 0
            },
            "supplier": {
              "confidence": 0.64,
              "polygon": [
                [
                  0.319,
                  0.041
                ],
                [
                  0.677,
                  0.041
                ],
                [
                  0.677,
                  0.055
                ],
                [
                  0.319,
                  0.055
                ]
              ],
              "value": "MINDEE TAKE OUT"
            },
            "taxes": [
              {
                "code": "TAX",
                "confidence": 0.98,
                "polygon": [
                  [
                    0.098,
                    0.516
                  ],
                  [
                    0.897,
                    0.516
                  ],
                  [
                    0.897,
                    0.539
                  ],
                  [
                    0.098,
                    0.539
                  ]
                ],
                "rate": null,
                "value": 0.41
              }
            ],
            "time": {
              "confidence": 0.99,
              "polygon": [
                [
                  0.785,
                  0.235
                ],
                [
                  0.901,
                  0.235
                ],
                [
                  0.901,
                  0.25
                ],
                [
                  0.785,
                  0.25
                ]
              ],
              "raw": "10:00",
              "value": "10:00"
            },
            "total_incl": {
              "confidence": 0.99,
              "polygon": [
                [
                  0.801,
                  0.538
                ],
                [
                  0.9,
                  0.538
                ],
                [
                  0.9,
                  0.555
                ],
                [
                  0.801,
                  0.555
                ]
              ],
              "value": 7.27
            }
          }
        }
      ],
      "prediction": {
        "category": {
          "confidence": 0.96,
          "value": "food"
        },
        "date": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [
            [
              0.101,
              0.233
            ],
            [
              0.345,
              0.233
            ],
            [
              0.345,
              0.251
            ],
            [
              0.101,
              0.251
            ]
          ],
          "raw": "04-03-2022",
          "value": "2022-04-03"
        },
        "locale": {
          "confidence": 0.92,
          "country": "US",
          "currency": "USD",
          "language": "en",
          "value": "en-US"
        },
        "supplier": {
          "confidence": 0.64,
          "page_id": 0,
          "polygon": [
            [
              0.319,
              0.041
            ],
            [
              0.677,
              0.041
            ],
            [
              0.677,
              0.055
            ],
            [
              0.319,
              0.055
            ]
          ],
          "value": "MINDEE TAKE OUT"
        },
        "taxes": [
          {
            "code": "TAX",
            "confidence": 0.98,
            "page_id": 0,
            "polygon": [
              [
                0.098,
                0.516
              ],
              [
                0.897,
                0.516
              ],
              [
                0.897,
                0.539
              ],
              [
                0.098,
                0.539
              ]
            ],
            "rate": null,
            "value": 0.41
          }
        ],
        "time": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [
            [
              0.785,
              0.235
            ],
            [
              0.901,
              0.235
            ],
            [
              0.901,
              0.25
            ],
            [
              0.785,
              0.25
            ]
          ],
          "raw": "10:00",
          "value": "10:00"
        },
        "total_incl": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [
            [
              0.801,
              0.538
            ],
            [
              0.9,
              0.538
            ],
            [
              0.9,
              0.555
            ],
            [
              0.801,
              0.555
            ]
          ],
          "value": 7.27
        }
      },
      "processing_time": 1.948,
      "product": {
        "features": [
          "locale",
          "category",
          "date",
          "time",
          "total_incl",
          "taxes",
          "supplier",
          "orientation"
        ],
        "name": "mindee/expense_receipts",
        "type": "standard",
        "version": "3.0"
      },
      "started_at": "2022-03-06T07:49:23+00:00"
    },
    "n_pages": 1,
    "name": "receipt.png"
  },
  "document_type": "receipt",
  "input_type": "path",
  "filename": "receipt.png",
  "filepath": "receipt.png",
  "file_extension": "image/png"
}

Extracted Fields

Each receipt object contains a set of different fields. Each field contains the four following attributes:

  • value (Str or Float depending on the field type): corresponds to the field value. Set to None if the >field was not extracted.
  • probability (Float): the confidence score of the field prediction.
  • bbox (Array[Float]): contains the relative vertices coordinates of the bounding box containing the >field in the image. If the field is not written, the bbox is an empty array.
  • reconstructed (Bool): True if the field was reconstructed using other fields.

Additional Attributes

Depending on the field type specified, additional attributes can be extracted in the receipt object.

Using the above receipt example, the following are the basic fields that can be extracted.

Category

  • receipt.category (string): Receipt category as seen on the receipt. The following categories are supported: toll, food, parking, transport, accommodation, gasoline, miscellaneous.
# To get the category

category = receipt_data.receipt.category.value
print("purchase category: ", category)

Output

purchase category:  food

Date

  • receipt.date (string): Payment date as seen on the receipt.
    • value (string): ISO 8601 date format (yyyy-mm-dd). European and imperial dates are both supported.
    • raw (string): In any format as seen on the receipt.
# To get the receipt date of issuance

receipt_date = receipt_data.receipt.date.value
print("Date on receipt: ", receipt_date)

Output

Date on receipt:  2022-04-03

Locale

  • receipt.locale (string): Concatenation of lang and country codes.
# To get both the language code and country code

locale = receipt_data.receipt.locale.value
print("Locale code: ", locale)

Output

Locale:  en-US
  • reciept.locale.language (string): Language code in ISO 639-1 format as seen on the receipt. The following language codes are supported: ca, de, en, es, fr, it, nl and pt.
# To get the receipt language code

language = receipt_data.receipt.locale.language
print("Language code: ", language)

Output

Language code:  en
  • receipt.locale.currency (string): Currency code in ISO 4217 format as seen on the receipt. The following country codes are supported: CAD, CHF, GBP, EUR, USD.
# To get the receipt currency code

currency = receipt_data.receipt.locale.currency
print("Currency code: ", currency)

Output

Currency:  USD
  • receipt.locale.country (string): Country code in ISO 3166-1 alpha-2 format as seen on the receipt. The following country codes are supported: CA, CH, DE, ES, FR, GB, IT, NL, PT and US.
# To get the receipt country code

country = receipt_data.receipt.locale.country
print("Country code: ", Country)

Output

Country:  US

Orientation

  • receipt.orientation (number): The orientation field is only available at the page level as it describes whether the page image should be rotated to be upright. The rotation value is also conveniently available in the JSON response at: document > inference > pages [ ] > orientation > value.
    If the page requires rotation for correct display, the orientation field gives a prediction among these 3 possible outputs:
    • 0 degree: the page is already upright
    • 90 degrees: the page must be rotated clockwise to be upright
    • 270 degrees: the page must be rotated counterclockwise to be upright
# To get the receipt orientation
orientation = receipt_data.receipt.orientation
print("Degree: ", orientation)

Output

Degree: None

Supplier Information

  • receipt.merchant_name (string): Supplier name as written in the receipt.
# To get the supplier name

supplier_name = receipt_data.receipt.merchant_name.value
print("Supplier Name: ", supplier_name)

Output

Supplier Name: MINDEE TAKE OUT

Taxes

  • receipt.taxes (string): Contains tax fields as seen on the receipt.
    • value (number): The tax amount.
    • code (string): The tax code (HST, GST... for Canadian; City Tax, State tax for US, etc..).
    • rate (number): The tax rate.
# To get the list of taxes
taxes = receipt_data.receipt.taxes

# Loop on each Tax field
for tax in taxes:
   # To get the tax amount
   tax_amount = tax.value

   # To get the tax code for from a tax object
   tax_code = tax.code

   # To get the tax rate
   tax_rate = tax.rate
   print((" tax amount: {tax_amount} \n tax_code: {tax_code} \n tax_rate: {tax_rate} ").format(tax_amount=tax_amount, tax_code=tax_code, tax_rate=tax_rate ))

Output

 tax amount: 0.41 
 tax_code: TAX 
 tax_rate: None

Time

  • receipt.time (string): Time of purchase as seen on the receipt
    • value (string): Time of purchase with 24 hours formatting (hh:mm).
    • raw (string): In any format as seen on the receipt.
# To get the time
time = receipt_data.receipt.time.value
print("Time: ", time)

Output

Time:  10:00

Total Amounts

  • receipt.total_incl (number): Total amount including taxes
# To get the total amount including taxes value

total_incl = receipt_data.receipt.total_incl.value
print("total with tax", total_incl)

Output

total with tax 7.27
  • receipt.total_excl (number): Total amount paid excluding taxes
# To get the total amount excluding taxes value

total_excl = receipt_data.receipt.total_excl.value
print("total without tax", total_excl)

Output

total without tax 6.86
  • receipt.total_tax (number): Total tax value from tax lines
# To get the total tax amount value

total_tax = receipt_data.receipt.total_tax.value
print("total tax", total_tax)

Output

total tax 0.41

 

Questions?

Slack Logo IconSlack Logo Icon  Join our Slack


Did this page help you?