Passports API

The Python SDK supports the passport API for extracting data from passports.

from mindee import Client

mindee_client = Client().config_passport("passport-api-key")
passport_data = mindee_client.doc_from_path("/path/to/file").parse("passport")

Using this sample fake passport below, we are going to illustrate how to extract the data that we want using the SDK.
fake passportfake passport

Passport Data Structure

The passport object JSON data structure consists of:

Document Level Prediction

For document level prediction, we construct the document class by combining the different pages in a single document. This method used for creating a single passport object from multiple pages relies on field confidence scores.

Basically, we iterate over each page, and for each field, we keep the one that has the highest probability.

For example, if you send a three-page passport, the document level will provide you with one name, one country code, and so on.

passport_data.passport # returns a unique object from class Passport

Output

-----Passport data-----
Filename: passport.jpeg
Full name: HENERT PUDARSAN
Given names: HENERT
Surname: PUDARSAN
Country: GBR
ID Number: 707797979
Issuance date: 2012-04-22
Birth date: 1995-05-20
Expiry date: 2017-04-22
MRZ 1: P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<
MRZ 2: 7077979792GBR9505209M1704224<<<<<<<<<<<<<<00
MRZ: P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<7077979792GBR9505209M1704224<<<<<<<<<<<<<<00
----------------------

Page level prediction

We create the document class by iterating over each page one by one. Each page in the pdf is treated as a unique page.

For example, if you send a three-page passport, the page-level prediction will provide you with three names, three-countries codes, and so on.

passport_data.passports # [Passport, Passport ...]

Raw HTTP response

This contains the full Mindee API HTTP response object in JSON format.

passport_data.http_response # full HTTP request object

Output

{
  "api_request": {
    "error": {},
    "resources": [
      "document"
    ],
    "status": "success",
    "status_code": 201,
    "url": "http://api.mindee.net/v1/products/mindee/passport/v1/predict"
  },
  "document": {
    "annotations": {
      "labels": {}
    },
    "id": "20e60278-6635-41b3-902f-b5ad1ebdbfa0",
    "inference": {
      "extras": {},
      "finished_at": "2022-03-04T12:36:04+00:00",
      "is_rotation_applied": true,
      "pages": [
        {
          "id": 0,
          "prediction": {
            "birth_date": {
              "confidence": 1,
              "polygon": [[0.342, 0.689], [0.569, 0.689], [0.569, 0.713], [0.342, 0.713]],
              "value": "1995-05-20"
            },
            "birth_place": {
              "confidence": 0.89,
              "polygon": [[0.442, 0.725], [0.555, 0.725], [0.555, 0.743], [0.442, 0.743]],
              "value": "CAMTETH"
            },
            "country": {
              "confidence": 1,
              "polygon": [[0.509, 0.548], [0.558, 0.548], [0.558, 0.567], [0.509, 0.567]],
              "value": "GBR"
            },
            "expiry_date": {
              "confidence": 1,
              "polygon": [[0.34, 0.797], [0.575, 0.797], [0.575, 0.82], [0.34, 0.82]],
              "value": "2017-04-22"
            },
            "gender": {
              "confidence": 1,
              "polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
              "value": "M"
            },
            "given_names": [
              {
                "confidence": 0.99,
                "polygon": [[0.342, 0.617], [0.435, 0.617], [0.435, 0.638], [0.342, 0.638]],
                "value": "HENERT"
              }
            ],
            "id_number": {
              "confidence": 1,
              "polygon": [[0.723, 0.548], [0.899, 0.548], [0.899, 0.568], [0.723, 0.568]],
              "value": "707797979"
            },
            "issuance_date": {
              "confidence": 1,
              "polygon": [[0.34, 0.763], [0.564, 0.763], [0.564, 0.785], [0.34, 0.785]
              ],
              "value": "2012-04-22"
            },
            "mrz1": {
              "confidence": 0.99,
              "polygon": [[0.056, 0.883], [0.926, 0.883], [0.926, 0.911], [0.056, 0.911]],
              "value": "P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<"
            },
            "mrz2": {
              "confidence": 1,
              "polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
              "value": "7077979792GBR9505209M1704224<<<<<<<<<<<<<<00"
            },
            "orientation": {
              "confidence": 0.99,
              "degrees": 0
            },
            "surname": {
              "confidence": 0.99,
              "polygon": [[0.34, 0.581], [0.472, 0.581], [0.472, 0.603], [0.34, 0.603]],
              "value": "PUDARSAN"
            }
          }
        }
      ],
      "prediction": {
        "birth_date": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.342, 0.689], [0.569, 0.689], [0.569, 0.713], [0.342, 0.713]],
          "value": "1995-05-20"
        },
        "birth_place": {
          "confidence": 0.89,
          "page_id": 0,
          "polygon": [[0.442, 0.725], [0.555, 0.725], [0.555, 0.743], [0.442, 0.743]],
          "value": "CAMTETH"
        },
        "country": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.509, 0.548], [0.558, 0.548], [0.558, 0.567], [0.509, 0.567]],
          "value": "GBR"
        },
        "expiry_date": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.34, 0.797], [0.575, 0.797], [0.575, 0.82], [0.34, 0.82]],
          "value": "2017-04-22"
        },
        "gender": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
          "value": "M"
        },
        "given_names": [
          {
            "confidence": 0.99,
            "page_id": 0,
            "polygon": [[0.342, 0.617], [0.435, 0.617], [0.435, 0.638], [0.342, 0.638]],
            "value": "HENERT"
          }
        ],
        "id_number": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.723, 0.548], [0.899, 0.548], [0.899, 0.568], [0.723, 0.568]],
          "value": "707797979"
        },
        "issuance_date": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.34, 0.763], [0.564, 0.763], [0.564, 0.785], [0.34, 0.785]],
          "value": "2012-04-22"
        },
        "mrz1": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [[0.056, 0.883], [0.926, 0.883], [0.926, 0.911], [0.056, 0.911]],
          "value": "P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<"
        },
        "mrz2": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
          "value": "7077979792GBR9505209M1704224<<<<<<<<<<<<<<00"
        },
        "surname": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [[0.34, 0.581], [0.472, 0.581], [0.472, 0.603], [0.34, 0.603]],
          "value": "PUDARSAN"
        }
      },
      "product": {
        "features": [
          "country",
          "id_number",
          "given_names",
          "surname",
          "birth_date",
          "birth_place",
          "gender",
          "issuance_date",
          "expiry_date",
          "orientation",
          "mrz1",
          "mrz2"
        ],
        "name": "mindee/passport",
        "type": "standard",
        "version": "1.0"
      }
    }

Client Passport Parse Parameters

Parameter nameDescriptionDefault value
cutPdf(Boolean) If set to true, when sending a multi-page PDF of more than 5 pages, the library creates a new PDF by concatenating the first 4 pages and the last page.true

Extracted Fields

Each passport object contains a set of different fields. Each field contains the four following attributes:

  • value (Str or Float depending on the field type): corresponds to the field value. Set to None if the field was not extracted.
  • probability (Float): the confidence score of the field prediction.
  • bbox (Array[Float]): contains the relative vertices coordinates of the bounding box containing the field in the image. If the field is not written, the bbox is an empty array.
  • reconstructed (Bool): True if the field was reconstructed using other fields.

Passport's owner data

passport.given_names: List of passport's owner given names

# To get the list of names
given_names = passport_data.passport.given_names

# Loop on each given name
for given_name in given_names:

   # To get the name string
   name = given_name.value

passport.surname: Passport's owner surname

# To get the passport's owner surname (string)
surname = passport_data.passport.surname.value

passport.gender: Passport's owner gender (M / F)

# To get the passport's owner gender (string among {"M", "F"}
gender = passport_data.passport.gender.value

passport.full_name: Reconstructed Passport's owner full name from surname and given_names

# To get the passport's owner full name (string)
full_name = passport_data.passport.full_name.value

passport.birth_place: Passport's owner birth place

# To get the passport's owner birth place (string)
birth_place = passport_data.passport.birth_place.value

Dates

Each date field comes with an extra attribute:

date_object: (Datetime), datetime object from python datetime.date library

passport.birth_date: Passport's owner date of birth

# To get the passport's owner date of birth (string)
birth_date = passport_data.passport.birth_date.value

passport.expiry_date: Passport expiry date

# To get the passport expiry date (string)
expiry_date = passport_data.expiry_date.value

passport.issuance_date: Passport date of issuance

# To get the passport date of issuance (string)
issuance_date = passport_data.passport.issuance_date.value

Passports metadata

passport.mrz1: Passport first line of machine readable zone

# To get the passport  first line of machine readable zone (string)
mrz1 = passport_data.passport.mrz1.value

passport.mrz2: Passport second line of machine readable zone

# To get the passport second line of machine readable zone (string)
mrz2 = passport_data.passport.mrz2.value

passport.mrz: Reconstructed passport full machine readable zone from mrz1 and mrz2

# To get the passport full machine readable zone (string)
mrz = passport_data.passport.mrz.value

passport.id_number: Passport identification number

# To get the passport id number (string)
id_number = passport_data.passport.id_number.value

passport.country: Passport country code

# To get the passport country code (string)
country_code = passport_data.passport.country_code.value

 

Questions?

Slack Logo IconSlack Logo Icon  Join our Slack


Did this page help you?