Passport OCR Python

The Python OCR SDK supports the passport API for extracting data from passports.

from mindee import Client

mindee_client = Client().config_passport("your-api-key")
passport_data = mindee_client.doc_from_path("/path/to/file").parse("passport")
print(passport_data.passport)

Using this sample fake passport below, we are going to illustrate how to extract the data that we want using the SDK.
fake passportfake passport

Passport Data Structure

The passport object JSON data structure consists of:

Document Level Prediction

For document level prediction, we construct the document class by combining the different pages in a single document. This method used for creating a single passport object from multiple pages relies on field confidence scores.

Basically, we iterate over each page, and for each field, we keep the one that has the highest probability.

For example, if you send a three-page passport, the document level will provide you with one name, one country code, and so on.

passport_data.passport # returns a unique object from class Passport

Output

-----Passport data-----
Filename: passport.jpeg
Full name: HENERT PUDARSAN
Given names: HENERT
Surname: PUDARSAN
Country: GBR
ID Number: 707797979
Issuance date: 2012-04-22
Birth date: 1995-05-20
Expiry date: 2017-04-22
MRZ 1: P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<
MRZ 2: 7077979792GBR9505209M1704224<<<<<<<<<<<<<<00
MRZ: P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<7077979792GBR9505209M1704224<<<<<<<<<<<<<<00
----------------------

Page level prediction

We create the document class by iterating over each page one by one. Each page in the pdf is treated as a unique page.

For example, if you send a three-page passport, the page-level prediction will provide you with three names, three-countries codes, and so on.

print(passport_data.passports) # [Passport, Passport ...]

Raw HTTP response

This contains the full Mindee API HTTP response object in JSON format.

print(passport_data.http_response) # full HTTP request object
print(json.dumps(passport_data.http_response, indent=4, sort_keys=True)) # full HTTP request pretty JSON object

Output

{
  "api_request": {
    "error": {},
    "resources": [
      "document"
    ],
    "status": "success",
    "status_code": 201,
    "url": "http://api.mindee.net/v1/products/mindee/passport/v1/predict"
  },
  "document": {
    "annotations": {
      "labels": {}
    },
    "id": "20e60278-6635-41b3-902f-b5ad1ebdbfa0",
    "inference": {
      "extras": {},
      "finished_at": "2022-03-04T12:36:04+00:00",
      "is_rotation_applied": true,
      "pages": [
        {
          "id": 0,
          "prediction": {
            "birth_date": {
              "confidence": 1,
              "polygon": [[0.342, 0.689], [0.569, 0.689], [0.569, 0.713], [0.342, 0.713]],
              "value": "1995-05-20"
            },
            "birth_place": {
              "confidence": 0.89,
              "polygon": [[0.442, 0.725], [0.555, 0.725], [0.555, 0.743], [0.442, 0.743]],
              "value": "CAMTETH"
            },
            "country": {
              "confidence": 1,
              "polygon": [[0.509, 0.548], [0.558, 0.548], [0.558, 0.567], [0.509, 0.567]],
              "value": "GBR"
            },
            "expiry_date": {
              "confidence": 1,
              "polygon": [[0.34, 0.797], [0.575, 0.797], [0.575, 0.82], [0.34, 0.82]],
              "value": "2017-04-22"
            },
            "gender": {
              "confidence": 1,
              "polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
              "value": "M"
            },
            "given_names": [
              {
                "confidence": 0.99,
                "polygon": [[0.342, 0.617], [0.435, 0.617], [0.435, 0.638], [0.342, 0.638]],
                "value": "HENERT"
              }
            ],
            "id_number": {
              "confidence": 1,
              "polygon": [[0.723, 0.548], [0.899, 0.548], [0.899, 0.568], [0.723, 0.568]],
              "value": "707797979"
            },
            "issuance_date": {
              "confidence": 1,
              "polygon": [[0.34, 0.763], [0.564, 0.763], [0.564, 0.785], [0.34, 0.785]
              ],
              "value": "2012-04-22"
            },
            "mrz1": {
              "confidence": 0.99,
              "polygon": [[0.056, 0.883], [0.926, 0.883], [0.926, 0.911], [0.056, 0.911]],
              "value": "P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<"
            },
            "mrz2": {
              "confidence": 1,
              "polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
              "value": "7077979792GBR9505209M1704224<<<<<<<<<<<<<<00"
            },
            "orientation": {
              "confidence": 0.99,
              "degrees": 0
            },
            "surname": {
              "confidence": 0.99,
              "polygon": [[0.34, 0.581], [0.472, 0.581], [0.472, 0.603], [0.34, 0.603]],
              "value": "PUDARSAN"
            }
          }
        }
      ],
      "prediction": {
        "birth_date": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.342, 0.689], [0.569, 0.689], [0.569, 0.713], [0.342, 0.713]],
          "value": "1995-05-20"
        },
        "birth_place": {
          "confidence": 0.89,
          "page_id": 0,
          "polygon": [[0.442, 0.725], [0.555, 0.725], [0.555, 0.743], [0.442, 0.743]],
          "value": "CAMTETH"
        },
        "country": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.509, 0.548], [0.558, 0.548], [0.558, 0.567], [0.509, 0.567]],
          "value": "GBR"
        },
        "expiry_date": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.34, 0.797], [0.575, 0.797], [0.575, 0.82], [0.34, 0.82]],
          "value": "2017-04-22"
        },
        "gender": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
          "value": "M"
        },
        "given_names": [
          {
            "confidence": 0.99,
            "page_id": 0,
            "polygon": [[0.342, 0.617], [0.435, 0.617], [0.435, 0.638], [0.342, 0.638]],
            "value": "HENERT"
          }
        ],
        "id_number": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.723, 0.548], [0.899, 0.548], [0.899, 0.568], [0.723, 0.568]],
          "value": "707797979"
        },
        "issuance_date": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.34, 0.763], [0.564, 0.763], [0.564, 0.785], [0.34, 0.785]],
          "value": "2012-04-22"
        },
        "mrz1": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [[0.056, 0.883], [0.926, 0.883], [0.926, 0.911], [0.056, 0.911]],
          "value": "P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<"
        },
        "mrz2": {
          "confidence": 1,
          "page_id": 0,
          "polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
          "value": "7077979792GBR9505209M1704224<<<<<<<<<<<<<<00"
        },
        "surname": {
          "confidence": 0.99,
          "page_id": 0,
          "polygon": [[0.34, 0.581], [0.472, 0.581], [0.472, 0.603], [0.34, 0.603]],
          "value": "PUDARSAN"
        }
      },
      "product": {
        "features": [
          "country",
          "id_number",
          "given_names",
          "surname",
          "birth_date",
          "birth_place",
          "gender",
          "issuance_date",
          "expiry_date",
          "orientation",
          "mrz1",
          "mrz2"
        ],
        "name": "mindee/passport",
        "type": "standard",
        "version": "1.0"
      }
    }

Extracted Fields

Each passport object contains a set of different fields. Each field contains the four following attributes:

  • value (Str or Float depending on the field type): corresponds to the field value. Set to None if the field was not extracted.
  • probability (Float): the confidence score of the field prediction.
  • bbox (Array[Float]): contains the relative vertices coordinates of the bounding box containing the field in the image. If the field is not written, the bbox is an empty array.
  • reconstructed (Bool): True if the field was reconstructed using other fields.

Additional Attributes

Depending on the field type specified, additional attributes can be extracted from the passport object.

Using the above passport example, the following are the basic fields that can be extracted.

Birth Informations

  • passport.birth_date (string): Passport's owner date of birth.
# To get the passport's owner date of birth
birth_date = passport_data.passport.birth_date.value
print("DOB: ", birth_date)

Output

DOB:  1995-05-20
  • passport.birth_place (string): Passport owner birthplace.
# To get the passport's owner
birth_place = passport_data.passport.birth_place.value
print("birthplace: ", birth_place)

Output

birthplace:  CAMTETH

Country

# To get the passport country code
country_code = passport_data.passport.country.value
print("passport country code: ", country_code)

Output

passport country code:  GBR

Date

  • passport.expiry_date (string): Passport expiry date in ISO format (yyyy-mm-dd).
# To get the passport expiry date
expiry_date = passport_data.passport.expiry_date.value
print("expires: ", expiry_date)

Output

expires:  2017-04-22

Gender

  • passport.gender (string): Passport's owner gender (M / F).
# To get the passport's owner gender (string among {"M", "F"}
gender = passport_data.passport.gender.value
print("gender: ", gender)

Output

gender:  M

Given Names

  • passport.given_names (string): List of passport's owner given names.
# To get the list of names
given_names = passport_data.passport.given_names
print("Given names: ")
# Loop on each given name
for given_name in given_names:
   # To get the name string
   name = given_name.value
print(name)

Output

Given names: HENERT

ID

  • passport.id_number (string): Passport identification number.
# To get the passport id number (string)
id_number = passport_data.passport.id_number.value
print("passport number: ", id_number)

Output

passport number:  707797979

Issuance Date

  • passport.issuance_date (string): Passport date of issuance in ISO format (yyyy-mm-dd).
# To get the passport date of issuance
issuance_date = passport_data.passport.issuance_date.value
print("issued: ", issuance_date)

Output

issued:  2012-04-22

Machine Readable Zone

  • passport.mrz1 (string): Passport first line of machine-readable zone.
# To get the passport  first line of machine readable zone (string)
mrz1 = passport_data.passport.value
print("mrz1: ", mrz1)

Output

mrz1:  P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<
  • passport.mrz2 (string): Passport second line of machine-readable zone.
# To get the passport full machine-readable zone (string)
mrz2 = passport_data.passport.value
print("mrz2: ", mrz2)

Output

mrz2: "7077979792GBR9505209M1704224<<<<<<<<<<<<<<00
  • passport.mrz (string): Reconstructed passport full machine readable zone from mrz1 and mrz2.
# To get the passport full machine readable zone (string)
mrz = passport_data.passport.value
print("mrz: ", mrz)

Output

mrz:  P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<7077979792GBR9505209M1704224<<<<<<<<<<<<<<00

Surname

  • passport.surname (string): Passport's owner surname.
# To get the passport's owner surname
surname = passport_data.passport.surname.value
print("surname: ", surname)

Output

surname:  PUDARSAN

 

Questions?
Slack Logo IconSlack Logo Icon  Join our Slack


Did this page help you?