Passports API
The Python SDK supports the passport API for extracting data from passports.
from mindee import Client
mindee_client = Client().config_passport("passport-api-key")
passport_data = mindee_client.doc_from_path("/path/to/file").parse("passport")
Using this sample fake passport below, we are going to illustrate how to extract the data that we want using the SDK.
Passport Data Structure
The passport object JSON data structure consists of:
Document Level Prediction
For document level prediction, we construct the document class by combining the different pages in a single document. This method used for creating a single passport object from multiple pages relies on field confidence scores.
Basically, we iterate over each page, and for each field, we keep the one that has the highest probability.
For example, if you send a three-page passport, the document level will provide you with one name, one country code, and so on.
passport_data.passport # returns a unique object from class Passport
Output
-----Passport data-----
Filename: passport.jpeg
Full name: HENERT PUDARSAN
Given names: HENERT
Surname: PUDARSAN
Country: GBR
ID Number: 707797979
Issuance date: 2012-04-22
Birth date: 1995-05-20
Expiry date: 2017-04-22
MRZ 1: P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<
MRZ 2: 7077979792GBR9505209M1704224<<<<<<<<<<<<<<00
MRZ: P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<7077979792GBR9505209M1704224<<<<<<<<<<<<<<00
----------------------
Page level prediction
We create the document class by iterating over each page one by one. Each page in the pdf is treated as a unique page.
For example, if you send a three-page passport, the page-level prediction will provide you with three names, three-countries codes, and so on.
passport_data.passports # [Passport, Passport ...]
Raw HTTP response
This contains the full Mindee API HTTP response object in JSON format.
passport_data.http_response # full HTTP request object
Output
{
"api_request": {
"error": {},
"resources": [
"document"
],
"status": "success",
"status_code": 201,
"url": "http://api.mindee.net/v1/products/mindee/passport/v1/predict"
},
"document": {
"annotations": {
"labels": {}
},
"id": "20e60278-6635-41b3-902f-b5ad1ebdbfa0",
"inference": {
"extras": {},
"finished_at": "2022-03-04T12:36:04+00:00",
"is_rotation_applied": true,
"pages": [
{
"id": 0,
"prediction": {
"birth_date": {
"confidence": 1,
"polygon": [[0.342, 0.689], [0.569, 0.689], [0.569, 0.713], [0.342, 0.713]],
"value": "1995-05-20"
},
"birth_place": {
"confidence": 0.89,
"polygon": [[0.442, 0.725], [0.555, 0.725], [0.555, 0.743], [0.442, 0.743]],
"value": "CAMTETH"
},
"country": {
"confidence": 1,
"polygon": [[0.509, 0.548], [0.558, 0.548], [0.558, 0.567], [0.509, 0.567]],
"value": "GBR"
},
"expiry_date": {
"confidence": 1,
"polygon": [[0.34, 0.797], [0.575, 0.797], [0.575, 0.82], [0.34, 0.82]],
"value": "2017-04-22"
},
"gender": {
"confidence": 1,
"polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
"value": "M"
},
"given_names": [
{
"confidence": 0.99,
"polygon": [[0.342, 0.617], [0.435, 0.617], [0.435, 0.638], [0.342, 0.638]],
"value": "HENERT"
}
],
"id_number": {
"confidence": 1,
"polygon": [[0.723, 0.548], [0.899, 0.548], [0.899, 0.568], [0.723, 0.568]],
"value": "707797979"
},
"issuance_date": {
"confidence": 1,
"polygon": [[0.34, 0.763], [0.564, 0.763], [0.564, 0.785], [0.34, 0.785]
],
"value": "2012-04-22"
},
"mrz1": {
"confidence": 0.99,
"polygon": [[0.056, 0.883], [0.926, 0.883], [0.926, 0.911], [0.056, 0.911]],
"value": "P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<"
},
"mrz2": {
"confidence": 1,
"polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
"value": "7077979792GBR9505209M1704224<<<<<<<<<<<<<<00"
},
"orientation": {
"confidence": 0.99,
"degrees": 0
},
"surname": {
"confidence": 0.99,
"polygon": [[0.34, 0.581], [0.472, 0.581], [0.472, 0.603], [0.34, 0.603]],
"value": "PUDARSAN"
}
}
}
],
"prediction": {
"birth_date": {
"confidence": 1,
"page_id": 0,
"polygon": [[0.342, 0.689], [0.569, 0.689], [0.569, 0.713], [0.342, 0.713]],
"value": "1995-05-20"
},
"birth_place": {
"confidence": 0.89,
"page_id": 0,
"polygon": [[0.442, 0.725], [0.555, 0.725], [0.555, 0.743], [0.442, 0.743]],
"value": "CAMTETH"
},
"country": {
"confidence": 1,
"page_id": 0,
"polygon": [[0.509, 0.548], [0.558, 0.548], [0.558, 0.567], [0.509, 0.567]],
"value": "GBR"
},
"expiry_date": {
"confidence": 1,
"page_id": 0,
"polygon": [[0.34, 0.797], [0.575, 0.797], [0.575, 0.82], [0.34, 0.82]],
"value": "2017-04-22"
},
"gender": {
"confidence": 1,
"page_id": 0,
"polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
"value": "M"
},
"given_names": [
{
"confidence": 0.99,
"page_id": 0,
"polygon": [[0.342, 0.617], [0.435, 0.617], [0.435, 0.638], [0.342, 0.638]],
"value": "HENERT"
}
],
"id_number": {
"confidence": 1,
"page_id": 0,
"polygon": [[0.723, 0.548], [0.899, 0.548], [0.899, 0.568], [0.723, 0.568]],
"value": "707797979"
},
"issuance_date": {
"confidence": 1,
"page_id": 0,
"polygon": [[0.34, 0.763], [0.564, 0.763], [0.564, 0.785], [0.34, 0.785]],
"value": "2012-04-22"
},
"mrz1": {
"confidence": 0.99,
"page_id": 0,
"polygon": [[0.056, 0.883], [0.926, 0.883], [0.926, 0.911], [0.056, 0.911]],
"value": "P<GBRPUDARSAN<<HENERT<<<<<<<<<<<<<<<<<<<<<<<"
},
"mrz2": {
"confidence": 1,
"page_id": 0,
"polygon": [[0.054, 0.92], [0.927, 0.92], [0.927, 0.944], [0.054, 0.944]],
"value": "7077979792GBR9505209M1704224<<<<<<<<<<<<<<00"
},
"surname": {
"confidence": 0.99,
"page_id": 0,
"polygon": [[0.34, 0.581], [0.472, 0.581], [0.472, 0.603], [0.34, 0.603]],
"value": "PUDARSAN"
}
},
"product": {
"features": [
"country",
"id_number",
"given_names",
"surname",
"birth_date",
"birth_place",
"gender",
"issuance_date",
"expiry_date",
"orientation",
"mrz1",
"mrz2"
],
"name": "mindee/passport",
"type": "standard",
"version": "1.0"
}
}
Client Passport Parse Parameters
Parameter name | Description | Default value |
---|---|---|
cutPdf | (Boolean) If set to true , when sending a multi-page PDF of more than 5 pages, the library creates a new PDF by concatenating the first 4 pages and the last page. | true |
Extracted Fields
Each passport object contains a set of different fields. Each field contains the four following attributes:
- value (Str or Float depending on the field type): corresponds to the field value. Set to None if the
field
was not extracted. - probability (Float): the confidence score of the field prediction.
- bbox (Array[Float]): contains the relative vertices coordinates of the bounding box containing the
field
in the image. If the field is not written, thebbox
is an empty array. - reconstructed (Bool):
True
if the field was reconstructed using other fields.
Passport's owner data
passport.given_names: List of passport's owner given names
# To get the list of names
given_names = passport_data.passport.given_names
# Loop on each given name
for given_name in given_names:
# To get the name string
name = given_name.value
passport.surname: Passport's owner surname
# To get the passport's owner surname (string)
surname = passport_data.passport.surname.value
passport.gender: Passport's owner gender (M / F)
# To get the passport's owner gender (string among {"M", "F"}
gender = passport_data.passport.gender.value
passport.full_name: Reconstructed Passport's owner full name from surname and given_names
# To get the passport's owner full name (string)
full_name = passport_data.passport.full_name.value
passport.birth_place: Passport's owner birth place
# To get the passport's owner birth place (string)
birth_place = passport_data.passport.birth_place.value
Dates
Each date field comes with an extra attribute:
date_object: (Datetime), datetime object from python datetime.date library
passport.birth_date: Passport's owner date of birth
# To get the passport's owner date of birth (string)
birth_date = passport_data.passport.birth_date.value
passport.expiry_date: Passport expiry date
# To get the passport expiry date (string)
expiry_date = passport_data.expiry_date.value
passport.issuance_date: Passport date of issuance
# To get the passport date of issuance (string)
issuance_date = passport_data.passport.issuance_date.value
Passports metadata
passport.mrz1: Passport first line of machine readable zone
# To get the passport first line of machine readable zone (string)
mrz1 = passport_data.passport.mrz1.value
passport.mrz2: Passport second line of machine readable zone
# To get the passport second line of machine readable zone (string)
mrz2 = passport_data.passport.mrz2.value
passport.mrz: Reconstructed passport full machine readable zone from mrz1 and mrz2
# To get the passport full machine readable zone (string)
mrz = passport_data.passport.mrz.value
passport.id_number: Passport identification number
# To get the passport id number (string)
id_number = passport_data.passport.id_number.value
passport.country: Passport country code
# To get the passport country code (string)
country_code = passport_data.passport.country_code.value
Questions?
Updated about 1 month ago