Proof of Address OCR

Extract recipient and issuer information from utility bills, tax returns, payslips, and more.

Using Mindee's Proof of Address API, you can automatically extract key information about the recipient or the issuer of a document to help you automate customer onboarding or KYC processes:

  • Issuer Name
  • Issuer Address
  • Issuer Company Registrations numbers
  • Recipient Name
  • Recipient Address
  • Recipient Company Registration numbers
  • Issuance Date
  • Dates
  • Currency
  • Language
  • Orientation

Set up the API

Before making any API calls, you need to have created your API key.

  1. You'll need to get a utility bill, or any document containing an address block. You can use the following bill for your tests:
824

Mindee proof of address OCR example document

  1. Access your Passport API by clicking on the Proof of Address card in the APIs Store.

  1. From the left navigation, go to documentation > API reference, you'll find sample code in popular languages and the command line.
1902

Proof of address OCR documentation

curl -X POST \
  https://api.mindee.net/v1/products/mindee/proof_of_address/v1/predict \
  -H 'Authorization: Token my-api-key-here' \
  -F document=@/path/to/your/file.png
import requests

url = "https://api.mindee.net/v1/products/mindee/proof_of_address/v1/predict"

with open("/path/to/my/file", "rb") as myfile:
    files = {"document": myfile}
    headers = {"Authorization": "Token my-api-key-here"}
    response = requests.post(url, files=files, headers=headers)
    print(response.text)
// works for NODE > v10
const axios = require('axios');
const fs = require("fs");
const FormData = require('form-data')

async function makeRequest() {
    let data = new FormData()
    data.append('document', fs.createReadStream('./file.jpg'))
    const config = {
        method: 'POST',
        url: 'https://api.mindee.net/v1/products/mindee/proof_of_address/v1/predict',
        headers: { 
          'Authorization':'Token my-api-key-here',
          ...data.getHeaders()
           },
        data
    }

    try {
      let response = await axios(config)
      console.log(response.data);
    } catch (error) {
      console.log(error)
    }

}

makeRequest()
# tested with Ruby 2.5
require 'uri'
require 'net/http'
require 'net/https'
require 'mime/types'

url = URI("https://api.mindee.net/v1/products/mindee/proof_of_address/v1/predict")
file = "/path/to/your/file.png"

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
request = Net::HTTP::Post.new(url)
request["Authorization"] = 'Token my-api-key-here'
request.set_form([['document', File.open(file)]], 'multipart/form-data')

response = http.request(request)
puts response.read_body
<form onsubmit="mindeeSubmit(event)" >
  <input type="file" id="my-file-input" name="file" />
  <input type="submit" />
</form>

<script type="text/javascript">
const mindeeSubmit = (evt) => {
  evt.preventDefault()
  let myFileInput = document.getElementById('my-file-input');
  let myFile = myFileInput.files[0]
  if (!myFile) { return }
  let data = new FormData();
  data.append("document", myFile, myFile.name);

  let xhr = new XMLHttpRequest();

  xhr.addEventListener("readystatechange", function () {
    if (this.readyState === 4) {
      console.log(this.responseText);
    }
  });

  xhr.open("POST", "https://api.mindee.net/v1/products/mindee/proof_of_address/v1/predict");
  xhr.setRequestHeader("Authorization", "Token my-api-key-here");
  xhr.send(data);
}
</script>
  • Replace my-api-key-here with your new API key, or use the select an API key feature and it will be filled automatically.
  • Copy and paste the sample code of your desired choice in your application, code environment, terminal etc.
  • Replace /path/to/your/file/png with the path to your document.

❗️

Always remember to replace your API key!

  1. Run your code. You will receive a JSON response with the document details.

API Response

Below is the full sample JSON response you get when you call the API. Since the response is quite verbose, we will walk through the fields section by section.

{
  "api_request": {
    "error": {},
    "resources": [
      "document"
    ],
    "status": "success",
    "status_code": 201,
    "url": "http://api.mindee.net/v1/products/mindee/proof_of_address/v1/predict"
  },
  "document": {
    "id":  "ecdbe7bd-1037-47a5-87a8-b90d49475a1f",
    "name": "sample_invoce.jpeg",
    "n_pages": 1,
    "is_rotation_applied": true,
    "inference": {
      "started_at": "2021-05-06T16:37:28",
      "finished_at": "2021-05-06T16:37:29",
      "processing_time": 1.125,
      "pages": [
        {
          "id": 0,
          "orientation": {"value": 0},
          "prediction": { .. },
          "extras": {}
        }
      ],
      "prediction": { .. },
      "extras": {}
    }
  }
}

You can find the prediction within the prediction key found in two locations:

  • In document > inference > prediction for document-level predictions: it contains the different fields extracted at the document level, meaning that for multi-pages PDFs, we reconstruct a single document object using all the pages.
  • In document > inference > pages[ ] > prediction for page-level predictions: it gives the prediction for each page independently. With images, there is only one element on this array, but with PDFs, you can find the extracted data for each PDF page.

Each predicted field may contain one or several values:

  • a confidence score
  • a polygon highlighting the information location
  • a page_id where the information was found (document level only)
{
  "prediction": {
    "recipient_company_registrations": [
     {
      "confidence": 0.99,
      "page_id": 0,
      "polygon": [[ 0.515, 0.962 ], [ 0.59, 0.962 ], [ 0.59, 0.973 ], [ 0.515, 0.973 ]],
      "type": "SIRET",
      "value": "XXX81125600010"
    },
    {
      "confidence": 0.99,
      "page_id": 0,
      "polygon": [[ 0.658, 0.963 ], [ 0.729, 0.963 ], [ 0.729, 0.973 ], [ 0.658, 0.973 ]],
      "type": "VAT NUMBER",
      "value": "FR44837811XXX"
      }
    ],
    "recipient_name": {
      "confidence": 0.84,
      "page_id": 0,
      "polygon": [[0.035, 0.284], [0.098, 0.284], [0.098, 0.296], [0.035, 0.296]],
      "value": "JIRO DOI"
    },
    "recipient_address": {
      "confidence": 0.3,
      "page_id": 0,
      "polygon": [[0.035, 0.304], [0.214, 0.304], [0.214, 0.353], [0.035, 0.353]],
      "value": "1954 Bloon Street West Toronto, ON, M6P 3K9 Canada"
    },
    "issuer_company_registrations":[
      {
      "confidence": 0.84,
      "page_id": 0,
      "polygon": [[0.113, 0.251], [0.206, 0.251], [0.206, 0.266], [0.113, 0.266]],
      "type": "TIN",
      "value": "736952710"
   		 }
    ],
    "dates": [
      {
        "confidence": 0.99,
        "page_id": 0,
        "polygon": [[0.842, 0.305], [0.931, 0.305], [0.931, 0.319], [0.842, 0.319]],
        "value": "2018-09-25"
    	}
    ],
    "issuance_date": {
      "confidence": 0.99,
      "page_id": 0,
      "polygon": [[0.842, 0.305], [0.931, 0.305], [0.931, 0.319], [0.842, 0.319]],
      "value": "2018-09-25"
    },
    "issuer_name": {
      "confidence": 0.72,
      "page_id": 0,
      "polygon": [[0.164, 0.087], [0.4, 0.087], [0.4, 0.147], [0.164, 0.147]],
      "value": "TURNPIKE DESIGNS CO."
    },
    "issuer_address": {
      "confidence": 0.49,
      "page_id": 0,
      "polygon": [[0.756, 0.128], [0.964, 0.128], [0.964, 0.162], [0.756, 0.162]],
      "value": "156 University Ave, Toronto ON, Canada M5H 2H7"
    }
  }
}

For each document, the following fields are extracted.

Recipient Information

  • recipient_name: In the JSON response, we have the value of the recipient name as found on the document.
{
  "recipient_name": {
    "confidence": 0.84,
    "page_id": 0,
    "polygon": [[0.035, 0.284], [0.098, 0.298], [0.098, 0.296], [0.035, 0.296]],
    "value": "JIRO DOI"
  }
}
  • recipient_address: In the JSON response, we have the value of the recipient address as found on the document.
{
  "recipient_address": {
    "confidence": 0.3,
    "page_id": 0,
    "polygon": [[0.035, 0.304], [0.214, 0.304], [0.214, 0.353], [0.035, 0.0353]],
    "value": "1954 Bloon Street West Toronto, ON, M6P 3K9 Canada"
  }
}
{
  "recipient_company_registrations": [
    {
      "confidence": 0.99,
      "page_id": 0,
      "polygon": [[ 0.515, 0.962 ], [ 0.59, 0.962 ], [ 0.59, 0.973 ], [ 0.515, 0.973 ]],
      "type": "SIRET",
      "value": "XXX81125600010"
    },
    {
      "confidence": 0.99,
      "page_id": 0,
      "polygon": [[ 0.658, 0.963 ], [ 0.729, 0.963 ], [ 0.729, 0.973 ], [ 0.658, 0.973 ]],
      "type": "VAT NUMBER",
      "value": "FR44837811XXX"
    }
  ]
}

Issuer Information

{
  "issuer_company_registrations": [
    {
    "confidence": 0.99,
    "page_id": 0,
    "polygon": [[0.515, 0.962], [0.59, 0.962], [0.59, 0.973], [0.515, 0.973]],
    "type": "SIRET",
    "value": "XXX81125600010"
 		 },

		{
    "confidence": 0.99,
    "page_id": 0,
    "polygon": [[0.658, 0.963], [0.729, 0.963], [0.729, 0.973], [0.658, 0.973]],
    "type": "VAT",
    "value": "FR44837811XXX"
  	}
 ]
}
  • issuer_name: In the JSON response below, we have the value of the issuer name as written in the document.
{
  "issuer_name": {
    "confidence": 0.11,
    "page_id": 0,
    "polygon": [[0.165, 0.089], [0.385, 0.089], [0.385, 0.145], [0.165, 0.145]],
    "value": "DESIGNS TURNPIKE CO"
  }
}
  • issuer_address: In the JSON response, we have the value of the issuer address as found on the document.
{
  "issuer_address": {
    "confidence": 0.49,
    "page_id": 0,
    "polygon": [[0.756, 0.128], [0.964, 0.128], [0.964, 0.162], [0.756, 0.162]],
    "value": "156 University Ave, Toronto ON, Canada M5H 2H7"
  }
}

Dates

  • Issuance_date: In the JSON response below, we have the value of the issuance date in an ISO format (yyyy-mm-dd).
{
  "issuance_date": {
    "confidence": 0.99,
    "page_id": 0,
    "polygon": [[0.84, 0.305], [0.932, 0.305], [0.932, 0.318], [0.84, 0.318]],
    "value": "2018-09-25"
  }
}
  • dates: In the JSON response below, we have the list of all dates extracted in the document in an ISO format(yyyy-mm-dd).
{
  "due_date": {
    "confidence": 0.86,
    "page_id": 0,
    "polygon": [[0.841, 0.323], [0.941, 0.323], [0.941, 0.338], [0.841, 0.338]],
    "raw": "Upon receipt",
    "value": "2018-09-25"
  }
}

Locale

  • locale: In the JSON response, we have the currency and language found on the document.
    • language (String): Language code in ISO 639-1 format as seen on the document. The following language codes are supported: ca, de, en, es, fr, it, nl and pt.
    • currency (String): Currency code in ISO 4217 format as seen on the document. The following country codes are supported: USD, EUR, GBP, CAD, CHF, AED, AUD, BRL, CNY, COP, CZK, DKK, GNF, HKD, HUF, JPY, NOK, NZD, PLN, SEK, SGD, XPF
{
  "locale": {
    "confidence": 0.94,
    "currency": "CAD",
    "language": "en"
  }
}

Orientation

  • orientation: The orientation field is only available at the page level as it describes whether the page image should be rotated to be upright. The rotation value is also conveniently available in the JSON response at:
    document > inference > pages [ ] > orientation > value.
    If the page requires rotation for correct display, the orientation field gives a prediction among these 3 possible outputs:
    • 0 degree: the page is already upright
    • 90 degrees: the page must be rotated clockwise to be upright
    • 270 degrees: the page must be rotated counterclockwise to be upright

In our example, the receipt doesn't require any rotation.

{
  "orientation": {
    "confidence": 0.99,
    "degrees": 0
  }
}

πŸ“˜

All polygon fields across the JSON response are already rotated accordingly!