French Payslips OCR

This article explains how to build an OCR API that automatically extracts data from French payslips (bulletins de salaire).

Prerequisites

  1. You’ll need a free account. Sign up and confirm your email to login.
  2. You’ll need at least 20 French payslips images or PDFs to train your OCR.

Define your French payslips use case

First, we need to specify the fields we want to extract from our payslips.

French payslip key data extractionFrench payslip key data extraction

French payslip key data extraction

For our example, we are going to extract the following list of fields from our French payslips:

  • Employee full name: First and last names of the employee
  • Employee SSN: Employee social security number
  • Employer SIRET: Employer SIRET number
  • Payslip period: Payslip month and year
  • Net paid: Total net paid
  • Gross salary: Total gross salary before taxes

Feel free to add any data you'd like the OCR to extract.

Deploy your API

Once you have defined the fields you want to extract, head over to the platform and press the ‘create a new API’ button.

You land now on the setup page. Here is the image you can use to set up the API. For instance, my setup is as follows:
Setup your modelSetup your model

Setup your model

We're ready! Press the “next” button. We are going to build our data model in the next section.

To move forward, you have two possibilities:

Upload a json config
Copy the following JSON into a file and upload it on the interface

{
  "problem_type": {
    "classificator": { "features": [], "features_name": [] },
    "selector": {
      "features": [
        {
          "cfg": { "filter": { "alpha": -1, "numeric": 0 } },
          "handwritten": false,
          "name": "employee_full_name",
          "public_name": "Employee Full Name",
          "semantics": "word"
        },
        {
          "cfg": { "filter": { "alpha": -1, "numeric": -1 } },
          "handwritten": false,
          "name": "employee_ssn",
          "public_name": "Employee SSN",
          "semantics": "word"
        },
        {
          "cfg": { "filter": { "alpha": -1, "numeric": -1 } },
          "handwritten": false,
          "name": "company_siret",
          "public_name": "Company SIRET",
          "semantics": "word"
        },
        {
          "cfg": { "filter": { "convention": "FR" } },
          "handwritten": false,
          "name": "payslip_period",
          "public_name": "Payslip period",
          "semantics": "date"
        },
        {
          "cfg": { "filter": { "is_integer": -1 } },
          "handwritten": false,
          "name": "net_paid",
          "public_name": "Net Paid",
          "semantics": "amount"
        },
        {
          "cfg": { "filter": { "is_integer": -1 } },
          "handwritten": false,
          "name": "gross_salary",
          "public_name": "Gross salary",
          "semantics": "amount"
        }
      ],
      "features_name": [
        "employee_full_name",
        "employee_ssn",
        "company_siret",
        "payslip_period",
        "net_paid",
        "gross_salary"
      ]
    }
  }
}

Or build your data model manually
Using the interface, add up to your data model each field.

Upload JSON or manually set up your APIUpload JSON or manually set up your API

Upload JSON or manually set up your API

In our example, here are the different field configurations we used:

  • Employee full name: type String with no numeric characters
  • Employee SSN: type String. Note that we haven't checked the "It never contains alpha characters" as social security numbers can contain 'a' or 'b' for Corsican.
  • Employer company SIRET: type String that never contains alpha characters.
  • Payslip period: type Date
  • Net paid: type Amount
  • Gross salary: type Amount

You are now ready to train your model!

Ready to train modelReady to train model

Ready to train model

Train your Payslip OCR

Training your payslip APITraining your payslip API

Training your payslip API

You’re all set!

Now is the time to train your custom Payslip deep learning model. To get more information about the training phase, please refer to the Getting Started tutorial. And if you have any questions regarding your use case, feel free to reach out to us on our chat!

Updated 4 months ago


French Payslips OCR


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.