US Pay Stubs OCR

This article lays out the process recommended to build an OCR API that extracts data from US pay stubs using Mindee's deep learning engine.

Prerequisites

  1. You’ll need a free account. Sign up and confirm your email to login.
  2. You’ll need at least 20 US pay stubs (images or PDFs) to train your OCR.

Define your Pay Stub use case

You might need to automatically extract data from pay stubs to improve your user experience in payroll or loan eligibility workflows. This article will guide you over the few steps required to deploy your Pay Stubs data extraction API.

First, we’re going to define the fields we want to extract from your pay stubs.

Pay Stub key data extraction

Here is the list of fields we are going to extract using our OCR API:

  • Employer: The full name of the employer issuing the pay stub
  • Net pay: Total net paid to the employee
  • Pay date: Date of wage payment
  • Period beginning: Pay stub start date
  • Period ending: Pay stub end date
  • Gross pay: Total gross pay before taxes and deductions
  • Total tax: Total tax deducted

You can add as many relevant fields as you need to better fit your requirements.

Deploy your API

Once you have defined what fields you want to extract, head over to the platform and press the ‘create a new API’ button.

You land now on the setup page. Here is the image you can use to set up the API. For instance, my setup is as follows:

Setup your model

Once you’re ready, click on the “next” button. We are going to specify the data types for each of the fields we want our API to extract.

To move forward, you have two possibilities:

Upload a json config
Copy the following JSON into a file and upload it on the interface

{
  "problem_type": {
    "classificator": { "features": [], "features_name": [] },
    "selector": {
      "features": [
        {
          "cfg": { "filter": { "alpha": -1, "numeric": -1 } },
          "handwritten": false,
          "name": "employer",
          "public_name": "Employer",
          "semantics": "word"
        },
        {
          "cfg": { "filter": { "is_integer": -1 } },
          "handwritten": false,
          "name": "net_pay",
          "public_name": "Net Pay",
          "semantics": "amount"
        },
        {
          "cfg": { "filter": { "convention": "US" } },
          "handwritten": false,
          "name": "pay_date",
          "public_name": "Pay Date",
          "semantics": "date"
        },
        {
          "cfg": { "filter": {} },
          "handwritten": false,
          "name": "period_beg",
          "public_name": "Period beg",
          "semantics": "date"
        },
        {
          "cfg": { "filter": { "convention": "US" } },
          "handwritten": false,
          "name": "period_end",
          "public_name": "Period end",
          "semantics": "date"
        },
        {
          "cfg": { "filter": { "is_integer": -1 } },
          "handwritten": false,
          "name": "gross_pay",
          "public_name": "Gross Pay",
          "semantics": "amount"
        },
        {
          "cfg": { "filter": { "is_integer": -1 } },
          "handwritten": false,
          "name": "total_tax",
          "public_name": "Total tax ",
          "semantics": "amount"
        }
      ],
      "features_name": [
        "employer",
        "net_pay",
        "pay_date",
        "period_beg",
        "period_end",
        "gross_pay",
        "total_tax"
      ]
    }
  }
}

Or build your data model manually
Using the interface, add up to your data model each field.

Manually set up your API

In our example, here are the different field configurations we used:

  • Employer: type String
  • Net pay: type Amount
  • Pay date: type Date
  • Period beginning: type Date
  • Period ending: type Date
  • Gross pay: type Amount
  • Total tax: Total tax deducted

Your model is now ready to start training.

Ready to train model

Train your Pay Stub OCR

Train your model

You’re all set!

Now is the time to train your US Pay Stub deep learning model in the Training section of your API.

To get more information about the training phase, please refer to the Getting Started tutorial .

If you have any questions regarding your use case, feel free to reach out on the chat!

Updated 28 days ago


US Pay Stubs OCR


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.