Data Model Configuration

What Is a Data Model?

Data Models are a set of fields and data types that define the data you wish to extract from a document. Mindee's machine learning algorithms uses the data model to appropriately extract the desired information from your documents.

Define your custom Data Model

In order to train your custom parsing API, you need to define a data model, i.e. a list of fields and their corresponding data types you want to extract from your documents.

This tutorial will walk you through the steps of defining such a Data Model.

For further information on specific data types, see arborescence on the left.

Prerequisites

You’ll need a free Mindee account. Sign up and confirm your email to login.

Let’s get started! 

After giving a name, description and cover image to your new API, you should land on the following page

 

From there you have two options:

  1. Manually add fields one by one by filling the form on the right side of the screen

  2. Upload a Data Model config file

Let’s start with the manual option.

Manually add a field to your Data Model

 

You can add a new field by filling in the right-side form with the following information:

 
Field Name: The straightforward name that will appear on the annotation interface later on. Use a name that means something to you when reading it

API response key: The name of the key used in the API response scheme

Field type: The Field Type specifies the type of information we are going to look for on the document and defines the data type that will be returned in the API response.

 

You can choose among a drop-down list of pre-built data types between :

String
Number
Date
Email address
Phone number
Url

 

Update or Delete a field

 

Now that you’ve created you first field, you should see something like this

 

 

From there you can: 

 

  • Add a field : Manually add a new field to your Data Model following the same process. You can repeat this step as many times as you want, there is no limit is the number of fields you can extract from a document.
  • Edit a field : Edit a specific field. You can change its Field name, API response key and Field type
  • Delete a field : Delete a specific field from your data model
  • Start training : When your Data Model is ready. Click here to automatically deploy your API and start training it.

 

 

Upload a Data Model file config  

 

Alternatively, you can create a whole set of fields at once by uploading a JSON config file in the left-side section. 

 

 

 

For instance, with our started example, the config json file looks like this

{
  "classificator": {
    "features": []
  },
  "selector": {
    "features": [
      {
        "semantics": "Word",
        "public_name": "Name",
        "name": "name",
        "cfg": {
          "numeric": false,
          "alpha": false
        },
        "handwritten": "false"
      },
      {
        "semantics": "Amount",
        "public_name": "Number",
        "name": "number",
        "cfg": {
          "number_filter": {
            "integer_only": false,
            "float_only": false
          }
        },
        "handwritten": "false"
      },
      {
        "semantics": "Date",
        "public_name": "Date",
        "name": "date",
        "cfg": {
          "country_format": "us"
        },
        "handwritten": "false"
      },
      {
        "semantics": "Email",
        "public_name": "Email",
        "name": "email",
        "handwritten": "false"
      },
      {
        "semantics": "Url",
        "public_name": "Url",
        "name": "url",
        "handwritten": "false"
      },
      {
        "semantics": "Phone",
        "public_name": "Phone",
        "name": "phone",
        "handwritten": "false"
      }
    ]
  }
}

 

Each field has the following attributes:

 

  • semantics: the Field Type. Mandatory.

  • public_name: the Field Name. Mandatory.

  • name: the API response key. Mandatory.

  • cfg: An object of additional data constraints depending on the Field Type.

  • handwritten: is entry handwritten or not. Mandatory

 

 

The possible parameters (true/false) for the cfg object are all mandatory, depend on the field and are the following:

 

semantics = Word (String):

  • alpha: true if it only alpha characters (a-Z), otherwise false

  • numeric: true if it only contains numeric characters (0-9), otherwise false

 If the string is made up of both character types, both values are true. It is not possible for both values to be false.

semantics = Amount (Number) :

  • integer_only: true if it is always a whole number, otherwise false
  • float_only: true if it always a decimal number, otherwise false

If the numbers vary, both values are true. It is not possible for both values to be true.

semantics = Date :

  • counrty_format: us if it is mostly in MM-DD-YY(YY) format, fr if the format is mostly DD-MM-YY(YY)

Dates can be written in any format and will be returned in ISO format YYYY-MM-DD.

Questions?
Slack Logo Icon  Join our Slack