How does a document get a class and a design?

Learning and extraction in the Xtracta system are based on three key elements: Workflow, Class, and Design.

Workflow

This is a configuration set up by the user to handle specific types of documents. It dictates how a document should be processed to meet the particular needs of the user.

Class

When a document enters a workflow, Xtracta’s AI begins by identifying the document’s source, searching for distinctive features that could indicate the originator. These features may include:

  • Bank account details
  • Tax Identification Numbers
  • Phone numbers

Upon finding a relevant feature, the system classifies the document and assigns it a unique identifier. This classification is global and unique, allowing the system to apply specialized extraction techniques tailored to each class of documents.

Design

This refers to the system’s ability to recognize the layout of a document. Xtracta can distinguish between different designs within the same class, which is crucial for the training and learning phases.

For example:
A telecommunications company might use different invoice designs for broadband and telephone services. In Xtracta:

  • The first recognized design is labeled as 0
  • Subsequent layouts are numbered sequentially (e.g., 1, 2, etc.)

This ensures that even if documents of the same class have varying layouts, the system can still accurately process them.

Class and Design Information

To view the Class and Design for a document:

  1. From the Workflow Dashboard, access a document by clicking on the Document ID.
  2. On the Engine Learning Screen, access the Document Information tab.
  3. The Class and Design will be displayed.
Class and Design Info