What does training a document mean?
Training documents within the Xtracta system means teaching the system to extract information for a specific field from a document correctly. Documents can be annotated to facilitate the training process while in the Indexing Queue, Rejected Queue, or Quality Assurance Queue. When a document is output, Xtracta considers the captured fields accurate and feeds the data to the AI model to trigger the training process.
For workflows designated for training, documents must be reviewed by a user. Generally, they will remain in the dashboard queues until this action is taken. In contrast, when a workflow is production-ready, it can be configured to automatically output them.
Furthermore, various business logic-related validation rules can be configured in the workflow to make sure data are captured correctly before they get output. The documents having validation errors will be sent to a ‘reject’ queue where users can review and correct them, which will in turn help improve the model.


Recommendations
It’s advisable to train the system with at least six samples from the same type or supplier of the document. After training, when a new document is uploaded, the AI will extract the information based on the training. A typical training process takes a few hours.
Important Note:
Processing identical documents is generally not beneficial for the training process. Instead, using different documents with varying content or page numbers is preferable. Once the training for a specific document type is complete, reprocessing the same document can serve as a good check to see if the learning was successful.