General FAQs

Overview

This is a constantly changing item. Please feel free to provide feedback if you find errors, need more detail, or would like a new topic addressed.

Contact us via the ticketing portal or send an email.

What happens to a document during processing?

When a document is sent to a workflow, the processing steps occur in this order:

  1. Create PDF and thumbnails
  2. Auto-Separation (If an auto-separation workflow)
  3. OCR (Reading the text of the document)
  4. Determine Class and Design
  5. Extraction
  6. Strip and replace rules in order that the fields appear in the workflow. Along with this, conditions are checked
  7. Auto Format (Amounts, dates, times, CAPITALIZATION)
  8. Date Calculations (Differences, add/subtract time)
  9. Field merge
  10. Data Matching
  11. Math Calculations
  12. Check for OCR override
  13. Individual field validations (Correct format, required, etc - In field display order)
  14. Advanced validations (Math or database checks)
  15. Check for reassignment (If all validation rules pass)
  16. If all validations pass, the document is sent to Indexing, Quality Assurance, or Output, depending on the workflow configuration. Otherwise it will go to the Rejected tab

Strip and replace rules happen in field display order in the order that they appear for the field; Each rule happens once. Data-matching runs in a loop until no changes were made.

What are the different OCR engines?

When a workflow is created, there are five options for the administrator to choose from:

OCR Methods

In general, the OCR method that has the most success with the highest percentage of documents would be successful should be the default for the workflow.

  1. Nuance: Preferred method. Best for digital PDFs.
  2. Nuance – Rasterized: Best for scanned documents.
  3. Tesseract: Internal to Xtracta. Useful to correct font or special character issues.
  4. Microsoft Azure: Similar to Google Vision and Tesseract - Likely with better results. Best for hand-writing.
  5. Google Vision: Used when the other versions do not work adequately.

For documents using Microsoft Azure or Google Vision, the document is sent outside of Xtracta to Microsoft or Google. Some partners prefer to not have their data sent to a third party. Tesseract and Nuance OCR engines are internal to Xtracta.

Tesseract, Azure, Google Vision may OCR and extract handwriting; However, this is officially not supported if the results are not as desired. Azure, is preferred for hand-writing.

If the OCR engine of the workflow does not achieve the desired results, there are generally three options:

Changing the OCR engine can affect future document extraction as features and word_ids may change. Changing between Nuance – Rasterized and Nuance has marginal effect; Changing between others may require retraining of the documents.

How long is data retained in Xtracta?

When a document is uploaded to Xtracta, different documents (PDFs, JPGs, thumbnails) are created.

Retention Standards

  • Field values in documents are retained forever.
  • Regardless of input type, the PDF and JPG files created in Xtracta are deleted after 90 days unless the document is in the Learning tab.
  • Thumbnails are deleted after six months. unless the document contains field learning history. Field Learning History.

Relevant Information

  • Customized retention can be created as needed; Contact Xtracta support for this option.
  • Xtracta is EU Article 17 – GDPR compliant if requested.
  • On-premises installations can have retention policies based on partner requirements.
  • For API users, document URLs are not fixed. For the first three days, they remain in place; over the next 90 days, they will move as storage changes.

What happens if I run out of documents?

When a document is uploaded, a document Id is created. If a document is split, new document Ids are created. Each document Id is counted against the billing plan. For those groups on billing plans, if the document count reaches the group's allowance, the documents are held in a pre-processing status.

Billing Plans

There are several types of billing plans in Xtracta:

  • Monthly
  • Annual
  • Free or Trial

Certain partners use free plans (usually ten documents per month) or trial plans with more documents for up to three months. Plans renew when the time lapses.

Documents held in a Pre-Processing Status

Each billing plan has a different document count allowance and costs. When the document count reaches the allowance, additional documents can be put into the workflow, but they will be held in a pre-processing status. They will remain in this status until:

  • The plan renews
  • The plan is upgraded

In both cases, the documents will be released for processing.

Which document types does Xtracta support?

Xtracta accepts images and general documents in the following formats:

ImagesDocuments
GifDoc
HeicDocx
JpgOds
JpeOdt
JpegPdf
PipgXls
PngXlsx
Tif
Tiff
Information Circle
Email Footers

If the image in an email footer exceeds 50 pixels in height or width, it may get considered an attachment and be processed.

How does document classification work?

For learning to occur, documents must get classified. There are five ways of determining class:

  1. Primary classifier: Using features on the document (email addresses, phone numbers, URLs, etc.).
  2. String classifier: A field is specified to use for classification. After five documents have been trained using the same string, they get the same class.
  3. Fixed class: Used for forms that always look the same (templates). Xtracta support will need to assist with this.
  4. Ignore class: For dissimilar documents (contracts, etc.), fields are trained using custom models. Xtracta would need to assist with this.
  5. Custom Classifier: If you know the document source, you can pass a value via API to assign class. Not commonly used, but very helpful for some cases.

Another option is “String fallback to primary”: the workflow uses the string classifier first, if available, otherwise the primary classifier. Ideally, several documents from a particular supplier would be added to the workflow. If they get a class, then training and learning can occur. If the documents do not get a class, then the string classifier field would need to be used. After classification, documents are assigned a design (in general, documents from the same class that look alike).

What are the maximum file sizes?

Xtracta works with different document sizes, depending on how the document is sent into the workflow.

Via API / Web Uploader

  • Maximum POST: 110 MB
  • Maximum file size: 100 MB

Via Email

  • Maximum file size: 127.98 MB

Via FTP

  • Maximum file size: 200 MB

What are the Web Browser Requirements for Xtracta?

To ensure optimal performance and compatibility, we recommend using the latest versions of the following web browsers when accessing the Xtracta user interface:

  • Google Chrome
  • Mozilla Firefox
  • Apple Safari
  • Microsoft Edge

Keeping your browser up to date helps guarantee a smooth and secure experience.