Get Document(s)
This endpoint allows you to get a list of all documents (and their relating data & images) matching your query parameters OR if you pass a DocumentId, the data for a single document. This method cannot insert data into the Xtracta App. It can only be used to retrieve data.
POST Parameters
| Parameter | Required | Value | Description |
|---|---|---|---|
api_key | {key} | An API key that has access to the resource you want to query for | |
document_id | {integer} | The ID of a specific document you want information for | |
workflow_id | {integer} | The ID of a workflow you want document information for | |
any_user_documents | 1 or 0 | User documents you want to get | |
document_status | pre-processing, indexing, qa, reject, output, output-in-progress, api-ui-in-progress, prelearning, learning | Limit the results to documents of a particular status within the workflow | |
api_download_status | active, archived | Define what status the document is in. Usually once you have downloaded the document, you would use the update document method to set it to archived. Thus when running this query you would usually set this parameter to active. By doing this, because previous documents you have downloaded would be in archive status, only new documents you haven't downloaded through the API would be present. | |
page | {integer} | If the number of documents in the request exceeds the defined documents_per_page (or exceeds the maximum allowed) then define what page of results you want. | |
items_per_page | {integer} | Define the number of documents to be returned. There is a 1,000 maximum limit which is also the default. | |
documents_order | asc, desc | How the documents are ordered when they are returned (by document_id) | |
detailed | 1, {null} | When set to "1" this will return all data about each document being returned (including field data). By default this endpoint will only return summary information about each document (when workflow_id is POSTed – if document_id is POSTed then full information about that document will be supplied by default). Getting detailed information can reduce the number of calls you need to make (as you don't need to make a separate call for each document) but can increase the size of the returned XML which may be superfluous if you only want summary information about a document. | |
locations | 1, {null} | When set to "1" this will return field value documents. | |
deleted | 1, {null} | When set to "1" this will return all deleted documents. | |
days_to_check | {integer} | To which date you want to check, eg: 180 |
Important Requirements:
- Either document_id OR workflow_id must be provided (⚠️ indicates at least one is required)
- When using any_user_documents: set to 1 to get documents from all users, 0 to get only current user's documents
Sample Request
POST https://api-app.xtracta.com/v1/documents HTTP/1.1 api_key=123&document_id=456789&detailed=1
Sample Response
<?xml version="1.0" encoding="UTF-8"?> <documents_response> <status>200</status> <message>The request has been successfully processed</message> <workflow_id>595</workflow_id> <documents_matching_query>3</documents_matching_query> <documents_number> <total>3</total> <usage-allowance-depleted>0</usage-allowance-depleted> <pre-processing>0</pre-processing> <indexing>3</indexing> <qa>0</qa> <reject>0</reject> <output-in-progress>0</output-in-progress> <completed>0</completed> <prelearning>0</prelearning> <learning>0</learning> </documents_number> <items_per_page>20</items_per_page> <page>1</page> <document revision="2"> <document_id>228004</document_id> <document_status>indexing</document_status> <number_of_pages>1</number_of_pages> <api_download_status>active</api_download_status> <free_form/> <classification/> <classification_class>awaiting classification</classification_class> <classification_design>undetected</classification_design> <siblings_documents/> <parent_document/> </document> <document revision="2"> <document_id>228003</document_id> <document_status>indexing</document_status> <number_of_pages>1</number_of_pages> <api_download_status>active</api_download_status> <free_form/> <classification/> <classification_class>awaiting classification</classification_class> <classification_design>undetected</classification_design> <siblings_documents/> <parent_document/> </document> <document revision="2"> <document_id>228002</document_id> <document_status>indexing</document_status> <number_of_pages>1</number_of_pages> <api_download_status>active</api_download_status> <free_form/> <classification/> <classification_class>awaiting classification</classification_class> <classification_design>undetected</classification_design> <siblings_documents/> <parent_document/> </document> </documents_response>
Schema Definition
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:simpleType name="DocumentStatusType">
<xs:restriction base="xs:string">
<xs:enumeration value="pre-processing"/>
<xs:enumeration value="indexing"/>
<xs:enumeration value="qa"/>
<xs:enumeration value="reject"/>
<xs:enumeration value="output"/>
<xs:enumeration value="output-in-progress"/>
<xs:enumeration value="api-ui-in-progress"/>
<xs:enumeration value="prelearning"/>
<xs:enumeration value="learning"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="ApiDownloadStatusType">
<xs:restriction base="xs:string">
<xs:enumeration value="active"/>
<xs:enumeration value="archived"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="ClassificationType">
<xs:restriction base="xs:string">
<xs:enumeration value="full"/>
<xs:enumeration value="basic"/>
<xs:enumeration value=""/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="ConfidenceType">
<xs:union>
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="100"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value=""/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>
<xs:complexType name="DocumentsNumberType">
<xs:sequence>
<xs:element name="total" type="xs:nonNegativeInteger"/>
<xs:element name="usage-allowance-depleted" type="xs:nonNegativeInteger"/>
<xs:element name="pre-processing" type="xs:nonNegativeInteger"/>
<xs:element name="indexing" type="xs:nonNegativeInteger"/>
<xs:element name="qa" type="xs:nonNegativeInteger"/>
<xs:element name="reject" type="xs:nonNegativeInteger"/>
<xs:element name="output-in-progress" type="xs:nonNegativeInteger"/>
<xs:element name="completed" type="xs:nonNegativeInteger"/>
<xs:element name="prelearning" type="xs:nonNegativeInteger"/>
<xs:element name="learning" type="xs:nonNegativeInteger"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="WordType">
<xs:sequence>
<xs:element name="page_number" type="xs:positiveInteger"/>
<xs:element name="left" type="xs:decimal"/>
<xs:element name="top" type="xs:decimal"/>
<xs:element name="width" type="xs:decimal"/>
<xs:element name="height" type="xs:decimal"/>
<xs:element name="value" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FieldValueLocationType">
<xs:sequence>
<xs:element name="word" type="WordType" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FieldType">
<xs:sequence>
<xs:element name="field_id" type="xs:positiveInteger"/>
<xs:element name="field_name" type="xs:string"/>
<xs:element name="field_value" type="xs:string" nillable="true"/>
<xs:element name="field_extraction_confidence" type="ConfidenceType" nillable="true"/>
<xs:element name="field_value_location" type="FieldValueLocationType" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="RowType">
<xs:sequence>
<xs:element name="field" type="FieldType" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FieldSetType">
<xs:sequence>
<xs:element name="field_set_id" type="xs:positiveInteger"/>
<xs:element name="field_set_name" type="xs:string"/>
<xs:element name="row" type="RowType" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FieldDataType">
<xs:sequence>
<xs:element name="field" type="FieldType" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="field_set" type="FieldSetType" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ValidationRuleType">
<xs:sequence>
<xs:element name="type" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="LinkedFieldType">
<xs:sequence>
<xs:element name="field_id" type="xs:positiveInteger"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ReasonType">
<xs:sequence>
<xs:element name="message" type="xs:string"/>
<xs:element name="validation_rule" type="ValidationRuleType"/>
<xs:element name="linked_field" type="LinkedFieldType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="RejectionType">
<xs:sequence>
<xs:element name="reason" type="ReasonType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="DocumentType">
<xs:sequence>
<xs:element name="document_id" type="xs:positiveInteger"/>
<xs:element name="document_status" type="DocumentStatusType"/>
<xs:element name="number_of_pages" type="xs:positiveInteger"/>
<xs:element name="api_download_status" type="ApiDownloadStatusType"/>
<xs:element name="free_form" type="xs:string" nillable="true"/>
<xs:element name="classification" type="ClassificationType" nillable="true"/>
<xs:element name="classification_class" type="xs:string"/>
<xs:element name="classification_design" type="xs:string"/>
<xs:element name="document_url" type="xs:anyURI" minOccurs="0"/>
<xs:element name="image_url" type="xs:anyURI" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="image_skew" type="xs:decimal" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="delete" type="xs:nonNegativeInteger" minOccurs="0"/>
<xs:element name="ocr_data_url" type="xs:anyURI" minOccurs="0"/>
<xs:element name="ocr_text_url" type="xs:anyURI" minOccurs="0"/>
<xs:element name="field_data" type="FieldDataType" minOccurs="0"/>
<xs:element name="rejection" type="RejectionType" minOccurs="0"/>
<xs:element name="siblings_documents" type="xs:string" nillable="true"/>
<xs:element name="parent_document" type="xs:string" nillable="true"/>
</xs:sequence>
<xs:attribute name="revision" type="xs:positiveInteger" use="required"/>
</xs:complexType>
<xs:element name="documents_response">
<xs:complexType>
<xs:sequence>
<xs:element name="status" type="xs:positiveInteger"/>
<xs:element name="message" type="xs:string"/>
<xs:choice>
<!-- Option 1: Standard format with workflow_id before documents -->
<xs:sequence>
<xs:element name="workflow_id" type="xs:positiveInteger"/>
<xs:element name="documents_matching_query" type="xs:nonNegativeInteger"/>
<xs:element name="documents_number" type="DocumentsNumberType"/>
<xs:element name="items_per_page" type="xs:positiveInteger"/>
<xs:element name="page" type="xs:positiveInteger"/>
<xs:element name="document" type="DocumentType" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<!-- Option 2: Alternative format with document first, then workflow_id -->
<xs:sequence>
<xs:element name="document" type="DocumentType"/>
<xs:element name="workflow_id" type="xs:positiveInteger"/>
</xs:sequence>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>