Push API Configuration

The Push API Configuration allows you to set up real-time data delivery from Xtracta to your own systems. Instead of polling for updates, Xtracta will automatically push document data and status changes to your specified endpoint whenever events occur. This enables seamless integration with your business workflows and ensures you receive document processing results immediately.

Key Benefits

  • Real-time Updates - Receive instant notifications when documents change status
  • Reduced API Calls - Eliminate the need for constant polling
  • Flexible Integration - Support for multiple endpoint types (HTTP, AWS SQS, AWS S3)
  • Selective Events - Configure exactly which events and workflows trigger notifications
  • Secure Delivery - Built-in support for authentication and SSL certificates

How It Works

  1. Configure Endpoint - Set up your receiving endpoint (Web Service, AWS SQS, or AWS S3)
  2. Select Events - Choose which document events and workflows to monitor
  3. Receive Data - Xtracta pushes data to your endpoint when configured events occur
  4. Process Results - Your system processes the received data according to your business logic
Client Push Applications
Client Push Applications

Service Configuration

Receive Events

The receive Events setting determines what type of processing events will trigger push notifications. Choose the option that best matches your integration needs:

  • Input & Document (Default) - Receive notifications for both individual document processing and batch input events. This is the most comprehensive option
  • Input - Receive notifications only when new documents are uploaded or submitted to Xtracta
  • Document - Receive notifications only for document-level events like status changes, extractions, and validations
  • Batch - Receive notifications for batch-level operations when multiple documents are processed together

Endpoint Type

Select the destination where Xtracta will send your document data. Each endpoint type offers different advantages:

  • Web Service - Direct HTTP/HTTPS POST requests to your REST API or webhook endpoint. Ideal for real-time processing and custom integrations
  • AWS SQS - Push messages to Amazon Simple Queue Service for reliable, scalable message queuing. Perfect for decoupled architectures and asynchronous processing
  • AWS S3 - Store documents and data files directly in Amazon S3 buckets. Best for archival, data lakes, or when you need persistent file storage
Information Circle

The configuration form dynamically displays different fields based on your selected endpoint type. Each type has specific requirements and authentication methods.

Web Service Configuration

Web Service configuration allows you to receive push notifications via HTTP/HTTPS POST requests. This is the most common integration method for custom applications and third-party services.

Client Push Applications
Information Circle

These fields only appear when "Web Service" is selected as the Endpoint Type.

URL

Enter the complete endpoint URL where your server will receive POST requests from Xtracta. The URL must be publicly accessible and should handle JSON payloads.

Format: https://api.example.com/webhook/xtracta

Requirements:

  • Must be a valid HTTP or HTTPS URL
  • Should respond with 200 OK status for successful receipt
  • Must be able to handle POST requests with JSON content

Headers

Configure custom HTTP headers that will be included with every push request. This is typically used for authentication and content type specification.

Common Use Cases:

  • API key authentication
  • Basic authentication
  • Custom authorization tokens
  • Content type declarations

Example Configuration:

Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=
Content-Type: application/json
X-API-Key: your-api-key-here
X-Custom-Header: custom-value
Checkmark Circle

Use the "Credentials?" link to automatically generate properly formatted Basic Authentication headers from your username and password.

SSL Configuration

For enhanced security, you can configure SSL client certificate authentication. This provides an additional layer of security beyond standard HTTPS by requiring Xtracta to present a client certificate when connecting to your endpoint.

Configuration Options:

  • SSL Toggle - Enable or disable SSL client certificate authentication. When enabled, additional certificate fields will appear
  • Cert File (.pem) - Your SSL client certificate in PEM format. You can either upload a .pem file or paste the certificate content directly
  • Cert Password - The password for your certificate file (only required if the certificate is password-protected)
  • Key File (.pem) - Your SSL private key in PEM format. Upload or paste the private key that corresponds to your certificate
  • Key Password - The password for your private key file (only required if the key is password-protected)
Warning

Keep your private keys secure. Never share them or commit them to version control systems.

Test Connection

Before saving your configuration, use the "Test Connection" button to verify that Xtracta can successfully connect to your endpoint. This test will:

  1. Validate the URL format
  2. Check network connectivity
  3. Verify SSL certificates (if configured)
  4. Send a test POST request with sample data
  5. Display the response status

Successful Test: You'll see a success message indicating the connection was established Failed Test: Error details will help you troubleshoot connection issues

AWS SQS Configuration

Amazon SQS (Simple Queue Service) provides a reliable, scalable message queuing service. This option is ideal for decoupled architectures where you want to process documents asynchronously or handle high volumes with automatic scaling.

Client Push Applications
Information Circle

These fields only appear when "AWS SQS" is selected as the Endpoint Type.

Required Fields

  • Bucket - The full URL of your SQS queue where messages will be sent

    • Format: https://sqs.{region}.amazonaws.com/{account-id}/{queue-name}
    • Example: https://sqs.us-east-1.amazonaws.com/123456789012/xtracta-documents
  • Region - The AWS region where your SQS queue is hosted

    • Examples: us-east-1, eu-west-1, ap-southeast-2
    • Must match the region in your queue URL
  • Access Key - Your AWS IAM access key ID for authentication

    • Format: 20 characters (e.g., AKIAIOSFODNN7EXAMPLE)
    • Associated IAM user must have SQS permissions
  • Secret Key - Your AWS IAM secret access key

    • Format: 40 characters (keep this secure)
    • Never share or expose this key

Optional Fields

  • Message Group Id - Required for FIFO (First-In-First-Out) queues only
    • Specifies which message group the messages belong to
    • Messages in the same group are processed in order
    • Required if your queue name ends with .fifo
    • Example: xtracta-documents-group

Required IAM Permissions

Your AWS IAM user must have the following permissions for the SQS queue:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sqs:SendMessage",
        "sqs:GetQueueAttributes"
      ],
      "Resource": "arn:aws:sqs:region:account-id:queue-name"
    }
  ]
}
Warning

Without proper IAM permissions, push notifications will fail. Test your configuration after setup to ensure proper access.

AWS S3 Configuration

Amazon S3 (Simple Storage Service) integration allows Xtracta to store processed documents and extracted data directly in your S3 buckets. This is perfect for long-term storage, data archival, or integration with data lake architectures.

Client Push Applications
Information Circle

These fields only appear when "AWS S3" is selected as the Endpoint Type.

Required Fields

  • Bucket - The name of your S3 bucket where files will be stored

    • Format: Bucket name only (not the full URL)
    • Example: my-xtracta-documents
    • Must be an existing bucket in your AWS account
  • Region - The AWS region where your S3 bucket is located

    • Examples: us-west-2, eu-central-1, ap-northeast-1
    • Must match the actual bucket region
    • Find your bucket region in the AWS S3 console
  • Access Key - Your AWS IAM access key ID

    • Format: 20 characters (e.g., AKIAIOSFODNN7EXAMPLE)
    • User must have S3 write permissions
  • Secret Key - Your AWS IAM secret access key

    • Format: 40 characters
    • Keep this secure and never expose it

Receive Files

Choose which file types Xtracta should store in your S3 bucket:

  • PDF File - The original PDF document as processed by Xtracta

    • Useful for archival and compliance
    • Maintains original document format
    • Includes any annotations or stamps applied
  • Data File - The extracted data in structured format

    • Available in JSON or XML format
    • Contains all extracted fields and values
    • Includes metadata and confidence scores
    • Ready for integration with data processing pipelines

S3 File Organization

Files will be organized in your bucket with a logical structure:

bucket-name/
ā”œā”€ā”€ {year}/{month}/{day}/
│   ā”œā”€ā”€ documents/
│   │   └── {document-id}.pdf
│   └── data/
│       └── {document-id}.json

Required IAM Permissions

Configure your IAM user with these minimum permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:PutObjectAcl"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    }
  ]
}
Warning

Ensure your S3 bucket has appropriate access policies and encryption settings for your security requirements.

Workflows and Statuses

This section allows you to precisely control which document processing events trigger push notifications. You can configure notifications at both the workflow level and status level, ensuring you only receive relevant updates.

Warning

At least one workflow must be selected for the Push API to function. The system will display an error if you attempt to save without selecting any workflows.

Understanding Workflows

Workflows in Xtracta represent different document processing pipelines. Each workflow is designed for specific document types (invoices, receipts, statements, etc.) and contains its own extraction rules and validation logic.

All Workflows Option

The "All Workflows" checkbox provides a comprehensive coverage option that includes:

  • Current Group Workflows - All workflows configured in your current group
  • Sub-group Workflows - Workflows from any child groups in your hierarchy
  • Future Workflows - Any new workflows added later will automatically be included
  • Simplified Management - No need to update configuration when adding new workflows

This option is ideal when you want complete visibility across all document processing in your organization.

Specific Workflows Configuration

For more granular control, click "- Specify Workflows and Statuses" to reveal individual workflow options. This allows you to:

  1. Select Individual Workflows - Choose only the workflows relevant to your integration
  2. Configure Per-Workflow Settings - Different status and action filters for each workflow
  3. Optimize API Traffic - Reduce unnecessary notifications by filtering precisely
  4. Workflow-Specific Logic - Apply different business rules based on workflow type

Document Statuses

Configure which document processing statuses should trigger notifications:

  • All Statuses - Receive notifications for every status change throughout the document lifecycle

  • Indexing - Document is actively being processed

    • OCR and data extraction in progress
    • Initial validation being performed
    • Useful for tracking processing start times
  • Rejected - Document failed processing or validation

    • Quality checks failed
    • Business rules not met
    • Manual review required
    • Important for exception handling
  • Output - Document successfully completed processing

    • All data extracted and validated
    • Ready for downstream systems
    • The most common trigger for integrations

Document Actions

Monitor specific user or system actions on documents:

  • Deleted - Document moved to recycle bin

    • Soft delete operation
    • Document can still be restored
    • Useful for audit trails
  • Restored - Document recovered from recycle bin

    • Previously deleted document made active again
    • Reprocessing may be required
    • Important for data consistency
  • Hard Deleted - Document permanently removed

    • Irreversible deletion
    • All associated data purged
    • Critical for compliance tracking

Batch Workflows Configuration

Batch workflows handle the processing of multiple documents as a single unit. This configuration section allows you to receive notifications about batch-level events, which is essential for monitoring bulk document processing operations.

When to Use Batch Workflows

Batch workflow notifications are ideal when:

  • Processing large volumes of documents together
  • Documents need to be grouped for business logic
  • Monitoring overall batch completion rather than individual documents
  • Implementing batch-level quality control

Batch Event Configuration

For each batch workflow, you can configure three key filtering dimensions:

1. Event Type

The specific batch processing milestone to monitor:

  • All Events - Receive notifications for any batch status change

  • Ready - Batch has been created and is awaiting processing

    • All documents uploaded
    • Validation checks passed
    • Ready for extraction
  • Processing - Batch extraction is actively running

    • Documents being analyzed
    • Data extraction in progress
    • Resource intensive phase
  • Locked - Batch is locked for exclusive access

    • Prevents concurrent modifications
    • Usually during manual review
    • System or user initiated
  • Unlocked - Batch lock has been released

    • Available for processing again
    • Edits can be made
    • Review completed
  • On Hold - Batch processing temporarily suspended

    • Awaiting additional information
    • Business rule intervention
    • Manual decision required
  • Deleted - Batch has been marked for deletion

    • Soft delete state
    • Can be restored if needed
    • Cleanup operations
  • Done - Batch processing fully completed

    • All documents processed
    • Results available
    • Ready for export

2. Category Filter

If your batch workflows use categories, filter notifications by specific category types. This helps segment different business processes or document sources.

3. Event Origin

Understand where batch events originated:

  • All Origins - Events from any source

  • System - Automated triggers

    • Scheduled processing
    • Rule-based actions
    • System maintenance
  • User - Manual interventions

    • User-initiated processing
    • Manual status changes
    • Review actions
  • API - Programmatic triggers

    • External system requests
    • Integration-driven events
    • Automated workflows

Multiple Event Rules

Information Circle

You can create multiple event configurations for each batch workflow. This allows you to:

  • Set different rules for different event combinations
  • Create complex notification logic
  • Handle various scenarios with precision

Example Configurations:

  1. Notify when batch is "Done" from "System" origin (automated completion)
  2. Notify when batch is "On Hold" from "User" origin (manual intervention needed)
  3. Notify for "All Events" from "API" origin (track all API-driven changes)

Best Practices

Follow these recommendations to ensure reliable and secure Push API integration:

Security Best Practices

  1. Use HTTPS Exclusively

    • Always use HTTPS endpoints for Web Services
    • HTTP should only be used in development environments
    • Protects data in transit from interception
  2. Secure Credential Management

    • Rotate AWS access keys regularly (every 90 days recommended)
    • Use strong, unique passwords for Basic Authentication
    • Never commit credentials to version control
    • Consider using AWS IAM roles where possible
  3. Implement Authentication

    • Always use authentication headers for Web Service endpoints
    • Validate incoming requests with API keys or tokens
    • Consider implementing request signing for additional security

Performance Optimization

  1. Selective Event Filtering

    • Only subscribe to necessary workflows and statuses
    • Use specific workflows instead of "All Workflows" when possible
    • Reduces unnecessary network traffic and processing
  2. Endpoint Reliability

    • Ensure your endpoint can handle concurrent requests
    • Implement proper request queuing
    • Design for horizontal scaling during peak loads
  3. Error Handling Strategy

    • Implement idempotent processing (handle duplicate messages gracefully)
    • Return appropriate HTTP status codes
    • Log failed requests for debugging
    • Implement exponential backoff for retries

Monitoring and Maintenance

  1. Connection Testing

    • Always test connections before going live
    • Re-test after any endpoint changes
    • Verify after credential rotations
  2. Monitoring Setup

    • Track successful vs failed push notifications
    • Monitor endpoint response times
    • Set up alerts for connection failures
    • Review logs regularly for patterns
  3. Documentation

    • Document your endpoint specifications
    • Maintain a changelog of configuration updates
    • Keep team informed of integration changes

Integration Patterns

  1. Asynchronous Processing

    • Process received data asynchronously
    • Respond quickly to Xtracta (< 5 seconds)
    • Use message queues for heavy processing
  2. Data Validation

    • Validate received JSON structure
    • Check for required fields
    • Handle missing or null values gracefully
  3. Backup Strategy

    • Have a fallback endpoint configured
    • Store failed messages for reprocessing
    • Implement data recovery procedures