Webhook Integration Guide

Overview

The document extraction platform supports webhook integrations to automatically send extracted data to external systems. When a document is successfully processed, the platform can trigger webhooks to deliver the extracted data in real-time.

Webhook Payload Format

When a document is extracted, the following payload is sent to your webhook endpoint:

{
  "extractionId": "uuid",
  "templateId": "uuid",
  "fileName": "document.pdf",
  "timestamp": "2024-01-15T10:30:00Z",
  "organizationId": "uuid",
  "userId": "uuid",
  "data": {
    // Your extracted fields based on template
    "customerName": "John Doe",
    "invoiceNumber": "INV-2024-001",
    "amount": 1500.00,
    // ... other fields
  },
  "confidence": {
    "fields": {
      "customerName": 0.95,
      "invoiceNumber": 0.98,
      "amount": 0.92
      // ... confidence for each field
    },
    "overall": 0.92
  },
  "metadata": {
    "extractionPlatform": "Extractable",
    "version": "1.0.0"
  }
}

API Endpoints

Create Webhook Configuration

POST /api/integrations/webhooks
Content-Type: application/json

{
  "connectionName": "My CRM Webhook",
  "webhookConfig": {
    "url": "https://your-webhook-endpoint.com/webhook",
    "authType": "bearer", // Options: "none", "bearer", "apikey", "hmac"
    "authValue": "your-api-token",
    "headers": {
      "X-Custom-Header": "value"
    },
    "retryEnabled": true,
    "maxRetries": 5
  },
  "fieldMappings": {
    "format": "default", // Options: "default", "flat", "custom"
    "fields": {
      "customerName": "client_name",
      "invoiceNumber": "invoice_id"
    }
  }
}

List Webhook Configurations

GET /api/integrations/webhooks

Update Webhook Configuration

PUT /api/integrations/webhooks/{id}
Content-Type: application/json

{
  "webhookConfig": {
    "url": "https://your-webhook-endpoint.com/new-webhook"
  },
  "status": "active" // or "inactive"
}

Delete Webhook Configuration

DELETE /api/integrations/webhooks/{id}

Test Webhook Configuration

POST /api/integrations/webhooks/test
Content-Type: application/json

{
  "webhookId": "webhook-uuid",
  "testData": {
    "data": {
      "customerName": "Test Customer",
      "amount": 100.00
    }
  }
}

Authentication Methods

1. No Authentication

{
  "authType": "none"
}

2. Bearer Token

{
  "authType": "bearer",
  "authValue": "your-bearer-token"
}

3. API Key

{
  "authType": "apikey",
  "authValue": "X-API-Key:your-api-key"
}

Or use default header:

{
  "authType": "apikey",
  "authValue": "your-api-key"
}

4. HMAC Signature

{
  "authType": "hmac",
  "hmacSecret": "your-shared-secret"
}

With HMAC, the following headers are added:

  • X-Webhook-Signature: HMAC-SHA256 signature of timestamp.payload
  • X-Webhook-Timestamp: Unix timestamp in milliseconds

Field Mappings

Default Format

Preserves the structure with renamed fields:

{
  "format": "default",
  "fields": {
    "customerName": "client_name",
    "invoiceNumber": "invoice_id"
  }
}

Flat Format

Flattens the entire payload:

{
  "format": "flat",
  "fields": {
    "data.customerName": "client_name",
    "data.amount": "total_amount",
    "confidence.overall": "confidence_score"
  }
}

Custom Format

Use a JavaScript transformation function:

{
  "format": "custom",
  "transform": "return { customer: data.data.customerName, total: data.data.amount };"
}

Retry Logic

Failed webhook deliveries are automatically retried with exponential backoff:

  • Attempt 1: Immediate
  • Attempt 2: After 1 second
  • Attempt 3: After 2 seconds
  • Attempt 4: After 4 seconds
  • Attempt 5: After 8 seconds

Maximum retry delay is capped at 5 minutes.

Best Practices

  1. Use HTTPS: Always use HTTPS endpoints for security
  2. Implement Idempotency: Use the extractionId to handle duplicate deliveries
  3. Quick Response: Respond to webhooks within 30 seconds
  4. Return 2xx Status: Return HTTP 200-299 to indicate success
  5. Verify Signatures: When using HMAC, always verify the signature
  6. Handle Retries: Be prepared to receive the same webhook multiple times

Example Webhook Receiver (Node.js)

const express = require('express');
const crypto = require('crypto');
const app = express();

app.use(express.json());

app.post('/webhook', (req, res) => {
  // Verify HMAC signature if configured
  const signature = req.headers['x-webhook-signature'];
  const timestamp = req.headers['x-webhook-timestamp'];
  
  if (signature && timestamp) {
    const expectedSignature = crypto
      .createHmac('sha256', process.env.WEBHOOK_SECRET)
      .update(`${timestamp}.${JSON.stringify(req.body)}`)
      .digest('hex');
    
    if (signature !== expectedSignature) {
      return res.status(401).send('Invalid signature');
    }
  }
  
  // Process the webhook
  const { extractionId, data, confidence } = req.body;
  
  console.log(`Received extraction ${extractionId}`);
  console.log('Extracted data:', data);
  console.log('Confidence:', confidence);
  
  // Your business logic here
  
  // Respond quickly
  res.status(200).json({ received: true });
});

app.listen(3000, () => {
  console.log('Webhook receiver listening on port 3000');
});

Troubleshooting

Common Issues

  1. Webhook not triggering: Ensure the integration status is "active"
  2. Authentication failures: Double-check your auth credentials
  3. Timeout errors: Ensure your endpoint responds within 30 seconds
  4. SSL errors: Verify your SSL certificate is valid
  5. Field mapping issues: Test with the test endpoint first

Debugging Tips

  1. Use the test endpoint to verify your webhook configuration
  2. Check webhook delivery logs in the extraction history
  3. Monitor your server logs for incoming requests
  4. Verify field mappings produce expected output
  5. Test with small documents first