Document Extraction API Documentation

Authentication

All API requests require authentication using an API key. Include your API key in the Authorization header:

Authorization: Bearer dk_live_1234567890abcdef1234567890abcdef

Creating API Keys

Option 1: Using the Dashboard (Recommended)

Log in to your account
Navigate to Dashboard → API Keys
Click "Create New API Key"
Enter a descriptive name for your key
Select an expiration period (optional)
Click "Create"
Important: Copy and save the API key immediately - it won't be shown again!

Note: Only organization owners and admins can create API keys. Each key is scoped to your organization and can have optional expiration dates.

Option 2: Using the API

You can also create API keys programmatically:

curl -X POST https://draw.extractable.xyz/api/api-keys \
  -H "Authorization: Bearer YOUR_SESSION_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My API Key",
    "expiresIn": "90d",
    "scopes": ["*"]
  }'

Endpoints

Extract Document

Upload a PDF document for extraction. The API will automatically split multi-document PDFs and extract data from each identified document.

POST /api/documents/extract

Request Parameters

Parameter	Type	Required	Description
`file`	File	Required	PDF file to process (max 100MB)
`documentType`	String	Optional	Hint for document type (invoice, permit, contract, etc.)
`options`	JSON	Optional	Advanced extraction options

Options Object

{
  "splitStrategy": "auto",           // auto | manual | none
  "confidenceThreshold": 0.8,        // Minimum confidence (0-1)
  "includeAlternatives": true,       // Include alternative document types
  "maxDocuments": 10,                // Maximum documents to process
  "pageTimeout": 5000                // Timeout per page in ms
}

curl -X POST https://draw.extractable.xyz/api/documents/extract \
  -H "Authorization: Bearer dk_live_1234567890abcdef1234567890abcdef" \
  -F "file=@invoice.pdf" \
  -F "documentType=invoice"

const formData = new FormData();
formData.append('file', fileInput.files[0]);
formData.append('documentType', 'invoice');

const response = await fetch('https://draw.extractable.xyz/api/documents/extract', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer dk_live_1234567890abcdef1234567890abcdef'
  },
  body: formData
});

const result = await response.json();

import requests

files = {'file': open('invoice.pdf', 'rb')}
data = {'documentType': 'invoice'}
headers = {
    'Authorization': 'Bearer dk_live_1234567890abcdef1234567890abcdef'
}

response = requests.post(
    'https://draw.extractable.xyz/api/documents/extract',
    files=files,
    data=data,
    headers=headers
)

result = response.json()

Response

{
  "success": true,
  "documents": [
    {
      "documentType": "sor",
      "pages": [1, 15],
      "extractedData": {
        "consultantInfo": {
          "name": "John Smith",
          "company": "ABC Consulting LLC",
          "fileNumber": "2024-1234"
        },
        "propertyInfo": {
          "address": "123 Main St, Anytown, CA 90210",
          "buildingSize": "2,500 sq ft",
          "numberOfUnits": 1
        },
        "constructionSections": [
          {
            "sectionNumber": "1.0",
            "sectionName": "Foundations",
            "subTotal": 15000.00,
            "lineItems": [
              {
                "itemName": "Foundation Repair",
                "materials": 8000.00,
                "labor": 7000.00,
                "itemTotal": 15000.00
              }
            ]
          }
        ],
        "grandTotal": 125000.00
      },
      "confidence": 0.92,
      "splitPdfUrl": "https://draw.extractable.xyz/storage/documents/123/sor_pages_1-15.pdf"
    },
    {
      "documentType": "insurance_certificate",
      "pages": [16, 17],
      "extractedData": {
        "certificateNumber": "CERT-2024-5678",
        "dateIssued": "2024-01-15",
        "insuredName": "XYZ Construction Inc",
        "coverages": [
          {
            "typeOfInsurance": "General Liability",
            "policyNumber": "GL-123456",
            "policyEffectiveDate": "2024-01-01",
            "policyExpirationDate": "2025-01-01",
            "limits": {
              "eachOccurrence": 1000000,
              "generalAggregate": 2000000
            }
          }
        ]
      },
      "confidence": 0.95,
      "splitPdfUrl": "https://draw.extractable.xyz/storage/documents/123/insurance_pages_16-17.pdf"
    }
  ],
  "metadata": {
    "totalPages": 17,
    "processedDocuments": 2,
    "unidentifiedDocuments": 0,
    "processingTime": 5200
  }
}

Get Task Status

Check the status of an asynchronous extraction task.

GET /api/tasks/{taskId}

Response

{
  "taskId": "task_123456",
  "status": "completed",    // pending | processing | completed | failed
  "progress": 100,
  "result": {
    // Same as extraction response
  }
}

List Supported Document Types

Get a list of all supported document types and their extraction schemas.

GET /api/documents/types

Response

{
  "documentTypes": [
    {
      "type": "sor",
      "name": "Statement of Repair",
      "description": "HUD 203(k) renovation loan documents with detailed construction scopes",
      "supportedFields": [
        {
          "field": "consultantInfo",
          "type": "object",
          "description": "HUD consultant details"
        },
        {
          "field": "propertyInfo",
          "type": "object",
          "description": "Property address and characteristics"
        },
        {
          "field": "constructionSections",
          "type": "array",
          "description": "Detailed line items by section (1.0-8.0)"
        },
        {
          "field": "grandTotal",
          "type": "number",
          "description": "Total project cost including contingency"
        }
      ]
    },
    {
      "type": "insurance_certificate",
      "name": "Insurance Certificate (ACORD)",
      "description": "ACORD insurance certificates for contractor compliance",
      "supportedFields": [
        {
          "field": "certificateNumber",
          "type": "string",
          "required": true
        },
        {
          "field": "insuredName",
          "type": "string",
          "required": true
        },
        {
          "field": "coverages",
          "type": "array",
          "description": "List of insurance coverages with limits"
        },
        {
          "field": "policyDates",
          "type": "object",
          "description": "Effective and expiration dates by coverage"
        }
      ]
    }
  ]
}

Error Handling

The API uses standard HTTP status codes and returns detailed error messages.

Status Code	Error Code	Description
401	UNAUTHORIZED	Invalid or missing API key
413	FILE_TOO_LARGE	File exceeds maximum size limit
422	INVALID_FILE_FORMAT	Unsupported file format
429	RATE_LIMIT_EXCEEDED	Too many requests
500	INTERNAL_ERROR	Server error

Error Response Format

{
  "error": "Rate limit exceeded",
  "code": "RATE_LIMIT_EXCEEDED",
  "retryAfter": 60,
  "limit": 100,
  "remaining": 0,
  "reset": "2024-01-15T12:00:00Z"
}

Rate Limits

API requests are rate limited to ensure fair usage:

Per API Key: 100 requests per minute
Per Organization: 1,000 requests per minute
File Size: Maximum 100MB per file
Concurrent Requests: Maximum 10 per API key

Rate Limit Headers: Each response includes headers showing your current rate limit status:

X-RateLimit-Limit: Maximum requests allowed
X-RateLimit-Remaining: Requests remaining
X-RateLimit-Reset: Time when limit resets

Code Examples

Node.js with Error Handling

const extractDocument = async (filePath, apiKey) => {
  const FormData = require('form-data');
  const fs = require('fs');
  const axios = require('axios');
  
  const form = new FormData();
  form.append('file', fs.createReadStream(filePath));
  form.append('options', JSON.stringify({
    confidenceThreshold: 0.85,
    includeAlternatives: true
  }));
  
  try {
    const response = await axios.post(
      'https://draw.extractable.xyz/api/documents/extract',
      form,
      {
        headers: {
          ...form.getHeaders(),
          'Authorization': `Bearer ${apiKey}`
        }
      }
    );
    
    console.log('Extraction successful:', response.data);
    return response.data;
    
  } catch (error) {
    if (error.response) {
      console.error('API Error:', error.response.data);
      
      if (error.response.status === 429) {
        // Handle rate limit
        const retryAfter = error.response.data.retryAfter;
        console.log(`Rate limited. Retry after ${retryAfter} seconds`);
      }
    } else {
      console.error('Network error:', error.message);
    }
    throw error;
  }
};

Python with Retry Logic

import requests
from time import sleep
from typing import Dict, Any

class DocumentExtractorClient:
    def __init__(self, api_key: str, base_url: str = "https://draw.extractable.xyz"):
        self.api_key = api_key
        self.base_url = base_url
        self.headers = {
            "Authorization": f"Bearer {api_key}"
        }
    
    def extract_document(
        self, 
        file_path: str, 
        document_type: str = None,
        max_retries: int = 3
    ) -> Dict[str, Any]:
        url = f"{self.base_url}/api/documents/extract"
        
        with open(file_path, 'rb') as f:
            files = {'file': f}
            data = {}
            
            if document_type:
                data['documentType'] = document_type
            
            for attempt in range(max_retries):
                try:
                    response = requests.post(
                        url, 
                        files=files, 
                        data=data,
                        headers=self.headers
                    )
                    
                    if response.status_code == 200:
                        return response.json()
                    
                    elif response.status_code == 429:
                        # Rate limited, wait and retry
                        retry_after = response.json().get('retryAfter', 60)
                        print(f"Rate limited. Waiting {retry_after} seconds...")
                        sleep(retry_after)
                        continue
                    
                    else:
                        response.raise_for_status()
                        
                except requests.exceptions.RequestException as e:
                    print(f"Attempt {attempt + 1} failed: {e}")
                    if attempt == max_retries - 1:
                        raise
                    sleep(2 ** attempt)  # Exponential backoff
        
        raise Exception("Max retries exceeded")

# Usage
client = DocumentExtractorClient("dk_live_your_api_key")
result = client.extract_document("invoice.pdf", document_type="invoice")
print(f"Extracted {len(result['documents'])} documents")

Support

For additional support or questions:

Documentation: https://draw.extractable.xyz/docs
Status Page: https://status.extractable.xyz