Document Extraction API - Quick Start Guide
Overview
The Document Extraction API is a REST API that accepts ANY document type (PDF, JPEG, PNG) and uses AI to automatically identify and classify documents. It routes identified documents to appropriate extraction workflows based on a configurable document type registry.
Getting Started
1. API Key Authentication
All requests require a valid API key in the Authorization header:
Authorization: Bearer dk_live_your_api_key_here
2. Base URL
Production: https://api.extractable.xyz
Core Endpoints
Submit Document for Processing
POST /api/documents/extract
Content-Type: multipart/form-data
Authorization: Bearer dk_live_your_api_key_here
file: document.pdf
webhookUrl: https://your-domain.com/webhook (optional)
metadata: {"clientId": "12345"} (optional)
Response (202 Accepted):
{
"jobId": "123e4567-e89b-12d3-a456-426614174000",
"status": "pending",
"pollingUrl": "https://api.extractable.xyz/api/documents/extract/123e4567-e89b-12d3-a456-426614174000",
"createdAt": "2024-01-01T00:00:00Z",
"message": "Document submitted for processing. Document extraction pipeline initiated."
}
Check Processing Status
GET /api/documents/extract/{jobId}
Authorization: Bearer dk_live_your_api_key_here
Response (200 OK):
{
"jobId": "123e4567-e89b-12d3-a456-426614174000",
"status": "completed",
"fileName": "statement_of_repair.pdf",
"fileType": "pdf",
"documentType": "sor",
"classificationConfidence": 0.95,
"progress": 100,
"extractedData": {
"documentMetadata": {
"consultant": "ABC Construction Consulting",
"propertyAddress": "123 Main St, Anytown, ST 12345",
"borrowerName": "John Doe",
"lenderName": "First National Bank"
},
"constructionSections": [
{
"sectionName": "1.0 GENERAL CONDITIONS",
"lineItems": [
{
"description": "Permits and inspections",
"quantity": 1,
"unit": "LS",
"materialCost": 500.00,
"laborCost": 0.00,
"totalCost": 500.00
}
]
}
],
"recapSummary": {
"subtotal": 45000.00,
"generalConditions": 4500.00,
"overheadAndProfit": 7425.00,
"grandTotal": 56925.00
}
},
"createdAt": "2024-01-01T00:00:00Z",
"updatedAt": "2024-01-01T00:00:45Z",
"completedAt": "2024-01-01T00:00:45Z"
}
List Supported Document Types
GET /api/documents/types
Response (200 OK):
{
"supportedTypes": [
{
"typeCode": "sor",
"displayName": "Statement of Repair",
"description": "HUD 203(k) Statement of Repair documents",
"confidenceThreshold": 0.85,
"extractionSupported": true
},
{
"typeCode": "insurance_cert",
"displayName": "Insurance Certificate",
"description": "ACORD Insurance Certificate forms",
"confidenceThreshold": 0.80,
"extractionSupported": true
}
],
"capabilities": [
"AI-powered document classification",
"Structured data extraction",
"Webhook notifications",
"Secure file processing"
]
}
Key Features
Universal Document Acceptance
- Upload ANY document type without specifying the type
- AI automatically identifies document type
- Supports both single and multi-page documents
- Handles unknown document types gracefully
Supported Document Types
- Statement of Repair (SOR): HUD 203(k) documents with detailed construction scope
- Insurance Certificates: ACORD forms with coverage details
- Extensible Registry: Easy to add new document types
AI-Powered Classification
- Uses advanced AI models for document identification
- Returns confidence scores for transparency
- Provides alternative type suggestions
- Configurable confidence thresholds
Error Handling
Common Error Responses
401 Unauthorized:
{
"error": "unauthorized",
"message": "Invalid or missing API key"
}
400 Bad Request:
{
"error": "invalid_request",
"message": "No file provided"
}
413 File Too Large:
{
"error": "file_too_large",
"message": "File size exceeds limit of 50MB for PDF files"
}
429 Rate Limited:
{
"error": "rate_limit_exceeded",
"message": "Too many requests. Please try again later."
}
File Requirements
Supported Formats
- PDF: Up to 50MB
- JPEG/JPG: Up to 10MB
- PNG: Up to 10MB
Best Practices
- Use high-resolution scans (300 DPI or higher)
- Ensure text is clearly readable
- Avoid heavily redacted or corrupted documents
- Single-page documents process faster than multi-page
Testing
cURL Examples
Submit a document:
curl -X POST https://api.extractable.xyz/api/documents/extract \
-H "Authorization: Bearer dk_live_your_api_key_here" \
-F "file=@document.pdf" \
-F 'metadata={"clientId": "CLIENT123"}'
Check status:
curl https://api.extractable.xyz/api/documents/extract/123e4567-e89b-12d3-a456-426614174000 \
-H "Authorization: Bearer dk_live_your_api_key_here"
List document types:
curl https://api.extractable.xyz/api/documents/types
Integration Notes
Webhook Configuration
When providing a webhook URL, ensure your endpoint can handle POST requests with the following payload structure:
{
"jobId": "123e4567-e89b-12d3-a456-426614174000",
"status": "completed",
"documentType": "sor",
"confidence": 0.95,
"extractedData": { ... },
"metadata": { ... }
}
Polling Strategy
- Poll every 5-10 seconds for job status
- Most documents process within 30-60 seconds
- Complex multi-page documents may take up to 2 minutes
Rate Limits
- Per API Key: 100 requests per minute
- Per Organization: 1000 requests per minute
- Contact support for higher limits
Example Workflow
- Submit Document: Upload any document without specifying type
- AI Classification: System automatically identifies document type
- Extraction: If type is supported, structured data is extracted
- Results: Receive extracted data via polling or webhook
Document Type Registry
The API maintains a registry of supported document types that can be extended:
Current Types
- Statement of Repair: Construction scope documents
- Insurance Certificate: Coverage verification forms
- Invoice: Billing documents (identification only)
- Contract: Agreement documents (identification only)
Adding New Types
Document types can be added to the registry without code changes:
- Define recognition criteria
- Set confidence thresholds
- Optionally add extraction templates
- Deploy and test
Support
For technical support or questions:
- Documentation: Full API documentation available
- Status Page: Check system status and maintenance windows
Next Steps
- Get your API key: Contact support for API access
- Review the full API documentation: See the complete OpenAPI specification
- Test with sample documents: Use the provided cURL examples
- Set up webhooks: Configure your endpoint to receive results
- Monitor usage: Track API usage and performance