Document Validation API: Developer Guide
Integrate document validation into your application: REST API, webhooks, code examples in Python and Node.js.

Summarize this article with
This guide covers everything you need to integrate automated document validation into your application -- from authentication to webhook handling. Whether you are building a client onboarding flow, a compliance pipeline, or a back-office automation tool, the CheckFile API gives you programmatic access to the same AI-powered validation engine used in the platform. You will find architecture decisions, endpoint references, code samples in Python and Node.js, webhook payloads, error handling strategies, and integration patterns that scale from prototype to production.
This article is for informational purposes only and does not constitute legal, financial, or regulatory advice. Regulatory references are accurate as of the publication date. Consult a qualified professional for guidance specific to your situation.
Architecture Overview
The CheckFile API follows a standard async processing model: documents upload via REST, queue for AI analysis, and results deliver through polling or webhook callbacks. Median processing time for a standard 8-12 document dossier is 12 seconds; P95 is 28 seconds. Our platform processes over 180,000 documents per month with 98.7% OCR accuracy and 99.97% availability. This decoupled architecture processes documents at scale without blocking your application.
The EU AI Act (Regulation 2024/1689, Art. 13) requires high-risk AI systems used in document processing for financial applications to provide transparent, traceable outputs -- a requirement that the CheckFile API satisfies through its deterministic rule engine layer, which produces auditable decision traces for every validation. In Australia, APRA's CPS 234 imposes similar information security and auditability requirements on regulated entities.
+-------------------+
| Results API |
| GET /files/{id} |
+--------^----------+
|
Client App | Poll or fetch
| |
| POST /v1/files +-------+--------+
+------------------------------->| Upload API |
| +-------+--------+
| |
| v
| +--------+--------+
| | Processing Queue|
| | (AI validation) |
| +--------+--------+
| |
| Webhook callback |
|<---------------------------------------+
| POST your-endpoint
Three key design decisions shape this architecture:
-
Asynchronous by default. Document validation involves OCR, fraud detection, cross-referencing, and rule evaluation. These operations take 2-15 seconds depending on document complexity. The API accepts uploads immediately and processes them in the background.
-
Dual delivery. You can poll the status endpoint or register a webhook. Polling works for simple integrations; webhooks are the recommended approach for production systems handling more than a few documents per minute.
-
Idempotent uploads. Each upload returns a unique
file_id. Re-uploading the same document with the same idempotency key returns the existing result instead of reprocessing, saving both time and API credits.
Authentication and Security
All API requests require authentication. The API supports two authentication methods depending on your use case.
API Key Authentication
For server-to-server integrations, pass your API key in the X-API-Key header:
curl -H "X-API-Key: ck_live_abc123..." \
https://api.checkfile.ai/v1/files
API keys are scoped to your organisation. You can generate multiple keys in the dashboard -- one per environment (development, staging, production) is the recommended practice. Keys prefixed with ck_test_ hit the sandbox environment; keys prefixed with ck_live_ hit production.
OAuth 2.0 for User-Scoped Access
If your application acts on behalf of end users (e.g., a multi-tenant SaaS), use OAuth 2.0 with the authorisation code flow. This provides user-level audit trails and granular permission scoping.
POST /oauth/token
Content-Type: application/x-www-form-urlencoded
grant_type=authorization_code
&code=AUTH_CODE
&client_id=YOUR_CLIENT_ID
&client_secret=YOUR_CLIENT_SECRET
&redirect_uri=https://yourapp.com/callback
Access tokens expire after 1 hour. Use the refresh token to obtain new access tokens without re-authenticating.
Rate Limits
Rate limits are enforced per API key, measured in requests per minute:
| Plan | Rate Limit | Burst Allowance | Concurrent Uploads |
|---|---|---|---|
| Starter | 100 req/min | 150 req/min (30s window) | 5 |
| Business | 500 req/min | 750 req/min (30s window) | 25 |
| Enterprise | Unlimited | Unlimited | Unlimited |
When you exceed the rate limit, the API returns 429 Too Many Requests with a Retry-After header indicating how many seconds to wait. See pricing for plan details.
Transport Security
All traffic is encrypted with TLS 1.3. The API rejects connections using TLS 1.2 or earlier. Certificate pinning is available for Enterprise customers. All uploaded documents are encrypted at rest using AES-256 and automatically purged after the retention period configured in your account settings.
The Privacy Act 1988 (Cth) and Australian Privacy Principles require that document processing systems implement appropriate technical measures including encryption of personal information in transit and at rest (OAIC guidance). For APRA-regulated entities, CPS 234 imposes additional information security requirements on systems processing financial documents. The Australian Cyber Security Centre (ACSC) provides the Information Security Manual (ISM) as a complementary framework for transport and storage encryption. Read more about our security practices.
Core Endpoints
The API is organised around six primary endpoints:
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/files |
Upload a single document for validation |
POST |
/api/v1/files/batch |
Upload multiple files as a dossier |
GET |
/api/v1/files/{id}/status |
Check processing status of an upload |
GET |
/api/v1/files/{id}/results |
Retrieve validation results |
POST |
/api/v1/rules |
Configure custom business rules |
GET |
/api/v1/webhooks |
List registered webhook endpoints |
All endpoints accept and return JSON (except file upload endpoints, which accept multipart/form-data). The base URL is https://api.checkfile.ai. API versioning is path-based; the current stable version is v1.
Ready to automate your checks?
Free pilot with your own documents. Results in 48h.
Request a free pilotUpload and Validate -- Step by Step
The most common workflow is: upload documents, wait for processing, retrieve results. Here is the complete flow in both Python and Node.js.
Python (requests)
import requests
import time
API_KEY = "ck_live_abc123..."
BASE_URL = "https://api.checkfile.ai/v1"
HEADERS = {"X-API-Key": API_KEY}
# Step 1: Upload a batch of documents as a dossier
response = requests.post(
f"{BASE_URL}/files/batch",
headers=HEADERS,
files=[
("files", ("contract.pdf", open("contract.pdf", "rb"), "application/pdf")),
("files", ("agreement.pdf", open("agreement.pdf", "rb"), "application/pdf")),
("files", ("asic_extract.pdf", open("asic_extract.pdf", "rb"), "application/pdf")),
],
data={"rule_set": "equipment-leasing"}
)
response.raise_for_status()
file_id = response.json()["id"]
print(f"Dossier uploaded: {file_id}")
# Step 2: Poll for completion
while True:
status_resp = requests.get(
f"{BASE_URL}/files/{file_id}/status",
headers=HEADERS
)
status = status_resp.json()["status"]
if status in ("completed", "failed"):
break
time.sleep(2) # Poll every 2 seconds
# Step 3: Retrieve results
results = requests.get(
f"{BASE_URL}/files/{file_id}/results",
headers=HEADERS
).json()
for doc in results["documents"]:
print(f"{doc['filename']}: {doc['verdict']} (confidence: {doc['confidence']})")
if doc["alerts"]:
for alert in doc["alerts"]:
print(f" - {alert['severity']}: {alert['message']}")
Node.js (fetch + FormData)
import fs from 'node:fs';
const API_KEY = process.env.CHECKFILE_API_KEY;
const BASE_URL = 'https://api.checkfile.ai/v1';
// Step 1: Upload a batch of documents
const form = new FormData();
form.append('files', new Blob([fs.readFileSync('contract.pdf')]), 'contract.pdf');
form.append('files', new Blob([fs.readFileSync('agreement.pdf')]), 'agreement.pdf');
form.append('files', new Blob([fs.readFileSync('asic_extract.pdf')]), 'asic_extract.pdf');
form.append('rule_set', 'equipment-leasing');
const uploadRes = await fetch(`${BASE_URL}/files/batch`, {
method: 'POST',
headers: { 'X-API-Key': API_KEY },
body: form,
});
const { id: fileId } = await uploadRes.json();
console.log(`Dossier uploaded: ${fileId}`);
// Step 2: Poll for completion
let status = 'processing';
while (status !== 'completed' && status !== 'failed') {
await new Promise((r) => setTimeout(r, 2000));
const statusRes = await fetch(`${BASE_URL}/files/${fileId}/status`, {
headers: { 'X-API-Key': API_KEY },
});
({ status } = await statusRes.json());
}
// Step 3: Retrieve results
const results = await fetch(`${BASE_URL}/files/${fileId}/results`, {
headers: { 'X-API-Key': API_KEY },
}).then((r) => r.json());
for (const doc of results.documents) {
console.log(`${doc.filename}: ${doc.verdict} (confidence: ${doc.confidence})`);
for (const alert of doc.alerts) {
console.log(` - ${alert.severity}: ${alert.message}`);
}
}
Both examples follow the same three-step pattern: upload, poll, retrieve. For production systems, replace the polling loop with a webhook listener (covered in the next section).
Webhook Payloads
Webhooks eliminate the need for polling. Register a webhook URL in the dashboard or via the API, and CheckFile will POST a signed JSON payload to your endpoint when processing completes or an alert is detected.
Validation Complete Event
{
"event": "validation.completed",
"timestamp": "2026-02-09T14:32:08Z",
"data": {
"file_id": "dossier_8f3a2b1c",
"rule_set": "equipment-leasing",
"verdict": "approved",
"processing_time_ms": 4280,
"documents": [
{
"filename": "contract.pdf",
"type": "contract",
"verdict": "valid",
"confidence": 0.97,
"alerts": []
},
{
"filename": "agreement.pdf",
"type": "agreement",
"verdict": "valid",
"confidence": 0.95,
"alerts": []
},
{
"filename": "asic_extract.pdf",
"type": "company_registration",
"verdict": "valid",
"confidence": 0.99,
"alerts": []
}
]
}
}
Verifying Webhook Signatures
Every webhook request includes an X-Checkfile-Signature header containing an HMAC-SHA256 signature. Verify it against your webhook secret to ensure the payload was not tampered with:
import hmac
import hashlib
def verify_signature(payload: bytes, signature: str, secret: str) -> bool:
expected = hmac.new(
secret.encode(),
payload,
hashlib.sha256
).hexdigest()
return hmac.compare_digest(f"sha256={expected}", signature)
Error Handling Best Practices
Standard HTTP status codes map to distinct error classes.
| Code | Meaning | Cause | Recommended Action |
|---|---|---|---|
400 |
Bad Request | Malformed request body, unsupported file type, missing required field | Fix the request. Check the error.details field for specifics. |
401 |
Unauthorised | Invalid or missing API key | Verify your API key. Check for whitespace or truncation. |
413 |
Payload Too Large | File exceeds the 50 MB per-document limit | Compress the file or split multi-page documents. |
429 |
Too Many Requests | Rate limit exceeded | Back off using the Retry-After header value. |
500 |
Internal Server Error | Unexpected server-side failure | Retry with exponential backoff. If persistent, contact support. |
Retry Strategy with Exponential Backoff
For transient errors (429, 500, 502, 503), implement exponential backoff with jitter:
import time
import random
import requests
def api_request_with_retry(method, url, max_retries=5, **kwargs):
for attempt in range(max_retries):
response = requests.request(method, url, **kwargs)
if response.status_code < 400:
return response
if response.status_code in (429, 500, 502, 503):
base_delay = min(2 ** attempt, 60) # Cap at 60 seconds
jitter = random.uniform(0, base_delay * 0.5)
delay = base_delay + jitter
if response.status_code == 429:
retry_after = response.headers.get("Retry-After")
if retry_after:
delay = float(retry_after)
time.sleep(delay)
continue
# Non-retryable error
response.raise_for_status()
raise Exception(f"Max retries exceeded for {url}")
Key principles: never retry 400-level client errors (except 429), always respect the Retry-After header, and add jitter to avoid thundering herd problems when multiple clients retry simultaneously.
Performance Benchmarks
Processing times depend on document count, complexity, and the rule set applied:
| Scenario | Document Count | Rule Set | Median Processing Time | P95 Processing Time |
|---|---|---|---|---|
| Single identity document | 1 | Default | 2.1s | 4.8s |
| Single contract (multi-page) | 1 | Default | 3.4s | 6.2s |
| Standard dossier | 8-12 | Equipment leasing | 12s | 28s |
| Complex dossier | 15-20 | Full compliance | 22s | 45s |
| Batch (100 dossiers) | 800-1,200 | Equipment leasing | 8 min | 14 min |
Get Started
The fastest path from zero to a working integration:
- Create an account and generate a test API key (
ck_test_prefix) from the dashboard. - Upload a test document using the
curlexample above or the code samples in this guide. - Register a webhook to receive results asynchronously.
- Configure a rule set that matches your business requirements.
- Switch to production by replacing your test key with a live key (
ck_live_prefix).
For Australian organisations, the API supports verification of documents against the Australian Business Register (ABR) and ASIC company records, enabling automated ABN/ACN cross-referencing as part of the validation workflow.
Full endpoint documentation, SDKs (Python, Node.js, Go), and an interactive API explorer are available at docs.checkfile.ai. If you have questions about which plan fits your volume, see pricing or contact the engineering team directly.
For a comprehensive overview, see our document verification complete guide.
Frequently Asked Questions
How does the CheckFile API handle authentication for server-to-server integrations?
For server-to-server integrations, pass your API key in the X-API-Key header with every request. API keys are scoped to your organisation and should be generated separately for each environment: development keys carry a ck_test_ prefix and hit the sandbox, while production keys carry a ck_live_ prefix. Rotate keys every 90 days using the dual-key support to avoid downtime during rotation. Never store API keys in source code or in a committed .env file -- use a secrets manager such as HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault.
What is the difference between polling and webhooks for retrieving validation results?
Polling means your application repeatedly queries the status endpoint until processing completes, which introduces unnecessary latency and API request overhead. Webhooks invert the flow: CheckFile posts a signed JSON payload to your registered endpoint as soon as processing finishes, eliminating polling entirely. For production systems processing more than a few documents per minute, webhooks are the recommended approach.
How do I verify that a webhook payload has not been tampered with?
Every webhook request includes an X-Checkfile-Signature header containing an HMAC-SHA256 signature computed using your webhook secret. To verify the payload, compute the expected signature by running HMAC-SHA256 on the raw request body with your secret, then use a constant-time comparison function to check it against the header value. Never compare signatures with a standard equality operator, as that is vulnerable to timing attacks.
What file size limits apply and how can I optimise upload performance?
Individual documents are accepted up to 50 MB per file. For latency-sensitive applications, reducing PDF file sizes through compression before upload significantly improves throughput without affecting validation accuracy. For large batches, the batch endpoint accepts up to 20 files per request and delivers 3 to 4 times better performance than equivalent individual uploads.
What retry strategy should I implement for transient API errors?
Implement exponential backoff with jitter for transient errors including 429 rate limit responses, 500 internal server errors, 502 bad gateway, and 503 service unavailable. Cap the base delay at 60 seconds, add a random jitter of up to 50 percent of the base delay to prevent thundering herd problems, and respect the Retry-After header value when it is present in a 429 response.
The information presented in this article is provided for informational purposes only and does not constitute legal advice. Regulatory obligations vary by state and territory and by organisation size. Consult a legal professional for analysis specific to your situation.
Stay informed
Get our compliance insights and practical guides delivered to your inbox.