Tutorial: Custom Document Types & AI Extraction

This tutorial walks you through creating a custom document type, uploading a document, running automated classification and extraction, and viewing the results. By the end you’ll have a working pipeline that extracts structured fields from any document your business needs to verify.

What you’ll learn

Creating a custom document type with extraction hints
Uploading a standalone document via the API
How classification matches documents to types
Running verification and field extraction
Reading extraction results and confidence scores
Using document groups for recency checks

Prerequisites

A Verifa account (sign up)
A sandbox API key (starts with vk_sandbox_). Find it under Developers > API > API Keys in the dashboard.

Introduction

Verifa ships with built-in document types for common identity documents (passports, driver’s licenses, national IDs). But many businesses need to verify documents that fall outside the standard set: health insurance cards, business registrations, utility bills with specific fields, professional licenses, and more.

Custom document types let you define exactly which fields to extract from a document. When a document is classified against your custom type, Verifa reads the document and extracts every field you specified — all processing runs on private infrastructure, with no data sent to third-party services.

Step 1: Create a custom document type

Create a document type for a health insurance card. The extraction_hints array tells Verifa which fields to look for.

$ curl -X POST https://api.withverifa.com/api/v1/settings/document-types \
>   -H "X-API-Key: vk_sandbox_your_key_here" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Health Insurance Card",
>     "slug": "health_insurance_card",
>     "extraction_hints": [
>       "insurer_name",
>       "member_id",
>       "group_number",
>       "plan_type",
>       "effective_date",
>       "member_name"
>     ]
>   }'

1 {
2   "id": "dtype_abc123",
3   "name": "Health Insurance Card",
4   "slug": "health_insurance_card",
5   "extraction_hints": [
6     "insurer_name",
7     "member_id",
8     "group_number",
9     "plan_type",
10     "effective_date",
11     "member_name"
12   ],
13   "is_system": false,
14   "created_at": "2026-04-10T14:00:00Z"
15 }

You can also create custom document types in the dashboard under Settings > Document Types. The form provides the same options as the API.

Step 2: Upload a document

Upload a health insurance card image as a standalone document (not attached to a verification session).

$ curl -X POST https://api.withverifa.com/api/v1/documents \
>   -H "X-API-Key: vk_sandbox_your_key_here" \
>   -F "file=@insurance-card-front.jpg" \
>   -F "document_type=other" \
>   -F "original_filename=insurance-card-front.jpg"

1 {
2   "id": "doc_def456",
3   "document_type": "other",
4   "classification": null,
5   "status": "pending",
6   "mime_type": "image/jpeg",
7   "file_size_bytes": 245120,
8   "original_filename": "insurance-card-front.jpg",
9   "download_url": "https://api.withverifa.com/api/v1/documents/doc_def456/download?expires=1744300800&sig=a1b2c3...",
10   "preview_url": "https://api.withverifa.com/api/v1/documents/doc_def456/preview?expires=1744300800&sig=a1b2c3...",
11   "extracted_fields": null,
12   "extracted_fields_confidence": null,
13   "created_at": "2026-04-10T14:05:00Z"
14 }

The document is stored and ready for classification and extraction. The preview_url provides a server-side JPEG rendering (useful for PDFs with page navigation).

Step 3: Classification

When you trigger verification (Step 4), classification runs automatically. The system compares the document against all system types and your custom types, then selects the best match.

You can also trigger classification manually:

$ curl -X POST https://api.withverifa.com/api/v1/documents/doc_def456/classify \
>   -H "X-API-Key: vk_sandbox_your_key_here"

1 {
2   "id": "doc_def456",
3   "classification": "health_insurance_card",
4   "status": "pending",
5   "updated_at": "2026-04-10T14:06:00Z"
6 }

Verifa analyzed the document and matched it to your custom “Health Insurance Card” type. Classification runs entirely on private infrastructure — document data is never sent to third-party services.

Step 4: Run verification

Trigger the verification pipeline, which includes field extraction when the matched type has extraction_hints.

$ curl -X POST https://api.withverifa.com/api/v1/documents/doc_def456/verify \
>   -H "X-API-Key: vk_sandbox_your_key_here"

1 {
2   "id": "doc_def456",
3   "status": "completed",
4   "classification": "health_insurance_card",
5   "updated_at": "2026-04-10T14:07:00Z"
6 }

The pipeline runs OCR, classification (if not already done), and structured field extraction in sequence.

Step 5: View results

Retrieve the document to see extracted fields and confidence scores.

$ curl https://api.withverifa.com/api/v1/documents/doc_def456 \
>   -H "X-API-Key: vk_sandbox_your_key_here"

1 {
2   "id": "doc_def456",
3   "document_type": "other",
4   "classification": "health_insurance_card",
5   "status": "completed",
6   "mime_type": "image/jpeg",
7   "file_size_bytes": 245120,
8   "original_filename": "insurance-card-front.jpg",
9   "download_url": "https://api.withverifa.com/api/v1/documents/doc_def456/download?expires=1744300800&sig=a1b2c3...",
10   "preview_url": "https://api.withverifa.com/api/v1/documents/doc_def456/preview?expires=1744300800&sig=a1b2c3...",
11   "extracted_fields": {
12     "insurer_name": "Nationwide Health Insurance",
13     "member_id": "NHI-882-456-7721",
14     "group_number": "GRP-40291",
15     "plan_type": "PPO",
16     "effective_date": "2025-01-01",
17     "member_name": "Jane A. Doe"
18   },
19   "extracted_fields_confidence": {
20     "insurer_name": 0.997,
21     "member_id": 0.981,
22     "group_number": 0.964,
23     "plan_type": 0.991,
24     "effective_date": 0.973,
25     "member_name": 0.998
26   },
27   "created_at": "2026-04-10T14:05:00Z",
28   "updated_at": "2026-04-10T14:07:00Z"
29 }

Understanding confidence scores

Each field in extracted_fields_confidence has a score between 0.0 and 1.0. The score is computed by checking whether the extracted value’s tokens actually appear in the underlying OCR text.

Range	Interpretation
0.9 — 1.0	High confidence. The extracted value closely matches OCR text.
0.5 — 0.9	Moderate confidence. Some tokens matched; review recommended.
Below 0.5	Likely hallucination. The LLM produced a value not found in the document. Flag for manual review.

Step 6: Document groups and recency checks

Document groups let you enforce recency requirements. For example, you might require that a proof of address document was issued within the last 90 days.

$ curl -X POST https://api.withverifa.com/api/v1/settings/document-groups \
>   -H "X-API-Key: vk_sandbox_your_key_here" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "name": "Recent Insurance",
>     "document_type_slugs": ["health_insurance_card"],
>     "recency_config": {
>       "max_age_days": 365,
>       "date_field": "effective_date"
>     }
>   }'

When a document in this group is verified, the system checks whether the effective_date field falls within the allowed window. If the document is older than the configured threshold, the recency check fails.

Dashboard

Everything in this tutorial is also available in the Verifa dashboard. Navigate to Documents in the left sidebar to see:

Document list — all standalone documents with status, classification, and upload date. Use the filters to narrow by type, status, or date range.
Document detail — full metadata, extraction results displayed as a structured card, classification details, check results, and a PDF/image preview with page navigation.
Upload — drag-and-drop document upload directly from the dashboard.

The extraction fields card shows each extracted value alongside its confidence score, color-coded by the ranges described above.

Next steps

Browse the API Reference for the full Documents API
Learn about Verifications & Checks to understand how document checks fit into the broader pipeline
Set up Webhooks to get notified when document verification completes

$	curl -X POST https://api.withverifa.com/api/v1/settings/document-types \
>	-H "X-API-Key: vk_sandbox_your_key_here" \
>	-H "Content-Type: application/json" \
>	-d '{
>	"name": "Health Insurance Card",
>	"slug": "health_insurance_card",
>	"extraction_hints": [
>	"insurer_name",
>	"member_id",
>	"group_number",
>	"plan_type",
>	"effective_date",
>	"member_name"
>	]
>	}'

1	{
2	"id": "dtype_abc123",
3	"name": "Health Insurance Card",
4	"slug": "health_insurance_card",
5	"extraction_hints": [
6	"insurer_name",
7	"member_id",
8	"group_number",
9	"plan_type",
10	"effective_date",
11	"member_name"
12	],
13	"is_system": false,
14	"created_at": "2026-04-10T14:00:00Z"
15	}

$	curl -X POST https://api.withverifa.com/api/v1/documents \
>	-H "X-API-Key: vk_sandbox_your_key_here" \
>	-F "file=@insurance-card-front.jpg" \
>	-F "document_type=other" \
>	-F "original_filename=insurance-card-front.jpg"

1	{
2	"id": "doc_def456",
3	"document_type": "other",
4	"classification": null,
5	"status": "pending",
6	"mime_type": "image/jpeg",
7	"file_size_bytes": 245120,
8	"original_filename": "insurance-card-front.jpg",
9	"download_url": "https://api.withverifa.com/api/v1/documents/doc_def456/download?expires=1744300800&sig=a1b2c3...",
10	"preview_url": "https://api.withverifa.com/api/v1/documents/doc_def456/preview?expires=1744300800&sig=a1b2c3...",
11	"extracted_fields": null,
12	"extracted_fields_confidence": null,
13	"created_at": "2026-04-10T14:05:00Z"
14	}

1	{
2	"id": "doc_def456",
3	"classification": "health_insurance_card",
4	"status": "pending",
5	"updated_at": "2026-04-10T14:06:00Z"
6	}

1	{
2	"id": "doc_def456",
3	"status": "completed",
4	"classification": "health_insurance_card",
5	"updated_at": "2026-04-10T14:07:00Z"
6	}

$	curl https://api.withverifa.com/api/v1/documents/doc_def456 \
>	-H "X-API-Key: vk_sandbox_your_key_here"