For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Introduction
    • How Verifa Works
    • Quickstart
    • Choosing an Integration Method
  • Use Cases
    • KYC Onboarding
    • Age Verification
    • AML Compliance
    • Fraud Prevention
    • Marketplace Trust & Safety
  • Core Concepts
    • Overview
    • Sessions
    • Verifications & Checks
    • Workflows
    • Identities
    • Cases
    • Screening & Reports
    • Lists
  • Integration Guides
    • Overview
    • JavaScript SDK
    • Web Capture Flow
    • API-Only Integration
    • Mobile SDK
    • Webhooks Guide
    • MCP Server
    • Migrating from Persona
  • API Details
    • Overview
    • Authentication
    • Pagination
    • Rate Limiting
    • Versioning
    • Errors
    • Webhooks
    • Idempotency
    • Key Inflection
    • Data Access
    • Data Retention
  • Tutorials
    • Creating Your First Verification Session
    • Creating a Workflow
    • Receiving Webhooks & Validating Signatures
    • Handling Webhook Events
    • Custom Document Types & AI Extraction
  • Best Practices
    • Testing
    • Preventing Duplicates
    • Fraud Signals
    • Changelog
  • API Reference
On this page
  • What you’ll learn
  • Prerequisites
  • Introduction
  • Step 1: Create a custom document type
  • Step 2: Upload a document
  • Step 3: Classification
  • Step 4: Run verification
  • Step 5: View results
  • Understanding confidence scores
  • Step 6: Document groups and recency checks
  • Dashboard
  • Next steps
Tutorials

Tutorial: Custom Document Types & AI Extraction

Was this page helpful?
Previous

Testing

Next
Built with

This tutorial walks you through creating a custom document type, uploading a document, running automated classification and extraction, and viewing the results. By the end you’ll have a working pipeline that extracts structured fields from any document your business needs to verify.

What you’ll learn

  • Creating a custom document type with extraction hints
  • Uploading a standalone document via the API
  • How classification matches documents to types
  • Running verification and field extraction
  • Reading extraction results and confidence scores
  • Using document groups for recency checks

Prerequisites

  • A Verifa account (sign up)
  • A sandbox API key (starts with vk_sandbox_). Find it under Developers > API > API Keys in the dashboard.

Introduction

Verifa ships with built-in document types for common identity documents (passports, driver’s licenses, national IDs). But many businesses need to verify documents that fall outside the standard set: health insurance cards, business registrations, utility bills with specific fields, professional licenses, and more.

Custom document types let you define exactly which fields to extract from a document. When a document is classified against your custom type, Verifa reads the document and extracts every field you specified — all processing runs on private infrastructure, with no data sent to third-party services.

Step 1: Create a custom document type

Create a document type for a health insurance card. The extraction_hints array tells Verifa which fields to look for.

$curl -X POST https://api.withverifa.com/api/v1/settings/document-types \
> -H "X-API-Key: vk_sandbox_your_key_here" \
> -H "Content-Type: application/json" \
> -d '{
> "name": "Health Insurance Card",
> "slug": "health_insurance_card",
> "extraction_hints": [
> "insurer_name",
> "member_id",
> "group_number",
> "plan_type",
> "effective_date",
> "member_name"
> ]
> }'
1{
2 "id": "dtype_abc123",
3 "name": "Health Insurance Card",
4 "slug": "health_insurance_card",
5 "extraction_hints": [
6 "insurer_name",
7 "member_id",
8 "group_number",
9 "plan_type",
10 "effective_date",
11 "member_name"
12 ],
13 "is_system": false,
14 "created_at": "2026-04-10T14:00:00Z"
15}

You can also create custom document types in the dashboard under Settings > Document Types. The form provides the same options as the API.

Step 2: Upload a document

Upload a health insurance card image as a standalone document (not attached to a verification session).

$curl -X POST https://api.withverifa.com/api/v1/documents \
> -H "X-API-Key: vk_sandbox_your_key_here" \
> -F "file=@insurance-card-front.jpg" \
> -F "document_type=other" \
> -F "original_filename=insurance-card-front.jpg"
1{
2 "id": "doc_def456",
3 "document_type": "other",
4 "classification": null,
5 "status": "pending",
6 "mime_type": "image/jpeg",
7 "file_size_bytes": 245120,
8 "original_filename": "insurance-card-front.jpg",
9 "download_url": "https://api.withverifa.com/api/v1/documents/doc_def456/download?expires=1744300800&sig=a1b2c3...",
10 "preview_url": "https://api.withverifa.com/api/v1/documents/doc_def456/preview?expires=1744300800&sig=a1b2c3...",
11 "extracted_fields": null,
12 "extracted_fields_confidence": null,
13 "created_at": "2026-04-10T14:05:00Z"
14}

The document is stored and ready for classification and extraction. The preview_url provides a server-side JPEG rendering (useful for PDFs with page navigation).

Step 3: Classification

When you trigger verification (Step 4), classification runs automatically. The system compares the document against all system types and your custom types, then selects the best match.

You can also trigger classification manually:

$curl -X POST https://api.withverifa.com/api/v1/documents/doc_def456/classify \
> -H "X-API-Key: vk_sandbox_your_key_here"
1{
2 "id": "doc_def456",
3 "classification": "health_insurance_card",
4 "status": "pending",
5 "updated_at": "2026-04-10T14:06:00Z"
6}

Verifa analyzed the document and matched it to your custom “Health Insurance Card” type. Classification runs entirely on private infrastructure — document data is never sent to third-party services.

Step 4: Run verification

Trigger the verification pipeline, which includes field extraction when the matched type has extraction_hints.

$curl -X POST https://api.withverifa.com/api/v1/documents/doc_def456/verify \
> -H "X-API-Key: vk_sandbox_your_key_here"
1{
2 "id": "doc_def456",
3 "status": "completed",
4 "classification": "health_insurance_card",
5 "updated_at": "2026-04-10T14:07:00Z"
6}

The pipeline runs OCR, classification (if not already done), and structured field extraction in sequence.

Step 5: View results

Retrieve the document to see extracted fields and confidence scores.

$curl https://api.withverifa.com/api/v1/documents/doc_def456 \
> -H "X-API-Key: vk_sandbox_your_key_here"
1{
2 "id": "doc_def456",
3 "document_type": "other",
4 "classification": "health_insurance_card",
5 "status": "completed",
6 "mime_type": "image/jpeg",
7 "file_size_bytes": 245120,
8 "original_filename": "insurance-card-front.jpg",
9 "download_url": "https://api.withverifa.com/api/v1/documents/doc_def456/download?expires=1744300800&sig=a1b2c3...",
10 "preview_url": "https://api.withverifa.com/api/v1/documents/doc_def456/preview?expires=1744300800&sig=a1b2c3...",
11 "extracted_fields": {
12 "insurer_name": "Nationwide Health Insurance",
13 "member_id": "NHI-882-456-7721",
14 "group_number": "GRP-40291",
15 "plan_type": "PPO",
16 "effective_date": "2025-01-01",
17 "member_name": "Jane A. Doe"
18 },
19 "extracted_fields_confidence": {
20 "insurer_name": 0.997,
21 "member_id": 0.981,
22 "group_number": 0.964,
23 "plan_type": 0.991,
24 "effective_date": 0.973,
25 "member_name": 0.998
26 },
27 "created_at": "2026-04-10T14:05:00Z",
28 "updated_at": "2026-04-10T14:07:00Z"
29}

Understanding confidence scores

Each field in extracted_fields_confidence has a score between 0.0 and 1.0. The score is computed by checking whether the extracted value’s tokens actually appear in the underlying OCR text.

RangeInterpretation
0.9 — 1.0High confidence. The extracted value closely matches OCR text.
0.5 — 0.9Moderate confidence. Some tokens matched; review recommended.
Below 0.5Likely hallucination. The LLM produced a value not found in the document. Flag for manual review.

Step 6: Document groups and recency checks

Document groups let you enforce recency requirements. For example, you might require that a proof of address document was issued within the last 90 days.

$curl -X POST https://api.withverifa.com/api/v1/settings/document-groups \
> -H "X-API-Key: vk_sandbox_your_key_here" \
> -H "Content-Type: application/json" \
> -d '{
> "name": "Recent Insurance",
> "document_type_slugs": ["health_insurance_card"],
> "recency_config": {
> "max_age_days": 365,
> "date_field": "effective_date"
> }
> }'

When a document in this group is verified, the system checks whether the effective_date field falls within the allowed window. If the document is older than the configured threshold, the recency check fails.

Dashboard

Everything in this tutorial is also available in the Verifa dashboard. Navigate to Documents in the left sidebar to see:

  • Document list — all standalone documents with status, classification, and upload date. Use the filters to narrow by type, status, or date range.
  • Document detail — full metadata, extraction results displayed as a structured card, classification details, check results, and a PDF/image preview with page navigation.
  • Upload — drag-and-drop document upload directly from the dashboard.

The extraction fields card shows each extracted value alongside its confidence score, color-coded by the ranges described above.

Next steps

  • Browse the API Reference for the full Documents API
  • Learn about Verifications & Checks to understand how document checks fit into the broader pipeline
  • Set up Webhooks to get notified when document verification completes