Tutorial: Custom Document Types & AI Extraction
Tutorial: Custom Document Types & AI Extraction
Tutorial: Custom Document Types & AI Extraction
This tutorial walks you through creating a custom document type, uploading a document, running automated classification and extraction, and viewing the results. By the end you’ll have a working pipeline that extracts structured fields from any document your business needs to verify.
vk_sandbox_). Find it under
Developers > API > API Keys in the dashboard.Verifa ships with built-in document types for common identity documents (passports, driver’s licenses, national IDs). But many businesses need to verify documents that fall outside the standard set: health insurance cards, business registrations, utility bills with specific fields, professional licenses, and more.
Custom document types let you define exactly which fields to extract from a document. When a document is classified against your custom type, Verifa reads the document and extracts every field you specified — all processing runs on private infrastructure, with no data sent to third-party services.
Create a document type for a health insurance card. The extraction_hints
array tells Verifa which fields to look for.
You can also create custom document types in the dashboard under Settings > Document Types. The form provides the same options as the API.
Upload a health insurance card image as a standalone document (not attached to a verification session).
The document is stored and ready for classification and extraction. The
preview_url provides a server-side JPEG rendering (useful for PDFs with
page navigation).
When you trigger verification (Step 4), classification runs automatically. The system compares the document against all system types and your custom types, then selects the best match.
You can also trigger classification manually:
Verifa analyzed the document and matched it to your custom “Health Insurance Card” type. Classification runs entirely on private infrastructure — document data is never sent to third-party services.
Trigger the verification pipeline, which includes field extraction when the
matched type has extraction_hints.
The pipeline runs OCR, classification (if not already done), and structured field extraction in sequence.
Retrieve the document to see extracted fields and confidence scores.
Each field in extracted_fields_confidence has a score between 0.0 and 1.0.
The score is computed by checking whether the extracted value’s tokens actually
appear in the underlying OCR text.
Document groups let you enforce recency requirements. For example, you might require that a proof of address document was issued within the last 90 days.
When a document in this group is verified, the system checks whether the
effective_date field falls within the allowed window. If the document is
older than the configured threshold, the recency check fails.
Everything in this tutorial is also available in the Verifa dashboard. Navigate to Documents in the left sidebar to see:
The extraction fields card shows each extracted value alongside its confidence score, color-coded by the ranges described above.