Videos

Extract Structured Data on the Kognitos Platform

Name: Extract Structured Data on the Kognitos Platform
Uploaded: 2026-04-08
Duration: 8 min 34 s

Tutorial: extract structured data from a remittance, including how to use first/last/second to pick the right field, and find vs get for graceful exception handling.

What you'll learn

A 9-minute hands-on tutorial for extracting structured data from a remittance advice PDF on Kognitos, and for handling the common edge cases (multiple matches, missing fields, wording variation) without leaving the playground.

The setup

Add the Document Processing book: Books > New Book > Document Processing. No credentials required. This book is what enables Kognitos's English commands for OCR, under the hood it's AWS Textract plus Kognitos's own extensions. Add the book before creating the playground, otherwise the playground won't see it.
Create a playground: name it something like get data.
Load the document: get the file as a scanned documentthat “as a scanned document” phrase is what invokes Textract.

Extracting fields

get the documents payment number
get the documents date → the demo deliberately has multiple dates, which Kognitos surfaces as “multiple values found for date.” The fix: use a directional wordget the documents first date, get the documents last date, get the documents second date, etc.
get the documents customer number
get the documents tableand the same first/last directional words work on tables when there are several.

Find vs Get, graceful exception flow

The tutorial explains a critical nuance: find returns “not found” if the value is missing, while get throws an exception. Use find when you want to branch on absence; use get when you want Kognitos to pause and let a business user resolve the gap.

Teaching Kognitos a fallback technique

The demo also processes a second document that uses Payment ID instead of Payment Number. Kognitos raises an exception, the user opens the mini playground, types get the document's payment ID, verifies the value Wi789 was extracted correctly, and clicks Always use this answer for this department / process. The technique becomes a stored learning, context-based embeddings apply it automatically the next time a similar document arrives.

Questions answered in this video

How do I enable scanned-document extraction in Kognitos?

Add the Document Processing book to the department (Books > New Book > Document Processing). No credentials are needed. This book is what enables the as a scanned document capability, under the hood, AWS Textract plus Kognitos's own extensions.

Why does it matter whether I add a book before or after creating the playground?

A playground created before the book is added will not see the book's capabilities. Add the book first, then create the playground so it can use the English commands the book enables.

How do I handle “multiple values found” for a field?

Use a directional word like first, last, second, third. For example, get the documents first date picks the first date on the page. The same modifier works on tables when a document has more than one.

What's the difference between find and get in Kognitos?

find returns “not found” when the value isn't on the document, useful when you want to branch on absence. get throws an exception when the value isn't found, useful when you want a business user to step in and resolve the gap (and optionally teach a technique).

How do I make Kognitos handle a document where the label is different (Payment ID vs Payment Number)?

Open the mini playground from the exception, write the alternative English (e.g. get the document's payment ID), verify the extracted value, and click Always use this answer. Kognitos saves it as a learning and applies it automatically on future similar documents via context-based embeddings.

END