# Extract Structured Data on the Kognitos Platform

> Hands-on tutorial: add the Document Processing book, extract payment number/customer number/date and the payment table from a remittance, and handle a "not found" with find vs get.

**Page**: https://www.kognitos.com/videos/extract-structured-data-on-the-kognitos-platform/
**Watch on YouTube**: https://www.youtube.com/watch?v=JgfHYbTmUgg
**Length**: 8m 34s

## What you'll learn

A 9-minute hands-on tutorial for extracting structured data from a remittance advice PDF on Kognitos — and for handling the common edge cases (multiple matches, missing fields, wording variation) without leaving the playground.

## The setup

- **Add the Document Processing book**: Books > New Book > Document Processing. No credentials required. This book is what enables Kognitos's English commands for OCR — under the hood it's AWS Textract plus Kognitos's own extensions. *Add the book before creating the playground*, otherwise the playground won't see it.
- **Create a playground**: name it something like *get data*.
- **Load the document**: `get the file as a scanned document` — that “as a scanned document” phrase is what invokes Textract.

## Extracting fields

- `get the documents payment number`
- `get the documents date` → the demo deliberately has multiple dates, which Kognitos surfaces as “multiple values found for date.” The fix: use a directional word — `get the documents first date`, `get the documents last date`, `get the documents second date`, etc.
- `get the documents customer number`
- `get the documents table` — and the same first/last directional words work on tables when there are several.

## Find vs Get — graceful exception flow

The tutorial explains a critical nuance: `find` returns “not found” if the value is missing, while `get` throws an exception. Use `find` when you want to branch on absence; use `get` when you want Kognitos to pause and let a business user resolve the gap.

## Teaching Kognitos a fallback technique

The demo also processes a second document that uses *Payment ID* instead of *Payment Number*. Kognitos raises an exception, the user opens the mini playground, types `get the document's payment ID`, verifies the value Wi789 was extracted correctly, and clicks *Always use this answer for this department / process*. The technique becomes a stored learning — context-based embeddings apply it automatically the next time a similar document arrives.

## FAQs

**Q: How do I enable scanned-document extraction in Kognitos?**

Add the Document Processing book to the department (Books > New Book > Document Processing). No credentials are needed. This book is what enables the as a scanned document capability — under the hood, AWS Textract plus Kognitos's own extensions.


**Q: Why does it matter whether I add a book before or after creating the playground?**

A playground created before the book is added will not see the book's capabilities. Add the book first, then create the playground so it can use the English commands the book enables.


**Q: How do I handle &ldquo;multiple values found&rdquo; for a field?**

Use a directional word like first, last, second, third. For example, get the documents first date picks the first date on the page. The same modifier works on tables when a document has more than one.


**Q: What's the difference between find and get in Kognitos?**

find returns “not found” when the value isn't on the document — useful when you want to branch on absence. get throws an exception when the value isn't found — useful when you want a business user to step in and resolve the gap (and optionally teach a technique).


**Q: How do I make Kognitos handle a document where the label is different (Payment ID vs Payment Number)?**

Open the mini playground from the exception, write the alternative English (e.g. get the document's payment ID), verify the extracted value, and click Always use this answer. Kognitos saves it as a learning and applies it automatically on future similar documents via context-based embeddings.