Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 144 additions & 27 deletions api-reference/workflow/workflows.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1416,7 +1416,10 @@ import EnrichmentImageSummaryHiResOnly from '/snippets/general-shared-text/enric
name="Enrichment",
subtype="<subtype>",
type="prompter",
settings={}
settings={
"provider_type": "<provider-type>",
"model": "<model>"
}
)
```
</Accordion>
Expand All @@ -1426,17 +1429,43 @@ import EnrichmentImageSummaryHiResOnly from '/snippets/general-shared-text/enric
"name": "Enrichment",
"type": "prompter",
"subtype": "<subtype>",
"settings": {}
"settings": {
"provider_type": "<provider-type>",
"model": "<model>"
}
}
```
</Accordion>
</AccordionGroup>

Allowed values for `<subtype>` include:
Allowed values for `<subtype>`, `<provider-type>`, and `<model>` include, respectively:

- `anthropic_image_description`, `anthropic`, and:

- `claude-3-7-sonnet-20250219`
- `claude-sonnet-4-20250514`
- `claude-sonnet-4-5-20250929`

- `bedrock_image_description`, `bedrock`, and:

- `openai_image_description`
- `anthropic_image_description`
- `bedrock_image_description`
- `us.amazon.nova-lite-v1:0`
- `us.amazon.nova-pro-v1:0`
- `us.anthropic.claude-3-haiku-20240307-v1:0`
- `us.anthropic.claude-3-opus-20240229-v1:0`
- `us.anthropic.claude-3-sonnet-20240229-v1:0`
- `us.anthropic.claude-3-7-sonnet-20250219-v1:0`
- `us.anthropic.claude-sonnet-4-20250514-v1:0`
- `us.anthropic.claude-sonnet-4-5-20250929-v1:0`

- `openai_image_description`, `openai`, and:

- `gpt-4o`
- `gpt-4o-mini`
- `gpt-5-mini`

- `vertexai_image_description`, `vertexai`, and:

- `gemini-2.0-flash-001`

#### Table Description task

Expand All @@ -1451,7 +1480,10 @@ import EnrichmentTableSummaryHiResOnly from '/snippets/general-shared-text/enric
name="Enrichment",
subtype="<subtype>",
type="prompter",
settings={}
settings={
"provider_type": "<provider-type>",
"model": "<model>"
}
)
```
</Accordion>
Expand All @@ -1461,17 +1493,43 @@ import EnrichmentTableSummaryHiResOnly from '/snippets/general-shared-text/enric
"name": "Enrichment",
"type": "prompter",
"subtype": "<subtype>",
"settings": {}
"settings": {
"provider_type": "<provider-type>",
"model": "<model>"
}
}
```
</Accordion>
</AccordionGroup>

Allowed values for `<subtype>` include:
Allowed values for `<subtype>`, `<provider-type>`, and `<model>` include, respectively:

- `anthropic_table_description`, `anthropic`, and:

- `claude-3-7-sonnet-20250219`
- `claude-sonnet-4-20250514`
- `claude-sonnet-4-5-20250929`

- `bedrock_table_description`, `bedrock`, and:

- `us.amazon.nova-lite-v1:0`
- `us.amazon.nova-pro-v1:0`
- `us.anthropic.claude-3-haiku-20240307-v1:0`
- `us.anthropic.claude-3-opus-20240229-v1:0`
- `us.anthropic.claude-3-sonnet-20240229-v1:0`
- `us.anthropic.claude-3-7-sonnet-20250219-v1:0`
- `us.anthropic.claude-sonnet-4-20250514-v1:0`
- `us.anthropic.claude-sonnet-4-5-20250929-v1:0`

- `openai_table_description`, `openai`, and:

- `gpt-4o`
- `gpt-4o-mini`
- `gpt-5-mini`

- `openai_table_description`
- `anthropic_table_description`
- `bedrock_table_description`
- `vertexai_table_description`, `vertexai`, and:

- `gemini-2.0-flash-001`

#### Table to HTML task

Expand All @@ -1484,9 +1542,12 @@ import EnrichmentTableToHTMLHiResOnly from '/snippets/general-shared-text/enrich
```python
table_to_html_enrichment_workflow_node = WorkflowNode(
name="Enrichment",
subtype="openai_table2html",
subtype="<subtype>",
type="prompter",
settings={}
settings={
"provider_type": "<provider-type>",
"model": "<model>"
}
)
```
</Accordion>
Expand All @@ -1495,13 +1556,31 @@ import EnrichmentTableToHTMLHiResOnly from '/snippets/general-shared-text/enrich
{
"name": "Enrichment",
"type": "prompter",
"subtype": "openai_table2html",
"settings": {}
"subtype": "<subtype>",
"settings": {
"provider_type": "<provider-type>",
"model": "<model>"
}
}
```
</Accordion>
</AccordionGroup>

Allowed values for `<subtype>`, `<provider-type>`, and `<model>` include, respectively:

- `twopass_table2html` for agentic AI table-to-HTML output. Do not specify a `<provider-type>` or `<model>` for this subtype.
- `anthropic_table2html` for VLM table-to-HTML output by using Anthropic, `anthropic`, and:

- `claude-3-7-sonnet-20250219`
- `claude-sonnet-4-20250514`
- `claude-sonnet-4-5-20250929`

- `openai_table2html` for VLM table-to-HTML output by using OpenAI, `openai`, and:

- `gpt-4o`
- `gpt-4o-mini`
- `gpt-5-mini`

#### Named Entity Recognition (NER) task

<AccordionGroup>
Expand All @@ -1512,6 +1591,8 @@ import EnrichmentTableToHTMLHiResOnly from '/snippets/general-shared-text/enrich
subtype="<subtype>",
type="prompter",
settings={
"provider_type": "<provider-type>",
"model": "<model>",
"prompt_interface_overrides": {
"prompt": {
"user": "<user-prompt-override>"
Expand All @@ -1528,6 +1609,8 @@ import EnrichmentTableToHTMLHiResOnly from '/snippets/general-shared-text/enrich
"type": "prompter",
"subtype": "<subtype>",
"settings": {
"provider_type": "<provider-type>",
"model": "<model>",
"prompt_interface_overrides": {
"prompt": {
"user": "<user-prompt-override>"
Expand All @@ -1541,6 +1624,20 @@ import EnrichmentTableToHTMLHiResOnly from '/snippets/general-shared-text/enrich

Fields for settings include:

- Allowed values for `<subtype>`, `<provider-type>`, and `<model>` include, respectively:

- `anthropic_ner`, `anthropic`, and:

- `claude-3-7-sonnet-20250219`
- `claude-sonnet-4-20250514`
- `claude-sonnet-4-5-20250929`

- `openai_ner`, `openai`, and:

- `gpt-4o`
- `gpt-4o-mini`
- `gpt-5-mini`

- `prompt_interface_overrides.prompt.user`: _Optional_. Any alternative prompt to use with the underlying NER model. The default is none, which means to rely on using Unstructured's internal default prompt when calling the NER model.
The internal default prompt is as follows, which you can override by providing an alternative prompt:

Expand Down Expand Up @@ -1615,11 +1712,6 @@ Fields for settings include:

- Changing any other portions of the internal default prompt could produce unexpected results.

Allowed values for `<subtype>` include:

- `openai_ner`
- `anthropic_ner`

#### Generative OCR task

import EnrichmentOCRHiResOnly from '/snippets/general-shared-text/enrichment-ocr-high-res-only.mdx';
Expand All @@ -1645,7 +1737,10 @@ import EnrichmentOCRHiResOnly from '/snippets/general-shared-text/enrichment-ocr
name="Enrichment",
subtype="<subtype>",
type="prompter",
settings={}
settings={
"provider_type": "<provider-type>",
"model": "<model>"
}
)
```
</Accordion>
Expand All @@ -1655,17 +1750,39 @@ import EnrichmentOCRHiResOnly from '/snippets/general-shared-text/enrichment-ocr
"name": "Enrichment",
"type": "prompter",
"subtype": "<subtype>",
"settings": {}
"settings": {
"provider_type": "<provider-type>",
"model": "<model>"
}
}
```
</Accordion>
</AccordionGroup>

Allowed values for `<subtype>` include:
Allowed values for `<subtype>`, `<provider-type>`, and `<model>` include, respectively:

- `anthropic_ocr`, `anthropic`, and:

- `claude-3-7-sonnet-20250219`
- `claude-sonnet-4-20250514`
- `claude-sonnet-4-5-20250929`

- `bedrock_ocr`, `bedrock`, and:

- `us.amazon.nova-lite-v1:0`
- `us.amazon.nova-pro-v1:0`
- `us.anthropic.claude-3-haiku-20240307-v1:0`
- `us.anthropic.claude-3-opus-20240229-v1:0`
- `us.anthropic.claude-3-sonnet-20240229-v1:0`
- `us.anthropic.claude-3-7-sonnet-20250219-v1:0`
- `us.anthropic.claude-sonnet-4-20250514-v1:0`
- `us.anthropic.claude-sonnet-4-5-20250929-v1:0`

- `openai_ocr`, `openai`, and:

- `anthropic_ocr`
- `openai_ocr`
- `vertexai_ocr`
- `gpt-4o`
- `gpt-4o-mini`
- `gpt-5-mini`

### Chunker node

Expand Down
14 changes: 11 additions & 3 deletions ui/enriching/table-to-html.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ title: Tables to HTML

After partitioning, you can have Unstructured generate representations of each detected table in HTML markup format.

This table-to-HTML output is done by using [GPT-4o](https://openai.com/index/hello-gpt-4o/), provided through OpenAI.
This table-to-HTML output is generated by using agent AI or a vision language model (VLM).
The agentic AI option typically provides more accurate table-to-HTML output than the VLM option.

Here is an example of the HTML markup output of a detected table using GPT-4o. Note specifically the `text_as_html` field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.
Expand Down Expand Up @@ -86,9 +87,16 @@ For workflows that use [chunking](/ui/chunking), note the following changes:
import EnrichmentTableToHTMLHiResOnly from '/snippets/general-shared-text/enrichment-table-to-html-hi-res-only.mdx';
import DeprecatedModelsUI from '/snippets/general-shared-text/deprecated-models-ui.mdx';

To generate table-to-HTML output, in an **Enrichment** node in a workflow, for **Model**, select **OpenAI (GPT-4o)**.
To generate table-to-HTML output, for an **Enrichment** node in a workflow, do the following:

Make sure after you choose this provider and model, that **Table to HTML** is also selected.
1. For **Input Type**, click **Table**.
2. To use agentic AI to generate the table-to-HTML output, for **Provider**, click **Agentic Table Parsing**.

To use a VLM to generate the table-to-HTML output instead, do the following:

a. For **Provider**, click **Anthropic** or **OpenAI**.<br/>
b. For **Model**, click one of the available models that are shown.<br/>
c. For **Task**, click **Table to HTML**.<br/>

<Note>
You can change a workflow's table description settings only through [Custom](/ui/workflows#create-a-custom-workflow) workflow settings.
Expand Down
8 changes: 5 additions & 3 deletions ui/walkthrough.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -277,9 +277,11 @@ HTML representations of detected tables, and detected entities (such as people a
In the node's settings pane's **Details** tab, click:

- **Table** under **Input Type**.
- Any available choice for **Provider** (for example, **Anthropic**).
- Any available choice for **Model** (for example, **Claude Sonnet 4.5** if you chose **Anthropic** for **Provider**).
- **Table to HTML** under **Task**.
- Any available choice for **Provider** (for example, **Agentic Table Parsing** to use agentic AI, or **Anthropic** to use a VLM).
- If you did not choose **Agentic Table Parsing** for **Provider**, click the following:

- Any available choice for **Model** (for example, **Claude Sonnet 4.5** if you chose **Anthropic** for **Provider**).
- **Table to HTML** under **Task**.

<Tip>
The table to HTML enrichment generates an HTML representation of each detected table. This can help you to more quickly and accurately recreate the table's content elsewhere later as needed. This also provides additional context about the table's structure for your RAG apps, agents, and models. [Learn more](/ui/enriching/table-to-html).
Expand Down
9 changes: 7 additions & 2 deletions ui/workflows.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -336,9 +336,14 @@ import DeprecatedModelsUI from '/snippets/general-shared-text/deprecated-models-

[Learn more](/ui/enriching/table-descriptions).

- **Table** to convert tables to HTML. Also select one of the available provider (and model) combinations that are shown.
- **Table** to convert tables to HTML. Also select the following:

- To use agentic AI to convert tables to HTML, select **Agentic Table Parsing**.
- To use a VLM to convert tables to HTML instead, do the following:

Make sure after you choose this provider and model, that **Table to HTML** is also selected.
a. For **Provider**, click **Anthropic** or **OpenAI**.<br/>
b. For **Model**, click one of the available models that are shown.<br/>
c. For **Task**, click **Table to HTML**.<br/>

[Learn more](/ui/enriching/table-to-html).

Expand Down