You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: api-reference/how-to/embedding.mdx
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,9 +5,7 @@ title: Set embedding behavior
5
5
<Note>
6
6
The following information applies only to the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli) and the [Unstructured Ingest Python library](/ingestion/overview#unstructured-ingest-python-library).
7
7
8
-
For the Unstructured open-source library, see [Embedding](/open-source/core-functionality/embedding) instead.
9
-
10
-
The Unstructured SDKs for Python and JavaScript/TypeScript do not support this functionality.
8
+
The Unstructured SDKs for Python and JavaScript/TypeScript, and the Unstructured open-source library, do not support this functionality.
11
9
</Note>
12
10
13
11
## Concepts
@@ -51,6 +49,7 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
51
49
-`langchain-openai` for [OpenAI](https://openai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/).
52
50
-`langchain-vertexai` for [Google Vertex AI PaLM](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/google_vertex_ai_palm/).
53
51
-`langchain-voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/voyageai/).
52
+
-`mixedbread-ai` for [Mixedbread](https://www.mixedbread.ai/). [Learn more](https://www.mixedbread.ai/docs/embeddings/overview).
54
53
-`octoai` for [Octo AI](https://octo.ai/). [Learn more](https://octo.ai/docs/text-gen-solution/using-unstructured-io-for-embedding-documents).
55
54
56
55
2. Run the following command to install the required Python pacakge for the embedding provider:
@@ -60,6 +59,7 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
60
59
- For `langchain-openai`, run `pip install "unstructured-ingest[openai]"`.
61
60
- For `langchain-vertexai`, run `pip install "unstructured-ingest[embed-vertexai]"`.
62
61
- For `langchain-voyageai`, run `pip install "unstructured-ingest[embed-voyageai]"`.
62
+
- For `mixedbread-ai`, run `pip install "unstructured-ingest[embed-mixedbreadai]"`.
63
63
- For `octoai`, run `pip install "unstructured-ingest[embed-octoai]"`.
64
64
65
65
3. For the following embedding providers, you can choose the model that you want to use. If you do choose a model, note the model's name:
@@ -69,15 +69,17 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
69
69
-`langchain-openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`.
70
70
-`langchain-vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `textembedding-gecko@001`.
71
71
-`langchain-voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided.
72
+
-`mixedbread-ai`. [Choose a model](https://www.mixedbread.ai/docs/embeddings/models), or use the default model [mixedbread-ai/mxbai-embed-large-v1](https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-v1).
72
73
-`octoai`. [Choose a model](https://octo.ai/blog/supercharge-rag-performance-using-octoai-and-unstructured-embeddings/), or use the default model `thenlper/gte-large`.
73
74
74
75
4. Note the special settings to connect to the provider:
75
76
76
77
- For `langchain-aws-bedrock`, you'll need an AWS access key value, the corresponding AWS secret access key value, and the corresponding AWS Region identifier. [Get an AWS access key and secret access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html).
77
-
- For `langchain-huggingface`, if you use a gated model (a model with special conditions that you must accept before you can use it, or a privately published model), you'll need an HF inference API key value, beginning with `hf_`. [Get an HF inference API key](https://huggingface.co/docs/api-inference/en/quicktour#get-your-api-token). To learn whether your model requires an HF inference API key, see your model provider's documentation
78
+
- For `langchain-huggingface`, if you use a gated model (a model with special conditions that you must accept before you can use it, or a privately published model), you'll need an HF inference API key value, beginning with `hf_`. [Get an HF inference API key](https://huggingface.co/docs/api-inference/en/quicktour#get-your-api-token). To learn whether your model requires an HF inference API key, see your model provider's documentation.
78
79
- For `langchain-openai`, you'll need an OpenAI API key value. [Get an OpenAI API key](https://platform.openai.com/docs/quickstart/create-and-export-an-api-key).
79
80
- For `langchain-vertexai`, you'll need the path to a Google Cloud credentials JSON file. Learn more [here](https://cloud.google.com/docs/authentication/application-default-credentials#GAC) and [here](https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth).
80
81
- For `langchain-voyageai`, you'll need a Voyage AI API key value. [Get a Voyage AI API key](https://docs.voyageai.com/docs/api-key-and-installation#authentication-with-api-keys).
82
+
- For `mixedbread-ai`, you'll need a Mixedbread API key value. [Get a Mixedbread API key](https://www.mixedbread.ai/dashboard?next=api-keys).
81
83
- For `octoai`, you'll need an Octo AI API token value. [Get an Octo AI API token](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token).
82
84
83
85
5. Now, apply all of this information as follows, and then run your command or code:
0 commit comments