Skip to content

Commit 9ff0642

Browse files
authored
Mixedbread: adding embedding details (#231)
1 parent 478d5f3 commit 9ff0642

File tree

3 files changed

+9
-4
lines changed

3 files changed

+9
-4
lines changed

api-reference/how-to/embedding.mdx

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,7 @@ title: Set embedding behavior
55
<Note>
66
The following information applies only to the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli) and the [Unstructured Ingest Python library](/ingestion/overview#unstructured-ingest-python-library).
77

8-
For the Unstructured open-source library, see [Embedding](/open-source/core-functionality/embedding) instead.
9-
10-
The Unstructured SDKs for Python and JavaScript/TypeScript do not support this functionality.
8+
The Unstructured SDKs for Python and JavaScript/TypeScript, and the Unstructured open-source library, do not support this functionality.
119
</Note>
1210

1311
## Concepts
@@ -51,6 +49,7 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
5149
- `langchain-openai` for [OpenAI](https://openai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/).
5250
- `langchain-vertexai` for [Google Vertex AI PaLM](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/google_vertex_ai_palm/).
5351
- `langchain-voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/voyageai/).
52+
- `mixedbread-ai` for [Mixedbread](https://www.mixedbread.ai/). [Learn more](https://www.mixedbread.ai/docs/embeddings/overview).
5453
- `octoai` for [Octo AI](https://octo.ai/). [Learn more](https://octo.ai/docs/text-gen-solution/using-unstructured-io-for-embedding-documents).
5554

5655
2. Run the following command to install the required Python pacakge for the embedding provider:
@@ -60,6 +59,7 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
6059
- For `langchain-openai`, run `pip install "unstructured-ingest[openai]"`.
6160
- For `langchain-vertexai`, run `pip install "unstructured-ingest[embed-vertexai]"`.
6261
- For `langchain-voyageai`, run `pip install "unstructured-ingest[embed-voyageai]"`.
62+
- For `mixedbread-ai`, run `pip install "unstructured-ingest[embed-mixedbreadai]"`.
6363
- For `octoai`, run `pip install "unstructured-ingest[embed-octoai]"`.
6464

6565
3. For the following embedding providers, you can choose the model that you want to use. If you do choose a model, note the model's name:
@@ -69,15 +69,17 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
6969
- `langchain-openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`.
7070
- `langchain-vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `textembedding-gecko@001`.
7171
- `langchain-voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided.
72+
- `mixedbread-ai`. [Choose a model](https://www.mixedbread.ai/docs/embeddings/models), or use the default model [mixedbread-ai/mxbai-embed-large-v1](https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-v1).
7273
- `octoai`. [Choose a model](https://octo.ai/blog/supercharge-rag-performance-using-octoai-and-unstructured-embeddings/), or use the default model `thenlper/gte-large`.
7374

7475
4. Note the special settings to connect to the provider:
7576

7677
- For `langchain-aws-bedrock`, you'll need an AWS access key value, the corresponding AWS secret access key value, and the corresponding AWS Region identifier. [Get an AWS access key and secret access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html).
77-
- For `langchain-huggingface`, if you use a gated model (a model with special conditions that you must accept before you can use it, or a privately published model), you'll need an HF inference API key value, beginning with `hf_`. [Get an HF inference API key](https://huggingface.co/docs/api-inference/en/quicktour#get-your-api-token). To learn whether your model requires an HF inference API key, see your model provider's documentation
78+
- For `langchain-huggingface`, if you use a gated model (a model with special conditions that you must accept before you can use it, or a privately published model), you'll need an HF inference API key value, beginning with `hf_`. [Get an HF inference API key](https://huggingface.co/docs/api-inference/en/quicktour#get-your-api-token). To learn whether your model requires an HF inference API key, see your model provider's documentation.
7879
- For `langchain-openai`, you'll need an OpenAI API key value. [Get an OpenAI API key](https://platform.openai.com/docs/quickstart/create-and-export-an-api-key).
7980
- For `langchain-vertexai`, you'll need the path to a Google Cloud credentials JSON file. Learn more [here](https://cloud.google.com/docs/authentication/application-default-credentials#GAC) and [here](https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth).
8081
- For `langchain-voyageai`, you'll need a Voyage AI API key value. [Get a Voyage AI API key](https://docs.voyageai.com/docs/api-key-and-installation#authentication-with-api-keys).
82+
- For `mixedbread-ai`, you'll need a Mixedbread API key value. [Get a Mixedbread API key](https://www.mixedbread.ai/dashboard?next=api-keys).
8183
- For `octoai`, you'll need an Octo AI API token value. [Get an Octo AI API token](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token).
8284

8385
5. Now, apply all of this information as follows, and then run your command or code:

api-reference/ingest/ingest-dependencies.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ To add support for available embedding libraries, run the following:
9696
| `pip install "unstructured-ingest[embed-octoai]"` | OctoAI |
9797
| `pip install "unstructured-ingest[embed-vertexai]"` | Google Vertex AI |
9898
| `pip install "unstructured-ingest[embed-voyageai]"` | Voyage AI |
99+
| `pip install "unstructured-ingest[embed-mixedbreadai]"` | Mixedbread |
99100
| `pip install "unstructured-ingest[openai]"` | OpenAI |
100101

101102
For details about the specific dependencies that are installed, see:

snippets/ingest-configuration-shared/embedding-configuration.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,6 @@ A common embedding configuration is a critical component that allows for dynamic
3939

4040
* `langchain-voyageai`: None
4141

42+
* `mixedbread-ai`: `mixedbread-ai/mxbai-embed-large-v1`
43+
4244
* `octoai`: `thenlper/gte-large`

0 commit comments

Comments
 (0)