feat(google-vertexai): add taskType and title support to VertexAI embeddings #9377
+64
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds support for the
taskTypeandtitleparameters in the VertexAI embeddings constructor, enabling users to optimize embeddings for specific downstream applications such as retrieval, semantic similarity, and classification.Closes #9371
Motivation
Currently, the
taskTypeparameter can only be provided per request. Users requested the ability to define it at initialization for better embedding optimization across tasks.This update enhances flexibility, alignment with Vertex AI API capabilities, and ease of configuration.
Changes
Type Definitions (
@langchain/google-common)taskType?: GoogleEmbeddingsTaskTypetoBaseGoogleEmbeddingsParamstitle?: stringtoBaseGoogleEmbeddingsParamsImplementation (
@langchain/google-common)taskTypeandtitleas class properties inBaseGoogleEmbeddingsembedDocuments()to forwardtaskTypeandtitleto embedding instances when specifiedTests (
@langchain/google-vertexai)taskTypeparametertaskTypecombined withdimensionsoutputDimensionalityparameter (backward compatibility)Usage Example
Before
After
Available Task Types
The
taskTypeparameter allows you to optimize embeddings for different downstream applications.Below are the supported values and their typical use cases:
RETRIEVAL_QUERY— Optimize embeddings for search or query text.RETRIEVAL_DOCUMENT— Optimize embeddings for documents in a retrieval corpus.SEMANTIC_SIMILARITY— Generate embeddings for measuring semantic similarity between texts.CLASSIFICATION— Optimize embeddings for text classification tasks.CLUSTERING— Generate embeddings suitable for clustering or grouping similar content.QUESTION_ANSWERING— Optimize embeddings for question-answering datasets.FACT_VERIFICATION— Generate embeddings to support fact-checking or verification pipelines.CODE_RETRIEVAL_QUERY— Optimize embeddings for code or developer-related retrieval queries.