Skip to content

embedDocuments() batch support and possibly large document splitting #390

@krodyrobi

Description

@krodyrobi

Describe the Problem

OpenAi apis limit the number of elements embedded at a time and the size of each element, to work around this the js sdk and python sdk provide the following:

  • batching if number of documents are above 2048 (promise.all on the batch api calls)
  • split large (>~8000 tokens) documents in sections and then join the retrieved embeddings in a normalized fashion (here the js sdk does not do it but python one does)

Propose a Solution

Base implementation of sdk chat model on @langchain/oneai models.
For the per document split I believe this needs a custom implementation above the core sdk, didn't see it in js langchain.

Describe Alternatives

Create a wrapper over existing model.

Affected Development Phase

Development

Impact

Inconvenience

Timeline

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions