pytorch backend for lora model

I know triton inference server support to serve the lora model but seems it only supports for LLM model with vLLM backend and TensorRTLLM backend.
Our use case is:
We have a foundation model for eeg data. then we fine tuning the model on different task with `peft` library from Hugging face.
the output is:
- base model 
- many lora adapter for different tasks

How we can run inference with triton server without duplicating the weight of base model


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pytorch backend for lora model #8545

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pytorch backend for lora model #8545

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions