-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Description
I know triton inference server support to serve the lora model but seems it only supports for LLM model with vLLM backend and TensorRTLLM backend.
Our use case is:
We have a foundation model for eeg data. then we fine tuning the model on different task with peft library from Hugging face.
the output is:
- base model
- many lora adapter for different tasks
How we can run inference with triton server without duplicating the weight of base model
Metadata
Metadata
Assignees
Labels
No labels