Skip to content

Commit 6861772

Browse files
author
Paweł Kędzia
committed
Merge branch 'features/balancing'
2 parents 7093dde + 73fe61d commit 6861772

File tree

10 files changed

+270
-57
lines changed

10 files changed

+270
-57
lines changed

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@
77
| 0.0.3 | Proper `AutoLoading` for each found endpoint. Implementation of `ApiTypesDispatcher`, `ApiModelConfig`, `ModelHandler`. Ollama endpoints: `/`, `tags`. Added endpoint to full proxy with params. Streaming in case when external api provides stream. |
88
| 0.0.4 | All llama-service endpoints are refactored to `llm-proxy-api`. Refactoring base `ep_run` method. Proper handling system message, prompt name, model etc. |
99
| 0.1.0 | Repository name changed from `llm-proxy-api` to `llm-router`. Added class `HttpRequestExecutor` to handle http requests from `EndpointWithHttpRequestI`. Handled routing between any models: `openai -> ollama` and `ollama -> openai` |
10-
| 0.1.1 | Prometheus metrics logging. Workers/Threads/Workers class is able to set by environments. Streaming fixes. |
10+
| 0.1.1 | Prometheus metrics logging. Workers/Threads/Workers class is able to set by environments. Streaming fixes. Multi-providers for single model with default-balanced strategy. |

README.md

Lines changed: 84 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,19 @@ allowing your application to talk to any supported LLM through a single, consist
1414
| **Unified REST interface** | One endpoint schema works for OpenAI‑compatible, Ollama, vLLM and any future provider. |
1515
| **Provider‑agnostic streaming** | The `stream` flag (default `true`) controls whether the proxy forwards **chunked** responses as they arrive or returns a **single** aggregated payload. |
1616
| **Built‑in prompt library** | Language‑aware system prompts stored under `resources/prompts` can be referenced automatically. |
17-
| **Dynamic model configuration** | JSON file (`models-config.json`) defines provider, model name, default options and per‑model overrides. |
18-
| **Pluggable providers** | New providers are added by implementing the `BaseProvider` interface in `llm_proxy_rest/core/api_types`. |
17+
| **Dynamic model configuration** | JSON file (`models-config.json`) defines providers, model name, default options and per‑model overrides. |
1918
| **Request validation** | Pydantic models guarantee correct payloads; errors are returned with clear messages. |
2019
| **Structured logging** | Configurable log level, filename, and optional JSON formatting. |
2120
| **Health & metadata endpoints** | `/ping` (simple 200 OK) and `/tags` (available model tags/metadata). |
2221
| **Simple deployment** | One‑liner run script or `python -m llm_proxy_rest.rest_api`. |
2322
| **Extensible conversation formats** | Basic chat, conversation with system prompt, and extended conversation with richer options (e.g., temperature, top‑k, custom system prompt). |
23+
| **Multi‑provider model support** | Each model can be backed by multiple providers (VLLM, Ollama, OpenAI) defined in `models-config.json`. |
24+
| **Provider selection abstraction** | `ProviderChooser` delegates to a configurable strategy, enabling easy swapping of load‑balancing, round‑robin, weighted‑random, etc. |
25+
| **Load‑balanced default strategy** | `LoadBalancedStrategy` distributes requests evenly across providers using in‑memory usage counters. |
26+
| **Dynamic model handling** | `ModelHandler` loads model definitions at runtime and resolves the appropriate provider per request. |
27+
| **Pluggable endpoint architecture** | Automatic discovery and registration of all concrete `EndpointI` implementations via `EndpointAutoLoader`. |
28+
| **Prometheus metrics integration** | Optional `/metrics` endpoint for latency, error counts, and provider usage statistics. |
29+
| **Docker ready** | Dockerfile and scripts for containerised deployment. |
2430

2531
---
2632

@@ -150,6 +156,82 @@ LLM_ROUTER_MINIMUM=1 python3 -m llm_router_api.rest_api
150156

151157
---
152158

159+
## Provider Selection
160+
161+
The LLM‑router supports **multiple providers** for a single model. Provider selection is handled by
162+
the **ProviderChooser** class, which delegates the choice to a configurable **strategy** implementation.
163+
164+
### Chooser
165+
166+
``` python
167+
from llm_router_api.base.lb.chooser import ProviderChooser
168+
from llm_router_api.base.lb.strategy import LoadBalancedStrategy
169+
170+
# By default the chooser uses the LoadBalancedStrategy
171+
provider_chooser = ProviderChooser(strategy=LoadBalancedStrategy())
172+
```
173+
174+
`ProviderChooser.get_provider(model_name, providers)` receives the model name
175+
and the list of provider configurations (as defined in `models-config.json`) and
176+
returns the chosen provider dictionary.
177+
178+
### Strategy Interface
179+
180+
All strategies must implement `ChooseProviderStrategyI`:
181+
182+
``` python
183+
class ChooseProviderStrategyI(ABC):
184+
@abstractmethod
185+
def choose(self, model_name: str, providers: List[Dict]) -> Dict:
186+
"""Select one provider configuration for the given model."""
187+
raise NotImplementedError
188+
```
189+
190+
### Built‑in Strategy: LoadBalancedStrategy
191+
192+
The default `LoadBalancedStrategy` distributes requests evenly across providers
193+
by keeping an in‑memory usage counter per model/provider pair.
194+
195+
``` python
196+
class LoadBalancedStrategy(ChooseProviderStrategyI):
197+
def __init__(self) -> None:
198+
self._usage_counters: Dict[str, Dict[str, int]] = defaultdict(
199+
lambda: defaultdict(int)
200+
)
201+
202+
def choose(self, model_name: str, providers: List[Dict]) -> Dict:
203+
# selects the provider with the smallest usage count
204+
...
205+
```
206+
207+
### Current Setting
208+
209+
In **`engine.py`** the Flask engine creates the chooser like this:
210+
211+
``` python
212+
self._provider_chooser = ProviderChooser(strategy=LoadBalancedStrategy())
213+
```
214+
215+
Therefore, unless overridden, the application uses the **load‑balanced** provider
216+
selection strategy out of the box.
217+
218+
### Extending with Custom Strategies
219+
220+
To use a different strategy (e.g., round‑robin, random weighted, latency‑based),
221+
implement `ChooseProviderStrategyI` and pass the instance to `ProviderChooser`:
222+
223+
``` python
224+
from llm_router_api.base.lb.chooser import ProviderChooser
225+
from my_strategies import RoundRobinStrategy
226+
227+
chooser = ProviderChooser(strategy=RoundRobinStrategy())
228+
```
229+
230+
The rest of the code – `ModelHandler`, endpoint implementations, etc. – will
231+
automatically use the chooser you provide.
232+
233+
---
234+
153235
## 🛣️ Endpoints Overview
154236
155237
All endpoints are exposed under the REST API service. Unless stated otherwise, methods are POST and consume/produce

llm_router_api/base/lb/__init__.py

Whitespace-only changes.

llm_router_api/base/lb/chooser.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
from typing import List, Dict, Optional
2+
3+
from llm_router_api.base.lb.strategy import (
4+
ChooseProviderStrategyI,
5+
LoadBalancedStrategy,
6+
)
7+
8+
9+
class ProviderChooser:
10+
11+
def __init__(self, strategy: Optional[ChooseProviderStrategyI] = None) -> None:
12+
self.strategy: ChooseProviderStrategyI = strategy or LoadBalancedStrategy()
13+
14+
def get_provider(self, model_name: str, providers: List[Dict]) -> Dict:
15+
if not providers:
16+
raise RuntimeError(f"{model_name} does not have any providers!")
17+
return self.strategy.choose(model_name, providers)

llm_router_api/base/lb/strategy.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
from abc import ABC, abstractmethod
2+
from collections import defaultdict
3+
from typing import List, Dict
4+
5+
6+
class ChooseProviderStrategyI(ABC):
7+
8+
@staticmethod
9+
def _provider_key(provider_cfg: Dict) -> str:
10+
return provider_cfg.get("id") or provider_cfg.get("api_host", "unknown")
11+
12+
@abstractmethod
13+
def choose(self, model_name: str, providers: List[Dict]) -> Dict:
14+
raise NotImplementedError
15+
16+
17+
class LoadBalancedStrategy(ChooseProviderStrategyI):
18+
19+
def __init__(self) -> None:
20+
self._usage_counters: Dict[str, Dict[str, int]] = defaultdict(
21+
lambda: defaultdict(int)
22+
)
23+
24+
def choose(self, model_name: str, providers: List[Dict]) -> Dict:
25+
if not providers:
26+
raise ValueError(f"No providers configured for model '{model_name}'")
27+
28+
min_used = None
29+
chosen_cfg = None
30+
for cfg in providers:
31+
key = self._provider_key(cfg)
32+
used = self._usage_counters[model_name][key]
33+
34+
if min_used is None or used < min_used:
35+
min_used = used
36+
chosen_cfg = cfg
37+
38+
chosen_key = self._provider_key(chosen_cfg)
39+
self._usage_counters[model_name][chosen_key] += 1
40+
41+
return chosen_cfg

llm_router_api/base/model_config.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"""
44

55
import json
6-
from typing import Dict
6+
from typing import Dict, List
77

88

99
class ApiModelConfig:
@@ -68,13 +68,20 @@ def _read_active_models(self) -> Dict[str, str]:
6868
def _active_models_configuration(self) -> Dict:
6969
"""
7070
Build a dictionary containing the configuration for each active model.
71-
71+
Now each model maps to a **list** of provider configurations.
7272
Returns:
73-
Dict[str, Dict]: Mapping of model name to its configuration dictionary.
73+
Dict[str, List[Dict]]: Mapping of model name to a list of provider dicts.
7474
"""
75-
models_configuration = {}
75+
models_configuration: Dict[str, List[Dict]] = {}
7676
models_json = json.load(open(self.models_config_path, "rt"))
7777
for m_type, models_list in self.active_models.items():
7878
for m_name in models_list:
79-
models_configuration[m_name] = models_json[m_type][m_name]
79+
provider_cfg = models_json[m_type][m_name]
80+
if "providers" not in provider_cfg:
81+
raise KeyError(f"{m_type}:{m_name} has no providers!")
82+
# if "api_host" in provider_cfg:
83+
# provider_cfg["id"] = f"{m_name}_{provider_cfg['api_host']}"
84+
# provider_cfg = {"providers": [provider_cfg]}
85+
models_configuration[m_name] = provider_cfg
86+
8087
return models_configuration

llm_router_api/base/model_handler.py

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
from dataclasses import dataclass
1111
from typing import Dict, Optional, Any
1212

13+
from llm_router_api.base.lb.chooser import ProviderChooser
1314
from llm_router_api.base.model_config import ApiModelConfig
1415

1516

@@ -34,6 +35,7 @@ class ApiModel:
3435
Optional path to model (in case when local model is used)
3536
"""
3637

38+
id: str
3739
name: str
3840
api_host: str
3941
api_type: str
@@ -64,18 +66,13 @@ def from_config(name: str, cfg: Dict) -> "ApiModel":
6466
The "input_size" value can be an integer or a numeric string;
6567
it will be converted to an int. If conversion fails, defaults to 0.
6668
"""
67-
raw_size = cfg.get("input_size", 0)
68-
try:
69-
input_size = int(raw_size)
70-
except (TypeError, ValueError):
71-
input_size = 0
72-
7369
return ApiModel(
70+
id=str(cfg["id"]),
7471
name=name,
75-
api_host=str(cfg.get("api_host", "")),
76-
api_type=str(cfg.get("api_type")),
72+
api_host=str(cfg["api_host"]),
73+
api_type=str(cfg["api_type"]),
7774
api_token=str(cfg.get("api_token", "")),
78-
input_size=input_size,
75+
input_size=int(cfg.get("input_size", 0)),
7976
model_path=str(cfg.get("model_path", "")),
8077
)
8178

@@ -108,7 +105,7 @@ class ModelHandler:
108105
Loader responsible for reading and exposing model configuration.
109106
"""
110107

111-
def __init__(self, models_config_path: str):
108+
def __init__(self, models_config_path: str, provider_chooser: ProviderChooser):
112109
"""
113110
Initialize the handler with the provided configuration path.
114111
@@ -117,6 +114,7 @@ def __init__(self, models_config_path: str):
117114
models_config_path : str
118115
Path to the JSON configuration file containing model definitions.
119116
"""
117+
self.provider_chooser = provider_chooser
120118
self.api_model_config: ApiModelConfig = ApiModelConfig(models_config_path)
121119

122120
def get_model(self, model_name: str) -> Optional[ApiModel]:
@@ -133,10 +131,15 @@ def get_model(self, model_name: str) -> Optional[ApiModel]:
133131
Optional[ApiModel]
134132
ApiModel instance if found; otherwise, None.
135133
"""
136-
cfg = self.api_model_config.models_configs.get(model_name)
137-
if cfg is None:
134+
providers = self.api_model_config.models_configs[model_name].get(
135+
"providers", []
136+
)
137+
model_host_cfg = self.provider_chooser.get_provider(
138+
model_name=model_name, providers=providers
139+
)
140+
if model_host_cfg is None:
138141
return None
139-
return ApiModel.from_config(model_name, cfg)
142+
return ApiModel.from_config(model_name, model_host_cfg)
140143

141144
def list_active_models(self) -> Dict[str, Any]:
142145
"""

llm_router_api/core/engine.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@
2323
from flask import Flask, Response
2424
from typing import List, Type, Optional
2525

26+
from llm_router_api.base.lb.chooser import ProviderChooser
27+
from llm_router_api.base.lb.strategy import LoadBalancedStrategy
2628
from llm_router_api.endpoints.endpoint_i import EndpointI
2729
from llm_router_api.register.auto_loader import EndpointAutoLoader
2830
from llm_router_api.register.register import FlaskEndpointRegistrar
@@ -101,6 +103,11 @@ def __init__(
101103
self.logger_level = logger_level
102104
self.logger_file_name = logger_file_name
103105

106+
# NOTE: Currently LoadBalancedStrategy is implemented. Should be replaced
107+
# NOTE: in the future when new strategies will be implemented.
108+
# IDEA: Provider should be configurable by ENV value
109+
self._provider_chooser = ProviderChooser(strategy=LoadBalancedStrategy())
110+
104111
def prepare_flask_app(
105112
self,
106113
):
@@ -176,6 +183,7 @@ def __auto_load_endpoints(self, base_class: Type[EndpointI]):
176183
base_class=base_class,
177184
prompts_dir=self.prompts_dir,
178185
models_config_path=self.models_config_path,
186+
provider_chooser=self._provider_chooser,
179187
logger_file_name=self.logger_file_name,
180188
logger_level=self.logger_level,
181189
)

llm_router_api/register/auto_loader.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from rdl_ml_utils.utils.logger import prepare_logger
99
from rdl_ml_utils.handlers.prompt_handler import PromptHandler
1010

11+
from llm_router_api.base.lb.chooser import ProviderChooser
1112
from llm_router_api.endpoints.passthrough import PassthroughI
1213
from llm_router_api.endpoints.endpoint_i import EndpointI, EndpointWithHttpRequestI
1314
from llm_router_api.base.model_handler import ModelHandler
@@ -29,6 +30,7 @@ def __init__(
2930
base_class: Type[EndpointI],
3031
prompts_dir: str,
3132
models_config_path: str,
33+
provider_chooser: ProviderChooser,
3234
logger_file_name: Optional[str] = None,
3335
logger_level: Optional[str] = REST_API_LOG_LEVEL,
3436
):
@@ -48,7 +50,9 @@ def __init__(
4850
self.prompts_dir = prompts_dir
4951

5052
self._prompt_handler = PromptHandler(base_dir=prompts_dir)
51-
self._model_handler = ModelHandler(models_config_path=models_config_path)
53+
self._model_handler = ModelHandler(
54+
models_config_path=models_config_path, provider_chooser=provider_chooser
55+
)
5256

5357
self._logger_level = logger_level
5458
self._logger_file_name = logger_file_name

0 commit comments

Comments
 (0)