Merge branch 'features/balancing'

Paweł Kędzia · Paweł Kędzia · commit 6861772451b6 · 2025-10-14T23:43:35.000+02:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,4 +7,4 @@
 | 0.0.3   | Proper `AutoLoading` for each found endpoint. Implementation of `ApiTypesDispatcher`, `ApiModelConfig`, `ModelHandler`. Ollama endpoints: `/`, `tags`. Added endpoint to full proxy with params. Streaming in case when external api provides stream. |
 | 0.0.4   | All llama-service endpoints are refactored to `llm-proxy-api`. Refactoring base `ep_run` method. Proper handling system message, prompt name, model etc.                                                                                              |
 | 0.1.0   | Repository name changed from `llm-proxy-api` to `llm-router`. Added class `HttpRequestExecutor` to handle http requests from `EndpointWithHttpRequestI`. Handled routing between any models: `openai -> ollama` and `ollama -> openai`                |
-| 0.1.1   | Prometheus metrics logging. Workers/Threads/Workers class is able to set by environments. Streaming fixes.                                                                                                                                            |
+| 0.1.1   | Prometheus metrics logging. Workers/Threads/Workers class is able to set by environments. Streaming fixes. Multi-providers for single model with default-balanced strategy.                                                                           |
diff --git a/README.md b/README.md
@@ -14,13 +14,19 @@ allowing your application to talk to any supported LLM through a single, consist
 | **Unified REST interface**          | One endpoint schema works for OpenAI‑compatible, Ollama, vLLM and any future provider.                                                                  |
 | **Provider‑agnostic streaming**     | The `stream` flag (default `true`) controls whether the proxy forwards **chunked** responses as they arrive or returns a **single** aggregated payload. |
 | **Built‑in prompt library**         | Language‑aware system prompts stored under `resources/prompts` can be referenced automatically.                                                         |
-| **Dynamic model configuration**     | JSON file (`models-config.json`) defines provider, model name, default options and per‑model overrides.                                                 |
-| **Pluggable providers**             | New providers are added by implementing the `BaseProvider` interface in `llm_proxy_rest/core/api_types`.                                                |
+| **Dynamic model configuration**     | JSON file (`models-config.json`) defines providers, model name, default options and per‑model overrides.                                                |
 | **Request validation**              | Pydantic models guarantee correct payloads; errors are returned with clear messages.                                                                    |
 | **Structured logging**              | Configurable log level, filename, and optional JSON formatting.                                                                                         |
 | **Health & metadata endpoints**     | `/ping` (simple 200 OK) and `/tags` (available model tags/metadata).                                                                                    |
 | **Simple deployment**               | One‑liner run script or `python -m llm_proxy_rest.rest_api`.                                                                                            |
 | **Extensible conversation formats** | Basic chat, conversation with system prompt, and extended conversation with richer options (e.g., temperature, top‑k, custom system prompt).            |
+| **Multi‑provider model support**    | Each model can be backed by multiple providers (VLLM, Ollama, OpenAI) defined in `models-config.json`.                                                  |
+| **Provider selection abstraction**  | `ProviderChooser` delegates to a configurable strategy, enabling easy swapping of load‑balancing, round‑robin, weighted‑random, etc.                    |
+| **Load‑balanced default strategy**  | `LoadBalancedStrategy` distributes requests evenly across providers using in‑memory usage counters.                                                     |
+| **Dynamic model handling**          | `ModelHandler` loads model definitions at runtime and resolves the appropriate provider per request.                                                    |
+| **Pluggable endpoint architecture** | Automatic discovery and registration of all concrete `EndpointI` implementations via `EndpointAutoLoader`.                                              |
+| **Prometheus metrics integration**  | Optional `/metrics` endpoint for latency, error counts, and provider usage statistics.                                                                  |
+| **Docker ready**                    | Dockerfile and scripts for containerised deployment.                                                                                                    |
 
 ---
 
@@ -150,6 +156,82 @@ LLM_ROUTER_MINIMUM=1 python3 -m llm_router_api.rest_api
 
 ---
 
+## Provider Selection
+
+The LLM‑router supports **multiple providers** for a single model. Provider selection is handled by
+the **ProviderChooser** class, which delegates the choice to a configurable **strategy** implementation.
+
+### Chooser
+
+``` python
+from llm_router_api.base.lb.chooser import ProviderChooser
+from llm_router_api.base.lb.strategy import LoadBalancedStrategy
+
+# By default the chooser uses the LoadBalancedStrategy
+provider_chooser = ProviderChooser(strategy=LoadBalancedStrategy())
+```
+
+`ProviderChooser.get_provider(model_name, providers)` receives the model name
+and the list of provider configurations (as defined in `models-config.json`) and
+returns the chosen provider dictionary.
+
+### Strategy Interface
+
+All strategies must implement `ChooseProviderStrategyI`:
+
+``` python
+class ChooseProviderStrategyI(ABC):
+    @abstractmethod
+    def choose(self, model_name: str, providers: List[Dict]) -> Dict:
+        """Select one provider configuration for the given model."""
+        raise NotImplementedError
+```
+
+### Built‑in Strategy: LoadBalancedStrategy
+
+The default `LoadBalancedStrategy` distributes requests evenly across providers
+by keeping an in‑memory usage counter per model/provider pair.
+
+``` python
+class LoadBalancedStrategy(ChooseProviderStrategyI):
+    def __init__(self) -> None:
+        self._usage_counters: Dict[str, Dict[str, int]] = defaultdict(
+            lambda: defaultdict(int)
+        )
+
+    def choose(self, model_name: str, providers: List[Dict]) -> Dict:
+        # selects the provider with the smallest usage count
+        ...
+```
+
+### Current Setting
+
+In **`engine.py`** the Flask engine creates the chooser like this:
+
+``` python
+self._provider_chooser = ProviderChooser(strategy=LoadBalancedStrategy())
+```
+
+Therefore, unless overridden, the application uses the **load‑balanced** provider
+selection strategy out of the box.
+
+### Extending with Custom Strategies
+
+To use a different strategy (e.g., round‑robin, random weighted, latency‑based),
+implement `ChooseProviderStrategyI` and pass the instance to `ProviderChooser`:
+
+``` python
+from llm_router_api.base.lb.chooser import ProviderChooser
+from my_strategies import RoundRobinStrategy
+
+chooser = ProviderChooser(strategy=RoundRobinStrategy())
+```
+
+The rest of the code – `ModelHandler`, endpoint implementations, etc. – will
+automatically use the chooser you provide.
+
+---
+
 ## 🛣️ Endpoints Overview
 
 All endpoints are exposed under the REST API service. Unless stated otherwise, methods are POST and consume/produce
diff --git a/llm_router_api/base/lb/__init__.py b/llm_router_api/base/lb/__init__.py
diff --git a/llm_router_api/base/lb/chooser.py b/llm_router_api/base/lb/chooser.py
@@ -0,0 +1,17 @@
+from typing import List, Dict, Optional
+
+from llm_router_api.base.lb.strategy import (
+    ChooseProviderStrategyI,
+    LoadBalancedStrategy,
+)
+
+
+class ProviderChooser:
+
+    def __init__(self, strategy: Optional[ChooseProviderStrategyI] = None) -> None:
+        self.strategy: ChooseProviderStrategyI = strategy or LoadBalancedStrategy()
+
+    def get_provider(self, model_name: str, providers: List[Dict]) -> Dict:
+        if not providers:
+            raise RuntimeError(f"{model_name} does not have any providers!")
+        return self.strategy.choose(model_name, providers)
diff --git a/llm_router_api/base/lb/strategy.py b/llm_router_api/base/lb/strategy.py
@@ -0,0 +1,41 @@
+from abc import ABC, abstractmethod
+from collections import defaultdict
+from typing import List, Dict
+
+
+class ChooseProviderStrategyI(ABC):
+
+    @staticmethod
+    def _provider_key(provider_cfg: Dict) -> str:
+        return provider_cfg.get("id") or provider_cfg.get("api_host", "unknown")
+
+    @abstractmethod
+    def choose(self, model_name: str, providers: List[Dict]) -> Dict:
+        raise NotImplementedError
+
+
+class LoadBalancedStrategy(ChooseProviderStrategyI):
+
+    def __init__(self) -> None:
+        self._usage_counters: Dict[str, Dict[str, int]] = defaultdict(
+            lambda: defaultdict(int)
+        )
+
+    def choose(self, model_name: str, providers: List[Dict]) -> Dict:
+        if not providers:
+            raise ValueError(f"No providers configured for model '{model_name}'")
+
+        min_used = None
+        chosen_cfg = None
+        for cfg in providers:
+            key = self._provider_key(cfg)
+            used = self._usage_counters[model_name][key]
+
+            if min_used is None or used < min_used:
+                min_used = used
+                chosen_cfg = cfg
+
+        chosen_key = self._provider_key(chosen_cfg)
+        self._usage_counters[model_name][chosen_key] += 1
+
+        return chosen_cfg
diff --git a/llm_router_api/base/model_config.py b/llm_router_api/base/model_config.py
@@ -3,7 +3,7 @@
 """
 
 import json
-from typing import Dict
+from typing import Dict, List
 
 
 class ApiModelConfig:
@@ -68,13 +68,20 @@ def _read_active_models(self) -> Dict[str, str]:
     def _active_models_configuration(self) -> Dict:
         """
         Build a dictionary containing the configuration for each active model.
-
+        Now each model maps to a **list** of provider configurations.
         Returns:
-            Dict[str, Dict]: Mapping of model name to its configuration dictionary.
+            Dict[str, List[Dict]]: Mapping of model name to a list of provider dicts.
         """
-        models_configuration = {}
+        models_configuration: Dict[str, List[Dict]] = {}
         models_json = json.load(open(self.models_config_path, "rt"))
         for m_type, models_list in self.active_models.items():
             for m_name in models_list:
-                models_configuration[m_name] = models_json[m_type][m_name]
+                provider_cfg = models_json[m_type][m_name]
+                if "providers" not in provider_cfg:
+                    raise KeyError(f"{m_type}:{m_name} has no providers!")
+                # if "api_host" in provider_cfg:
+                #     provider_cfg["id"] = f"{m_name}_{provider_cfg['api_host']}"
+                #     provider_cfg = {"providers": [provider_cfg]}
+                models_configuration[m_name] = provider_cfg
+
         return models_configuration
diff --git a/llm_router_api/base/model_handler.py b/llm_router_api/base/model_handler.py
@@ -10,6 +10,7 @@
 from dataclasses import dataclass
 from typing import Dict, Optional, Any
 
+from llm_router_api.base.lb.chooser import ProviderChooser
 from llm_router_api.base.model_config import ApiModelConfig
 
 
@@ -34,6 +35,7 @@ class ApiModel:
         Optional path to model (in case when local model is used)
     """
 
+    id: str
     name: str
     api_host: str
     api_type: str
@@ -64,18 +66,13 @@ def from_config(name: str, cfg: Dict) -> "ApiModel":
         The "input_size" value can be an integer or a numeric string;
         it will be converted to an int. If conversion fails, defaults to 0.
         """
-        raw_size = cfg.get("input_size", 0)
-        try:
-            input_size = int(raw_size)
-        except (TypeError, ValueError):
-            input_size = 0
-
         return ApiModel(
+            id=str(cfg["id"]),
             name=name,
-            api_host=str(cfg.get("api_host", "")),
-            api_type=str(cfg.get("api_type")),
+            api_host=str(cfg["api_host"]),
+            api_type=str(cfg["api_type"]),
             api_token=str(cfg.get("api_token", "")),
-            input_size=input_size,
+            input_size=int(cfg.get("input_size", 0)),
             model_path=str(cfg.get("model_path", "")),
         )
 
@@ -108,7 +105,7 @@ class ModelHandler:
         Loader responsible for reading and exposing model configuration.
     """
 
-    def __init__(self, models_config_path: str):
+    def __init__(self, models_config_path: str, provider_chooser: ProviderChooser):
         """
         Initialize the handler with the provided configuration path.
 
@@ -117,6 +114,7 @@ def __init__(self, models_config_path: str):
         models_config_path : str
             Path to the JSON configuration file containing model definitions.
         """
+        self.provider_chooser = provider_chooser
         self.api_model_config: ApiModelConfig = ApiModelConfig(models_config_path)
 
     def get_model(self, model_name: str) -> Optional[ApiModel]:
@@ -133,10 +131,15 @@ def get_model(self, model_name: str) -> Optional[ApiModel]:
         Optional[ApiModel]
             ApiModel instance if found; otherwise, None.
         """
-        cfg = self.api_model_config.models_configs.get(model_name)
-        if cfg is None:
+        providers = self.api_model_config.models_configs[model_name].get(
+            "providers", []
+        )
+        model_host_cfg = self.provider_chooser.get_provider(
+            model_name=model_name, providers=providers
+        )
+        if model_host_cfg is None:
             return None
-        return ApiModel.from_config(model_name, cfg)
+        return ApiModel.from_config(model_name, model_host_cfg)
 
     def list_active_models(self) -> Dict[str, Any]:
         """
diff --git a/llm_router_api/core/engine.py b/llm_router_api/core/engine.py
@@ -23,6 +23,8 @@
 from flask import Flask, Response
 from typing import List, Type, Optional
 
+from llm_router_api.base.lb.chooser import ProviderChooser
+from llm_router_api.base.lb.strategy import LoadBalancedStrategy
 from llm_router_api.endpoints.endpoint_i import EndpointI
 from llm_router_api.register.auto_loader import EndpointAutoLoader
 from llm_router_api.register.register import FlaskEndpointRegistrar
@@ -101,6 +103,11 @@ def __init__(
         self.logger_level = logger_level
         self.logger_file_name = logger_file_name
 
+        # NOTE: Currently LoadBalancedStrategy is implemented. Should be replaced
+        # NOTE: in the future when new strategies will be implemented.
+        # IDEA: Provider should be configurable by ENV value
+        self._provider_chooser = ProviderChooser(strategy=LoadBalancedStrategy())
+
     def prepare_flask_app(
         self,
     ):
@@ -176,6 +183,7 @@ def __auto_load_endpoints(self, base_class: Type[EndpointI]):
             base_class=base_class,
             prompts_dir=self.prompts_dir,
             models_config_path=self.models_config_path,
+            provider_chooser=self._provider_chooser,
             logger_file_name=self.logger_file_name,
             logger_level=self.logger_level,
         )
diff --git a/llm_router_api/register/auto_loader.py b/llm_router_api/register/auto_loader.py
@@ -8,6 +8,7 @@
 from rdl_ml_utils.utils.logger import prepare_logger
 from rdl_ml_utils.handlers.prompt_handler import PromptHandler
 
+from llm_router_api.base.lb.chooser import ProviderChooser
 from llm_router_api.endpoints.passthrough import PassthroughI
 from llm_router_api.endpoints.endpoint_i import EndpointI, EndpointWithHttpRequestI
 from llm_router_api.base.model_handler import ModelHandler
@@ -29,6 +30,7 @@ def __init__(
         base_class: Type[EndpointI],
         prompts_dir: str,
         models_config_path: str,
+        provider_chooser: ProviderChooser,
         logger_file_name: Optional[str] = None,
         logger_level: Optional[str] = REST_API_LOG_LEVEL,
     ):
@@ -48,7 +50,9 @@ def __init__(
         self.prompts_dir = prompts_dir
 
         self._prompt_handler = PromptHandler(base_dir=prompts_dir)
-        self._model_handler = ModelHandler(models_config_path=models_config_path)
+        self._model_handler = ModelHandler(
+            models_config_path=models_config_path, provider_chooser=provider_chooser
+        )
 
         self._logger_level = logger_level
         self._logger_file_name = logger_file_name
diff --git a/resources/configs/models-config.json b/resources/configs/models-config.json