You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,4 +7,4 @@
7
7
| 0.0.3 | Proper `AutoLoading` for each found endpoint. Implementation of `ApiTypesDispatcher`, `ApiModelConfig`, `ModelHandler`. Ollama endpoints: `/`, `tags`. Added endpoint to full proxy with params. Streaming in case when external api provides stream. |
8
8
| 0.0.4 | All llama-service endpoints are refactored to `llm-proxy-api`. Refactoring base `ep_run` method. Proper handling system message, prompt name, model etc. |
9
9
| 0.1.0 | Repository name changed from `llm-proxy-api` to `llm-router`. Added class `HttpRequestExecutor` to handle http requests from `EndpointWithHttpRequestI`. Handled routing between any models: `openai -> ollama` and `ollama -> openai`|
10
-
| 0.1.1 | Prometheus metrics logging. Workers/Threads/Workers class is able to set by environments. Streaming fixes. |
10
+
| 0.1.1 | Prometheus metrics logging. Workers/Threads/Workers class is able to set by environments. Streaming fixes. Multi-providers for single model with default-balanced strategy.|
Copy file name to clipboardExpand all lines: README.md
+84-2Lines changed: 84 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,13 +14,19 @@ allowing your application to talk to any supported LLM through a single, consist
14
14
|**Unified REST interface**| One endpoint schema works for OpenAI‑compatible, Ollama, vLLM and any future provider. |
15
15
|**Provider‑agnostic streaming**| The `stream` flag (default `true`) controls whether the proxy forwards **chunked** responses as they arrive or returns a **single** aggregated payload. |
16
16
|**Built‑in prompt library**| Language‑aware system prompts stored under `resources/prompts` can be referenced automatically. |
17
-
|**Dynamic model configuration**| JSON file (`models-config.json`) defines provider, model name, default options and per‑model overrides. |
18
-
|**Pluggable providers**| New providers are added by implementing the `BaseProvider` interface in `llm_proxy_rest/core/api_types`. |
17
+
|**Dynamic model configuration**| JSON file (`models-config.json`) defines providers, model name, default options and per‑model overrides. |
19
18
|**Request validation**| Pydantic models guarantee correct payloads; errors are returned with clear messages. |
|**Health & metadata endpoints**|`/ping` (simple 200 OK) and `/tags` (available model tags/metadata). |
22
21
|**Simple deployment**| One‑liner run script or `python -m llm_proxy_rest.rest_api`. |
23
22
|**Extensible conversation formats**| Basic chat, conversation with system prompt, and extended conversation with richer options (e.g., temperature, top‑k, custom system prompt). |
23
+
|**Multi‑provider model support**| Each model can be backed by multiple providers (VLLM, Ollama, OpenAI) defined in `models-config.json`. |
24
+
|**Provider selection abstraction**|`ProviderChooser` delegates to a configurable strategy, enabling easy swapping of load‑balancing, round‑robin, weighted‑random, etc. |
25
+
|**Load‑balanced default strategy**|`LoadBalancedStrategy` distributes requests evenly across providers using in‑memory usage counters. |
26
+
|**Dynamic model handling**|`ModelHandler` loads model definitions at runtime and resolves the appropriate provider per request. |
27
+
|**Pluggable endpoint architecture**| Automatic discovery and registration of all concrete `EndpointI` implementations via `EndpointAutoLoader`. |
28
+
|**Prometheus metrics integration**| Optional `/metrics` endpoint for latency, error counts, and provider usage statistics. |
29
+
|**Docker ready**| Dockerfile and scripts for containerised deployment. |
0 commit comments