Skip to content

Commit 211a6fa

Browse files
author
Paweł Kędzia
committed
Merge branch 'features/web'
2 parents fce2089 + 01207d8 commit 211a6fa

File tree

2 files changed

+249
-40
lines changed

2 files changed

+249
-40
lines changed

llm_router_api/README.md

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
# llm‑router‑api
2+
3+
**llm‑router‑api** is a lightweight Python library that provides a flexible, extensible proxy for Large Language Model (
4+
LLM) back‑ends. It abstracts the details of multiple model providers (OpenAI‑compatible, Ollama, vLLM, LM Studio, etc.)
5+
and offers a unified REST interface with built‑in load‑balancing, health‑checking, and monitoring.
6+
7+
> **Repository:** <https://github.com/radlab-dev-group/llm-router>
8+
9+
---
10+
11+
## Table of Contents
12+
13+
1. [Features](#features)
14+
2. [Installation](#installation)
15+
3. [Configuration](#configuration)
16+
4. [Running the Server](#running-the-server)
17+
5. [REST API Overview](#rest-api-overview)
18+
6. [Load‑Balancing Strategies](#load‑balancing-strategies)
19+
7. [Extending the Router](#extending-the-router)
20+
8. [Monitoring & Metrics](#monitoring--metrics)
21+
9. [Development & Testing](#development--testing)
22+
10. [License](#license)
23+
24+
---
25+
26+
## Features
27+
28+
- **Unified API** – One REST surface (`/api/...`) that proxies calls to any supported LLM back‑end.
29+
- **Provider Selection** – Choose a provider per request using pluggable strategies (balanced, weighted, adaptive,
30+
first‑available).
31+
- **Prompt Management** – System prompts are stored as files and can be dynamically injected with placeholder
32+
substitution.
33+
- **Streaming Support** – Transparent streaming for both OpenAI‑compatible and Ollama endpoints.
34+
- **Health Checks** – Built‑in ping endpoint and Redis‑based provider health monitoring.
35+
- **Prometheus Metrics** – Optional instrumentation for request counts, latencies, and error rates.
36+
- **Auto‑Discovery** – Endpoints are automatically discovered and instantiated at startup.
37+
- **Extensible** – Add new providers, strategies, or custom endpoints with minimal boilerplate.
38+
39+
---
40+
41+
## Installation
42+
43+
The project uses **Python 3.10.6** and a **virtualenv**‑based workflow.
44+
45+
```shell script
46+
# Clone the repository
47+
git clone https://github.com/radlab-dev-group/llm-router.git
48+
cd llm-router
49+
50+
# Create a virtual environment
51+
python3 -m venv venv
52+
source venv/bin/activate
53+
54+
# Install the package (including optional extras)
55+
pip install -e .[metrics] # installs Prometheus support
56+
```
57+
58+
All required third‑party libraries are listed in `requirements.txt` (e.g., Flask, requests, redis, rdl‑ml‑utils, etc.).
59+
60+
---
61+
62+
## Configuration
63+
64+
Configuration is driven primarily by environment variables and a JSON model‑config file.
65+
66+
### Environment Variables
67+
68+
| Variable | Description | Default |
69+
|---------------------------------------------------|------------------------------------------------------------------------------------------|----------------------------------------|
70+
| `LLM_ROUTER_PROMPTS_DIR` | Directory containing system prompt files. | `resources/prompts` |
71+
| `LLM_ROUTER_MODELS_CONFIG` | Path to the JSON file defining models and providers. | `resources/configs/models-config.json` |
72+
| `LLM_ROUTER_EXTERNAL_TIMEOUT` | HTTP timeout (seconds) for outbound LLM calls. | `300` |
73+
| `LLM_ROUTER_TIMEOUT` | Timeout for the proxy server itself. | `0` (no timeout) |
74+
| `LLM_ROUTER_LOG_FILENAME` | Log file name for the router. | `llm-router.log` |
75+
| `LLM_ROUTER_LOG_LEVEL` | Logging level (`DEBUG`, `INFO`, …). | `INFO` |
76+
| `LLM_ROUTER_EP_PREFIX` | Global URL prefix (e.g., `/api`). | `/api` |
77+
| `LLM_ROUTER_MINIMUM` | Must be set to enable proxy mode (`1`/`true`). | *required* |
78+
| `LLM_ROUTER_BALANCE_STRATEGY` | Load‑balancing strategy (`balanced`, `weighted`, `dynamic_weighted`, `first_available`). | `balanced` |
79+
| `LLM_ROUTER_REDIS_HOST` / `LLM_ROUTER_REDIS_PORT` | Redis connection details for provider locking/monitoring. | `""` / `6379` |
80+
| `LLM_ROUTER_USE_PROMETHEUS` | Enable Prometheus metrics (`1`/`true`). | `False` |
81+
| `LLM_ROUTER_SERVER_TYPE` | Server backend (`flask`, `gunicorn`, `waitress`). | `flask` |
82+
| `LLM_ROUTER_SERVER_PORT` | Port the server listens on. | `8080` |
83+
| `LLM_ROUTER_SERVER_HOST` | Host/interface to bind. | `0.0.0.0` |
84+
| `LLM_ROUTER_SERVER_WORKERS_COUNT` | Number of workers (Gunicorn/Waitress). | `2` |
85+
| `LLM_ROUTER_SERVER_THREADS_COUNT` | Number of threads per worker. | `8` |
86+
| `LLM_ROUTER_SERVER_WORKER_CLASS` | Gunicorn worker class (e.g., `gevent`). | *empty* |
87+
88+
### Model Configuration
89+
90+
`models-config.json` follows the schema:
91+
92+
```json
93+
{
94+
"active_models": {
95+
"openai_models": [
96+
"gpt-4",
97+
"gpt-3.5-turbo"
98+
],
99+
"ollama_models": [
100+
"llama2"
101+
]
102+
},
103+
"openai_models": {
104+
"gpt-4": {
105+
"providers": [
106+
{
107+
"id": "openai-gpt4-1",
108+
"api_host": "https://api.openai.com/v1",
109+
"api_token": "sk-...",
110+
"api_type": "openai",
111+
"input_size": 8192,
112+
"model_path": ""
113+
}
114+
]
115+
}
116+
},
117+
...
118+
}
119+
```
120+
121+
Only the fields required by the router are needed: `id`, `api_host`, `api_token` (optional), `api_type`, `input_size`,
122+
and optionally `model_path`.
123+
124+
---
125+
126+
## Running the Server
127+
128+
The entry point is `llm_router_api.rest_api`. Choose a server backend via the `LLM_ROUTER_SERVER_TYPE` variable or
129+
command‑line flags.
130+
131+
```shell script
132+
# Using the built‑in Flask development server (default)
133+
python -m llm_router_api.rest_api
134+
135+
# Production‑grade with Gunicorn (streaming supported)
136+
python -m llm_router_api.rest_api --gunicorn
137+
138+
# Windows‑friendly Waitress server
139+
python -m llm_router_api.rest_api --waitress
140+
```
141+
142+
The server starts on the host/port defined by `LLM_ROUTER_SERVER_HOST` and `LLM_ROUTER_SERVER_PORT` (default
143+
`0.0.0.0:8080`).
144+
145+
**Note:** The service must be launched with `LLM_ROUTER_MINIMUM=1` (or any truthy value) because it operates in
146+
“proxy‑only” mode.
147+
148+
---
149+
150+
## REST API Overview
151+
152+
All routes are prefixed by `LLM_ROUTER_EP_PREFIX` (default `/api`).
153+
154+
| Method | Path | Description |
155+
|--------|------------------------------------------|---------------------------------------------------------------|
156+
| `GET` | `/api/ping` | Health‑check, returns `"pong"` |
157+
| `GET` | `/api/ollama/` | Ollama health endpoint (`"Ollama is running"`). |
158+
| `GET` | `/api/ollama/tags` | List available Ollama model tags. |
159+
| `GET` | `/api/openai/models` | List OpenAI‑compatible model tags. |
160+
| `POST` | `/api/conversation_with_model` | Chat endpoint (builtin). |
161+
| `POST` | `/api/extended_conversation_with_model` | Chat with extra fields (builtin). |
162+
| `POST` | `/api/generate_questions` | Generate questions from texts (builtin). |
163+
| `POST` | `/api/translate` | Translate a list of texts (builtin). |
164+
| `POST` | `/api/simplify_text` | Simplify texts (builtin). |
165+
| `POST` | `/api/generate_article_from_text` | Generate article from a single text (builtin). |
166+
| `POST` | `/api/create_full_article_from_texts` | Generate a full article from multiple texts (builtin). |
167+
| `POST` | `/api/generative_answer` | Answer a question using a context (builtin). |
168+
| `POST` | `/api/v0/models` | List LM Studio models. |
169+
| `POST` | `/api/chat` (or provider‑specific paths) | Proxy to the underlying provider’s chat/completions endpoint. |
170+
171+
**Payload format** follows the OpenAI schema (`model`, `messages`, optional `stream`, etc.) unless a custom endpoint
172+
overrides it.
173+
174+
All endpoints automatically:
175+
176+
- Validate required arguments (via `REQUIRED_ARGS`).
177+
- Resolve the appropriate provider using the configured **load‑balancing strategy**.
178+
- Inject system prompts when `SYSTEM_PROMPT_NAME` is defined.
179+
- Return a JSON response with `{ "status": true, "body": … }` or an error payload.
180+
181+
Streaming responses are returned as **Server‑Sent Events (SSE)** (`text/event-stream`) and are compatible with both
182+
OpenAI‑style and Ollama‑style streams.
183+
184+
---
185+
186+
## Load‑Balancing Strategies
187+
188+
The router selects a provider for a given model request using the **ProviderChooser**. The strategy can be chosen via
189+
the `LLM_ROUTER_BALANCE_STRATEGY` variable.
190+
191+
| Strategy | Description |
192+
|----------------------|-----------------------------------------------------------------------------------------------|
193+
| **balanced** | Simple round‑robin based on usage counters. |
194+
| **weighted** | Static weights defined in each provider configuration (`weight` field). |
195+
| **dynamic_weighted** | Weights are updated at runtime; tracks latency and failure penalties. |
196+
| **first_available** | Uses Redis locks to guarantee exclusive access to a provider (useful for stateful back‑ends). |
197+
198+
Custom strategies can be added by subclassing `ChooseProviderStrategyI` and registering the class in
199+
`llm_router_api.base.lb.chooser.STRATEGIES`.
200+
201+
---
202+
203+
## Extending the Router
204+
205+
### Adding a New Provider Type
206+
207+
1. **Implement `ApiTypesI`**
208+
Create a class (e.g., `MyProviderType`) that implements the abstract methods `chat_ep`, `chat_method`,
209+
`completions_ep`, and `completions_method`.
210+
2. **Register in Dispatcher**
211+
Add the class to `ApiTypesDispatcher._REGISTRY` with a lowercase key.
212+
3. **Update Constants (optional)**
213+
If you need a new balance strategy, extend `BalanceStrategies` in `constants_base.py`.
214+
215+
### Adding a New Endpoint
216+
217+
1. Choose a base class:
218+
- `EndpointWithHttpRequestI` for full proxy behaviour (default).
219+
- `PassthroughI` if you only need to forward the request unchanged.
220+
- Directly subclass `EndpointI` for non‑proxy use cases.
221+
2. Define `REQUIRED_ARGS`, `OPTIONAL_ARGS`, and optionally `SYSTEM_PROMPT_NAME`.
222+
3. Implement `prepare_payload(self, params)` – convert incoming parameters to the payload expected by the downstream
223+
model.
224+
4. (Optional) Set `self._prepare_response_function` to post‑process the model response.
225+
5. The endpoint will be auto‑discovered by `EndpointAutoLoader` at startup.
226+
227+
### Prompt Files
228+
229+
Prompt files live under the directory configured by `LLM_ROUTER_PROMPTS_DIR`.
230+
File naming convention: `<category>/system/<lang>/<prompt-id>`.
231+
Placeholders such as `##PLACEHOLDER##` can be replaced via `self._map_prompt` in the endpoint implementation.
232+
233+
---
234+
235+
## Monitoring & Metrics
236+
237+
When `LLM_ROUTER_USE_PROMETHEUS=1` (or `true`) the router automatically:
238+
239+
- Exposes a `/metrics` endpoint (Prometheus format).
240+
- Tracks request counts, latency histograms, in‑progress gauges, and error counters.
241+
242+
You can scrape this endpoint with a Prometheus server or query it manually.
243+
244+
---
245+
246+
## License
247+
248+
`llm-router-api` is released under the **MIT License**. See the `LICENSE` file in the repository for full terms.

llm_router_lib/README.md

Lines changed: 1 addition & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -178,46 +178,7 @@ malformed responses.
178178
All exceptions inherit from `LLMRouterError`, allowing a single `except LLMRouterError:` clause to catch any SDK‑related
179179
problem.
180180

181-
---
182-
183-
## Testing
184-
185-
The library ships with a pytest‑compatible test suite in `llm_router_lib/tests/`.
186-
187-
```shell script
188-
# Activate the virtualenv first
189-
source .venv/bin/activate
190-
191-
# Run the tests
192-
pytest -q llm_router_lib/tests
193-
```
194-
195-
The tests cover:
196-
197-
* Model serialization/deserialization.
198-
* Service endpoint configuration.
199-
* HTTP error handling (mocked).
200-
201-
---
202-
203-
## Contributing
204-
205-
Contributions are welcome! Please follow these steps:
206-
207-
1. Fork the repository.
208-
2. Create a new branch (`git checkout -b feature/awesome‑feature`).
209-
3. Write code **and** accompanying tests.
210-
4. Run the full test suite (`pytest`).
211-
5. Submit a pull request with a clear description of the change.
212-
213-
Before committing, run the code‑formatters and linters that the project uses:
214-
215-
```shell script
216-
autopep8 --in-place --aggressive --aggressive **/*.py
217-
pylint llm_router_lib
218-
```
219-
220-
---
181+
---
221182

222183
## License
223184

0 commit comments

Comments
 (0)