|
| 1 | +# llm‑router‑api |
| 2 | + |
| 3 | +**llm‑router‑api** is a lightweight Python library that provides a flexible, extensible proxy for Large Language Model ( |
| 4 | +LLM) back‑ends. It abstracts the details of multiple model providers (OpenAI‑compatible, Ollama, vLLM, LM Studio, etc.) |
| 5 | +and offers a unified REST interface with built‑in load‑balancing, health‑checking, and monitoring. |
| 6 | + |
| 7 | +> **Repository:** <https://github.com/radlab-dev-group/llm-router> |
| 8 | +
|
| 9 | +--- |
| 10 | + |
| 11 | +## Table of Contents |
| 12 | + |
| 13 | +1. [Features](#features) |
| 14 | +2. [Installation](#installation) |
| 15 | +3. [Configuration](#configuration) |
| 16 | +4. [Running the Server](#running-the-server) |
| 17 | +5. [REST API Overview](#rest-api-overview) |
| 18 | +6. [Load‑Balancing Strategies](#load‑balancing-strategies) |
| 19 | +7. [Extending the Router](#extending-the-router) |
| 20 | +8. [Monitoring & Metrics](#monitoring--metrics) |
| 21 | +9. [Development & Testing](#development--testing) |
| 22 | +10. [License](#license) |
| 23 | + |
| 24 | +--- |
| 25 | + |
| 26 | +## Features |
| 27 | + |
| 28 | +- **Unified API** – One REST surface (`/api/...`) that proxies calls to any supported LLM back‑end. |
| 29 | +- **Provider Selection** – Choose a provider per request using pluggable strategies (balanced, weighted, adaptive, |
| 30 | + first‑available). |
| 31 | +- **Prompt Management** – System prompts are stored as files and can be dynamically injected with placeholder |
| 32 | + substitution. |
| 33 | +- **Streaming Support** – Transparent streaming for both OpenAI‑compatible and Ollama endpoints. |
| 34 | +- **Health Checks** – Built‑in ping endpoint and Redis‑based provider health monitoring. |
| 35 | +- **Prometheus Metrics** – Optional instrumentation for request counts, latencies, and error rates. |
| 36 | +- **Auto‑Discovery** – Endpoints are automatically discovered and instantiated at startup. |
| 37 | +- **Extensible** – Add new providers, strategies, or custom endpoints with minimal boilerplate. |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## Installation |
| 42 | + |
| 43 | +The project uses **Python 3.10.6** and a **virtualenv**‑based workflow. |
| 44 | + |
| 45 | +```shell script |
| 46 | +# Clone the repository |
| 47 | +git clone https://github.com/radlab-dev-group/llm-router.git |
| 48 | +cd llm-router |
| 49 | + |
| 50 | +# Create a virtual environment |
| 51 | +python3 -m venv venv |
| 52 | +source venv/bin/activate |
| 53 | + |
| 54 | +# Install the package (including optional extras) |
| 55 | +pip install -e .[metrics] # installs Prometheus support |
| 56 | +``` |
| 57 | + |
| 58 | +All required third‑party libraries are listed in `requirements.txt` (e.g., Flask, requests, redis, rdl‑ml‑utils, etc.). |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +## Configuration |
| 63 | + |
| 64 | +Configuration is driven primarily by environment variables and a JSON model‑config file. |
| 65 | + |
| 66 | +### Environment Variables |
| 67 | + |
| 68 | +| Variable | Description | Default | |
| 69 | +|---------------------------------------------------|------------------------------------------------------------------------------------------|----------------------------------------| |
| 70 | +| `LLM_ROUTER_PROMPTS_DIR` | Directory containing system prompt files. | `resources/prompts` | |
| 71 | +| `LLM_ROUTER_MODELS_CONFIG` | Path to the JSON file defining models and providers. | `resources/configs/models-config.json` | |
| 72 | +| `LLM_ROUTER_EXTERNAL_TIMEOUT` | HTTP timeout (seconds) for outbound LLM calls. | `300` | |
| 73 | +| `LLM_ROUTER_TIMEOUT` | Timeout for the proxy server itself. | `0` (no timeout) | |
| 74 | +| `LLM_ROUTER_LOG_FILENAME` | Log file name for the router. | `llm-router.log` | |
| 75 | +| `LLM_ROUTER_LOG_LEVEL` | Logging level (`DEBUG`, `INFO`, …). | `INFO` | |
| 76 | +| `LLM_ROUTER_EP_PREFIX` | Global URL prefix (e.g., `/api`). | `/api` | |
| 77 | +| `LLM_ROUTER_MINIMUM` | Must be set to enable proxy mode (`1`/`true`). | *required* | |
| 78 | +| `LLM_ROUTER_BALANCE_STRATEGY` | Load‑balancing strategy (`balanced`, `weighted`, `dynamic_weighted`, `first_available`). | `balanced` | |
| 79 | +| `LLM_ROUTER_REDIS_HOST` / `LLM_ROUTER_REDIS_PORT` | Redis connection details for provider locking/monitoring. | `""` / `6379` | |
| 80 | +| `LLM_ROUTER_USE_PROMETHEUS` | Enable Prometheus metrics (`1`/`true`). | `False` | |
| 81 | +| `LLM_ROUTER_SERVER_TYPE` | Server backend (`flask`, `gunicorn`, `waitress`). | `flask` | |
| 82 | +| `LLM_ROUTER_SERVER_PORT` | Port the server listens on. | `8080` | |
| 83 | +| `LLM_ROUTER_SERVER_HOST` | Host/interface to bind. | `0.0.0.0` | |
| 84 | +| `LLM_ROUTER_SERVER_WORKERS_COUNT` | Number of workers (Gunicorn/Waitress). | `2` | |
| 85 | +| `LLM_ROUTER_SERVER_THREADS_COUNT` | Number of threads per worker. | `8` | |
| 86 | +| `LLM_ROUTER_SERVER_WORKER_CLASS` | Gunicorn worker class (e.g., `gevent`). | *empty* | |
| 87 | + |
| 88 | +### Model Configuration |
| 89 | + |
| 90 | +`models-config.json` follows the schema: |
| 91 | + |
| 92 | +```json |
| 93 | +{ |
| 94 | + "active_models": { |
| 95 | + "openai_models": [ |
| 96 | + "gpt-4", |
| 97 | + "gpt-3.5-turbo" |
| 98 | + ], |
| 99 | + "ollama_models": [ |
| 100 | + "llama2" |
| 101 | + ] |
| 102 | + }, |
| 103 | + "openai_models": { |
| 104 | + "gpt-4": { |
| 105 | + "providers": [ |
| 106 | + { |
| 107 | + "id": "openai-gpt4-1", |
| 108 | + "api_host": "https://api.openai.com/v1", |
| 109 | + "api_token": "sk-...", |
| 110 | + "api_type": "openai", |
| 111 | + "input_size": 8192, |
| 112 | + "model_path": "" |
| 113 | + } |
| 114 | + ] |
| 115 | + } |
| 116 | + }, |
| 117 | + ... |
| 118 | +} |
| 119 | +``` |
| 120 | + |
| 121 | +Only the fields required by the router are needed: `id`, `api_host`, `api_token` (optional), `api_type`, `input_size`, |
| 122 | +and optionally `model_path`. |
| 123 | + |
| 124 | +--- |
| 125 | + |
| 126 | +## Running the Server |
| 127 | + |
| 128 | +The entry point is `llm_router_api.rest_api`. Choose a server backend via the `LLM_ROUTER_SERVER_TYPE` variable or |
| 129 | +command‑line flags. |
| 130 | + |
| 131 | +```shell script |
| 132 | +# Using the built‑in Flask development server (default) |
| 133 | +python -m llm_router_api.rest_api |
| 134 | + |
| 135 | +# Production‑grade with Gunicorn (streaming supported) |
| 136 | +python -m llm_router_api.rest_api --gunicorn |
| 137 | + |
| 138 | +# Windows‑friendly Waitress server |
| 139 | +python -m llm_router_api.rest_api --waitress |
| 140 | +``` |
| 141 | + |
| 142 | +The server starts on the host/port defined by `LLM_ROUTER_SERVER_HOST` and `LLM_ROUTER_SERVER_PORT` (default |
| 143 | +`0.0.0.0:8080`). |
| 144 | + |
| 145 | +**Note:** The service must be launched with `LLM_ROUTER_MINIMUM=1` (or any truthy value) because it operates in |
| 146 | +“proxy‑only” mode. |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +## REST API Overview |
| 151 | + |
| 152 | +All routes are prefixed by `LLM_ROUTER_EP_PREFIX` (default `/api`). |
| 153 | + |
| 154 | +| Method | Path | Description | |
| 155 | +|--------|------------------------------------------|---------------------------------------------------------------| |
| 156 | +| `GET` | `/api/ping` | Health‑check, returns `"pong"` | |
| 157 | +| `GET` | `/api/ollama/` | Ollama health endpoint (`"Ollama is running"`). | |
| 158 | +| `GET` | `/api/ollama/tags` | List available Ollama model tags. | |
| 159 | +| `GET` | `/api/openai/models` | List OpenAI‑compatible model tags. | |
| 160 | +| `POST` | `/api/conversation_with_model` | Chat endpoint (builtin). | |
| 161 | +| `POST` | `/api/extended_conversation_with_model` | Chat with extra fields (builtin). | |
| 162 | +| `POST` | `/api/generate_questions` | Generate questions from texts (builtin). | |
| 163 | +| `POST` | `/api/translate` | Translate a list of texts (builtin). | |
| 164 | +| `POST` | `/api/simplify_text` | Simplify texts (builtin). | |
| 165 | +| `POST` | `/api/generate_article_from_text` | Generate article from a single text (builtin). | |
| 166 | +| `POST` | `/api/create_full_article_from_texts` | Generate a full article from multiple texts (builtin). | |
| 167 | +| `POST` | `/api/generative_answer` | Answer a question using a context (builtin). | |
| 168 | +| `POST` | `/api/v0/models` | List LM Studio models. | |
| 169 | +| `POST` | `/api/chat` (or provider‑specific paths) | Proxy to the underlying provider’s chat/completions endpoint. | |
| 170 | + |
| 171 | +**Payload format** follows the OpenAI schema (`model`, `messages`, optional `stream`, etc.) unless a custom endpoint |
| 172 | +overrides it. |
| 173 | + |
| 174 | +All endpoints automatically: |
| 175 | + |
| 176 | +- Validate required arguments (via `REQUIRED_ARGS`). |
| 177 | +- Resolve the appropriate provider using the configured **load‑balancing strategy**. |
| 178 | +- Inject system prompts when `SYSTEM_PROMPT_NAME` is defined. |
| 179 | +- Return a JSON response with `{ "status": true, "body": … }` or an error payload. |
| 180 | + |
| 181 | +Streaming responses are returned as **Server‑Sent Events (SSE)** (`text/event-stream`) and are compatible with both |
| 182 | +OpenAI‑style and Ollama‑style streams. |
| 183 | + |
| 184 | +--- |
| 185 | + |
| 186 | +## Load‑Balancing Strategies |
| 187 | + |
| 188 | +The router selects a provider for a given model request using the **ProviderChooser**. The strategy can be chosen via |
| 189 | +the `LLM_ROUTER_BALANCE_STRATEGY` variable. |
| 190 | + |
| 191 | +| Strategy | Description | |
| 192 | +|----------------------|-----------------------------------------------------------------------------------------------| |
| 193 | +| **balanced** | Simple round‑robin based on usage counters. | |
| 194 | +| **weighted** | Static weights defined in each provider configuration (`weight` field). | |
| 195 | +| **dynamic_weighted** | Weights are updated at runtime; tracks latency and failure penalties. | |
| 196 | +| **first_available** | Uses Redis locks to guarantee exclusive access to a provider (useful for stateful back‑ends). | |
| 197 | + |
| 198 | +Custom strategies can be added by subclassing `ChooseProviderStrategyI` and registering the class in |
| 199 | +`llm_router_api.base.lb.chooser.STRATEGIES`. |
| 200 | + |
| 201 | +--- |
| 202 | + |
| 203 | +## Extending the Router |
| 204 | + |
| 205 | +### Adding a New Provider Type |
| 206 | + |
| 207 | +1. **Implement `ApiTypesI`** |
| 208 | + Create a class (e.g., `MyProviderType`) that implements the abstract methods `chat_ep`, `chat_method`, |
| 209 | + `completions_ep`, and `completions_method`. |
| 210 | +2. **Register in Dispatcher** |
| 211 | + Add the class to `ApiTypesDispatcher._REGISTRY` with a lowercase key. |
| 212 | +3. **Update Constants (optional)** |
| 213 | + If you need a new balance strategy, extend `BalanceStrategies` in `constants_base.py`. |
| 214 | + |
| 215 | +### Adding a New Endpoint |
| 216 | + |
| 217 | +1. Choose a base class: |
| 218 | + - `EndpointWithHttpRequestI` for full proxy behaviour (default). |
| 219 | + - `PassthroughI` if you only need to forward the request unchanged. |
| 220 | + - Directly subclass `EndpointI` for non‑proxy use cases. |
| 221 | +2. Define `REQUIRED_ARGS`, `OPTIONAL_ARGS`, and optionally `SYSTEM_PROMPT_NAME`. |
| 222 | +3. Implement `prepare_payload(self, params)` – convert incoming parameters to the payload expected by the downstream |
| 223 | + model. |
| 224 | +4. (Optional) Set `self._prepare_response_function` to post‑process the model response. |
| 225 | +5. The endpoint will be auto‑discovered by `EndpointAutoLoader` at startup. |
| 226 | + |
| 227 | +### Prompt Files |
| 228 | + |
| 229 | +Prompt files live under the directory configured by `LLM_ROUTER_PROMPTS_DIR`. |
| 230 | +File naming convention: `<category>/system/<lang>/<prompt-id>`. |
| 231 | +Placeholders such as `##PLACEHOLDER##` can be replaced via `self._map_prompt` in the endpoint implementation. |
| 232 | + |
| 233 | +--- |
| 234 | + |
| 235 | +## Monitoring & Metrics |
| 236 | + |
| 237 | +When `LLM_ROUTER_USE_PROMETHEUS=1` (or `true`) the router automatically: |
| 238 | + |
| 239 | +- Exposes a `/metrics` endpoint (Prometheus format). |
| 240 | +- Tracks request counts, latency histograms, in‑progress gauges, and error counters. |
| 241 | + |
| 242 | +You can scrape this endpoint with a Prometheus server or query it manually. |
| 243 | + |
| 244 | +--- |
| 245 | + |
| 246 | +## License |
| 247 | + |
| 248 | +`llm-router-api` is released under the **MIT License**. See the `LICENSE` file in the repository for full terms. |
0 commit comments