You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: model instances inherit both command line arguments and environment variables from the router server.
1516
1521
1522
+
Alternatively, you can also add GGUF based preset (see next section)
1523
+
1524
+
### Model presets
1525
+
1526
+
Model presets allow advanced users to define custom configurations using an `.ini` file:
1527
+
1528
+
```sh
1529
+
llama-server --models-preset ./my-models.ini
1530
+
```
1531
+
1532
+
Each section in the file defines a new preset. Keys within a section correspond to command-line arguments (without leading dashes). For example, the argument `--n-gpu-layer 123` is written as `n-gpu-layer = 123`.
1533
+
1534
+
Short argument forms (e.g., `c`, `ngl`) and environment variable names (e.g., `LLAMA_ARG_N_GPU_LAYERS`) are also supported as keys.
1535
+
1536
+
Example:
1537
+
1538
+
```ini
1539
+
version = 1
1540
+
1541
+
; If the key corresponds to an existing model on the server,
1542
+
; this will be used as the default config for that model
1543
+
[ggml-org/MY-MODEL-GGUF:Q8_0]
1544
+
; string value
1545
+
chat-template = chatml
1546
+
; numeric value
1547
+
n-gpu-layer = 123
1548
+
; boolean value
1549
+
jinja = false
1550
+
; shorthand argument (for example, context size)
1551
+
c = 4096
1552
+
; environment variable name
1553
+
LLAMA_ARG_CACHE_RAM = 0
1554
+
; file paths are relative to server's CWD
1555
+
model-draft = ./my-models/draft.gguf
1556
+
; but it's RECOMMENDED to use absolute path
1557
+
model-draft = /Users/abc/my-models/draft.gguf
1558
+
1559
+
; If the key does NOT correspond to an existing model,
1560
+
; you need to specify at least the model path
1561
+
[custom_model]
1562
+
model = /Users/abc/my-awesome-model-Q4_K_M.gguf
1563
+
```
1564
+
1565
+
Note: some arguments are controlled by router (e.g., host, port, API key, HF repo, model alias). They will be removed or overwritten upload loading.
1566
+
1517
1567
### Routing requests
1518
1568
1519
1569
Requests are routed according to the requested model name.
0 commit comments