You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`--keep N`| number of tokens to keep from the initial prompt (default: 0, -1 = all) |
54
54
|`--swa-full`| use full-size SWA cache (default: false)<br/>[(more info)](https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)<br/>(env: LLAMA_ARG_SWA_FULL) |
55
-
|`--kv-unified, -kvu`| use single unified KV buffer for the KV cache of all sequences (default: false)<br/>[(more info)](https://github.com/ggml-org/llama.cpp/pull/14363)<br/>(env: LLAMA_ARG_KV_SPLIT) |
55
+
|`--kv-unified, -kvu`| use single unified KV buffer for the KV cache of all sequences (default: false)<br/>[(more info)](https://github.com/ggml-org/llama.cpp/pull/14363)<br/>(env: LLAMA_ARG_KV_UNIFIED) |
56
56
|`-fa, --flash-attn [on\|off\|auto]`| set Flash Attention use ('on', 'off', or 'auto', default: 'auto')<br/>(env: LLAMA_ARG_FLASH_ATTN) |
@@ -103,11 +103,11 @@ The project is under active development, and we are [looking for feedback and co
103
103
|`-hffv, --hf-file-v FILE`| Hugging Face model file for the vocoder model (default: unused)<br/>(env: LLAMA_ARG_HF_FILE_V) |
104
104
|`-hft, --hf-token TOKEN`| Hugging Face access token (default: value from HF_TOKEN environment variable)<br/>(env: HF_TOKEN) |
105
105
|`--log-disable`| Log disable |
106
-
|`--log-file FNAME`| Log to file |
106
+
|`--log-file FNAME`| Log to file<br/>(env: LLAMA_LOG_FILE)|
107
107
|`--log-colors [on\|off\|auto]`| Set colored logging ('on', 'off', or 'auto', default: 'auto')<br/>'auto' enables colors when output is to a terminal<br/>(env: LLAMA_LOG_COLORS) |
108
108
|`-v, --verbose, --log-verbose`| Set verbosity level to infinity (i.e. log all messages, useful for debugging) |
109
109
|`--offline`| Offline mode: forces use of cache, prevents network access<br/>(env: LLAMA_OFFLINE) |
110
-
|`-lv, --verbosity, --log-verbosity N`| Set the verbosity threshold. Messages with a higher verbosity will be ignored.<br/>(env: LLAMA_LOG_VERBOSITY) |
110
+
|`-lv, --verbosity, --log-verbosity N`| Set the verbosity threshold. Messages with a higher verbosity will be ignored. Values:<br/> - 0: generic output<br/> - 1: error<br/> - 2: warning<br/> - 3: info<br/> - 4: debug<br/>(default: 3)<br/><br/>(env: LLAMA_LOG_VERBOSITY) |
111
111
|`--log-prefix`| Enable prefix in log messages<br/>(env: LLAMA_LOG_PREFIX) |
112
112
|`--log-timestamps`| Enable timestamps in log messages<br/>(env: LLAMA_LOG_TIMESTAMPS) |
113
113
|`-ctkd, --cache-type-k-draft TYPE`| KV cache data type for K for the draft model<br/>allowed values: f32, f16, bf16, q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1<br/>(default: f16)<br/>(env: LLAMA_ARG_CACHE_TYPE_K_DRAFT) |
0 commit comments