v3.8.0
3.8.0 (2025-05-17)
Features
- save and restore a context sequence state (#460) (f2cb873) (documentation: Saving and restoring a context sequence evaluation state)
- stream function call parameters (#460) (f2cb873) (documentation: API:
LLamaChatPromptOptions["onFunctionCallParamsChunk"]) - configure Hugging Face remote endpoint for resolving URIs (#460) (f2cb873) (documentation: API:
ResolveModelFileOptions["endpoints"]) - Qwen 3 support (#460) (f2cb873)
QwenChatWrapper: support discouraging the generation of thoughts (#460) (f2cb873) (documentation: API:QwenChatWrapperconstructor >thoughtsoption)getLlama:dryRunoption (#460) (f2cb873) (documentation: API:LlamaOptions["dryRun"])getLlamaGpuTypesfunction (#460) (f2cb873) (documentation: API:getLlamaGpuTypes)
Bug Fixes
- adapt to breaking
llama.cppchanges (#460) (f2cb873) - capture multi-token segment separators (#460) (f2cb873)
- race condition when reading extremely long gguf metadata (#460) (f2cb873)
- adapt memory estimation to newly added model architectures (#460) (f2cb873)
- skip binary testing on certain problematic conditions (#460) (f2cb873)
- improve GPU backend loading error description (#460) (f2cb873)
Shipped with llama.cpp release b5414
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)