3.8.0 (2025-05-17)

save and restore a context sequence state (#460) (f2cb873) (documentation: Saving and restoring a context sequence evaluation state)
stream function call parameters (#460) (f2cb873) (documentation: API: LLamaChatPromptOptions["onFunctionCallParamsChunk"])
configure Hugging Face remote endpoint for resolving URIs (#460) (f2cb873) (documentation: API: ResolveModelFileOptions["endpoints"])
Qwen 3 support (#460) (f2cb873)
QwenChatWrapper: support discouraging the generation of thoughts (#460) (f2cb873) (documentation: API: QwenChatWrapper constructor > thoughts option)
getLlama: dryRun option (#460) (f2cb873) (documentation: API: LlamaOptions["dryRun"])
getLlamaGpuTypes function (#460) (f2cb873) (documentation: API: getLlamaGpuTypes)

adapt to breaking llama.cpp changes (#460) (f2cb873)
capture multi-token segment separators (#460) (f2cb873)
race condition when reading extremely long gguf metadata (#460) (f2cb873)
adapt memory estimation to newly added model architectures (#460) (f2cb873)
skip binary testing on certain problematic conditions (#460) (f2cb873)
improve GPU backend loading error description (#460) (f2cb873)

Shipped with llama.cpp release b5414

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

Provide feedback