v3.4.0
3.4.0 (2025-01-08)
Features
- token prediction (speculative decoding) (#405) (632a7bf) (documentation: Token Prediction)
controlledEvaluate(#405) (632a7bf) (documentation: Low Level API)evaluateWithMetadata(#405) (632a7bf) (documentation: Low Level API)- reranking (#405) (632a7bf) (documentation: Reranking Documents)
- token confidence (#405) (632a7bf) (documentation: Low Level API)
experimentalChunkDocument(#405) (632a7bf)- build on arm64 using LLVM (#405) (632a7bf)
- try compiling with LLVM on Windows x64 when available (#405) (632a7bf)
- minor: dynamically load
llama.cppbackends (#405) (632a7bf) - minor: more token values support in
SpecialToken(#405) (632a7bf) - minor: improve memory usage estimation (#405) (632a7bf)
Bug Fixes
- check for Rosetta usage on macOS x64 when using the
inspect gpucommand (#405) (632a7bf) - detect running under Rosetta on Apple Silicone and show an error message instead of crashing (#405) (632a7bf)
- switch from
"nextTick"to"nextCycle"for the default batch dispatcher (#405) (632a7bf) - remove deprecated CLS token (#405) (632a7bf)
- pipe error logs in
inspect gpucommand (#405) (632a7bf)
Shipped with llama.cpp release b4435
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)