1+ <div align =" center " >
2+
13# Node Llama.cpp
24Node.js bindings for llama.cpp.
35
4- Pre-built bindings are provided with a fallback to building from source with ` node-gyp ` .
6+ < sub > Pre-built bindings are provided with a fallback to building from source with ` node-gyp ` .< sub >
57
68[ ![ Build] ( https://github.com/withcatai/node-llama-cpp/actions/workflows/build.yml/badge.svg )] ( https://github.com/withcatai/node-llama-cpp/actions/workflows/build.yml )
9+ [ ![ License] ( https://badgen.net/badge/color/MIT/green?label=license )] ( https://www.npmjs.com/package/node-llama-cpp )
10+ [ ![ License] ( https://badgen.net/badge/color/TypeScript/blue?label=types )] ( https://www.npmjs.com/package/node-llama-cpp )
711[ ![ Version] ( https://badgen.net/npm/v/node-llama-cpp )] ( https://www.npmjs.com/package/node-llama-cpp )
812
13+ </div >
914
1015## Installation
1116``` bash
@@ -113,8 +118,8 @@ console.log("AI: " + q1);
113118
114119const tokens = context .encode (q1 );
115120const res: number [] = [];
116- for await (const chunk of context .evaluate (tokens )) {
117- res .push (chunk );
121+ for await (const modelToken of context .evaluate (tokens )) {
122+ res .push (modelToken );
118123
119124 // it's important to not concatinate the results as strings,
120125 // as doing so will break some characters (like some emojis) that are made of multiple tokens.
@@ -130,15 +135,55 @@ const a1 = context.decode(Uint32Array.from(res)).split("USER:")[0];
130135console .log (" AI: " + a1 );
131136```
132137
138+ #### With grammar
139+ Use this to direct the model to generate a specific format of text, like ` JSON ` for example.
140+
141+ > ** Note:** there's an issue with some grammars where the model won't stop generating output,
142+ > so it's advised to use it together with ` maxTokens ` set to the context size of the model
143+
144+ ``` typescript
145+ import {fileURLToPath } from " url" ;
146+ import path from " path" ;
147+ import {LlamaModel , LlamaGrammar , LlamaContext , LlamaChatSession } from " node-llama-cpp" ;
148+
149+ const __dirname = path .dirname (fileURLToPath (import .meta .url ));
150+
151+ const model = new LlamaModel ({
152+ modelPath: path .join (__dirname , " models" , " codellama-13b.Q3_K_M.gguf" )
153+ })
154+ const grammar = await LlamaGrammar .getFor (" json" );
155+ const context = new LlamaContext ({
156+ model ,
157+ grammar
158+ });
159+ const session = new LlamaChatSession ({context });
160+
161+
162+ const q1 = ' Create a JSON that contains a message saying "hi there"' ;
163+ console .log (" User: " + q1 );
164+
165+ const a1 = await session .prompt (q1 , {maxTokens: context .getContextSize ()});
166+ console .log (" AI: " + a1 );
167+ console .log (JSON .parse (a1 ));
168+
169+
170+ const q2 = ' Add another field to the JSON with the key being "author" and the value being "LLama"' ;
171+ console .log (" User: " + q2 );
172+
173+ const a2 = await session .prompt (q2 , {maxTokens: context .getContextSize ()});
174+ console .log (" AI: " + a2 );
175+ console .log (JSON .parse (a2 ));
176+ ```
177+
133178### CLI
134179```
135180Usage: node-llama-cpp <command> [options]
136181
137182Commands:
138- node-llama-cpp download Download a release of llama.cpp and compile it
139- node-llama-cpp build Compile the currently downloaded llama.cpp
140- node-llama-cpp clear [type] Clear files created by llama-cli
141- node-llama-cpp chat Chat with a LLama model
183+ node-llama-cpp download Download a release of llama.cpp and compile it
184+ node-llama-cpp build Compile the currently downloaded llama.cpp
185+ node-llama-cpp clear [type] Clear files created by node- llama-cpp
186+ node-llama-cpp chat Chat with a LLama model
142187
143188Options:
144189 -h, --help Show help [boolean]
@@ -152,15 +197,17 @@ node-llama-cpp download
152197Download a release of llama.cpp and compile it
153198
154199Options:
155- -h, --help Show help [boolean]
156- --repo The GitHub repository to download a release of llama.cpp from. Can also be set v
157- ia the NODE_LLAMA_CPP_REPO environment variable
200+ -h, --help Show help [boolean]
201+ --repo The GitHub repository to download a release of llama.cpp from. Can also be
202+ set via the NODE_LLAMA_CPP_REPO environment variable
158203 [string] [default: "ggerganov/llama.cpp"]
159- --release The tag of the llama.cpp release to download. Can also be set via the NODE_LLAMA
160- _CPP_REPO_RELEASE environment variable [string] [default: "latest"]
161- --arch The architecture to compile llama.cpp for [string]
162- --nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
163- -v, --version Show version number [boolean]
204+ --release The tag of the llama.cpp release to download. Set to "latest" to download t
205+ he latest release. Can also be set via the NODE_LLAMA_CPP_REPO_RELEASE envi
206+ ronment variable [string] [default: "latest"]
207+ -a, --arch The architecture to compile llama.cpp for [string]
208+ -t, --nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
209+ --skipBuild, --sb Skip building llama.cpp after downloading it [boolean] [default: false]
210+ -v, --version Show version number [boolean]
164211```
165212
166213#### ` build ` command
@@ -171,16 +218,16 @@ Compile the currently downloaded llama.cpp
171218
172219Options:
173220 -h, --help Show help [boolean]
174- --arch The architecture to compile llama.cpp for [string]
175- --nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
221+ -a, --arch The architecture to compile llama.cpp for [string]
222+ -t, --nodeTarget The Node.js version to compile llama.cpp for. Example: v18.0.0 [string]
176223 -v, --version Show version number [boolean]
177224```
178225
179226#### ` clear ` command
180227```
181228node-llama-cpp clear [type]
182229
183- Clear files created by llama-cli
230+ Clear files created by node- llama-cpp
184231
185232Options:
186233 -h, --help Show help [boolean]
@@ -195,20 +242,45 @@ node-llama-cpp chat
195242Chat with a LLama model
196243
197244Required:
198- --model LLama model file to use for the chat [string] [required]
245+ -m, --model LLama model file to use for the chat [string] [required]
199246
200247Optional:
201- --systemInfo Print llama.cpp system info [boolean] [default: false]
202- --systemPrompt System prompt to use against the model. [default value: You are a helpful, res
203- pectful and honest assistant. Always answer as helpfully as possible. If a que
204- stion does not make any sense, or is not factually coherent, explain why inste
205- ad of answering something not correct. If you don't know the answer to a quest
206- ion , please don't share false information.]
248+ -i, --systemInfo Print llama.cpp system info [boolean] [default: false]
249+ -s, --systemPrompt System prompt to use against the model. [default value: You are a helpful,
250+ respectful and honest assistant. Always answer as helpfully as possible. If
251+ a question does not make any sense, or is not factually coherent, explain
252+ why instead of answering something not correct. If you don't know the answe
253+ r to a question , please don't share false information.]
207254 [string] [default: "You are a helpful, respectful and honest assistant. Always answer as helpfully
208255 as possible.
209256 If a question does not make any sense, or is not factually coherent, explain why ins
210257 tead of answering something not correct. If you don't know the answer to a question, please don't
211258 share false information."]
259+ -w, --wrapper Chat wrapper to use. Use `auto` to automatically select a wrapper based on
260+ the model's BOS token
261+ [string] [choices: "auto", "general", "llamaChat", "chatML"] [default: "general"]
262+ -c, --contextSize Context size to use for the model [number] [default: 4096]
263+ -g, --grammar Restrict the model response to a specific grammar, like JSON for example
264+ [string] [choices: "text", "json", "list", "arithmetic", "japanese", "chess"] [default: "text"]
265+ -t, --temperature Temperature is a hyperparameter that controls the randomness of the generat
266+ ed text. It affects the probability distribution of the model's output toke
267+ ns. A higher temperature (e.g., 1.5) makes the output more random and creat
268+ ive, while a lower temperature (e.g., 0.5) makes the output more focused, d
269+ eterministic, and conservative. The suggested temperature is 0.8, which pro
270+ vides a balance between randomness and determinism. At the extreme, a tempe
271+ rature of 0 will always pick the most likely next token, leading to identic
272+ al outputs in each run. Set to `0` to disable. [number] [default: 0]
273+ -k, --topK Limits the model to consider only the K most likely next tokens for samplin
274+ g at each step of sequence generation. An integer number between `1` and th
275+ e size of the vocabulary. Set to `0` to disable (which uses the full vocabu
276+ lary). Only relevant when `temperature` is set to a value greater than 0.
277+ [number] [default: 40]
278+ -p, --topP Dynamically selects the smallest set of tokens whose cumulative probability
279+ exceeds the threshold P, and samples the next token only from this set. A
280+ float number between `0` and `1`. Set to `1` to disable. Only relevant when
281+ `temperature` is set to a value greater than `0`. [number] [default: 0.95]
282+ --maxTokens, --mt Maximum number of tokens to generate in responses. Set to `0` to disable. S
283+ et to `-1` to set to the context size [number] [default: 0]
212284
213285Options:
214286 -h, --help Show help [boolean]
0 commit comments