Skip to content

Commit e4a911c

Browse files
authored
Update Kobold_API.md
1 parent fbfd5df commit e4a911c

File tree

1 file changed

+10
-5
lines changed

1 file changed

+10
-5
lines changed

Kobold_API.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -138,25 +138,30 @@ Enables GPU acceleration using Vulkan, which is compatible with a broader range
138138

139139
## Using Multiple GPUs
140140

141-
The program first determines how many layers are computed on the GPU(s) based on `--gpulayers`. Those layers are split according to the `--tensor_split` parameter. Layers not offloaded will be computed on the CPU. It is possible to specify `--usecublas`, `--usevulkan` or `--useclblast` and not specify `--gpulayers` in which case the prompt processing will occur on the GPU(s) but the per-token inference will not.
141+
The program first determines how many layers are computed on the GPU(s) based on `--gpulayers`. Those layers are split according to the `--tensor_split` parameter. Layers not offloaded will be computed on the CPU. It is possible to specify `--usecublas`, `--usevulkan`, or `--useclblast` and not specify `--gpulayers`, in which case the prompt processing will occur on the GPU(s) but the per-token inference will not.
142142

143143
### Not Specifying GPU IDs:
144144

145-
- By default, if no GPU IDs are specified after `--usecublas` or `--usevulkan` all compatible GPUs will be used and layers will be distributed equally.
145+
- By default, if no GPU IDs are specified after `--usecublas` or `--usevulkan`, all compatible GPUs will be used and layers will be distributed equally.
146146
- NOTE: This can be bad if the GPUs are different sizes.
147-
- Use `--tensor_split` to control the ratio, e.g., `--tensor_split 4 1` for a 80%/20% split on two GPUs.
147+
- Use `--tensor_split` to control the ratio, e.g., `--tensor_split 4 1` for an 80%/20% split on two GPUs.
148148
- The number of values in `--tensor_split` should match the total number of available GPUs.
149149

150150
### Specifying a Single GPU ID:
151151

152152
- Don't use `--tensor_split`. However, you can still use `--gpulayers`.
153153

154-
### Specifying Some GPUs:
154+
### Specifying Some GPUs and Offloading Layers to Those GPUs:
155155

156156
- If some (but not all) GPU IDs are provided after `--usecublas` or `--usevulkan`, only those GPUs will be used for layer offloading.
157157
- Use `--tensor_split` to control the distribution ratio among the specified GPUs.
158158
- The number of values in `--tensor_split` should match the number of GPUs selected.
159-
- Example: With four GPUs available but only specifying the last two with `--usecublas 2 3`, also using `--tensor_split 1 1` would offload an equal amount of layers to the third and fourth GPUs but none to the first two.
159+
- Example: With four GPUs available but only specifying the last two with `--usecublas 2 3`, using `--tensor_split 1 1` would offload an equal amount of layers to the third and fourth GPUs but none to the first two.
160+
161+
### Specifying Some GPUs to Process Layers While Allowing Other GPUs for Prompt Processing:
162+
163+
- Use `--usecublas` or `--usevulkan` without specifying the GPU Ids, which makes available all GPUs for prompt processing.
164+
- Only assign layers to certain GPUs. Example: Using `--usecublas` and `--tensor_split 5 0 3 2` will offload 50% of the layers to the first GPU, 30% to the third, and 20% to the fourth. However, the second GPU will still be available for other processing that doesn't require layers of the model.
160165

161166
### Usage with `--useclblast`:
162167

0 commit comments

Comments
 (0)