You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Kobold_API.md
+10-5Lines changed: 10 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -138,25 +138,30 @@ Enables GPU acceleration using Vulkan, which is compatible with a broader range
138
138
139
139
## Using Multiple GPUs
140
140
141
-
The program first determines how many layers are computed on the GPU(s) based on `--gpulayers`. Those layers are split according to the `--tensor_split` parameter. Layers not offloaded will be computed on the CPU. It is possible to specify `--usecublas`, `--usevulkan` or `--useclblast` and not specify `--gpulayers` in which case the prompt processing will occur on the GPU(s) but the per-token inference will not.
141
+
The program first determines how many layers are computed on the GPU(s) based on `--gpulayers`. Those layers are split according to the `--tensor_split` parameter. Layers not offloaded will be computed on the CPU. It is possible to specify `--usecublas`, `--usevulkan`, or `--useclblast` and not specify `--gpulayers`, in which case the prompt processing will occur on the GPU(s) but the per-token inference will not.
142
142
143
143
### Not Specifying GPU IDs:
144
144
145
-
- By default, if no GPU IDs are specified after `--usecublas` or `--usevulkan` all compatible GPUs will be used and layers will be distributed equally.
145
+
- By default, if no GPU IDs are specified after `--usecublas` or `--usevulkan`, all compatible GPUs will be used and layers will be distributed equally.
146
146
- NOTE: This can be bad if the GPUs are different sizes.
147
-
- Use `--tensor_split` to control the ratio, e.g., `--tensor_split 4 1` for a 80%/20% split on two GPUs.
147
+
- Use `--tensor_split` to control the ratio, e.g., `--tensor_split 4 1` for an 80%/20% split on two GPUs.
148
148
- The number of values in `--tensor_split` should match the total number of available GPUs.
149
149
150
150
### Specifying a Single GPU ID:
151
151
152
152
- Don't use `--tensor_split`. However, you can still use `--gpulayers`.
153
153
154
-
### Specifying Some GPUs:
154
+
### Specifying Some GPUs and Offloading Layers to Those GPUs:
155
155
156
156
- If some (but not all) GPU IDs are provided after `--usecublas` or `--usevulkan`, only those GPUs will be used for layer offloading.
157
157
- Use `--tensor_split` to control the distribution ratio among the specified GPUs.
158
158
- The number of values in `--tensor_split` should match the number of GPUs selected.
159
-
- Example: With four GPUs available but only specifying the last two with `--usecublas 2 3`, also using `--tensor_split 1 1` would offload an equal amount of layers to the third and fourth GPUs but none to the first two.
159
+
- Example: With four GPUs available but only specifying the last two with `--usecublas 2 3`, using `--tensor_split 1 1` would offload an equal amount of layers to the third and fourth GPUs but none to the first two.
160
+
161
+
### Specifying Some GPUs to Process Layers While Allowing Other GPUs for Prompt Processing:
162
+
163
+
- Use `--usecublas` or `--usevulkan` without specifying the GPU Ids, which makes available all GPUs for prompt processing.
164
+
- Only assign layers to certain GPUs. Example: Using `--usecublas` and `--tensor_split 5 0 3 2` will offload 50% of the layers to the first GPU, 30% to the third, and 20% to the fourth. However, the second GPU will still be available for other processing that doesn't require layers of the model.
0 commit comments