ggml webgpu: add support for emscripten builds #17184

reeselevine · 2025-11-12T01:25:59Z

This PR builds on and supersedes #15826 from @ngxson.

Adds __EMSCRIPTEN__ preprocessors conditionals in places necessary for compilation (this included some OS specific things in common/
Adds flags for emscripten builds for 64-bit memory, memory growth that I found to be required to get the backend operations to pass when running through the browser (Chrome/M3 system)
Also adds a Github workflow that ensures emscripten builds compile (at least with test-backend-ops), to avoid other code additions that accidentally break things.
Disables dawn/native-only WebGPU features when building for the browser, like experimental support for subgroup matrices

Add fast matrix and matrix/vector multiplication.

* webgpu : fix build on emscripten * more debugging stuff * test-backend-ops: force single thread on wasm * fix single-thread case for init_tensor_uniform * use jspi * add pthread * test: remember to set n_thread for cpu backend * Add buffer label and enable dawn-specific toggles to turn off some checks * Intermediate state * Fast working f16/f32 vec4 * Working float fast mul mat * Clean up naming of mul_mat to match logical model, start work on q mul_mat * Setup for subgroup matrix mat mul * Basic working subgroup matrix * Working subgroup matrix tiling * Handle weirder sg matrix sizes (but still % sg matrix size) * Working start to gemv * working f16 accumulation with shared memory staging * Print out available subgroup matrix configurations * Vectorize dst stores for sg matrix shader * Gemv working scalar * Minor set_rows optimization (#4) * updated optimization, fixed errors * non vectorized version now dispatches one thread per element * Simplify * Change logic for set_rows pipelines --------- Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Comment on dawn toggles * Working subgroup matrix code for (semi)generic sizes * Remove some comments * Cleanup code * Update dawn version and move to portable subgroup size * Try to fix new dawn release * Update subgroup size comment * Only check for subgroup matrix configs if they are supported * Add toggles for subgroup matrix/f16 support on nvidia+vulkan * Make row/col naming consistent * Refactor shared memory loading * Move sg matrix stores to correct file * Working q4_0 * Formatting * Work with emscripten builds * Fix test-backend-ops emscripten for f16/quantized types * Use emscripten memory64 to support get_memory * Add build flags and try ci --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

slaren · 2025-11-13T19:53:52Z

tests/test-backend-ops.cpp

 #include <ggml-alloc.h>
 #include <ggml-backend.h>
 #include <ggml-cpp.h>
+#include <ggml-cpu.h>


The functions in this header are not compatible with DL backends, so they cannot be used here. There is already code here to obtain the set_threads function dynamically, so it shouldn't be necessary.

This change is carried over from my original PR. IIRC just using set_threads is not enough, probably it's a bug?

Related comment: #15826 (comment)

I am not aware of any bugs. I think that the call works as intended, I have changed the value many times before to test performance of some ops with different numbers of threads.

Looking again at the code, seems like the earlier condition skips the set_n_threads for CPU backend altogether (the Skipping CPU backend message). This likely be the problem. Can you propose a fix? @reeselevine

llama.cpp/tests/test-backend-ops.cpp

Lines 8154 to 8169 in becc481

if (backend_filter == NULL && ggml_backend_dev_type(dev) == GGML_BACKEND_DEVICE_TYPE_CPU && mode != MODE_GRAD) {

output_printer->print_backend_init(backend_init_info(

i, ggml_backend_dev_count(), ggml_backend_dev_name(dev), true, "Skipping CPU backend"));

n_ok++;

continue;

}

ggml_backend_t backend = ggml_backend_dev_init(dev, NULL);

GGML_ASSERT(backend != NULL);

ggml_backend_reg_t reg = ggml_backend_dev_backend_reg(dev);

auto ggml_backend_set_n_threads_fn = (ggml_backend_set_n_threads_t) ggml_backend_reg_get_proc_address(reg, "ggml_backend_set_n_threads");

if (ggml_backend_set_n_threads_fn) {

// TODO: better value for n_threads

ggml_backend_set_n_threads_fn(backend, std::thread::hardware_concurrency());

}

You may need to change the number of threads of the CPU backend used for reference here:

llama.cpp/tests/test-backend-ops.cpp

Lines 7880 to 7889 in becc481

if (mode == MODE_TEST) {

auto test_cases = make_test_cases_eval();

filter_test_cases(test_cases, params_filter);

ggml_backend_t backend_cpu = ggml_backend_init_by_type(GGML_BACKEND_DEVICE_TYPE_CPU, NULL);

if (backend_cpu == NULL) {

test_operation_info info("", "", "CPU");

info.set_error("backend", "Failed to initialize CPU backend");

output_printer->print_operation(info);

return false;

}

However, if the problem ultimately is that emscripten does not work at all with more than 1 thread, it may be better to fix this in the CPU backend. I would prefer to avoid backend/platform-specific hacks in test-backend-ops.

That sounds good. You could also modify ggml_backend_cpu_get_proc_address to just not return a set_n_threads function when built with emscripten.

updated in my latest commit!

There are two places where cpu_set_n_threads itself is specifically called: tools/cvector-generator and tools/export-lor.

If we wanted to go further, I could refactor these to use the get_proc_adress function instead, and remove the cpu_set_n_threads function from the CPU backend API completely.

Edit: I realize this might break users of llama.cpp, if they happen to use this function though.

The change looks good, although it may be more reliable to change the CPU backend graph_compute function to not launch any threads when built with emscripten, rather than trying to force n_threads to be 1 externally. It would be easy to forget some way to change the number of threads, or reintroduce the issue in the future.

As you noted, applications may still depend on the static API rather than using get_proc_address, so we shouldn't remove these functions entirely at this point. Removing the function conditionally when building with emscripten may be ok, but would also make it more difficult to port ggml applications that rely on these functions to webassembly, so I don't think there is a good enough reason to do so at this moment.

That makes sense. It looks like ggml_graph_compute() relies on a ggml_cplan, where tasks are actually assigned to threads. So I actually modified ggml_graph_plan() to always use one thread for single-thread emscripten builds, if that makes sense to you.

Based on that, I think we can also probably remove the forcing to one thread in the previous commit, since this change should override those anyways.

…graph_plan

reeselevine and others added 4 commits November 5, 2025 08:24

Faster tensors (#8)

c6bc125

Add fast matrix and matrix/vector multiplication.

Use map for shader replacements instead of pair of strings

7c2b2ef

Merge remote-tracking branch 'upstream/master'

6db7298

reeselevine requested review from CISC, ggerganov and slaren as code owners November 12, 2025 01:26

DajanaV mentioned this pull request Nov 12, 2025

UPSTREAM PR #17184: ggml webgpu: add support for emscripten builds auroralabs-loci/llama.cpp#176

Open

github-actions bot added build Compilation issues script Script related testing Everything test related devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Nov 12, 2025

Remove extra whitespace

56e6959

slaren reviewed Nov 13, 2025

View reviewed changes

reeselevine added 3 commits November 13, 2025 19:34

Merge remote-tracking branch 'upstream/master'

2abcdd6

Move wasm single-thread logic out of test-backend-ops for cpu backend

5bcd577

Disable multiple threads for emscripten single-thread builds in ggml_…

f9ba819

…graph_plan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml webgpu: add support for emscripten builds #17184

ggml webgpu: add support for emscripten builds #17184

reeselevine commented Nov 12, 2025

Uh oh!

slaren Nov 13, 2025

Uh oh!

ngxson Nov 13, 2025

Uh oh!

slaren Nov 13, 2025

Uh oh!

ngxson Nov 13, 2025

Uh oh!

slaren Nov 13, 2025 •

edited

Loading

Uh oh!

slaren Nov 14, 2025

Uh oh!

reeselevine Nov 14, 2025

Uh oh!

reeselevine Nov 14, 2025 •

edited

Loading

Uh oh!

slaren Nov 17, 2025

Uh oh!

reeselevine Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if (backend_filter == NULL && ggml_backend_dev_type(dev) == GGML_BACKEND_DEVICE_TYPE_CPU && mode != MODE_GRAD) {
	output_printer->print_backend_init(backend_init_info(
	i, ggml_backend_dev_count(), ggml_backend_dev_name(dev), true, "Skipping CPU backend"));
	n_ok++;
	continue;
	}

	ggml_backend_t backend = ggml_backend_dev_init(dev, NULL);
	GGML_ASSERT(backend != NULL);

	ggml_backend_reg_t reg = ggml_backend_dev_backend_reg(dev);
	auto ggml_backend_set_n_threads_fn = (ggml_backend_set_n_threads_t) ggml_backend_reg_get_proc_address(reg, "ggml_backend_set_n_threads");
	if (ggml_backend_set_n_threads_fn) {
	// TODO: better value for n_threads
	ggml_backend_set_n_threads_fn(backend, std::thread::hardware_concurrency());
	}

	if (mode == MODE_TEST) {
	auto test_cases = make_test_cases_eval();
	filter_test_cases(test_cases, params_filter);
	ggml_backend_t backend_cpu = ggml_backend_init_by_type(GGML_BACKEND_DEVICE_TYPE_CPU, NULL);
	if (backend_cpu == NULL) {
	test_operation_info info("", "", "CPU");
	info.set_error("backend", "Failed to initialize CPU backend");
	output_printer->print_operation(info);
	return false;
	}

ggml webgpu: add support for emscripten builds #17184

Are you sure you want to change the base?

ggml webgpu: add support for emscripten builds #17184

Conversation

reeselevine commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slaren Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reeselevine Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slaren Nov 13, 2025 •

edited

Loading

reeselevine Nov 14, 2025 •

edited

Loading