Skip to content

Conversation

@aheejin
Copy link
Member

@aheejin aheejin commented Nov 26, 2025

This scans the code after optimizations and removes unused function names from 'names' field in the source map, reducing its size. Emscripten has not been generating 'names' field so far, but after emscripten-core/emscripten#25870, it will generate the field in case llvm-dwarfdump supports a new option --filter-child-tag.

This scans the code after optimizations and removes unused function
names from 'names' field in the source map, reducing its size.
Emscripten has not been generating 'names' field so far, but after #???,
it will generate the field in case `llvm-dwarfdump` supports a new
option `--filter-child-tag`.
@aheejin aheejin requested a review from dschuff November 26, 2025 01:21
aheejin added a commit to aheejin/emscripten that referenced this pull request Nov 26, 2025
This adds support for `names` field in source maps, which contains
function names. Source map mappings are correspondingly updated and
emsymbolizer now can provide function name information only with source
maps.

While source maps don't provide the full inlined hierarchies, this
provides the name of the original (= pre-inlining) function, which may
not exist in the final binary because they were inlined. This is because
source maps are primarily intended for user debugging.

This also demangles C++ function names using `llvm-cxxfilt`, so the
printed names can be human-readable.

I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup
here:
https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372
with `-g -gsource-map`. With this PR and
WebAssembly/binaryen#8068, the source map file
size increases by 3.5x (8632423 -> 30070042) primarily due to the
function name strings.

From `llvm-dwarfdump` output, this also requires additional parsing of
`DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at
any depths (because functions can be within nested namespaces or
classes), so we cannot use `--recurse-depth=0` (emscripten-core#9580) anymore. In case
of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in
the command line, the size of its text output increased by 27.5x, but
with the `--filter-child-tag` / `-t` option
(llvm/llvm-project#165720), the text output
increased only (?) by 3.2x, which I think is tolerable. This disables
`names` field generation when `-t` option is not available in
`llvm-dwarfdump` because it was added recently. To avoid this text size
problem, we can consider using DWARF-parsing Python libraries like
https://github.com/eliben/pyelftools, but this will make another third
party dependency, so I'm not sure if it's worth it at this point.

This also increased running time of `wasm-sourcemap.py`, in case of the
`wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking
time this was not very noticeable.

Fixes emscripten-core#20715 and closes emscripten-core#25116.
@aheejin aheejin changed the title Update 'names' field in source maps Update names field in source maps Nov 26, 2025
@aheejin
Copy link
Member Author

aheejin commented Dec 4, 2025

Ping 😄

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments


// Create the new list of names and the mapping from old to new indices.
uint32_t newIndex = 0;
for (auto& pair : oldToNewIndex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on an unordered map, so the indexing may end up nondeterministic, I worry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed oldToNewIndex to std::map.

;;@ src.cpp:2:1:used
(nop)
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add another used symbol, to get some coverage of the order of the symbols being deterministic? (not great coverage, but it might help)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: a55a844


// Collect all used symbol name indexes.
for (auto& func : wasm->functions) {
for (auto& pair : func->debugLocations) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you even do something like [_, location] here instead of having to use pair.second?

Copy link
Member Author

@aheejin aheejin Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

aheejin and others added 2 commits December 4, 2025 23:36
Co-authored-by: Derek Schuff <dschuff@chromium.org>
@kripken
Copy link
Member

kripken commented Dec 5, 2025

CI error might be fixed eventually by #8094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants