Skip to content

Conversation

@tlively
Copy link
Member

@tlively tlively commented Dec 3, 2025

DAE can be slow because it performs several rounds of interleaved
analysis and optimization. On top of this, the analysis it performs is
not as precise as it could be because it never removes parameters from
referenced functions and it cannot optimize unused parameters or results
that are forwarded through recursive cycles.

Start improving both the performance and the power of DAE by creating a
new pass, called DAE2 for now. DAE2 performs a single parallel walk of
the module to collect information with which it performs a fixed point
analysis to find unused parameters, then does a single parallel walk of
the module to optimize based on this analysis.

DAE can be slow because it performs several rounds of interleaved
analysis and optimization. On top of this, the analysis it performs is
not as precise as it could be because it never removes parameters from
referenced functions and it cannot optimize unused parameters or results
that are forwarded through recursive cycles.

Start improving both the performance and the power of DAE by creating a
new pass, called DAE2 for now. DAE2 performs a single parallel walk of
the module to collect information with which it performs a fixed point
analysis to find unused parameters, then does a single parallel walk of
the module to optimize based on this analysis.
@tlively tlively marked this pull request as draft December 3, 2025 06:31
@tlively
Copy link
Member Author

tlively commented Dec 3, 2025

@kripken, PTAL. If this seems promising to you, I'm happy to keep working on it.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

What is unclear to me is how much work the TODOs here would take, including my two comments below. Addressing them is a significant cause of complexity in the current pass, and I'm not sure that can be avoided. So I honestly can't tell if replacing the existing pass would be the right goal - maybe, or maybe not.

However, it is possible that this pass already handles the common case in Dart and elsewhere! That is, running this to remove those chains, then the existing DAE, may be faster, without any loss of power.

I tried to test that but hit a compile error on this PR:

src/passes/DeadArgumentElimination2.cpp:216:21:   required from here
  216 |     funcInfos.resize(wasm->functions.size());
      |     ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/14/bits/stl_uninitialized.h:90:56: error: static assertion failed: result type must be constructible from input type
   90 |       static_assert(is_constructible<_ValueType, _Tp>::value,
      |                                                        ^~~~~
/usr/include/c++/14/bits/stl_uninitialized.h:90:56: note: ‘std::integral_constant<bool, false>::value’ evaluates to false

// check whether the user is a drop, but for simplicity we assume that
// Vacuum would have already removed such patterns.
funcInfos[index].paramUsages[curr->index] = used.getTop();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about things between the call and the get, like this?

(call $foo
  (i32.eqz
    (local.get $param)
  )
)

(things like this are a major reason for the multiple iterations in the old pass)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would still require multiple iterations of the pass, but I would not expect those iterations to be internal to the pass. My hope is that the new pass being more powerful in a single iteration makes this a good trade off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way would it be more powerful in a single iteration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example the fixed point analysis will be able to fulfill this TODO: https://github.com/WebAssembly/binaryen/blob/main/src/passes/DeadArgumentElimination.cpp#L75-L77 and will similarly be able to handle unused parameters in recursive functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also by being able to remove parameters from referenced functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Improving tail-call handling and recursive functions sounds good, but I'd expect the benefits to be rare & minor? The loss of multiple iterations (especially in dae-optimizing) would outweigh that by a lot, I worry.
  2. SignaturePruning removes parameters from referenced functions already (and I think you'd need to reproduce the type analysis there to reproduce the same?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be more powerful than SignaturePruning, too, because IIUC that pass does not analyze unused parameters. I agree recursive cases (including mutual recursions!) will be more rare than non-recursive cases, but I would be surprised if they never show up.

Going back to the original scenario where a parameter flows indirectly into another call, I think we can actually handle this without further cycles by looking at not just the direct parent but they whole parent chain to see if we find a call. We would need to be careful to handle side effects that still depend on the parameter value, though. Examples would be the value escaping due to a local.tee or trapping or branching due to an explicit or implicit null check or cast. A simple conservative approximation would be to check that none of the parents up to the call have side effects.

Copy link
Member

@kripken kripken Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be more powerful than SignaturePruning, too, because IIUC that pass does not analyze unused parameters

No, it does do that, using ParamUtils::getUsedParams and then marking constant ones as unused, etc. - the same as DAE, but at the type level.

Overall I remain skeptical that you can improve the power here. But I am optimistic this can help with speed! If we call this CallChainOptimization, and use it to simplify common call chains, it could make DAE (and maybe Inlining) faster.

//
// To match and exceed the power of DAE, we will need to extend this backward
// analysis to find unused results as well, and also add a forward analysis that
// propagates constants and types through parameters and results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these analyses interact: constant params become unused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I plan to handle this is to keep the analyses separate: we will separately know whether a parameter is used from the current backward analysis and what value it might have (including none or many) from the future forward analysis. Then the optimization can remove call operands if the parameter is unused according to the backward analysis or it is a constant according to the forward analysis.

@tlively
Copy link
Member Author

tlively commented Dec 3, 2025

Good point that this could potentially combine well with the existing DAE pass. I'll let you know once this actually compiles and runs so we can try it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants