-
Notifications
You must be signed in to change notification settings - Fork 828
[WIP] Rewrite DAE to use a fixed point analysis #8085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
DAE can be slow because it performs several rounds of interleaved analysis and optimization. On top of this, the analysis it performs is not as precise as it could be because it never removes parameters from referenced functions and it cannot optimize unused parameters or results that are forwarded through recursive cycles. Start improving both the performance and the power of DAE by creating a new pass, called DAE2 for now. DAE2 performs a single parallel walk of the module to collect information with which it performs a fixed point analysis to find unused parameters, then does a single parallel walk of the module to optimize based on this analysis.
|
@kripken, PTAL. If this seems promising to you, I'm happy to keep working on it. |
kripken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
What is unclear to me is how much work the TODOs here would take, including my two comments below. Addressing them is a significant cause of complexity in the current pass, and I'm not sure that can be avoided. So I honestly can't tell if replacing the existing pass would be the right goal - maybe, or maybe not.
However, it is possible that this pass already handles the common case in Dart and elsewhere! That is, running this to remove those chains, then the existing DAE, may be faster, without any loss of power.
I tried to test that but hit a compile error on this PR:
src/passes/DeadArgumentElimination2.cpp:216:21: required from here
216 | funcInfos.resize(wasm->functions.size());
| ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/14/bits/stl_uninitialized.h:90:56: error: static assertion failed: result type must be constructible from input type
90 | static_assert(is_constructible<_ValueType, _Tp>::value,
| ^~~~~
/usr/include/c++/14/bits/stl_uninitialized.h:90:56: note: ‘std::integral_constant<bool, false>::value’ evaluates to false
| // check whether the user is a drop, but for simplicity we assume that | ||
| // Vacuum would have already removed such patterns. | ||
| funcInfos[index].paramUsages[curr->index] = used.getTop(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about things between the call and the get, like this?
(call $foo
(i32.eqz
(local.get $param)
)
)(things like this are a major reason for the multiple iterations in the old pass)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would still require multiple iterations of the pass, but I would not expect those iterations to be internal to the pass. My hope is that the new pass being more powerful in a single iteration makes this a good trade off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what way would it be more powerful in a single iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example the fixed point analysis will be able to fulfill this TODO: https://github.com/WebAssembly/binaryen/blob/main/src/passes/DeadArgumentElimination.cpp#L75-L77 and will similarly be able to handle unused parameters in recursive functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also by being able to remove parameters from referenced functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Improving tail-call handling and recursive functions sounds good, but I'd expect the benefits to be rare & minor? The loss of multiple iterations (especially in
dae-optimizing) would outweigh that by a lot, I worry. - SignaturePruning removes parameters from referenced functions already (and I think you'd need to reproduce the type analysis there to reproduce the same?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be more powerful than SignaturePruning, too, because IIUC that pass does not analyze unused parameters. I agree recursive cases (including mutual recursions!) will be more rare than non-recursive cases, but I would be surprised if they never show up.
Going back to the original scenario where a parameter flows indirectly into another call, I think we can actually handle this without further cycles by looking at not just the direct parent but they whole parent chain to see if we find a call. We would need to be careful to handle side effects that still depend on the parameter value, though. Examples would be the value escaping due to a local.tee or trapping or branching due to an explicit or implicit null check or cast. A simple conservative approximation would be to check that none of the parents up to the call have side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be more powerful than SignaturePruning, too, because IIUC that pass does not analyze unused parameters
No, it does do that, using ParamUtils::getUsedParams and then marking constant ones as unused, etc. - the same as DAE, but at the type level.
Overall I remain skeptical that you can improve the power here. But I am optimistic this can help with speed! If we call this CallChainOptimization, and use it to simplify common call chains, it could make DAE (and maybe Inlining) faster.
| // | ||
| // To match and exceed the power of DAE, we will need to extend this backward | ||
| // analysis to find unused results as well, and also add a forward analysis that | ||
| // propagates constants and types through parameters and results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these analyses interact: constant params become unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I plan to handle this is to keep the analyses separate: we will separately know whether a parameter is used from the current backward analysis and what value it might have (including none or many) from the future forward analysis. Then the optimization can remove call operands if the parameter is unused according to the backward analysis or it is a constant according to the forward analysis.
|
Good point that this could potentially combine well with the existing DAE pass. I'll let you know once this actually compiles and runs so we can try it out. |
DAE can be slow because it performs several rounds of interleaved
analysis and optimization. On top of this, the analysis it performs is
not as precise as it could be because it never removes parameters from
referenced functions and it cannot optimize unused parameters or results
that are forwarded through recursive cycles.
Start improving both the performance and the power of DAE by creating a
new pass, called DAE2 for now. DAE2 performs a single parallel walk of
the module to collect information with which it performs a fixed point
analysis to find unused parameters, then does a single parallel walk of
the module to optimize based on this analysis.