-
Notifications
You must be signed in to change notification settings - Fork 37
Improve FastLDF type stability when all parameters are linked or unlinked #1141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Benchmark Report
Computer InformationBenchmark Results |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## breaking #1141 +/- ##
============================================
+ Coverage 80.60% 80.66% +0.05%
============================================
Files 41 41
Lines 3861 3878 +17
============================================
+ Hits 3112 3128 +16
- Misses 749 750 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3875d41 to
0a01995
Compare
|
DynamicPPL.jl documentation for PR #1141 is available at: |
177656b to
9310ec0
Compare
b403dce to
072da15
Compare
|
Would something bad happen if we just wrapped all the arrays returned by Bijectors in trivial SubArrays? I did some very crude benchmarks locally and at least |
992cea9 to
759bf8a
Compare
072da15 to
e5a3c97
Compare
7fa0986 to
6849bc2
Compare
e5a3c97 to
fa0022b
Compare
6849bc2 to
b1368dd
Compare
fa0022b to
accb515
Compare
I tried with the following patch applied to #1139 (the changed method only applies to arrayvariates, because univariates are specially handled below) diff --git a/src/utils.jl b/src/utils.jl
index 2d7b0404..4e341ee7 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -462,6 +462,13 @@ size of the realization after pushed through the transformation.
"""
from_vec_transform(f, sz) = from_vec_transform_for_size(Bijectors.output_size(f, sz))
+struct View end
+(::View)(x::AbstractVector) = view(x, 1:length(x))
+function Bijectors.with_logabsdet_jacobian(::View, x::AbstractVector)
+ return view(x, :), zero(LogProbType)
+end
+Bijectors.inverse(::View) = View()
+
"""
from_linked_vec_transform(dist::Distribution)
@@ -475,7 +482,7 @@ See also: [`DynamicPPL.invlink_transform`](@ref), [`DynamicPPL.from_vec_transfor
function from_linked_vec_transform(dist::Distribution)
f_invlink = invlink_transform(dist)
f_vec = from_vec_transform(inverse(f_invlink), size(dist))
- return f_invlink ∘ f_vec
+ return View() ∘ f_invlink ∘ f_vec
end
# UnivariateDistributions need to be handled as a special case, because size(dist) is (),This makes the output type the same for both linked and unlinked: julia> dist = product_distribution([Beta(2, 2), Beta(2, 2)])
Product{Continuous, Beta{Float64}, Vector{Beta{Float64}}}(v=Beta{Float64}[Beta{Float64}(α=2.0, β=2.0), Beta{Float64}(α=2.0, β=2.0)])
julia> x = @view [0.5, 0.5][1:2]
2-element view(::Vector{Float64}, 1:2) with eltype Float64:
0.5
0.5
julia> DynamicPPL.from_linked_vec_transform(dist)(x)
2-element view(::Vector{Float64}, 1:2) with eltype Float64:
0.6224593312018546
0.6224593312018546
julia> DynamicPPL.from_vec_transform(dist)(x)
2-element view(::Vector{Float64}, 1:2) with eltype Float64:
0.5
0.5
julia> typeof(DynamicPPL.from_linked_vec_transform(dist)(x)) == typeof(DynamicPPL.from_vec_transform(dist)(x))
truebut performance is still poor (it's basically exactly the same as in #1139, including the atrocious Mooncake slowdown), and Enzyme instead errors with a different message |
|
Our benchmark models clearly aren't hitting these cases. |
79bfbc9 to
6fd177a
Compare
cf33cff to
d1c002f
Compare
accb515 to
4c4cd72
Compare
4c4cd72 to
60736ba
Compare
mhauru
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. I stuck in a proposal for some extra checks and documentation, but this is optional since it isn't exported.
|
|
||
| ADTYPES = Dict( | ||
| "EnzymeForward" => | ||
| ADTYPES = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entirely ambivalent about which constructor to use, but curious if you had a reason for changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, CI crashes with a Julia GC error when using a Dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(don't ask 🙃)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...
* v0.39 * Update DPPL compats for benchmarks and docs * remove merge conflict markers * Remove `NodeTrait` (#1133) * Remove NodeTrait * Changelog * Fix exports * docs * fix a bug * Fix doctests * Fix test * tweak changelog * FastLDF / InitContext unified (#1132) * Fast Log Density Function * Make it work with AD * Optimise performance for identity VarNames * Mark `get_range_and_linked` as having zero derivative * Update comment * make AD testing / benchmarking use FastLDF * Fix tests * Optimise away `make_evaluate_args_and_kwargs` * const func annotation * Disable benchmarks on non-typed-Metadata-VarInfo * Fix `_evaluate!!` correctly to handle submodels * Actually fix submodel evaluate * Document thoroughly and organise code * Support more VarInfos, make it thread-safe (?) * fix bug in parsing ranges from metadata/VNV * Fix get_param_eltype for TSVI * Disable Enzyme benchmark * Don't override _evaluate!!, that breaks ForwardDiff (sometimes) * Move FastLDF to experimental for now * Fix imports, add tests, etc * More test fixes * Fix imports / tests * Remove AbstractFastEvalContext * Changelog and patch bump * Add correctness tests, fix imports * Concretise parameter vector in tests * Add zero-allocation tests * Add Chairmarks as test dep * Disable allocations tests on multi-threaded * Fast InitContext (#1125) * Make InitContext work with OnlyAccsVarInfo * Do not convert NamedTuple to Dict * remove logging * Enable InitFromPrior and InitFromUniform too * Fix `infer_nested_eltype` invocation * Refactor FastLDF to use InitContext * note init breaking change * fix logjac sign * workaround Mooncake segfault * fix changelog too * Fix get_param_eltype for context stacks * Add a test for threaded observe * Export init * Remove dead code * fix transforms for pathological distributions * Tidy up loads of things * fix typed_identity spelling * fix definition order * Improve docstrings * Remove stray comment * export get_param_eltype (unfortunatley) * Add more comment * Update comment * Remove inlines, fix OAVI docstring * Improve docstrings * Simplify InitFromParams constructor * Replace map(identity, x[:]) with [i for i in x[:]] * Simplify implementation for InitContext/OAVI * Add another model to allocation tests Co-authored-by: Markus Hauru <markus@mhauru.org> * Revert removal of dist argument (oops) * Format * Update some outdated bits of FastLDF docstring * remove underscores --------- Co-authored-by: Markus Hauru <markus@mhauru.org> * implement `LogDensityProblems.dimension` * forgot about capabilities... * use interpolation in run_ad * Improvements to benchmark outputs (#1146) * print output * fix * reenable * add more lines to guide the eye * reorder table * print tgrad / trel as well * forgot this type * Allow generation of `ParamsWithStats` from `FastLDF` plus parameters, and also `bundle_samples` (#1129) * Implement `ParamsWithStats` for `FastLDF` * Add comments * Implement `bundle_samples` for ParamsWithStats -> MCMCChains * Remove redundant comment * don't need Statistics? * Make FastLDF the default (#1139) * Make FastLDF the default * Add miscellaneous LogDensityProblems tests * Use `init!!` instead of `fast_evaluate!!` * Rename files, rebalance tests * Implement `predict`, `returned`, `logjoint`, ... with `OnlyAccsVarInfo` (#1130) * Use OnlyAccsVarInfo for many re-evaluation functions * drop `fast_` prefix * Add a changelog * Improve FastLDF type stability when all parameters are linked or unlinked (#1141) * Improve type stability when all parameters are linked or unlinked * fix a merge conflict * fix enzyme gc crash (locally at least) * Fixes from review * Make threadsafe evaluation opt-in (#1151) * Make threadsafe evaluation opt-in * Reduce number of type parameters in methods * Make `warned_warn_about_threads_threads_threads_threads` shorter * Improve `setthreadsafe` docstring * warn on bare `@threads` as well * fix merge * Fix performance issues * Use maxthreadid() in TSVI * Move convert_eltype code to threadsafe eval function * Point to new Turing docs page * Add a test for setthreadsafe * Tidy up check_model * Apply suggestions from code review Fix outdated docstrings Co-authored-by: Markus Hauru <markus@mhauru.org> * Improve warning message * Export `requires_threadsafe` * Add an actual docstring for `requires_threadsafe` --------- Co-authored-by: Markus Hauru <markus@mhauru.org> * Standardise `:lp` -> `:logjoint` (#1161) * Standardise `:lp` -> `:logjoint` * changelog * fix a test --------- Co-authored-by: Markus Hauru <mhauru@turing.ac.uk> Co-authored-by: Markus Hauru <markus@mhauru.org>
The approach used in FastLDF potentially suffers from type stability issues. The reason is that there is a slightly subtle issue with using views. For example, this is responsible for failing type stability tests on #1115, which implement the naive solution of adding
@viewthroughout DefaultContext code. It's also (partly) responsible for Enzyme failures on previous PRs.The crux of the issue is that if you cannot tell whether a parameter is linked or unlinked, then you have to do something like this:
Now, consider
dist = product_distribution([Beta(2, 2), Beta(2, 2)]):and the effects of this transformation on a view:
So, generally when executing this code, if you can't tell whether the parameter is linked ahead of time, you will get a union type. Now running this in Julia itself doesn't affect performance that much because Julia is capable of handling this via union splitting. However, a test like
@inferredin #1115, or Enzyme's analysis, requires stricter type stability.This PR therefore implements special cases for what are by far the two most common use cases, where either all the parameters are linked, or all the parameters are unlinked. This is determined at LogDensityFunction construction time, and passed all the way down into
initvia a type parameter.I am still quite unsure whether there is a real scenario where mixed linked and unlinked variables. I think this was something to do with Gibbs, but if some samplers need linking (e.g. HMC), then surely we can just force all variables to be linked. This would only not be possible if some samplers need to be not linked, but I'm genuinely not sure if there is any sampler that has that property.
However, Gibbs doesn't use LDF, so I am not sure that this is an important consideration for this PR. Even so, there should be no regression in performance for the mixed linked/unlinked case: this PR should just be a strict improvement for the all-linked or all-unlinked case.
Why can't we just store the transform in the LDF?
The transform has to be constructed on-the-fly from
dist, and can't be stored ahead of time because ofBenchmarks (unlinked)
For most of the models that were benchmarked previously, the only real difference is that this PR makes Enzyme quite a bit faster. Still, it's good to verify that for those models, this PR does not cause any regressions.
Here 'before this PR' = #1139, 'after this PR' = this branch, 'v0.38.9' is current main.
The 'problem' with these benchmarks is that those models didn't catch this type stability issue. For a model where the type instability actually kicks in (
demo3here isDynamicPPL.TestUtils.DEMO_MODELS[3], see definition here), this makes a huge difference.Benchmarks (linked)
Here are the same benchmarks but run with linked parameters instead. This is arguably the more important case because HMC/NUTS use this.
Benchmark code