Improve FastLDF type stability when all parameters are linked or unlinked #1141

penelopeysm · 2025-11-15T17:25:30Z

The approach used in FastLDF potentially suffers from type stability issues. The reason is that there is a slightly subtle issue with using views. For example, this is responsible for failing type stability tests on #1115, which implement the naive solution of adding @view throughout DefaultContext code. It's also (partly) responsible for Enzyme failures on previous PRs.

The crux of the issue is that if you cannot tell whether a parameter is linked or unlinked, then you have to do something like this:

transform = if is_linked_param
    from_linked_vec_transform(dist)
else
    from_vec_transform(dist)
end
x = with_logabsdet_jacobian(transform, y)

Now, consider dist = product_distribution([Beta(2, 2), Beta(2, 2)]):

julia> using DynamicPPL, Distributions

julia> DynamicPPL.from_linked_vec_transform(dist)
Bijectors.Inverse{Bijectors.TruncatedBijector{Float64, Float64}}(Bijectors.TruncatedBijector{Float64, Float64}(0.0, 1.0)) ∘ identity

julia> DynamicPPL.from_vec_transform(dist)
identity (generic function with 1 method)

and the effects of this transformation on a view:

julia> x = @view [0.5, 0.5][1:2]
2-element view(::Vector{Float64}, 1:2) with eltype Float64:
 0.5
 0.5

julia> DynamicPPL.from_linked_vec_transform(dist)(x)
2-element Vector{Float64}:
 0.6224593312018546
 0.6224593312018546

julia> DynamicPPL.from_vec_transform(dist)(x)
2-element view(::Vector{Float64}, 1:2) with eltype Float64:
 0.5
 0.5

So, generally when executing this code, if you can't tell whether the parameter is linked ahead of time, you will get a union type. Now running this in Julia itself doesn't affect performance that much because Julia is capable of handling this via union splitting. However, a test like @inferred in #1115, or Enzyme's analysis, requires stricter type stability.

This PR therefore implements special cases for what are by far the two most common use cases, where either all the parameters are linked, or all the parameters are unlinked. This is determined at LogDensityFunction construction time, and passed all the way down into init via a type parameter.

I am still quite unsure whether there is a real scenario where mixed linked and unlinked variables. I think this was something to do with Gibbs, but if some samplers need linking (e.g. HMC), then surely we can just force all variables to be linked. This would only not be possible if some samplers need to be not linked, but I'm genuinely not sure if there is any sampler that has that property.

However, Gibbs doesn't use LDF, so I am not sure that this is an important consideration for this PR. Even so, there should be no regression in performance for the mixed linked/unlinked case: this PR should just be a strict improvement for the all-linked or all-unlinked case.

Why can't we just store the transform in the LDF?

The transform has to be constructed on-the-fly from dist, and can't be stored ahead of time because of

x ~ Normal()
y ~ truncated(Normal(); lower=x)

Benchmarks (unlinked)

For most of the models that were benchmarked previously, the only real difference is that this PR makes Enzyme quite a bit faster. Still, it's good to verify that for those models, this PR does not cause any regressions.

Here 'before this PR' = #1139, 'after this PR' = this branch, 'v0.38.9' is current main.

# trivial         before this PR                        after this PR                          v0.38.9
eval      ----    10.945 ns                             10.649 ns                              158.896 ns (6 allocs: 192 bytes)
grad (FD) ----    38.357 ns (3 allocs: 96 bytes)        39.345 ns (3 allocs: 96 bytes)         301.449 ns (13 allocs: 496 bytes)
grad (RD) ----    2.610 μs (46 allocs: 1.562 KiB)       2.629 μs (44 allocs: 1.500 KiB)        4.174 μs (82 allocs: 3.062 KiB)
grad (MC) ----    242.294 ns (4 allocs: 192 bytes)      269.660 ns (4 allocs: 192 bytes)       1.173 μs (25 allocs: 1.219 KiB)
grad (EN) ----    127.706 ns (2 allocs: 64 bytes)       100.539 ns (2 allocs: 64 bytes)        434.441 ns (16 allocs: 560 bytes)

# eight-schools   before this PR                        after this PR                          v0.38.9
eval      ----    168.374 ns (4 allocs: 256 bytes)      170.667 ns (4 allocs: 256 bytes)       851.706 ns (21 allocs: 1.344 KiB)
grad (FD) ----    821.500 ns (11 allocs: 2.594 KiB)     775.211 ns (11 allocs: 2.594 KiB)      1.528 μs (28 allocs: 5.484 KiB)
grad (RD) ----    35.167 μs (562 allocs: 20.562 KiB)    35.083 μs (555 allocs: 20.297 KiB)     40.250 μs (616 allocs: 25.766 KiB)
grad (MC) ----    1.248 μs (12 allocs: 784 bytes)       1.264 μs (12 allocs: 784 bytes)        4.479 μs (64 allocs: 4.016 KiB)
grad (EN) ----    728.659 ns (13 allocs: 832 bytes)     630.319 ns (13 allocs: 832 bytes)      1.826 μs (44 allocs: 2.609 KiB)

# badvarnames     before this PR                        after this PR                          v0.38.9
eval      ----    359.756 ns (2 allocs: 224 bytes)      325.720 ns (2 allocs: 224 bytes)       1.438 μs (46 allocs: 1.906 KiB)
grad (FD) ----    2.804 μs (11 allocs: 4.281 KiB)       3.000 μs (11 allocs: 4.281 KiB)        4.575 μs (103 allocs: 14.266 KiB)
grad (RD) ----    44.167 μs (773 allocs: 27.438 KiB)    45.333 μs (753 allocs: 26.812 KiB)     59.792 μs (1076 allocs: 38.828 KiB)
grad (MC) ----    2.080 μs (28 allocs: 1.094 KiB)       2.048 μs (28 allocs: 1.094 KiB)        6.854 μs (160 allocs: 7.000 KiB)
grad (EN) ----    1.649 μs (5 allocs: 2.188 KiB)        1.102 μs (5 allocs: 1.578 KiB)         3.264 μs (64 allocs: 6.141 KiB)

# submodel        before this PR                        after this PR                          v0.38.9
eval      ----    105.855 ns                            112.548 ns                             763.889 ns (20 allocs: 1.234 KiB)
grad (FD) ----    210.734 ns (3 allocs: 112 bytes)      214.715 ns (3 allocs: 112 bytes)       1.083 μs (27 allocs: 2.219 KiB)
grad (RD) ----    10.104 μs (148 allocs: 5.188 KiB)     9.750 μs (132 allocs: 4.641 KiB)       13.792 μs (221 allocs: 9.266 KiB)
grad (MC) ----    537.873 ns (6 allocs: 240 bytes)      540.094 ns (6 allocs: 240 bytes)       5.492 μs (72 allocs: 3.312 KiB)
grad (EN) ----    342.058 ns (2 allocs: 80 bytes)       229.326 ns (2 allocs: 80 bytes)        2.375 μs (52 allocs: 2.500 KiB)

The 'problem' with these benchmarks is that those models didn't catch this type stability issue. For a model where the type instability actually kicks in (demo3 here is DynamicPPL.TestUtils.DEMO_MODELS[3], see definition here), this makes a huge difference.

# demo3           before this PR                        after this PR                          v0.38.9
eval      ----    616.848 ns (24 allocs: 1.344 KiB)     241.597 ns (8 allocs: 352 bytes)       713.667 ns (20 allocs: 1008 bytes)
grad (FD) ----    809.028 ns (30 allocs: 2.078 KiB)     472.000 ns (13 allocs: 752 bytes)      948.903 ns (29 allocs: 2.078 KiB)
grad (RD) ----    15.250 μs (252 allocs: 9.797 KiB)     15.959 μs (230 allocs: 8.578 KiB)      16.500 μs (288 allocs: 11.891 KiB)
grad (MC) ----    12.542 μs (158 allocs: 8.469 KiB)     1.431 μs (20 allocs: 928 bytes)        3.312 μs (57 allocs: 3.047 KiB)
grad (EN) ----    errors                                1.215 μs (23 allocs: 1.078 KiB)        1.833 μs (45 allocs: 2.062 KiB)

Benchmarks (linked)

Here are the same benchmarks but run with linked parameters instead. This is arguably the more important case because HMC/NUTS use this.

# trivial        before this PR                      after this PR                         v0.38.9
eval      ----   36.364 ns (1 allocs: 32 bytes)      14.664 ns (1 allocs: 32 bytes)        163.035 ns (7 allocs: 224 bytes)    
grad (FD) ----   69.378 ns (4 allocs: 144 bytes)     43.595 ns (4 allocs: 144 bytes)       324.728 ns (14 allocs: 544 bytes)   
grad (RD) ----   3.194 μs (53 allocs: 1.812 KiB)     2.718 μs (52 allocs: 1.781 KiB)       4.042 μs (82 allocs: 3.125 KiB)     
grad (MC) ----   310.136 ns (6 allocs: 256 bytes)    319.596 ns (6 allocs: 256 bytes)      1.194 μs (27 allocs: 1.281 KiB)     
grad (EN) ----   171.965 ns (5 allocs: 160 bytes)    172.721 ns (6 allocs: 208 bytes)      483.607 ns (20 allocs: 688 bytes)   
                                                                                                                                 
# eight-schools  before this PR                      after this PR                         v0.38.9
eval      ----   276.206 ns (7 allocs: 352 bytes)    241.115 ns (7 allocs: 352 bytes)      760.417 ns (22 allocs: 1.094 KiB)   
grad (FD) ----   965.333 ns (13 allocs: 2.812 KiB)   886.719 ns (13 allocs: 2.812 KiB)     1.476 μs (28 allocs: 4.828 KiB)     
grad (RD) ----   40.750 μs (595 allocs: 21.734 KiB)  38.250 μs (593 allocs: 21.641 KiB)    43.000 μs (639 allocs: 26.359 KiB)  
grad (MC) ----   1.460 μs (18 allocs: 976 bytes)     1.511 μs (18 allocs: 976 bytes)       4.910 μs (68 allocs: 3.859 KiB)     
grad (EN) ----   991.379 ns (31 allocs: 1.375 KiB)   998.600 ns (33 allocs: 1.469 KiB)     1.942 μs (59 allocs: 2.797 KiB)     
                                                                                                                                 
# badvarnames    before this PR                      after this PR                         v0.38.9
eval      ----   608.521 ns (22 allocs: 864 bytes)   611.104 ns (22 allocs: 864 bytes)     1.635 μs (66 allocs: 2.531 KiB)     
grad (FD) ----   3.922 μs (51 allocs: 8.656 KiB)     3.530 μs (51 allocs: 8.656 KiB)       5.475 μs (143 allocs: 18.641 KiB)   
grad (RD) ----   52.167 μs (913 allocs: 32.438 KiB)  47.541 μs (893 allocs: 31.812 KiB)    57.167 μs (1076 allocs: 40.078 KiB) 
grad (MC) ----   2.983 μs (68 allocs: 2.344 KiB)     2.842 μs (68 allocs: 2.344 KiB)       8.319 μs (200 allocs: 8.250 KiB)    
grad (EN) ----   2.515 μs (65 allocs: 4.062 KiB)     2.633 μs (85 allocs: 4.469 KiB)       3.589 μs (144 allocs: 8.641 KiB)    
                                                                                                                                 
# submodel       before this PR                      after this PR                         v0.38.9
eval      ----   134.341 ns (3 allocs: 96 bytes)     124.658 ns (3 allocs: 96 bytes)       613.000 ns (19 allocs: 848 bytes)   
grad (FD) ----   260.697 ns (6 allocs: 304 bytes)    229.321 ns (6 allocs: 304 bytes)      920.966 ns (26 allocs: 1.594 KiB)   
grad (RD) ----   12.459 μs (181 allocs: 6.328 KiB)   10.667 μs (165 allocs: 5.781 KiB)     16.375 μs (235 allocs: 9.547 KiB)   
grad (MC) ----   736.450 ns (12 allocs: 432 bytes)   729.158 ns (12 allocs: 432 bytes)     5.667 μs (74 allocs: 3.000 KiB)     
grad (EN) ----   612.854 ns (23 allocs: 816 bytes)   661.111 ns (25 allocs: 928 bytes)     2.420 μs (70 allocs: 2.750 KiB)     
                                                                                                                                 
# demo3          before this PR                      after this PR                         v0.38.9
eval      ----   758.763 ns (27 allocs: 1.281 KiB)   306.122 ns (12 allocs: 528 bytes)     835.794 ns (23 allocs: 1.031 KiB)  
grad (FD) ----   914.063 ns (33 allocs: 2.172 KiB)   555.288 ns (17 allocs: 1.094 KiB)     1.141 μs (32 allocs: 2.219 KiB)    
grad (RD) ----   16.917 μs (269 allocs: 10.047 KiB)  15.917 μs (253 allocs: 9.297 KiB)     20.292 μs (315 allocs: 12.641 KiB) 
grad (MC) ----   12.750 μs (169 allocs: 8.344 KiB)   1.839 μs (28 allocs: 1.250 KiB)       3.810 μs (64 allocs: 3.266 KiB)    
grad (EN) ----   errors                              1.378 μs (32 allocs: 1.406 KiB)       1.922 μs (52 allocs: 2.281 KiB)

Benchmark code

using DynamicPPL, Distributions, LogDensityProblems, Chairmarks, LinearAlgebra
using ADTypes, ForwardDiff, ReverseDiff, Mooncake, Enzyme

const adtypes = [
    ("FD", AutoForwardDiff()),
    ("RD", AutoReverseDiff()),
    ("MC", AutoMooncake()),
    ("EN", AutoEnzyme(; mode=set_runtime_activity(Reverse), function_annotation=Const))
]

function benchmark_ldf(model; skip=Union{})
    vi = VarInfo(model)
    vi = DynamicPPL.link!!(vi, model) # comment out to use unlinked
    x = vi[:]
    ldf_no = DynamicPPL.LogDensityFunction(model, getlogjoint, vi)
    m = median(@be LogDensityProblems.logdensity(ldf_no, x))
    print("eval      ----  ")
    display(m)
    for name_adtype in adtypes
        name, adtype = name_adtype
        adtype isa skip && continue
        ldf = DynamicPPL.LogDensityFunction(model, getlogjoint, vi; adtype=adtype)
        m = median(@be LogDensityProblems.logdensity_and_gradient(ldf, x))
        print("grad ($name) ----  ")
        display(m)
    end
end

@model f() = x ~ Normal()
benchmark_ldf(f())

y = [28, 8, -3, 7, -1, 1, 18, 12]
sigma = [15, 10, 16, 11, 9, 11, 10, 18]
@model function eight_schools(y, sigma)
    mu ~ Normal(0, 5)
    tau ~ truncated(Cauchy(0, 5); lower=0)
    theta ~ MvNormal(fill(mu, length(y)), tau^2 * I)
    for i in eachindex(y)
        y[i] ~ Normal(theta[i], sigma[i])
    end
    return (mu=mu, tau=tau)
end
benchmark_ldf(eight_schools(y, sigma))

@model function badvarnames()
    N = 20
    x = Vector{Float64}(undef, N)
    for i in 1:N
        x[i] ~ Normal()
    end
end
benchmark_ldf(badvarnames())

@model function inner()
    m ~ Normal(0, 1)
    s ~ Exponential()
    return (m=m, s=s)
end
@model function withsubmodel()
    params ~ to_submodel(inner())
    y ~ Normal(params.m, params.s)
    1.0 ~ Normal(y)
end
benchmark_ldf(withsubmodel())

benchmark_ldf(DynamicPPL.TestUtils.DEMO_MODELS[3])

github-actions · 2025-11-15T17:31:33Z

Benchmark Report

this PR's head: 6c5d79574a070674ce2eb517432233d32122dfd6
base branch: 8547e250193496fcfedd0d9d950fe69dae65237c

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬───────────────────┬────────┬─────────────────────────────────┬────────────────────────────┬─────────────────────────────────┐
│                       │       │             │                   │        │        t(eval) / t(ref)         │     t(grad) / t(eval)      │        t(grad) / t(ref)         │
│                       │       │             │                   │        │ ──────────┬───────────┬──────── │ ───────┬─────────┬──────── │ ──────────┬───────────┬──────── │
│                 Model │   Dim │  AD Backend │           VarInfo │ Linked │      base │   this PR │ speedup │   base │ this PR │ speedup │      base │   this PR │ speedup │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│               Dynamic │    10 │    mooncake │             typed │   true │    338.67 │    381.50 │    0.89 │  13.90 │   10.80 │    1.29 │   4706.10 │   4121.44 │    1.14 │
│                   LDA │    12 │ reversediff │             typed │   true │   2827.91 │   2624.90 │    1.08 │   4.54 │    5.50 │    0.83 │  12842.43 │  14434.04 │    0.89 │
│   Loop univariate 10k │ 10000 │    mooncake │             typed │   true │ 107305.58 │ 103000.01 │    1.04 │   3.84 │    4.27 │    0.90 │ 412157.48 │ 440023.26 │    0.94 │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│    Loop univariate 1k │  1000 │    mooncake │             typed │   true │   8300.16 │   7932.88 │    1.05 │   4.28 │    4.76 │    0.90 │  35503.26 │  37759.71 │    0.94 │
│      Multivariate 10k │ 10000 │    mooncake │             typed │   true │  29084.97 │  33349.51 │    0.87 │  10.43 │   10.04 │    1.04 │ 303251.79 │ 334915.94 │    0.91 │
│       Multivariate 1k │  1000 │    mooncake │             typed │   true │   3244.30 │   3577.24 │    0.91 │   9.43 │    9.38 │    1.01 │  30590.85 │  33558.36 │    0.91 │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│ Simple assume observe │     1 │ forwarddiff │             typed │  false │      3.57 │      3.85 │    0.93 │   2.59 │    2.86 │    0.91 │      9.27 │     11.02 │    0.84 │
│           Smorgasbord │   201 │ forwarddiff │             typed │  false │   1185.93 │   1217.73 │    0.97 │ 118.06 │  126.14 │    0.94 │ 140008.26 │ 153603.73 │    0.91 │
│           Smorgasbord │   201 │ forwarddiff │       simple_dict │   true │       err │       err │     err │    err │     err │     err │       err │       err │     err │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│           Smorgasbord │   201 │ forwarddiff │ simple_namedtuple │   true │       err │       err │     err │    err │     err │     err │       err │       err │     err │
│           Smorgasbord │   201 │      enzyme │             typed │   true │       err │   1676.30 │     err │    err │    5.65 │     err │       err │   9477.20 │     err │
│           Smorgasbord │   201 │    mooncake │             typed │   true │   1623.24 │   1677.92 │    0.97 │   5.83 │    5.40 │    1.08 │   9461.33 │   9068.97 │    1.04 │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│           Smorgasbord │   201 │ reversediff │             typed │   true │   1655.10 │   1745.90 │    0.95 │  83.55 │   87.95 │    0.95 │ 138291.08 │ 153545.94 │    0.90 │
│           Smorgasbord │   201 │ forwarddiff │      typed_vector │   true │   1611.67 │   1713.53 │    0.94 │  55.03 │   58.29 │    0.94 │  88695.33 │  99886.27 │    0.89 │
│           Smorgasbord │   201 │ forwarddiff │           untyped │   true │   1611.67 │   1677.05 │    0.96 │  56.35 │   60.84 │    0.93 │  90819.07 │ 102027.06 │    0.89 │
├───────────────────────┼───────┼─────────────┼───────────────────┼────────┼───────────┼───────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│           Smorgasbord │   201 │ forwarddiff │    untyped_vector │   true │   1599.32 │   1683.37 │    0.95 │  54.75 │   55.32 │    0.99 │  87564.51 │  93125.72 │    0.94 │
│              Submodel │     1 │    mooncake │             typed │   true │      7.61 │      8.44 │    0.90 │   4.70 │    4.54 │    1.04 │     35.77 │     38.32 │    0.93 │
└───────────────────────┴───────┴─────────────┴───────────────────┴────────┴───────────┴───────────┴─────────┴────────┴─────────┴─────────┴───────────┴───────────┴─────────┘

codecov · 2025-11-15T17:35:26Z

Codecov Report

❌ Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 80.66%. Comparing base (8547e25) to head (6c5d795).
⚠️ Report is 1 commits behind head on breaking.

Files with missing lines	Patch %	Lines
src/contexts/init.jl	83.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           breaking    #1141      +/-   ##
============================================
+ Coverage     80.60%   80.66%   +0.05%     
============================================
  Files            41       41              
  Lines          3861     3878      +17     
============================================
+ Hits           3112     3128      +16     
- Misses          749      750       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-11-15T17:55:42Z

DynamicPPL.jl documentation for PR #1141 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1141/

mhauru · 2025-11-17T09:53:54Z

Would something bad happen if we just wrapped all the arrays returned by Bijectors in trivial SubArrays? I did some very crude benchmarks locally and at least getindex and setindex! seem to have essentially zero overhead from the wrap.

penelopeysm · 2025-11-19T10:03:47Z

Would something bad happen if we just wrapped all the arrays returned by Bijectors in trivial SubArrays?

I tried with the following patch applied to #1139 (the changed method only applies to arrayvariates, because univariates are specially handled below)

diff --git a/src/utils.jl b/src/utils.jl
index 2d7b0404..4e341ee7 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -462,6 +462,13 @@ size of the realization after pushed through the transformation.
 """
 from_vec_transform(f, sz) = from_vec_transform_for_size(Bijectors.output_size(f, sz))
 
+struct View end
+(::View)(x::AbstractVector) = view(x, 1:length(x))
+function Bijectors.with_logabsdet_jacobian(::View, x::AbstractVector)
+    return view(x, :), zero(LogProbType)
+end
+Bijectors.inverse(::View) = View()
+
 """
     from_linked_vec_transform(dist::Distribution)
 
@@ -475,7 +482,7 @@ See also: [`DynamicPPL.invlink_transform`](@ref), [`DynamicPPL.from_vec_transfor
 function from_linked_vec_transform(dist::Distribution)
     f_invlink = invlink_transform(dist)
     f_vec = from_vec_transform(inverse(f_invlink), size(dist))
-    return f_invlink ∘ f_vec
+    return View() ∘ f_invlink ∘ f_vec
 end
 
 # UnivariateDistributions need to be handled as a special case, because size(dist) is (),

This makes the output type the same for both linked and unlinked:

julia> dist = product_distribution([Beta(2, 2), Beta(2, 2)])
Product{Continuous, Beta{Float64}, Vector{Beta{Float64}}}(v=Beta{Float64}[Beta{Float64}(α=2.0, β=2.0), Beta{Float64}(α=2.0, β=2.0)])

julia> x = @view [0.5, 0.5][1:2]
2-element view(::Vector{Float64}, 1:2) with eltype Float64:
 0.5
 0.5

julia> DynamicPPL.from_linked_vec_transform(dist)(x)
2-element view(::Vector{Float64}, 1:2) with eltype Float64:
 0.6224593312018546
 0.6224593312018546

julia> DynamicPPL.from_vec_transform(dist)(x)
2-element view(::Vector{Float64}, 1:2) with eltype Float64:
 0.5
 0.5

julia> typeof(DynamicPPL.from_linked_vec_transform(dist)(x)) == typeof(DynamicPPL.from_vec_transform(dist)(x))
true

but performance is still poor (it's basically exactly the same as in #1139, including the atrocious Mooncake slowdown), and Enzyme instead errors with a different message

ERROR: "Error cannot store inactive but differentiable variable [1.5, 2.0] into active tuple"
Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/Enzyme/xxXAx/src/rules/typeunstablerules.jl:15 [inlined]
  [2] create_shadow_ret
    @ ~/.julia/packages/Enzyme/xxXAx/src/rules/typeunstablerules.jl:3 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/Enzyme/xxXAx/src/rules/typeunstablerules.jl:85 [inlined]
[...]

penelopeysm · 2025-11-19T10:10:35Z

Our benchmark models clearly aren't hitting these cases.

mhauru

Very nice. I stuck in a proposal for some extra checks and documentation, but this is optional since it isn't exported.

src/contexts/init.jl

mhauru · 2025-11-27T09:36:51Z

test/integration/enzyme/main.jl


-ADTYPES = Dict(
-    "EnzymeForward" =>
+ADTYPES = (


Entirely ambivalent about which constructor to use, but curious if you had a reason for changing.

Yeah, CI crashes with a Julia GC error when using a Dict.

(don't ask 🙃)

* v0.39 * Update DPPL compats for benchmarks and docs * remove merge conflict markers * Remove `NodeTrait` (#1133) * Remove NodeTrait * Changelog * Fix exports * docs * fix a bug * Fix doctests * Fix test * tweak changelog * FastLDF / InitContext unified (#1132) * Fast Log Density Function * Make it work with AD * Optimise performance for identity VarNames * Mark `get_range_and_linked` as having zero derivative * Update comment * make AD testing / benchmarking use FastLDF * Fix tests * Optimise away `make_evaluate_args_and_kwargs` * const func annotation * Disable benchmarks on non-typed-Metadata-VarInfo * Fix `_evaluate!!` correctly to handle submodels * Actually fix submodel evaluate * Document thoroughly and organise code * Support more VarInfos, make it thread-safe (?) * fix bug in parsing ranges from metadata/VNV * Fix get_param_eltype for TSVI * Disable Enzyme benchmark * Don't override _evaluate!!, that breaks ForwardDiff (sometimes) * Move FastLDF to experimental for now * Fix imports, add tests, etc * More test fixes * Fix imports / tests * Remove AbstractFastEvalContext * Changelog and patch bump * Add correctness tests, fix imports * Concretise parameter vector in tests * Add zero-allocation tests * Add Chairmarks as test dep * Disable allocations tests on multi-threaded * Fast InitContext (#1125) * Make InitContext work with OnlyAccsVarInfo * Do not convert NamedTuple to Dict * remove logging * Enable InitFromPrior and InitFromUniform too * Fix `infer_nested_eltype` invocation * Refactor FastLDF to use InitContext * note init breaking change * fix logjac sign * workaround Mooncake segfault * fix changelog too * Fix get_param_eltype for context stacks * Add a test for threaded observe * Export init * Remove dead code * fix transforms for pathological distributions * Tidy up loads of things * fix typed_identity spelling * fix definition order * Improve docstrings * Remove stray comment * export get_param_eltype (unfortunatley) * Add more comment * Update comment * Remove inlines, fix OAVI docstring * Improve docstrings * Simplify InitFromParams constructor * Replace map(identity, x[:]) with [i for i in x[:]] * Simplify implementation for InitContext/OAVI * Add another model to allocation tests Co-authored-by: Markus Hauru <markus@mhauru.org> * Revert removal of dist argument (oops) * Format * Update some outdated bits of FastLDF docstring * remove underscores --------- Co-authored-by: Markus Hauru <markus@mhauru.org> * implement `LogDensityProblems.dimension` * forgot about capabilities... * use interpolation in run_ad * Improvements to benchmark outputs (#1146) * print output * fix * reenable * add more lines to guide the eye * reorder table * print tgrad / trel as well * forgot this type * Allow generation of `ParamsWithStats` from `FastLDF` plus parameters, and also `bundle_samples` (#1129) * Implement `ParamsWithStats` for `FastLDF` * Add comments * Implement `bundle_samples` for ParamsWithStats -> MCMCChains * Remove redundant comment * don't need Statistics? * Make FastLDF the default (#1139) * Make FastLDF the default * Add miscellaneous LogDensityProblems tests * Use `init!!` instead of `fast_evaluate!!` * Rename files, rebalance tests * Implement `predict`, `returned`, `logjoint`, ... with `OnlyAccsVarInfo` (#1130) * Use OnlyAccsVarInfo for many re-evaluation functions * drop `fast_` prefix * Add a changelog * Improve FastLDF type stability when all parameters are linked or unlinked (#1141) * Improve type stability when all parameters are linked or unlinked * fix a merge conflict * fix enzyme gc crash (locally at least) * Fixes from review * Make threadsafe evaluation opt-in (#1151) * Make threadsafe evaluation opt-in * Reduce number of type parameters in methods * Make `warned_warn_about_threads_threads_threads_threads` shorter * Improve `setthreadsafe` docstring * warn on bare `@threads` as well * fix merge * Fix performance issues * Use maxthreadid() in TSVI * Move convert_eltype code to threadsafe eval function * Point to new Turing docs page * Add a test for setthreadsafe * Tidy up check_model * Apply suggestions from code review Fix outdated docstrings Co-authored-by: Markus Hauru <markus@mhauru.org> * Improve warning message * Export `requires_threadsafe` * Add an actual docstring for `requires_threadsafe` --------- Co-authored-by: Markus Hauru <markus@mhauru.org> * Standardise `:lp` -> `:logjoint` (#1161) * Standardise `:lp` -> `:logjoint` * changelog * fix a test --------- Co-authored-by: Markus Hauru <mhauru@turing.ac.uk> Co-authored-by: Markus Hauru <markus@mhauru.org>

github-actions bot assigned penelopeysm Nov 15, 2025

penelopeysm force-pushed the py/type-stability branch from 3875d41 to 0a01995 Compare November 15, 2025 17:54

penelopeysm marked this pull request as draft November 15, 2025 20:31

penelopeysm force-pushed the py/not-experimental branch 2 times, most recently from 177656b to 9310ec0 Compare November 15, 2025 20:44

penelopeysm force-pushed the py/type-stability branch from b403dce to 072da15 Compare November 15, 2025 22:10

penelopeysm mentioned this pull request Nov 15, 2025

Transforms could be optimised #1142

Open

penelopeysm force-pushed the py/not-experimental branch 3 times, most recently from 992cea9 to 759bf8a Compare November 18, 2025 11:07

penelopeysm force-pushed the py/type-stability branch from 072da15 to e5a3c97 Compare November 18, 2025 11:26

penelopeysm force-pushed the py/not-experimental branch 2 times, most recently from 7fa0986 to 6849bc2 Compare November 18, 2025 15:48

penelopeysm force-pushed the py/type-stability branch from e5a3c97 to fa0022b Compare November 18, 2025 15:52

penelopeysm mentioned this pull request Nov 18, 2025

Make FastLDF the default #1139

Merged

penelopeysm force-pushed the py/not-experimental branch from 6849bc2 to b1368dd Compare November 18, 2025 17:48

penelopeysm force-pushed the py/type-stability branch from fa0022b to accb515 Compare November 19, 2025 09:49

penelopeysm force-pushed the py/not-experimental branch 7 times, most recently from 79bfbc9 to 6fd177a Compare November 25, 2025 02:25

penelopeysm force-pushed the py/not-experimental branch 2 times, most recently from cf33cff to d1c002f Compare November 25, 2025 02:48

Base automatically changed from py/not-experimental to breaking November 25, 2025 11:41

penelopeysm force-pushed the py/type-stability branch from accb515 to 4c4cd72 Compare November 25, 2025 12:10

penelopeysm marked this pull request as ready for review November 25, 2025 12:10

Improve type stability when all parameters are linked or unlinked

60736ba

penelopeysm force-pushed the py/type-stability branch from 4c4cd72 to 60736ba Compare November 25, 2025 12:18

penelopeysm added 2 commits November 25, 2025 12:36

fix a merge conflict

d953f7b

fix enzyme gc crash (locally at least)

d7da26d

penelopeysm requested a review from mhauru November 25, 2025 14:05

mhauru approved these changes Nov 27, 2025

View reviewed changes

penelopeysm added 2 commits November 27, 2025 11:30

Fixes from review

d05c185

Merge branch 'breaking' into py/type-stability

6c5d795

penelopeysm merged commit a6d56a2 into breaking Nov 27, 2025
21 checks passed

penelopeysm deleted the py/type-stability branch November 27, 2025 12:08

Improve FastLDF type stability when all parameters are linked or unlinked #1141

Improve FastLDF type stability when all parameters are linked or unlinked #1141

Uh oh!

Conversation

penelopeysm commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why can't we just store the transform in the LDF?

Benchmarks (unlinked)

Benchmarks (linked)

Benchmark code

Uh oh!

github-actions bot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Report

Computer Information

Benchmark Results

Uh oh!

codecov bot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Nov 15, 2025

Uh oh!

mhauru commented Nov 17, 2025

Uh oh!

penelopeysm commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

penelopeysm commented Nov 19, 2025

Uh oh!

mhauru left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mhauru Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

penelopeysm Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

penelopeysm Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

mhauru Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

penelopeysm commented Nov 15, 2025 •

edited

Loading

github-actions bot commented Nov 15, 2025 •

edited

Loading

codecov bot commented Nov 15, 2025 •

edited

Loading

penelopeysm commented Nov 19, 2025 •

edited

Loading