Fix VI interface

penelopeysm · penelopeysm · commit f433446c7926 · 2025-12-04T13:32:04.000Z
diff --git a/tutorials/variational-inference/index.qmd b/tutorials/variational-inference/index.qmd
@@ -148,6 +148,7 @@ m = linear_regression(train, train_label, n_obs, n_vars);
 To run VI, we must first set a *variational family*.
 For instance, the most commonly used family is the mean-field Gaussian family.
 For this, Turing provides functions that automatically construct the initialisation corresponding to the model `m`:
+
 ```{julia}
 q_init = q_meanfield_gaussian(m);
 ```
@@ -161,10 +162,12 @@ Here is a detailed documentation for the constructor:
 As we can see, the precise initialisation can be customized through the keyword arguments.
 
 Let's run VI with the default setting:
+
 ```{julia}
 n_iters = 1000
-q_avg, q_last, info, state = vi(m, q_init, n_iters; show_progress=false);
+q_avg, info, state = vi(m, q_init, n_iters; show_progress=false);
 ```
+
 The default setting uses the `AdvancedVI.RepGradELBO` objective, which corresponds to a variant of what is known as *automatic differentiation VI*[^KTRGB2017] or *stochastic gradient VI*[^TL2014] or *black-box VI*[^RGB2014] with the reparameterization gradient[^KW2014][^RMW2014][^TL2014].
 The default optimiser we use is `AdvancedVI.DoWG`[^KMJ2023] combined with a proximal operator.
 (The use of proximal operators with VI on a location-scale family is discussed in detail by J. Domke[^D2020][^DGG2023] and others[^KOWMG2023].)
@@ -178,8 +181,24 @@ First, here is the full documentation for `vi`:
 ## Values Returned by `vi`
 The main output of the algorithm is `q_avg`, the average of the parameters generated by the optimisation algorithm.
 For computing `q_avg`, the default setting uses what is known as polynomial averaging[^SZ2013].
-Usually, `q_avg` will perform better than the last-iterate `q_last`.
+Usually, `q_avg` will perform better than the last-iterate `q_last`, which cana be obtained by disabling averaging:
+
+```{julia}
+q_last, _, _ = vi(
+    m,
+    q_init,
+    n_iters;
+    show_progress=false,
+    algorithm=KLMinRepGradDescent(
+        AutoForwardDiff();
+        operator=AdvancedVI.ClipScale(),
+        averager=AdvancedVI.NoAveraging()
+    ),
+);
+```
+
 For instance, we can compare the ELBO of the two:
+
 ```{julia}
 @info("Objective of q_avg and q_last",
     ELBO_q_avg = estimate_objective(AdvancedVI.RepGradELBO(32), q_avg, LogDensityFunction(m)),
@@ -194,6 +213,7 @@ For the default setting, which is `RepGradELBO`, it contains the ELBO estimated
 ```{julia}
 Plots.plot([i.elbo for i in info], xlabel="Iterations", ylabel="ELBO", label="info")
 ```
+
 Since the ELBO is estimated by a small number of samples, it appears noisy.
 Furthermore, at each step, the ELBO is evaluated on `q_last`, not `q_avg`, which is the actual output that we care about.
 To obtain more accurate ELBO estimates evaluated on `q_avg`, we have to define a custom callback function.
@@ -203,30 +223,38 @@ To inspect the progress of optimisation in more detail, one can define a custom
 For example, the following callback function estimates the ELBO on `q_avg` every 10 steps with a larger number of samples:
 
 ```{julia}
-function callback(; stat, averaged_params, restructure, kwargs...)
-    if mod(stat.iteration, 10) == 1
+using DynamicPPL: DynamicPPL
+
+function callback(; iteration, averaged_params, restructure, kwargs...)
+    if mod(iteration, 10) == 1
         q_avg = restructure(averaged_params)
-        obj = AdvancedVI.RepGradELBO(128)
-        elbo_avg = estimate_objective(obj, q_avg, LogDensityFunction(m))
+        obj = AdvancedVI.RepGradELBO(128) # 128 samples for ELBO estimation
+        vi = DynamicPPL.link!!(DynamicPPL.VarInfo(m), m);
+        elbo_avg = -estimate_objective(obj, q_avg, LogDensityFunction(m, DynamicPPL.getlogjoint_internal, vi))
         (elbo_avg = elbo_avg,)
     else
         nothing
     end
 end;
 ```
+
 The `NamedTuple` returned by `callback` will be appended to the corresponding entry of `info`, and it will also be displayed on the progress meter if `show_progress` is set as `true`.
 
 The custom callback can be supplied to `vi` as a keyword argument:
+
 ```{julia}
-q_mf, _, info_mf, _ = vi(m, q_init, n_iters; show_progress=false, callback=callback);
+q_mf, info_mf, _ = vi(m, q_init, n_iters; show_progress=false, callback=callback);
 ```
 
 Let's plot the result:
+
 ```{julia}
 iters = 1:10:length(info_mf)
 elbo_mf = [i.elbo_avg for i in info_mf[iters]]
-Plots.plot!(iters, elbo_mf, xlabel="Iterations", ylabel="ELBO", label="callback", ylims=(-200,Inf))
+Plots.plot([i.elbo for i in info], xlabel="Iterations", ylabel="ELBO", label="info", linewidth=0.4)
+Plots.plot!(iters, elbo_mf, xlabel="Iterations", ylabel="ELBO", label="callback", ylims=(-200,Inf), linewidth=2)
 ```
+
 We can see that the ELBO values are less noisy and progress more smoothly due to averaging.
 
 ## Using Different Optimisers
@@ -244,7 +272,7 @@ Since `AdvancedVI` does not implement a proximal operator for `Optimisers.Adam`,
 ```{julia}
 using Optimisers
 
-_, _, info_adam, _ = vi(
+_, info_adam, _ = vi(
     m, q_init, n_iters;
     show_progress=false,
     callback=callback,
@@ -265,6 +293,7 @@ That is, most common optimisers require some degree of tuning to perform better
 Due to this fact, they are referred to as parameter-free optimizers.
 
 ## Using Full-Rank Variational Families
+
 So far, we have only used the mean-field Gaussian family.
 This, however, approximates the posterior covariance with a diagonal matrix.
 To model the full covariance matrix, we can use the *full-rank* Gaussian family[^TL2014][^KTRGB2017]:
@@ -283,7 +312,7 @@ This term, however, traditionally comes from the fact that full-rank families us
 In contrast to the mean-field family, the full-rank family will often result in more computation per optimisation step and slower convergence, especially in high dimensions:
 
 ```{julia}
-q_fr, _, info_fr, _ = vi(m, q_init_fr, n_iters; show_progress=false, callback)
+q_fr, info_fr, _ = vi(m, q_init_fr, n_iters; show_progress=false, callback)
 
 Plots.plot(elbo_mf, xlabel="Iterations", ylabel="ELBO", label="Mean-Field", ylims=(-200, Inf))
 
@@ -292,7 +321,7 @@ Plots.plot!(elbo_fr, xlabel="Iterations", ylabel="ELBO", label="Full-Rank", ylim
 ```
 
 However, we can see that the full-rank families achieve a higher ELBO in the end.
-Due to the relationship between the ELBO and the Kullback-Leibler divergence, this indicates that the full-rank covariance is much more accurate.
+Due to the relationship between the ELBO and the Kullback–Leibler divergence, this indicates that the full-rank covariance is much more accurate.
 This trade-off between statistical accuracy and optimisation speed is often referred to as the *statistical-computational trade-off*.
 The fact that we can control this trade-off through the choice of variational family is a strength, rather than a limitation, of variational inference.
 
@@ -342,26 +371,29 @@ avg[union(sym2range[:coefficients]...)]
 ```
 
 For further convenience, we can wrap the samples into a `Chains` object to summarise the results.
+
 ```{julia}
 varinf = Turing.DynamicPPL.VarInfo(m)
 vns_and_values = Turing.DynamicPPL.varname_and_value_leaves(Turing.DynamicPPL.values_as(varinf, OrderedDict))
 varnames = map(first, vns_and_values)
 vi_chain = Chains(reshape(z', (size(z,2), size(z,1), 1)), varnames)
 ```
+
 (Since we're drawing independent samples, we can simply ignore the ESS and Rhat metrics.)
 Unfortunately, extracting `varnames` is a bit verbose at the moment, but hopefully will become simpler in the near future. 
 
 Let's compare this against samples from `NUTS`:
 
 ```{julia}
-mcmc_chain = sample(m, NUTS(), 10_000, drop_warmup=true, progress=false);
+mcmc_chain = sample(m, NUTS(), 10_000; progress=false);
 
 vi_mean = mean(vi_chain)[:, 2]
 mcmc_mean = mean(mcmc_chain, names(mcmc_chain, :parameters))[:, 2]
 
 plot(mcmc_mean; xticks=1:1:length(mcmc_mean), label="mean of NUTS")
 plot!(vi_mean; label="mean of VI")
 ```
+
 That looks pretty good! But let's see how the predictive distributions looks for the two.
 
 ## Making Predictions
@@ -516,6 +548,7 @@ title!("MCMC (NUTS)")
 
 plot(p1, p2, p3; layout=(1, 3), size=(900, 250), label="")
 ```
+
 We can see that the full-rank VI approximation is very close to the predictions from MCMC samples.
 Also, the coverage of full-rank VI and MCMC is much better the crude mean-field approximation.