You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorials/variational-inference/index.qmd
+45-12Lines changed: 45 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -148,6 +148,7 @@ m = linear_regression(train, train_label, n_obs, n_vars);
148
148
To run VI, we must first set a *variational family*.
149
149
For instance, the most commonly used family is the mean-field Gaussian family.
150
150
For this, Turing provides functions that automatically construct the initialisation corresponding to the model `m`:
151
+
151
152
```{julia}
152
153
q_init = q_meanfield_gaussian(m);
153
154
```
@@ -161,10 +162,12 @@ Here is a detailed documentation for the constructor:
161
162
As we can see, the precise initialisation can be customized through the keyword arguments.
162
163
163
164
Let's run VI with the default setting:
165
+
164
166
```{julia}
165
167
n_iters = 1000
166
-
q_avg, q_last, info, state = vi(m, q_init, n_iters; show_progress=false);
168
+
q_avg, info, state = vi(m, q_init, n_iters; show_progress=false);
167
169
```
170
+
168
171
The default setting uses the `AdvancedVI.RepGradELBO` objective, which corresponds to a variant of what is known as *automatic differentiation VI*[^KTRGB2017] or *stochastic gradient VI*[^TL2014] or *black-box VI*[^RGB2014] with the reparameterization gradient[^KW2014][^RMW2014][^TL2014].
169
172
The default optimiser we use is `AdvancedVI.DoWG`[^KMJ2023] combined with a proximal operator.
170
173
(The use of proximal operators with VI on a location-scale family is discussed in detail by J. Domke[^D2020][^DGG2023] and others[^KOWMG2023].)
@@ -178,8 +181,24 @@ First, here is the full documentation for `vi`:
178
181
## Values Returned by `vi`
179
182
The main output of the algorithm is `q_avg`, the average of the parameters generated by the optimisation algorithm.
180
183
For computing `q_avg`, the default setting uses what is known as polynomial averaging[^SZ2013].
181
-
Usually, `q_avg` will perform better than the last-iterate `q_last`.
184
+
Usually, `q_avg` will perform better than the last-iterate `q_last`, which cana be obtained by disabling averaging:
The `NamedTuple` returned by `callback` will be appended to the corresponding entry of `info`, and it will also be displayed on the progress meter if `show_progress` is set as `true`.
218
242
219
243
The custom callback can be supplied to `vi` as a keyword argument:
We can see that the ELBO values are less noisy and progress more smoothly due to averaging.
231
259
232
260
## Using Different Optimisers
@@ -244,7 +272,7 @@ Since `AdvancedVI` does not implement a proximal operator for `Optimisers.Adam`,
244
272
```{julia}
245
273
using Optimisers
246
274
247
-
_, _, info_adam, _ = vi(
275
+
_, info_adam, _ = vi(
248
276
m, q_init, n_iters;
249
277
show_progress=false,
250
278
callback=callback,
@@ -265,6 +293,7 @@ That is, most common optimisers require some degree of tuning to perform better
265
293
Due to this fact, they are referred to as parameter-free optimizers.
266
294
267
295
## Using Full-Rank Variational Families
296
+
268
297
So far, we have only used the mean-field Gaussian family.
269
298
This, however, approximates the posterior covariance with a diagonal matrix.
270
299
To model the full covariance matrix, we can use the *full-rank* Gaussian family[^TL2014][^KTRGB2017]:
@@ -283,7 +312,7 @@ This term, however, traditionally comes from the fact that full-rank families us
283
312
In contrast to the mean-field family, the full-rank family will often result in more computation per optimisation step and slower convergence, especially in high dimensions:
However, we can see that the full-rank families achieve a higher ELBO in the end.
295
-
Due to the relationship between the ELBO and the Kullback-Leibler divergence, this indicates that the full-rank covariance is much more accurate.
324
+
Due to the relationship between the ELBO and the Kullback–Leibler divergence, this indicates that the full-rank covariance is much more accurate.
296
325
This trade-off between statistical accuracy and optimisation speed is often referred to as the *statistical-computational trade-off*.
297
326
The fact that we can control this trade-off through the choice of variational family is a strength, rather than a limitation, of variational inference.
0 commit comments