Skip to content

Commit 038cfdc

Browse files
committed
2 parents b97ac22 + 272f071 commit 038cfdc

File tree

5 files changed

+479
-33
lines changed

5 files changed

+479
-33
lines changed

docs/src/lecture_05/lab.md

Lines changed: 181 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Once you have tested the functionality, you can start exploring the performance
1212
## Checking type stability
1313
Recall that type stable function is written in a way, that allows Julia's compiler to infer all the types of all the variables and produce an efficient native code implementation without the need of boxing some variables in a structure whose types is known only during runtime. Probably unbeknown to you we have already seen an example of type unstable function (at least in some situations) in the first lab, where we have defined the `polynomial` function:
1414

15-
```julia
15+
```@repl lab05_polynomial
1616
function polynomial(a, x)
1717
accumulator = 0
1818
for i in length(a):-1:1
@@ -25,20 +25,20 @@ end
2525
The exact form of compiled code and also the type stability depends on the arguments of the function. Let's explore the following two examples of calling the function:
2626

2727
- Integer number valued arguments
28-
```julia
28+
```@example lab05_polynomial
2929
a = [-19, 7, -4, 6]
3030
x = 3
3131
polynomial(a, x)
3232
```
3333

3434
- Float number valued arguments
35-
```julia
35+
```@example lab05_polynomial
3636
xf = 3.0
3737
polynomial(a, xf)
3838
```
3939

4040
The result they produce is the "same" numerically, however it differs in the output type. Though you have probably not noticed it, there should be a difference in runtime (assuming that you have run it once more after its compilation). It is probably a surprise to no one, that one of the methods that has been compiled is type unstable. This can be check with the `@code_warntype` macro:
41-
```julia
41+
```@repl lab05_polynomial
4242
using InteractiveUtils #hide
4343
@code_warntype polynomial(a, x) # type stable
4444
@code_warntype polynomial(a, xf) # type unstable
@@ -56,6 +56,45 @@ We are getting a little ahead of ourselves in this lab, as understanding of thes
5656
- run the function with the argument once before running `@time` or use `@btime` if you have `BenchmarkTools` readily available in your environment
5757
- To see some measurable difference with this simple function, a longer vector of coefficients may be needed.
5858

59+
!!! details
60+
```@repl lab05_polynomial
61+
function polynomial_stable(a, x)
62+
accumulator = zero(x)
63+
for i in length(a):-1:1
64+
accumulator += x^(i-1) * a[i]
65+
end
66+
accumulator
67+
end
68+
```
69+
70+
```@repl lab05_polynomial
71+
@code_warntype polynomial_stable(a, x) # type stable
72+
@code_warntype polynomial_stable(a, xf) # type stable
73+
```
74+
75+
```@repl lab05_polynomial
76+
polynomial(a, xf) #hide
77+
polynomial_stable(a, xf) #hide
78+
@time polynomial(a, xf)
79+
@time polynomial_stable(a, xf)
80+
```
81+
82+
Only really visible when evaluating multiple times.
83+
```julia
84+
julia> using BenchmarkTools
85+
86+
julia> @btime polynomial($a, $xf)
87+
31.806 ns (0 allocations: 0 bytes)
88+
128.0
89+
90+
julia> @btime polynomial_stable($a, $xf)
91+
28.522 ns (0 allocations: 0 bytes)
92+
128.0
93+
```
94+
Difference only a few nanoseconds.
95+
96+
97+
*Note*: Recalling homework from lab 1. Adding `zero` also extends this function to the case of `x` being a matrix, see `?` menu.
5998

6099
Code stability issues are something unique to Julia, as its JIT compilation allows it to produce code that contains boxed variables, whose type can be inferred during runtime. This is one of the reasons why interpreted languages are slow to run but fast to type. Julia's way of solving it is based around compiling functions for specific arguments, however in order for this to work without the interpreter, the compiler has to be able to infer the types.
61100

@@ -78,7 +117,7 @@ The number of samples/evaluations can be set manually, however most of the time
78117

79118
The most commonly used interface of `Benchmarkools` is the `@btime` macro, which returns an output similar to the regular `@time` macro however now aggregated over samples by taking their minimum (a robust estimator for the location parameter of the time distribution, should not be considered an outlier - usually the noise from other processes/tasks puts the results to the other tail of the distribution and some miraculous noisy speedups are uncommon. In order to see the underlying sampling better there is also the `@benchmark` macro, which runs in the same way as `@btime`, but prints more detailed statistics which are also returned in the `Trial` type instead of the actual code output.
80119

81-
```julia
120+
```
82121
julia> @btime sum($(rand(1000)))
83122
174.274 ns (0 allocations: 0 bytes)
84123
504.16236531044757
@@ -160,6 +199,20 @@ In order to get more of a visual feel for profiling, there are packages that all
160199
!!! warning "Exercise"
161200
Let's compare this with the type unstable situation.
162201

202+
!!! details
203+
First let's define the function that allows us to run the `polynomial` multiple times.
204+
```@repl lab05_polynomial
205+
function run_polynomial(a, x, n)
206+
for _ in 1:n
207+
polynomial(a, x)
208+
end
209+
end
210+
```
211+
212+
```julia
213+
@profview run_polynomial(a, xf, Int(1e5)) # clears the profile for us
214+
```
215+
![poly_unstable](poly_unstable.png)
163216

164217
Other options for viewing profiler outputs
165218
- [ProfileView](https://github.com/timholy/ProfileView.jl) - close cousin of `ProfileSVG`, spawns GTK window with interactive FlameGraph
@@ -176,8 +229,48 @@ We have noticed that no matter if the function is type stable or unstable the ma
176229

177230
**BONUS**: Profile the new method and compare the differences in traces.
178231

179-
[^1]: Explanation of the Horner schema can be found on [https://en.wikipedia.org/wiki/Horner%27s\_method](https://en.wikipedia.org/wiki/Horner%27s_method).
232+
[^1]: Explanation of the Horner schema can be found on [https://en.wikipedia.org/wiki/Horner%27s\_method](https://en.wikipedia.org/wiki/Horner%27s_method).
233+
234+
!!! details
235+
```julia
236+
function polynomial(a, x)
237+
accumulator = a[end] * one(x)
238+
for i in length(a)-1:-1:1
239+
accumulator = accumulator * x + a[i]
240+
end
241+
accumulator
242+
end
243+
```
244+
245+
Speed up:
246+
- 49ns -> 8ns ~ 6x on integer valued input
247+
- 59ns -> 8ns ~ 7x on real valued input
248+
249+
```
250+
julia> @btime polynomial($a, $x)
251+
8.008 ns (0 allocations: 0 bytes)
252+
97818
253+
254+
julia> @btime polynomial_stable($a, $x)
255+
49.173 ns (0 allocations: 0 bytes)
256+
97818
257+
258+
julia> @btime polynomial($a, $xf)
259+
8.008 ns (0 allocations: 0 bytes)
260+
97818.0
261+
262+
julia> @btime polynomial_stable($a, $xf)
263+
58.773 ns (0 allocations: 0 bytes)
264+
97818.0
265+
```
266+
These numbers will be different on different HW.
267+
268+
**BONUS**: The profile trace does not even contain the calling of mathematical operators and is mainly dominated by the iteration utilities. In this case we had to increase the number of runs to `1e6` to get some meaningful trace.
180269

270+
```julia
271+
@profview run_polynomial(a, xf, Int(1e6))
272+
```
273+
![poly_horner](poly_horner.png)
181274

182275
---
183276

@@ -270,7 +363,7 @@ BenchmarkTools.Trial: 10000 samples with 7 evaluations.
270363
Let's now apply what we have learned so far on the much bigger codebase of our
271364
`Ecosystem`.
272365

273-
```julia
366+
```@example block
274367
include("ecosystems/lab04/Ecosystem.jl")
275368
276369
function make_counter()
@@ -295,7 +388,6 @@ world = create_world();
295388
nothing # hide
296389
```
297390

298-
299391
!!! warning "Exercise"
300392
Use `@profview` and `@code_warntype` to find the type unstable and slow parts of
301393
our simulation.
@@ -312,6 +404,42 @@ nothing # hide
312404
![lab04-ecosystem](ecosystems/lab04-worldstep.png)
313405

314406

407+
!!! details
408+
Red bars indicate type instabilities. The bars stacked on top of them are high,
409+
narrow and not filling the whole width, indicating that the problem is pretty
410+
serious. In our case the worst offender is the `filter` method inside
411+
`find_food` and `find_mate` functions.
412+
In both cases the bars on top of it are narrow and not the full with, meaning
413+
that not that much time has been really spend working, but instead inferring the
414+
types in the function itself during runtime.
415+
416+
As a reminder, this is the `find_food` function:
417+
```julia
418+
# original
419+
function find_food(a::Animal, w::World)
420+
as = filter(x -> eats(a,x), w.agents |> values |> collect)
421+
isempty(as) ? nothing : sample(as)
422+
end
423+
```
424+
Just from looking at that piece of code its not obvious what is the problem,
425+
however the red color indicates that the code may be type unstable. Let's see if
426+
that is the case by evaluation the function with some isolated inputs.
427+
428+
```@example block
429+
using InteractiveUtils # hide
430+
w = Wolf(4000)
431+
find_food(w, world)
432+
@code_warntype find_food(w, world)
433+
```
434+
435+
Indeed we see that the return type is not inferred precisely but ends up being
436+
just the `Union{Nothing, Agent}`, this is better than straight out `Any`, which
437+
is the union of all types but still, julia has to do dynamic dispatch here, which is slow.
438+
439+
The underlying issue here is that we are working array of type `Vector{Agent}`,
440+
where `Agent` is abstract, which does not allow the compiler to specialize the
441+
code for the loop body.
442+
315443
## Different `Ecosystem.jl` versions
316444

317445
In order to fix the type instability in the `Vector{Agent}` we somehow have to
@@ -345,6 +473,50 @@ end
345473

346474
Which differences can you observe? Why is one version faster than the other?
347475

476+
!!! details
477+
It turns out that with this simple change we can already gain a little bit of speed:
478+
479+
| | `find_food` | `reproduce!` |
480+
|-------------------------------------------|-------------|--------------|
481+
|`Animal{A}` & `Dict{Int,Agent}` | 43.917 μs | 439.666 μs |
482+
|`Animal{A}` & `Dict{Int,Union{...}}` | 12.208 μs | 340.041 μs |
483+
484+
We are gaining performance here because for small `Union`s of types the julia
485+
compiler can precompile the multiple available code branches. If we have just a
486+
`Dict` of `Agent`s this is not possible.
487+
488+
This however, does not yet fix our type instabilities completely. We are still working with `Union`s of types
489+
which we can see again using `@code_warntype`:
490+
```@setup uniondict
491+
include("ecosystems/animal_S_world_DictUnion/Ecosystem.jl")
492+
493+
function make_counter()
494+
n = 0
495+
counter() = n += 1
496+
end
497+
498+
function create_world()
499+
n_grass = 1_000
500+
n_sheep = 40
501+
n_wolves = 4
502+
503+
nextid = make_counter()
504+
505+
World(vcat(
506+
[Grass(nextid()) for _ in 1:n_grass],
507+
[Sheep(nextid()) for _ in 1:n_sheep],
508+
[Wolf(nextid()) for _ in 1:n_wolves],
509+
))
510+
end
511+
world = create_world();
512+
```
513+
```@example uniondict
514+
using InteractiveUtils # hide
515+
w = Wolf(4000)
516+
find_food(w, world)
517+
@code_warntype find_food(w, world)
518+
```
519+
348520
---
349521

350522
Julia still has to perform runtime dispatch on the small `Union` of `Agent`s that is in our dictionary.
@@ -460,6 +632,7 @@ end
460632
world = create_world();
461633
nothing # hide
462634
```
635+
463636
```@example newblock
464637
using InteractiveUtils # hide
465638
w = Wolf(4000)
@@ -468,5 +641,3 @@ find_food(w, world)
468641
```
469642

470643
## Useful resources
471-
472-
The problem we have been touching in the latter part is quite pervasive in some systems with many agents. One solution we have not used here is to use SumTypes. Julia does not have a native support, but offers solutions thourough libraries like [SumTypes.jl](https://github.com/JuliaDynamics/LightSumTypes.jl), [UniTyper.jl](https://github.com/YingboMa/Unityper.jl) or [Moshi](https://rogerluo.dev/Moshi.jl/).

0 commit comments

Comments
 (0)