Skip to content

Commit 49751f2

Browse files
committed
Remove prefer_threads from docstrings
1 parent 2d86364 commit 49751f2

File tree

8 files changed

+8
-38
lines changed

8 files changed

+8
-38
lines changed

src/accumulate/accumulate.jl

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ include("accumulate_nd.jl")
3838
# CPU settings
3939
max_tasks::Int=Threads.nthreads(),
4040
min_elems::Int=2,
41-
prefer_threads::Bool=true,
4241
4342
# Algorithm choice
4443
alg::AccumulateAlgorithm=DecoupledLookback(),
@@ -59,7 +58,6 @@ include("accumulate_nd.jl")
5958
# CPU settings
6059
max_tasks::Int=Threads.nthreads(),
6160
min_elems::Int=2,
62-
prefer_threads::Bool=true,
6361
6462
# Algorithm choice
6563
alg::AccumulateAlgorithm=DecoupledLookback(),
@@ -82,9 +80,7 @@ we do not need the constraint of `dst` and `src` being different; to minimise me
8280
recommend using the single-array interface (the first one above).
8381
8482
## CPU
85-
Use at most `max_tasks` threads with at least `min_elems` elements per task. `prefer_threads` tells
86-
AK to prioritize using the CPU algorithm implementation (default behaviour) over the KA algorithm
87-
through POCL.
83+
Use at most `max_tasks` threads with at least `min_elems` elements per task.
8884
8985
Note that accumulation is typically a memory-bound operation, so multithreaded accumulation only
9086
becomes faster if it is a more compute-heavy operation to hide memory latency - that includes:

src/foreachindex.jl

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,6 @@ end
4747
# CPU settings
4848
max_tasks=Threads.nthreads(),
4949
min_elems=1,
50-
prefer_threads::Bool=true,
5150
5251
# GPU settings
5352
block_size=256,
@@ -61,8 +60,7 @@ MtlArray, oneArray - with one GPU thread per index.
6160
On CPUs at most `max_tasks` threads are launched, or fewer such that each thread processes at least
6261
`min_elems` indices; if a single task ends up being needed, `f` is inlined and no thread is
6362
launched. Tune it to your function - the more expensive it is, the fewer elements are needed to
64-
amortise the cost of launching a thread (which is a few μs). `prefer_threads` tells AK to prioritize
65-
using the CPU algorithm implementation (default behaviour) over the KA algorithm through POCL.
63+
amortise the cost of launching a thread (which is a few μs).
6664
6765
# Examples
6866
Normally you would write a for loop like this:
@@ -147,7 +145,6 @@ end
147145
# CPU settings
148146
max_tasks=Threads.nthreads(),
149147
min_elems=1,
150-
prefer_threads::Bool=true,
151148
152149
# GPU settings
153150
block_size=256,
@@ -161,8 +158,7 @@ MtlArray, oneArray - with one GPU thread per index.
161158
On CPUs at most `max_tasks` threads are launched, or fewer such that each thread processes at least
162159
`min_elems` indices; if a single task ends up being needed, `f` is inlined and no thread is
163160
launched. Tune it to your function - the more expensive it is, the fewer elements are needed to
164-
amortise the cost of launching a thread (which is a few μs). `prefer_threads` tells AK to prioritize
165-
using the CPU algorithm implementation (default behaviour) over the KA algorithm through POCL.
161+
amortise the cost of launching a thread (which is a few μs).
166162
167163
# Examples
168164
Normally you would write a for loop like this:

src/map.jl

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55
# CPU settings
66
max_tasks=Threads.nthreads(),
77
min_elems=1,
8-
prefer_threads::Bool=true,
98
109
# GPU settings
1110
block_size=256,
@@ -54,7 +53,6 @@ end
5453
# CPU settings
5554
max_tasks=Threads.nthreads(),
5655
min_elems=1,
57-
prefer_threads::Bool=true,
5856
5957
# GPU settings
6058
block_size=256,

src/predicates.jl

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,6 @@ end
3939
# CPU settings
4040
max_tasks=Threads.nthreads(),
4141
min_elems=1,
42-
prefer_threads::Bool=true,
4342
4443
# GPU settings
4544
block_size::Int=256,
@@ -54,9 +53,7 @@ reduction.
5453
## CPU
5554
Multithreaded parallelisation is only worth it for large arrays, relatively expensive predicates,
5655
and/or rare occurrence of true; use `max_tasks` and `min_elems` to only use parallelism when worth
57-
it in your application. When only one thread is needed, there is no overhead. `prefer_threads`
58-
tells AK to prioritize using the CPU algorithm implementation (default behaviour) over the KA
59-
algorithm through POCL.
56+
it in your application. When only one thread is needed, there is no overhead.
6057
6158
## GPU
6259
There are two possible `alg` choices:
@@ -176,7 +173,6 @@ end
176173
# CPU settings
177174
max_tasks=Threads.nthreads(),
178175
min_elems=1,
179-
prefer_threads::Bool=true,
180176
181177
# GPU settings
182178
block_size::Int=256,
@@ -191,9 +187,7 @@ reduction.
191187
## CPU
192188
Multithreaded parallelisation is only worth it for large arrays, relatively expensive predicates,
193189
and/or rare occurrence of true; use `max_tasks` and `min_elems` to only use parallelism when worth
194-
it in your application. When only one thread is needed, there is no overhead. `prefer_threads`
195-
tells AK to prioritize using the CPU algorithm implementation (default behaviour) over the KA
196-
algorithm through POCL.
190+
it in your application. When only one thread is needed, there is no overhead.
197191
198192
## GPU
199193
There are two possible `alg` choices:

src/reduce/reduce.jl

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ include("mapreduce_nd.jl")
1515
# CPU settings
1616
max_tasks::Int=Threads.nthreads(),
1717
min_elems::Int=1,
18-
prefer_threads::Bool=true,
1918
2019
# GPU settings
2120
block_size::Int=256,
@@ -32,8 +31,7 @@ The returned type is the same as `init` - to control output precision, specify `
3231
## CPU settings
3332
Use at most `max_tasks` threads with at least `min_elems` elements per task. For N-dimensional
3433
arrays (`dims::Int`) multithreading currently only becomes faster for `max_tasks >= 4`; all other
35-
cases are scaling linearly with the number of threads. `prefer_threads` tells AK to prioritize
36-
using the CPU algorithm implementation (default behaviour) over the KA algorithm through POCL.
34+
cases are scaling linearly with the number of threads.
3735
3836
Note that multithreading reductions only improves performance for cases with more compute-heavy
3937
operations, which hide the memory latency and thread launch overhead - that includes:
@@ -100,7 +98,6 @@ end
10098
# CPU settings
10199
max_tasks::Int=Threads.nthreads(),
102100
min_elems::Int=1,
103-
prefer_threads::Bool=true,
104101
105102
# GPU settings
106103
block_size::Int=256,
@@ -120,8 +117,7 @@ The returned type is the same as `init` - to control output precision, specify `
120117
## CPU settings
121118
Use at most `max_tasks` threads with at least `min_elems` elements per task. For N-dimensional
122119
arrays (`dims::Int`) multithreading currently only becomes faster for `max_tasks >= 4`; all other
123-
cases are scaling linearly with the number of threads. `prefer_threads` tells AK to prioritize
124-
using the CPU algorithm implementation (default behaviour) over the KA algorithm through POCL.
120+
cases are scaling linearly with the number of threads.
125121
126122
## GPU settings
127123
The `block_size` parameter controls the number of threads per block.

src/searchsorted.jl

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,6 @@ end
8080
# CPU settings
8181
max_tasks::Int=Threads.nthreads(),
8282
min_elems::Int=1000,
83-
prefer_threads::Bool=true,
8483
8584
# GPU settings
8685
block_size::Int=256,
@@ -129,7 +128,6 @@ end
129128
# CPU settings
130129
max_tasks::Int=Threads.nthreads(),
131130
min_elems::Int=1000,
132-
prefer_threads::Bool=true,
133131
134132
# GPU settings
135133
block_size::Int=256,
@@ -165,7 +163,6 @@ end
165163
# CPU settings
166164
max_tasks::Int=Threads.nthreads(),
167165
min_elems::Int=1000,
168-
prefer_threads::Bool=true,
169166
170167
# GPU settings
171168
block_size::Int=256,
@@ -214,7 +211,6 @@ end
214211
# CPU settings
215212
max_tasks::Int=Threads.nthreads(),
216213
min_elems::Int=1000,
217-
prefer_threads::Bool=true,
218214
219215
# GPU settings
220216
block_size::Int=256,

src/sort/sort.jl

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@ include("cpu_sample_sort.jl")
2121
# CPU settings
2222
max_tasks=Threads.nthreads(),
2323
min_elems=1,
24-
prefer_threads::Bool=true,
2524
2625
# GPU settings
2726
block_size::Int=256,
@@ -37,8 +36,6 @@ arguments are the same as for `Base.sort`.
3736
CPU settings: use at most `max_tasks` threads to sort the array such that at least `min_elems`
3837
elements are sorted by each thread. A parallel [`sample_sort!`](@ref) is used, processing
3938
independent slices of the array and deferring to `Base.sort!` for the final local sorts.
40-
`prefer_threads` tells AK to prioritize using the CPU algorithm implementation (default behaviour)
41-
over the KA algorithm through POCL.
4239
4340
Note that the Base Julia `sort!` is mainly memory-bound, so multithreaded sorting only becomes
4441
faster if it is a more compute-heavy operation to hide memory latency - that includes:
@@ -129,7 +126,6 @@ end
129126
# CPU settings
130127
max_tasks=Threads.nthreads(),
131128
min_elems=1,
132-
prefer_threads::Bool=true,
133129
134130
# GPU settings
135131
block_size::Int=256,
@@ -166,7 +162,6 @@ end
166162
# CPU settings
167163
max_tasks=Threads.nthreads(),
168164
min_elems=1,
169-
prefer_threads::Bool=true,
170165
171166
# GPU settings
172167
block_size::Int=256,
@@ -243,7 +238,6 @@ end
243238
# CPU settings
244239
max_tasks=Threads.nthreads(),
245240
min_elems=1,
246-
prefer_threads::Bool=true,
247241
248242
# GPU settings
249243
block_size::Int=256,

test/runtests.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ include("partition.jl")
6969
include("looping.jl")
7070
include("map.jl")
7171
include("sort.jl")
72-
include("reduce.jl")
72+
prefer_threads && include("reduce.jl") # Reduce is very broken when using the KA CPU backend
7373
include("accumulate.jl")
7474
include("predicates.jl")
7575
include("binarysearch.jl")

0 commit comments

Comments
 (0)