@@ -14,10 +14,10 @@ in order to maintain data coherence across the cluster. When a kernel exhibits a
1414access pattern, two core requirements are easy to violate from user code – even when using
1515Celerity's built-in range mappers.
1616
17- > A work item ` item ` must never access the buffer outside the range of ` range_mapper(chunk) ` ,
18- > where ` chunk ` is any chunk of the iteration space that contains ` item ` . Out-of-bounds accesses
19- > can be detected at runtime by enabling the ` CELERITY_ACCESSOR_BOUNDARY_CHECK ` CMake option at
20- > the cost of some runtime overhead (enabled by default in debug builds).
17+ ### Out-Of-Bounds Accesses
18+
19+ A work item ` item ` must never access the buffer outside the range of ` range_mapper(chunk) ` ,
20+ where ` chunk ` is any chunk of the iteration space that contains ` item ` :
2121
2222``` cpp
2323// INCORRECT example: access pattern inside the kernel does not follow the range mapper
@@ -31,10 +31,14 @@ celerity::distr_queue().submit([&](celerity::handler &cgh) {
3131});
3232```
3333
34- > Range mappers for `write_only` or `read_write` accessors must never produce overlapping buffer
35- > ranges for non-overlapping chunks of the iteration space. This means that none of the `all`,
36- > `fixed`, `neighborhood` or `slice` built-in range mappers can be used for a writing access
37- > (unless the kernel only operates on a single work item).
34+ > Out-of-bounds accesses can be detected at runtime by enabling the
35+ > `CELERITY_ACCESSOR_BOUNDARY_CHECK` CMake option at the cost of some runtime
36+ > overhead (enabled by default in debug builds).
37+
38+ ### Overlapping Writes
39+
40+ Range mappers for `write_only` or `read_write` accessors must never produce overlapping buffer
41+ ranges for non-overlapping chunks of the iteration space.
3842
3943A likely beginner mistake is to violate the second constraint when implementing a **stencil code**.
4044The first intuition might be to operate on a single buffer using a `read_write` accessor together
@@ -80,6 +84,10 @@ Note that this is not just a Celerity limitation, but inherent to the implementa
8084on GPUs, which must avoid races between reads and writes through a strategy like double-buffering
8185anyway.
8286
87+ > None of the `all`, `fixed`, `neighborhood` or `slice` built-in range mappers
88+ > can be used for a writing access (unless the kernel only operates on a single
89+ > work item).
90+
8391## Illegal Reference Captures in Kernel and Host Task Functions
8492
8593Celerity tasks submitted to the `celerity::distr_queue` are executed
0 commit comments