Skip to content

Commit 8d89077

Browse files
committed
Docs: Update section on Issues & Limitations
1 parent 36c0ffa commit 8d89077

File tree

1 file changed

+46
-5
lines changed

1 file changed

+46
-5
lines changed

docs/issues-and-limitations.md

Lines changed: 46 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,54 @@ title: Known Issues and Current Limitations
44
sidebar_label: Issues & Limitations
55
---
66

7-
> **This section is still a work in progress.**
7+
While Celerity can already do a lot, there still are some things it cannot
8+
do. This is usually either because of a SYCL limitation, because we are still
9+
figuring out how to fit certain functionality into the programming model,
10+
or because we simply haven't had the time yet to implement a given feature.
11+
If you are blocked by any of these or other issues, please
12+
[let us know](https://github.com/celerity/celerity-runtime/issues/new).
813

9-
Incomplete and very brief list of current issues:
14+
Here is a (potentially incomplete) list of currently known issues:
1015

11-
- No weak scaling (yet).
12-
- No form of control flow (yet).
13-
- Only simple `parallel_for` execution (for now).
16+
## No Reductions
17+
18+
Celerity currently offers no dedicated API for performing distributed
19+
reduction operations. While the experimental support for [collective host
20+
tasks](host-tasks.md#experimental-collective-host-tasks) allows to implement
21+
distributed reductions using e.g. `MPI_Reduce`, the calculations have to be
22+
performed on the host. First-class support for device-accelerated distributed
23+
reductions will be added to Celerity in the future.
24+
25+
We are currently evaluating the reduction functionalities proposed in the
26+
[SYCL 2020 Provisional Specification](https://www.khronos.org/registry/SYCL/),
27+
and how we could build a distributed variant on top of it.
28+
29+
## No Control Flow
30+
31+
In some situations, the number of Celerity tasks required for a computation
32+
may not be known fully in advance. For example, when using an iterative
33+
method, a kernel might be repeated until some error metric threshold is
34+
reached. Celerity currently offers no canonical way of incorporating such
35+
branching decisions into the data flow execution graph.
36+
37+
That being said, it is not impossible to achieve this behavior today. For
38+
example, the branching decision can be made within a [distributed host
39+
task](host-tasks.md#distributed-host-tasks) and then relayed into the main
40+
execution thread. The latter waits using
41+
`celerity::distr_queue::slow_full_sync` until a corresponding predicate has
42+
been set, and then continues submitting Celerity tasks depending on the
43+
predicate.
44+
45+
## Only Basic `parallel_for` Overload
46+
47+
Due to various rather technical issues with the SYCL 1.2.1 standard, Celerity
48+
is currently unable to support the `nd_range` overload of `parallel_for`, as
49+
well as `parallel_for_work_group`. However, thanks to improvements made in
50+
SYCL 2020, Celerity will be able to support the former as soon as SYCL
51+
implementations catch up, giving users explicit control over work group
52+
sizes and access to local shared memory.
53+
54+
---
1455

1556
If you encounter any additional issues, please [let us
1657
know](https://github.com/celerity/celerity-runtime/issues/new).

0 commit comments

Comments
 (0)