@@ -4,13 +4,54 @@ title: Known Issues and Current Limitations
44sidebar_label : Issues & Limitations
55---
66
7- > ** This section is still a work in progress.**
7+ While Celerity can already do a lot, there still are some things it cannot
8+ do. This is usually either because of a SYCL limitation, because we are still
9+ figuring out how to fit certain functionality into the programming model,
10+ or because we simply haven't had the time yet to implement a given feature.
11+ If you are blocked by any of these or other issues, please
12+ [ let us know] ( https://github.com/celerity/celerity-runtime/issues/new ) .
813
9- Incomplete and very brief list of current issues:
14+ Here is a (potentially incomplete) list of currently known issues:
1015
11- - No weak scaling (yet).
12- - No form of control flow (yet).
13- - Only simple ` parallel_for ` execution (for now).
16+ ## No Reductions
17+
18+ Celerity currently offers no dedicated API for performing distributed
19+ reduction operations. While the experimental support for [ collective host
20+ tasks] ( host-tasks.md#experimental-collective-host-tasks ) allows to implement
21+ distributed reductions using e.g. ` MPI_Reduce ` , the calculations have to be
22+ performed on the host. First-class support for device-accelerated distributed
23+ reductions will be added to Celerity in the future.
24+
25+ We are currently evaluating the reduction functionalities proposed in the
26+ [ SYCL 2020 Provisional Specification] ( https://www.khronos.org/registry/SYCL/ ) ,
27+ and how we could build a distributed variant on top of it.
28+
29+ ## No Control Flow
30+
31+ In some situations, the number of Celerity tasks required for a computation
32+ may not be known fully in advance. For example, when using an iterative
33+ method, a kernel might be repeated until some error metric threshold is
34+ reached. Celerity currently offers no canonical way of incorporating such
35+ branching decisions into the data flow execution graph.
36+
37+ That being said, it is not impossible to achieve this behavior today. For
38+ example, the branching decision can be made within a [ distributed host
39+ task] ( host-tasks.md#distributed-host-tasks ) and then relayed into the main
40+ execution thread. The latter waits using
41+ ` celerity::distr_queue::slow_full_sync ` until a corresponding predicate has
42+ been set, and then continues submitting Celerity tasks depending on the
43+ predicate.
44+
45+ ## Only Basic ` parallel_for ` Overload
46+
47+ Due to various rather technical issues with the SYCL 1.2.1 standard, Celerity
48+ is currently unable to support the ` nd_range ` overload of ` parallel_for ` , as
49+ well as ` parallel_for_work_group ` . However, thanks to improvements made in
50+ SYCL 2020, Celerity will be able to support the former as soon as SYCL
51+ implementations catch up, giving users explicit control over work group
52+ sizes and access to local shared memory.
53+
54+ ---
1455
1556If you encounter any additional issues, please [ let us
1657know] ( https://github.com/celerity/celerity-runtime/issues/new ) .
0 commit comments