You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+34-6Lines changed: 34 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,28 +6,56 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
6
6
and this project adheres to [Semantic
7
7
Versioning](http://semver.org/spec/v2.0.0.html).
8
8
9
-
## [Unreleased]
9
+
## [0.6.0] - 2024-08-12
10
+
11
+
This release includes changes that may require adjustments when upgrading:
12
+
- A single Celerity process can now manage multiple devices.
13
+
This means that on a cluster with 4 GPUs per node, only a single MPI rank needs to be spawned per node.
14
+
- The previous behavior of having a separate process per device is still supported but discouraged, as it incurs additional overhead.
15
+
- It is no longer possible to assign a device to a Celerity process using the `CELERITY_DEVICES` environment variable.
16
+
Please use vendor-specific mechanisms (such as `CUDA_VISIBLE_DEVICES`) for limiting the set of visible devices instead.
17
+
- We recommend performing a clean build when updating Celerity so that updated submodule dependencies are properly propagated.
10
18
11
19
We recommend using the following SYCL versions with this release:
12
20
13
-
- DPC++: ???
21
+
- DPC++: 89327e0a or newer
14
22
- AdaptiveCpp (formerly hipSYCL): v24.06
15
-
- SimSYCL: ???
23
+
- SimSYCL: master
16
24
17
25
See our [platform support guide](docs/platform-support.md) for a complete list of all officially supported configurations.
18
26
19
27
### Added
20
28
21
29
- Add support for SimSYCL as a SYCL implementation (#238)
22
30
- Extend compiler support to GCC (optionally with sanitizers) and C++20 code bases (#238)
31
+
-`celerity::hints::oversubscribe` can be passed to a command group to increase split granularity and improve computation-communication overlap (#249)
32
+
- Reductions are now unconditionally supported on all SYCL implementations (#265)
23
33
- Add support for profiling with [Tracy](https://github.com/wolfpld/tracy), via `CELERITY_TRACY_SUPPORT` and environment variable `CELERITY_TRACY` (#267)
24
-
- The active SYCL implementation can now be queried via `CELERITY_SYCL_IS_*` macros (#??)
34
+
- The active SYCL implementation can now be queried via `CELERITY_SYCL_IS_*` macros (#277)
25
35
26
36
### Changed
27
37
28
-
- Updated the internal [libenvpp](https://github.com/ph3at/libenvpp) dependency to 1.4.1 and use its new features. (#271)
38
+
- All low-level host / device operations such as memory allocations, copies, and kernel launches are now represented in the single Instruction Graph for improved asynchronicity (#249)
39
+
- Celerity can now maintain multiple disjoint backing allocations per buffer, so disjoint accesses to the same buffer do not trigger bounding-box allocations (#249)
40
+
- The previous implicit size limit of 128 GiB on buffer transfers is lifted (#249, #252)
41
+
- Celerity now manages multiple devices per node / MPI rank. This significantly reduces overhead in multi-GPU setups (#265)
42
+
- Runtime lifetime is extended until destruction of the last queue, buffer, or host object (#265)
43
+
- Host object instances are now destroyed from a runtime background thread instead of the application thread (#265)
44
+
- Collective host tasks in the same collective group continue to execute on the same communicator, but not necessarily on the same background thread anymore (#265)
45
+
- Updated the internal [libenvpp](https://github.com/ph3at/libenvpp) dependency to 1.4.1 and use its new features (#271)
46
+
- Celerity's compile-time feature flags and options are now written to `version.h` instead of being passed on the command line (#277)
47
+
48
+
### Fixed
49
+
50
+
- Scheduler tracking structures are now garbage-collected after buffers and host objects go out of scope (#246)
51
+
- The previous requirement to order accessors by access mode is lifted (#265)
52
+
- SYCL reductions to which only some Celerity nodes contribute partial results would read uninitialized data (#265)
53
+
54
+
### Removed
29
55
30
-
*Note:* We recommend performing a clean build when updating Celerity so that updated submodule dependencies are properly propagated.
56
+
- Celerity does not attempt to spill device allocations to the host if resizing buffers fails due to an out-of-memory condition (#265)
57
+
- The `CELERITY_DEVICES` environment variable is removed in favor of platform-specific visibility specifiers such as `CUDA_VISIBLE_DEVICES` (#265)
58
+
- The obsolete `experimental::user_benchmarker` infrastructure has been removed (#268).
0 commit comments