Skip to content

Commit 8341c51

Browse files
fknorrpsalz
authored andcommitted
Prepare 0.6.0 Release
1 parent 1cd701e commit 8341c51

File tree

12 files changed

+52
-24
lines changed

12 files changed

+52
-24
lines changed

.hdoc.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[project]
44
name = "Celerity"
5-
version = "0.5.0"
5+
version = "0.6.0"
66

77
# Optional, adding this will enable direct links from the documentation
88
# to your source code.

CHANGELOG.md

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,28 +6,56 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
66
and this project adheres to [Semantic
77
Versioning](http://semver.org/spec/v2.0.0.html).
88

9-
## [Unreleased]
9+
## [0.6.0] - 2024-08-12
10+
11+
This release includes changes that may require adjustments when upgrading:
12+
- A single Celerity process can now manage multiple devices.
13+
This means that on a cluster with 4 GPUs per node, only a single MPI rank needs to be spawned per node.
14+
- The previous behavior of having a separate process per device is still supported but discouraged, as it incurs additional overhead.
15+
- It is no longer possible to assign a device to a Celerity process using the `CELERITY_DEVICES` environment variable.
16+
Please use vendor-specific mechanisms (such as `CUDA_VISIBLE_DEVICES`) for limiting the set of visible devices instead.
17+
- We recommend performing a clean build when updating Celerity so that updated submodule dependencies are properly propagated.
1018

1119
We recommend using the following SYCL versions with this release:
1220

13-
- DPC++: ???
21+
- DPC++: 89327e0a or newer
1422
- AdaptiveCpp (formerly hipSYCL): v24.06
15-
- SimSYCL: ???
23+
- SimSYCL: master
1624

1725
See our [platform support guide](docs/platform-support.md) for a complete list of all officially supported configurations.
1826

1927
### Added
2028

2129
- Add support for SimSYCL as a SYCL implementation (#238)
2230
- Extend compiler support to GCC (optionally with sanitizers) and C++20 code bases (#238)
31+
- `celerity::hints::oversubscribe` can be passed to a command group to increase split granularity and improve computation-communication overlap (#249)
32+
- Reductions are now unconditionally supported on all SYCL implementations (#265)
2333
- Add support for profiling with [Tracy](https://github.com/wolfpld/tracy), via `CELERITY_TRACY_SUPPORT` and environment variable `CELERITY_TRACY` (#267)
24-
- The active SYCL implementation can now be queried via `CELERITY_SYCL_IS_*` macros (#??)
34+
- The active SYCL implementation can now be queried via `CELERITY_SYCL_IS_*` macros (#277)
2535

2636
### Changed
2737

28-
- Updated the internal [libenvpp](https://github.com/ph3at/libenvpp) dependency to 1.4.1 and use its new features. (#271)
38+
- All low-level host / device operations such as memory allocations, copies, and kernel launches are now represented in the single Instruction Graph for improved asynchronicity (#249)
39+
- Celerity can now maintain multiple disjoint backing allocations per buffer, so disjoint accesses to the same buffer do not trigger bounding-box allocations (#249)
40+
- The previous implicit size limit of 128 GiB on buffer transfers is lifted (#249, #252)
41+
- Celerity now manages multiple devices per node / MPI rank. This significantly reduces overhead in multi-GPU setups (#265)
42+
- Runtime lifetime is extended until destruction of the last queue, buffer, or host object (#265)
43+
- Host object instances are now destroyed from a runtime background thread instead of the application thread (#265)
44+
- Collective host tasks in the same collective group continue to execute on the same communicator, but not necessarily on the same background thread anymore (#265)
45+
- Updated the internal [libenvpp](https://github.com/ph3at/libenvpp) dependency to 1.4.1 and use its new features (#271)
46+
- Celerity's compile-time feature flags and options are now written to `version.h` instead of being passed on the command line (#277)
47+
48+
### Fixed
49+
50+
- Scheduler tracking structures are now garbage-collected after buffers and host objects go out of scope (#246)
51+
- The previous requirement to order accessors by access mode is lifted (#265)
52+
- SYCL reductions to which only some Celerity nodes contribute partial results would read uninitialized data (#265)
53+
54+
### Removed
2955

30-
*Note:* We recommend performing a clean build when updating Celerity so that updated submodule dependencies are properly propagated.
56+
- Celerity does not attempt to spill device allocations to the host if resizing buffers fails due to an out-of-memory condition (#265)
57+
- The `CELERITY_DEVICES` environment variable is removed in favor of platform-specific visibility specifiers such as `CUDA_VISIBLE_DEVICES` (#265)
58+
- The obsolete `experimental::user_benchmarker` infrastructure has been removed (#268).
3159

3260
## [0.5.0] - 2023-12-21
3361

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
The MIT License (MIT)
22

3-
Copyright (c) 2018-2023 DPS Group, University of Innsbruck, Austria.
3+
Copyright (c) 2018-2024 DPS Group, University of Innsbruck, Austria.
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,13 @@ Celerity without much hassle. If you know SYCL already, this will probably
2828
look very familiar to you:
2929

3030
```cpp
31-
celerity::buffer<float> buf{celerity::range<1>{1024}};
32-
queue.submit([=](celerity::handler& cgh) {
33-
celerity::accessor acc{buf, cgh,
34-
celerity::access::one_to_one{}, // 1
35-
celerity::write_only, celerity::no_init};
36-
cgh.parallel_for<class MyKernel>(
37-
celerity::range<1>{1024}, // 2
31+
celerity::buffer<float> buf(celerity::range(1024));
32+
queue.submit([&](celerity::handler& cgh) {
33+
celerity::accessor acc(buf, cgh,
34+
celerity::access::one_to_one(), // 1
35+
celerity::write_only, celerity::no_init);
36+
cgh.parallel_for(
37+
celerity::range(1024), // 2
3838
[=](celerity::item<1> item) { // 3
3939
acc[item] = sycl::sin(item[0] / 1024.f); // 4
4040
});
@@ -128,4 +128,4 @@ Celerity's runtime behavior:
128128
`fast` for light integration with little runtime overhead, and `full` for
129129
integration with extensive performance debug information included in the trace.
130130
Only available if integration was enabled enabled at build time through the
131-
CMake option `-DCELERITY_TRACY_SUPPORT=ON`.
131+
CMake option `-DCELERITY_TRACY_SUPPORT=ON`.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.5.0
1+
0.6.0

examples/convolution/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
cmake_minimum_required(VERSION 3.13)
22
project(convolution LANGUAGES CXX)
33

4-
find_package(Celerity 0.5.0 REQUIRED)
4+
find_package(Celerity 0.6.0 REQUIRED)
55

66
add_executable(convolution convolution.cc)
77
set_property(TARGET convolution PROPERTY CXX_STANDARD ${CELERITY_CXX_STANDARD})

examples/distr_io/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
cmake_minimum_required(VERSION 3.13)
22
project(distr_io LANGUAGES CXX)
33

4-
find_package(Celerity 0.5.0 REQUIRED)
4+
find_package(Celerity 0.6.0 REQUIRED)
55
find_package(PkgConfig REQUIRED)
66
pkg_search_module(HDF5 REQUIRED IMPORTED_TARGET hdf5-openmpi hdf5-1.12.0 hdf5)
77

examples/hello_world/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
cmake_minimum_required(VERSION 3.13)
22
project(hello_world LANGUAGES CXX)
33

4-
find_package(Celerity 0.5.0 REQUIRED)
4+
find_package(Celerity 0.6.0 REQUIRED)
55

66
add_executable(hello_world hello_world.cc)
77
set_property(TARGET hello_world PROPERTY CXX_STANDARD ${CELERITY_CXX_STANDARD})

examples/matmul/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
cmake_minimum_required(VERSION 3.13)
22
project(matmul LANGUAGES CXX)
33

4-
find_package(Celerity 0.5.0 REQUIRED)
4+
find_package(Celerity 0.6.0 REQUIRED)
55

66
add_executable(matmul matmul.cc)
77
set_property(TARGET matmul PROPERTY CXX_STANDARD ${CELERITY_CXX_STANDARD})

examples/reduction/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
cmake_minimum_required(VERSION 3.13)
22
project(syncing LANGUAGES CXX)
33

4-
find_package(Celerity 0.5.0 REQUIRED)
4+
find_package(Celerity 0.6.0 REQUIRED)
55

66
add_executable(reduction reduction.cc)
77
set_property(TARGET reduction PROPERTY CXX_STANDARD ${CELERITY_CXX_STANDARD})

0 commit comments

Comments
 (0)