Release 0.17.0 · IntelPython/dpctl

This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions,
and complies with revision 2023.12 of Python Array API specification.

Added pybind11 caster for sycl::half to map to/from Python float to "dpctl4pybind11.hpp" header: gh-1655
Added support for DLPack data interchange per Python Array API 2023.12 specification: gh-1667
Implemented tensor.cumulative_sum, tensor.cumulative_prod and tensor.cumulative_logsumexp: gh-1602

Expanded documentation for dpctl: gh-1619
Expanded utils.intel_device_info functionality: gh-1656
Improved performance of elementwise operations: gh-1651
Efficiency improvement by avoiding unnecessary copying of sycl::queue: gh-1645
dpctl uses pybind11 2.12.0: gh-1640
Improved performance of tensor.reshape operation with order="F" when copying is needed, or requested: gh-1677

Fixed initialization of byte type constants in dpctl_capi Python/C API loader class in "dpctl4pybind11.hpp": gh-1665
Fixed crash in tensor.sort reported for a CPU device and a CUDA device: gh-1676
Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: gh-1624
Fixed comparison operators for mixed signed and unsigned integral types: gh-1650
Support use of index arrays of different integral types in indexing operations: gh-47
Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: gh-1630
Corrected tensor.tile for scalar inputs and empty repetitions: gh-1628
Fixed support for out keyword in tensor.matmul: gh-1610
Fixed bug in basic slicing of empty arrays: gh-1680
Fixed bug in tensor.bitwise_invert for boolean input array: gh-1681
Fixed bug in tensor.repeat on zero-size input arrays: gh-1682

New Contributors

Full Changelog: https://github.com/IntelPython/dpctl/blob/master/CHANGELOG.md