-
Notifications
You must be signed in to change notification settings - Fork 31
Add alternative implementation of device timer to SyclTimer class #1872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_149 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_150 ran successfully. |
52e211c to
9322201
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_174 ran successfully. |
9322201 to
ecd1c17
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_177 ran successfully. |
ecd1c17 to
f4fa901
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_199 ran successfully. |
SyclTimer now supports device_timer keyword argument, a legacy behavior "queue_barrier", and new one based on sequential order manager, which inserts an empty task into the manager to record start and end of block of timed code. Docstring of SyclTimer updated. All data attributes needed for functioning of the timer are created during class instance construction now.
Check different device_timer values, test argument validation, and test cumulative timing.
f4fa901 to
d1011c5
Compare
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_209 ran successfully. |
|
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_210 ran successfully. |
ndgrigorian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR adds Python API to submit empty body single task to a queue.
dpctl.SyclTimeris modified to acquiredevice_timerkeyword argument with supported values being"queue_barrier"(legacy behavior, a default), and"order_manager".With
"order_manager", timer submits the empty body single tasks (fence tasks) to the queue, using order manager to order them so as to fence timed submissions. For example, execution of the following snippet:results in a task graph
[prior_tasks] -> [fence_start_task] -> [ compute_tasks] -> [fence_end_task] -> [subsequent_tasks].Timer uses profiling data from events associated with fence tasks to estimate execution time of compute tasks as measured by the device's timer.
The
device_timer="order_manager"is useful to timedpctl.tensoroperations which leverage order manager.