-
Notifications
You must be signed in to change notification settings - Fork 382
Roofline quantized conv3d/2d layer #3419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3419
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 44ce800 with merge base 73730e8 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
8fba23d to
79cdaec
Compare
| and not is_sm_at_least_100() | ||
| ) | ||
|
|
||
| if skip_conv_benchmarks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel conditions seems a bit convoluted here
maybe:
if do_benchmarks:
if op_name in ("conv2d", "conv3d") and not is_sm_at_least_100():
print warning
else:
# can also move this part to a function to make it clearer
....
| r_speedup = None | ||
| # use roofline model to estimate gemm time using equivalent GEMM dims | ||
| r_bf16_gemm_time_s = float( | ||
| bf16_gemm_time_sympy.subs(M, gemm_M).subs(K, gemm_K).subs(N, gemm_N) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the memory operations of conv the same as linear well?
ao/torchao/testing/training/roofline_utils.py
Line 332 in 0975a40
| mem_gemm_time_s = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As conv is an implicit gemm, I'm assuming the memory operations for gemm and conv should be same.
|
|
||
| # real gemm benchmark time, also not added yet | ||
| # if enabled, also measured observed gemm time | ||
| # gemm benchmarks for conv not implemented, as conv uses implicit GEMM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should run the conv ops I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're running conv op in benchmarks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a bit different I think, that one is doing e2e speedup.
we can do the same that linear is doing:
| def get_gemm_times( |
No description provided.