-
Notifications
You must be signed in to change notification settings - Fork 185
Initial TBR chapter. #338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Initial TBR chapter. #338
Conversation
…utorial when it is published.
|
Can I ask if a lot of this was written by AI? I'm very surprised by a lot of the text. Also the notion that one might as a SW developer 'choose' between TBR vs IMR, and have code trying to determine what to pick (ref the |
|
Parts were written by AI, specifically the power consumption analyzer code as that is outside my normal wheelhouse; but looking at the references they looked solid and I edited it to make it read mostly correct to me. So, genesis by AI sure, but heavy human editing. |
|
I'm not able to a point by point feedback, but for example the |
|
Okay, no worries, I'll rewrite it. |
|
Probably not too useful to drop more 'random' drive-by comments like this, but for another example I think none of the use-cases mentioned for VK_EXT_shader_tile_image (bloom, edge-detection, FXAA, SSR) makes sense as the extension only gives access to the current pixel while all of these effects need access to other pixels. FWIW I've ping some folks here at Arm to see if we can help review and support development of the guide -- I think it's a great initiative to be clear, but it probably needs some close review especially as there is not too much good and up-to-date public info about current mobile GPUs to pull from (hence also why the idea of the guide is good, of course) :) |
|
Thanks it's MUCH appreciated. I'm by far not the best expert at TBR; and I really want to try to get updated information out there. There's a reason I read all of the research articles linked and tried to put as much research into this chapter as I could. If we could get more details and more review, I'm much happier. Soon as I get a chance, I'm going to update from the comments already generated here. |
|
The chapter title is Tile-Based Rendering Best Practices, but most of what it talks about is nothing to do with tile-based rendering but related to other aspects of vendor-specific implementation detail or orthogonal mobile GPU issues (constant registers, coherent memory, thermal, etc). For a Vulkan guide I'd probably split this up - having a topic focused only on the effects of being tile based is useful and the rest is somewhat a distraction. The most important things for tilers (good use of loadOp/storeOp) seems to be buried right at the end, and the second most important (good use of pipeline barriers to get pipelining) isn't mentioned at all. |
|
Not that much of a hardware guy, but isn't laziliy allocated memory / transient attachment and important Vulkan concept for TBRs? If so might be good to add that. |
|
And I second the remarks about the power consumption part of that chapter. I tried to understand the code and data, but felt kinda lost. Wouldn't stuff like that require querying vendor specific apis to get real world power usage? Didn't see that mentioned anywhere. |
|
Also some of the links don't point to anything usefull, e.g. these: Imagination PowerVR Architecture Guide: Shows tile memory providing 10-20x bandwidth compared to external memory Qualcomm Adreno Performance Guide: Demonstrates GMEM (tile memory) efficiency in mobile gaming scenarios NVIDIA Tegra TBR Analysis: Research paper showing 60% power reduction through bandwidth optimization IEEE Computer Graphics and Applications: Tile-Based Rendering analysis and improvements research IEEE Transactions on Computers: Thermal management in mobile graphics processing research Either point to or redirect to a (company) landing page instead of the linked e.g. "Research papers" or documents. |
|
And other links don't make sense, e.g. this: Vulkan-Hpp: Modern C++ bindings with TBR optimization examples That links to the Vulkan-Hpp headers, I don't see why or how that relates to TBR optimizations? |
|
I'm going to rewrite this. Sorry not ready for prime time. |
|
Huawei Maleoon GPU Guide: Maleoon GPU Rendering Optimization |
|
|
||
| - **Attachment Configuration**: Final attachments use `VK_ATTACHMENT_STORE_OP_STORE`, intermediate attachments use `VK_ATTACHMENT_STORE_OP_DONT_CARE` | ||
| - **Load Operations**: Use `VK_ATTACHMENT_LOAD_OP_CLEAR` for new content, `VK_ATTACHMENT_LOAD_OP_DONT_CARE` for intermediate results | ||
| - **MSAA Efficiency**: TBR handles 4x MSAA efficiently due to tile memory resolve capabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I'd call out 4x specifically - makes it sound like you should prefer it over 2x, or 8x for example - which I'd not say is generic advice. Though tile memory resolve can be a good source of performance gain if you are going to be using MSAA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed that advice to ensure it's clear.
|
|
||
| **Tile Memory Management Strategies:** | ||
|
|
||
| - **Memory Calculation**: Typical tile memory 512KB, calculate usage based on tile size (32x32 pixels), format size, and sample count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calculation is not easy to perform for a developer as the determinations are not quite as simple as that. Different formats might not be stored in tile memory in the way you might naively expect. Also how MSAA affects tile size is also not widely documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, removed
| === Half-Precision Float Optimization | ||
|
|
||
| Using half-precision floats in shaders can speed up execution and reduce bandwidth on mobile TBR devices. Use low-precision numbers in fragment and compute shaders when visual quality permits: | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth pointing out that mediump should be checked on as many devices as possible as they act as something of a hint - using mediump and testing only on one device that mayu under the hood still be using F32 can be very misleading and lead to visual issues on devices actually employing mediump.
I'd still recommend using it whenever possible, but it might be a worth while note/pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it to ensure the information I have in here is correct and something I've been able to verify. If you recommend adding it back, I'd be happy to.
…ines, and implementation-agnostic practices.
| **Bandwidth Optimization Strategies:** | ||
|
|
||
| - **Attachment configuration**: Final attachments use `VK_ATTACHMENT_STORE_OP_STORE`; intermediate attachments use `VK_ATTACHMENT_STORE_OP_DONT_CARE` when you do not need the results. | ||
| - **Load operations**: Use `VK_ATTACHMENT_LOAD_OP_CLEAR` for new content; `VK_ATTACHMENT_LOAD_OP_DONT_CARE` for intermediate results you overwrite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure here. I think that the loadOp can also be set to dont_care when rendering opaque objects, even for new content.
|
|
||
| **Advanced TBR considerations:** | ||
|
|
||
| - Use subpasses and `VK_DEPENDENCY_BY_REGION_BIT` to enable local data reuse where beneficial; always measure on target devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth mentioning here the subpassLoad operator to read pixel value from tile memory.
|
|
||
| - No explicit on-chip tile memory model exposed to applications. | ||
| - Overdraw tends to generate more external memory traffic than on tilers; minimizing overdraw is important. | ||
| - Applications should rely on standard Vulkan techniques (early depth/stencil, appropriate load/store ops, and subpasses where helpful) and profile on target devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am seeing "profile on target devices", "measure on target devices", "profiling results on target hardware" many times in this documentation. This kind of redundant phrases should be cleaned up.
|
Currently information is scattered in various corners. And same information appears a few times including thing like "profile on target devices", or "Tile size not exposed by core Vulkan". |
NB, fix the TBR link to the Simple Game Engine tutorial when it is published.