Skip to content

Conversation

@gpx1000
Copy link
Contributor

@gpx1000 gpx1000 commented Aug 4, 2025

NB, fix the TBR link to the Simple Game Engine tutorial when it is published.

@cforfang
Copy link

cforfang commented Aug 5, 2025

Can I ask if a lot of this was written by AI? I'm very surprised by a lot of the text. Also the notion that one might as a SW developer 'choose' between TBR vs IMR, and have code trying to determine what to pick (ref the PowerConsumptionAnalyzer code) is very strange to me.

@gpx1000
Copy link
Contributor Author

gpx1000 commented Aug 5, 2025

Parts were written by AI, specifically the power consumption analyzer code as that is outside my normal wheelhouse; but looking at the references they looked solid and I edited it to make it read mostly correct to me. So, genesis by AI sure, but heavy human editing.

@cforfang
Copy link

cforfang commented Aug 5, 2025

I'm not able to a point by point feedback, but for example the VK_EXT_robustness2 section seems like total nonsense to me. And claims about use of VK_KHR_dynamic_rendering_local_read by Unity and Unreal is as far as I know also not true. As I scroll through the guide there is in general a lot of strange claims and commentary I think.

@gpx1000
Copy link
Contributor Author

gpx1000 commented Aug 5, 2025

Okay, no worries, I'll rewrite it.

@cforfang
Copy link

cforfang commented Aug 5, 2025

Probably not too useful to drop more 'random' drive-by comments like this, but for another example I think none of the use-cases mentioned for VK_EXT_shader_tile_image (bloom, edge-detection, FXAA, SSR) makes sense as the extension only gives access to the current pixel while all of these effects need access to other pixels.

FWIW I've ping some folks here at Arm to see if we can help review and support development of the guide -- I think it's a great initiative to be clear, but it probably needs some close review especially as there is not too much good and up-to-date public info about current mobile GPUs to pull from (hence also why the idea of the guide is good, of course) :)

@gpx1000
Copy link
Contributor Author

gpx1000 commented Aug 5, 2025

Thanks it's MUCH appreciated. I'm by far not the best expert at TBR; and I really want to try to get updated information out there. There's a reason I read all of the research articles linked and tried to put as much research into this chapter as I could. If we could get more details and more review, I'm much happier. Soon as I get a chance, I'm going to update from the comments already generated here.

@solidpixel
Copy link

solidpixel commented Aug 5, 2025

The chapter title is Tile-Based Rendering Best Practices, but most of what it talks about is nothing to do with tile-based rendering but related to other aspects of vendor-specific implementation detail or orthogonal mobile GPU issues (constant registers, coherent memory, thermal, etc). For a Vulkan guide I'd probably split this up - having a topic focused only on the effects of being tile based is useful and the rest is somewhat a distraction.

The most important things for tilers (good use of loadOp/storeOp) seems to be buried right at the end, and the second most important (good use of pipeline barriers to get pipelining) isn't mentioned at all.

@SaschaWillems
Copy link
Collaborator

Not that much of a hardware guy, but isn't laziliy allocated memory / transient attachment and important Vulkan concept for TBRs? If so might be good to add that.

@SaschaWillems
Copy link
Collaborator

And I second the remarks about the power consumption part of that chapter. I tried to understand the code and data, but felt kinda lost. Wouldn't stuff like that require querying vendor specific apis to get real world power usage? Didn't see that mentioned anywhere.

@SaschaWillems
Copy link
Collaborator

SaschaWillems commented Aug 5, 2025

Also some of the links don't point to anything usefull, e.g. these:

Imagination PowerVR Architecture Guide: Shows tile memory providing 10-20x bandwidth compared to external memory

Qualcomm Adreno Performance Guide: Demonstrates GMEM (tile memory) efficiency in mobile gaming scenarios

NVIDIA Tegra TBR Analysis: Research paper showing 60% power reduction through bandwidth optimization

IEEE Computer Graphics and Applications: Tile-Based Rendering analysis and improvements research

IEEE Transactions on Computers: Thermal management in mobile graphics processing research

Either point to or redirect to a (company) landing page instead of the linked e.g. "Research papers" or documents.

@SaschaWillems
Copy link
Collaborator

And other links don't make sense, e.g. this:

Vulkan-Hpp: Modern C++ bindings with TBR optimization examples

That links to the Vulkan-Hpp headers, I don't see why or how that relates to TBR optimizations?

@gpx1000
Copy link
Contributor Author

gpx1000 commented Aug 5, 2025

I'm going to rewrite this. Sorry not ready for prime time.

@ZehuiLin-Huawei
Copy link

Huawei Maleoon GPU Guide: Maleoon GPU Rendering Optimization


- **Attachment Configuration**: Final attachments use `VK_ATTACHMENT_STORE_OP_STORE`, intermediate attachments use `VK_ATTACHMENT_STORE_OP_DONT_CARE`
- **Load Operations**: Use `VK_ATTACHMENT_LOAD_OP_CLEAR` for new content, `VK_ATTACHMENT_LOAD_OP_DONT_CARE` for intermediate results
- **MSAA Efficiency**: TBR handles 4x MSAA efficiently due to tile memory resolve capabilities

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'd call out 4x specifically - makes it sound like you should prefer it over 2x, or 8x for example - which I'd not say is generic advice. Though tile memory resolve can be a good source of performance gain if you are going to be using MSAA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed that advice to ensure it's clear.


**Tile Memory Management Strategies:**

- **Memory Calculation**: Typical tile memory 512KB, calculate usage based on tile size (32x32 pixels), format size, and sample count

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calculation is not easy to perform for a developer as the determinations are not quite as simple as that. Different formats might not be stored in tile memory in the way you might naively expect. Also how MSAA affects tile size is also not widely documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, removed

=== Half-Precision Float Optimization

Using half-precision floats in shaders can speed up execution and reduce bandwidth on mobile TBR devices. Use low-precision numbers in fragment and compute shaders when visual quality permits:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth pointing out that mediump should be checked on as many devices as possible as they act as something of a hint - using mediump and testing only on one device that mayu under the hood still be using F32 can be very misleading and lead to visual issues on devices actually employing mediump.

I'd still recommend using it whenever possible, but it might be a worth while note/pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it to ensure the information I have in here is correct and something I've been able to verify. If you recommend adding it back, I'd be happy to.

…ines, and implementation-agnostic practices.
**Bandwidth Optimization Strategies:**

- **Attachment configuration**: Final attachments use `VK_ATTACHMENT_STORE_OP_STORE`; intermediate attachments use `VK_ATTACHMENT_STORE_OP_DONT_CARE` when you do not need the results.
- **Load operations**: Use `VK_ATTACHMENT_LOAD_OP_CLEAR` for new content; `VK_ATTACHMENT_LOAD_OP_DONT_CARE` for intermediate results you overwrite.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure here. I think that the loadOp can also be set to dont_care when rendering opaque objects, even for new content.


**Advanced TBR considerations:**

- Use subpasses and `VK_DEPENDENCY_BY_REGION_BIT` to enable local data reuse where beneficial; always measure on target devices.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth mentioning here the subpassLoad operator to read pixel value from tile memory.


- No explicit on-chip tile memory model exposed to applications.
- Overdraw tends to generate more external memory traffic than on tilers; minimizing overdraw is important.
- Applications should rely on standard Vulkan techniques (early depth/stencil, appropriate load/store ops, and subpasses where helpful) and profile on target devices.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing "profile on target devices", "measure on target devices", "profiling results on target hardware" many times in this documentation. This kind of redundant phrases should be cleaned up.

@ZehuiLin-Huawei
Copy link

Currently information is scattered in various corners. And same information appears a few times including thing like "profile on target devices", or "Tile size not exposed by core Vulkan".
The documentation structure could be improved by establishing a main line of reasoning and developing the content within the framework of this logic. The current version does not seem to be really useful for developers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants