[Bugfix] Fix HunyuanVL XD-RoPE and smart_resize bugs #29425

grider-transwithai · 2025-11-25T18:04:04Z

Purpose

Fix two critical bugs in HunyuanVL implementation that cause incorrect spatial localization (e.g., "squeezed" Y coordinates in detection masks):

XD-RoPE height index zeroing: In get_xdrope_input_positions, the h_index was incorrectly zeroed after being computed, removing vertical spatial awareness for image tokens. Fixed by setting t_index = 0 instead (correct for static images since temporal dimension is always 1).
smart_resize argument swap: The function signature is smart_resize(height, width, ...) returning (h_bar, w_bar), but it was called with (width, height) and assigned to (resized_width, resized_height). This caused incorrect resizing for non-square aspect ratios.

Test Plan

Tested with HunyuanOCR model on images with various aspect ratios. Before the fix, detected text bounding boxes were vertically compressed. After the fix, bounding boxes correctly match text positions.

Test Result

Before: Detection mask Y coordinates were "squeezed" - vertical positions incorrect
After: Detection mask correctly covers full image height with accurate text localization

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Fix two critical bugs in HunyuanVL implementation: 1. XD-RoPE height index zeroing: The h_index was incorrectly being zeroed instead of t_index, removing vertical spatial awareness for image tokens. Fixed by setting t_index to 0 (correct for static images) instead of h_index. 2. smart_resize argument swap: The function was called with (width, height) but expects (height, width), and return values were assigned incorrectly. Fixed argument order and assignment to match function signature. Signed-off-by: grider-transwithai <grider@transwith.ai>

gemini-code-assist

Code Review

This pull request addresses two critical bugs in the HunyuanVL implementation. The first fix correctly handles the temporal index (t_index) for XD-RoPE, which resolves an issue with vertical spatial awareness for image tokens. The second fix corrects a swapped argument issue in the smart_resize function call, ensuring correct image resizing for non-square aspect ratios. Both fixes are correct and well-justified. I've added one suggestion to improve code robustness by using keyword arguments in the smart_resize call, which would have prevented the original bug.

ywang96

Thank you for catching this bug! I'v checked versus the official impl and these look reasonable to me. cc @sergey-tc @ManaEstras

vllm/transformers_utils/processors/hunyuan_vl_image.py

ywang96 · 2025-11-27T08:04:22Z

Hey @grider-transwithai! We have communicated with the Tencent HunyuanOCR team and there's another bug we need to fix on top of this PR, but unfortunately I don't have access to modify your PR so I'm going to open another one that carries over your change and make you co-author. Thanks!

ywang96 · 2025-11-27T08:15:28Z

Closing as superseded by #29593

gemini-code-assist bot reviewed Nov 25, 2025

View reviewed changes

ywang96 approved these changes Nov 25, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 25, 2025

ywang96 enabled auto-merge (squash) November 25, 2025 18:58

ywang96 reviewed Nov 25, 2025

View reviewed changes

vllm/transformers_utils/processors/hunyuan_vl_image.py Show resolved Hide resolved

ywang96 disabled auto-merge November 27, 2025 07:58

ywang96 mentioned this pull request Nov 27, 2025

[Bugfix] Fix HunyuanVL XD-RoPE #29593

Merged

5 tasks

ywang96 closed this Nov 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix HunyuanVL XD-RoPE and smart_resize bugs #29425

[Bugfix] Fix HunyuanVL XD-RoPE and smart_resize bugs #29425

Uh oh!

grider-transwithai commented Nov 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

ywang96 left a comment

Uh oh!

Uh oh!

ywang96 commented Nov 27, 2025

Uh oh!

ywang96 commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bugfix] Fix HunyuanVL XD-RoPE and smart_resize bugs #29425

[Bugfix] Fix HunyuanVL XD-RoPE and smart_resize bugs #29425

Uh oh!

Conversation

grider-transwithai commented Nov 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ywang96 commented Nov 27, 2025

Uh oh!

ywang96 commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

grider-transwithai commented Nov 25, 2025 •

edited by github-actions bot

Loading