Skip to content

Fine-tuning Axial RoPE with frequency scaling? #32

@tasansal

Description

@tasansal

Hi @lucidrains

We have trained a 3D ViT masked autoencoder using axial RoPE for an image size of 512x512x512 (3D scientific images, sampled from much larger volumes). Now I want to try fine-tuning the pre-trained model for larger (i.e. 1024x1024x1024) context size. However, it doesn't seem obvious to how. I am especially unclear on how to calculate the scale for axial RoPE correctly. Important note: we are not resizing the images; we tile the larger image with these "mini-cubes". So, going up in size means we have more context.

I would love to hear your feedback on how to do this properly. Below is my thought process (and please correct me where I am wrong!).

Normally, with 1D RoPE, we have the theta_rescale_factor, which changes freqs in RoPE directly. However, when freqs_for is set to pixel, the theta parameter isn't used to build freqs, which is probably fine since we don't have a single sequence and reuse [-1, 1] range for axial.

Anyhow, assuming above is fine, we apply axial RoPE with apply_rotary_emb instead of rotate_queries_and_keys. It seems like rotate_queries_and_keys does use get_scale to calculate the scale and apply it to q/k separately. But if caching is disabled, is the scale hard coded to be 1?

Q1: Would it make sense to implement the same logic in rotate_queries_and_keys to do it with axial variant?

Q2: Maybe an ignorant question, but why scale q with scale and k with scale**-1?

Q3: Is it OK to apply the scale directly using apply_rotary_emb and then fine-tune the model?

Q4: Is the scaling linear to the size of the dimension change? i.e., if I double the resolution, should the scale be 2.0 in that direction? Or do we need to account for diagonal distances etc in N-D case?

Q5: Is there any writeup (paper, pre-print etc) about axial-RoPE?

I may be completely off and need to understand the logic better. If that's the case, I would appreciate any help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions