Skip to content

Feature request: Switch off Unet for DiT #11

@moiseshorta

Description

@moiseshorta

Hello,

I've been reading a lot of the SOTA papers on audio and video generation using Rectified Flows, and it seems most are using Transformers instead of Unets.

Are there any plans to implement such an architecture change? They seem to improve greatly in performance, as in this implementation: https://github.com/cloneofsimo/minRF

Would be great to see it here, as it's a very clear to understand codebase, thanks again for opensourcing it!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions