Skip to content

missalignment problem #46

@l1346792580123

Description

@l1346792580123

Thanks for opening source such great work! I have a problem about the missalignment between the generated video and the input low quality video. According to the largest_8n1_leq function the length of the low quallity video is 8n+1, while the length of the noise input to the diffusion model is 2n, corresponding to 8n-3 frames of the generated video. Moreover, the first 6 noise takes first 25 frames of low quality video as condition and each subsequent noise corresponds to 4 frames of low-resolution video. 8n+1 frames of low quality video are needed as input to the diffusion model. While when the latents are converted into video by the TCDecoder, only 8n-3 frames of low quality video are needed. Why is there a missalignment between the input video and the generated video?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions