Skip to content

Commit cbf4b5e

Browse files
Merge branch 'main' into feat/mag-cache
2 parents a8a57c6 + edf36f5 commit cbf4b5e

File tree

63 files changed

+6009
-95
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+6009
-95
lines changed

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,8 @@
359359
title: HunyuanDiT2DModel
360360
- local: api/models/hunyuanimage_transformer_2d
361361
title: HunyuanImageTransformer2DModel
362+
- local: api/models/hunyuan_video15_transformer_3d
363+
title: HunyuanVideo15Transformer3DModel
362364
- local: api/models/hunyuan_video_transformer_3d
363365
title: HunyuanVideoTransformer3DModel
364366
- local: api/models/latte_transformer3d
@@ -433,6 +435,8 @@
433435
title: AutoencoderKLHunyuanImageRefiner
434436
- local: api/models/autoencoder_kl_hunyuan_video
435437
title: AutoencoderKLHunyuanVideo
438+
- local: api/models/autoencoder_kl_hunyuan_video15
439+
title: AutoencoderKLHunyuanVideo15
436440
- local: api/models/autoencoderkl_ltx_video
437441
title: AutoencoderKLLTXVideo
438442
- local: api/models/autoencoderkl_magvit
@@ -652,6 +656,8 @@
652656
title: Framepack
653657
- local: api/pipelines/hunyuan_video
654658
title: HunyuanVideo
659+
- local: api/pipelines/hunyuan_video15
660+
title: HunyuanVideo1.5
655661
- local: api/pipelines/i2vgenxl
656662
title: I2VGen-XL
657663
- local: api/pipelines/kandinsky5_video

docs/source/en/api/loaders/lora.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
3131
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
3232
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
3333
- [`QwenImageLoraLoaderMixin`] provides similar functions for [Qwen Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwen).
34+
- [`ZImageLoraLoaderMixin`] provides similar functions for [Z-Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/zimage).
3435
- [`Flux2LoraLoaderMixin`] provides similar functions for [Flux2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux2).
3536
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
3637

@@ -112,6 +113,10 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
112113

113114
[[autodoc]] loaders.lora_pipeline.QwenImageLoraLoaderMixin
114115

116+
## ZImageLoraLoaderMixin
117+
118+
[[autodoc]] loaders.lora_pipeline.ZImageLoraLoaderMixin
119+
115120
## KandinskyLoraLoaderMixin
116121
[[autodoc]] loaders.lora_pipeline.KandinskyLoraLoaderMixin
117122

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLHunyuanVideo15
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanVideo1.5](https://github.com/Tencent/HunyuanVideo1-1.5) by Tencent.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLHunyuanVideo15
20+
21+
vae = AutoencoderKLHunyuanVideo15.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v", subfolder="vae", torch_dtype=torch.float32)
22+
23+
# make sure to enable tiling to avoid OOM
24+
vae.enable_tiling()
25+
```
26+
27+
## AutoencoderKLHunyuanVideo15
28+
29+
[[autodoc]] AutoencoderKLHunyuanVideo15
30+
- decode
31+
- encode
32+
- all
33+
34+
## DecoderOutput
35+
36+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# HunyuanVideo15Transformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data used in [HunyuanVideo1.5](https://github.com/Tencent/HunyuanVideo1-1.5).
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import HunyuanVideo15Transformer3DModel
20+
21+
transformer = HunyuanVideo15Transformer3DModel.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v" subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## HunyuanVideo15Transformer3DModel
25+
26+
[[autodoc]] HunyuanVideo15Transformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

docs/source/en/api/pipelines/bria_fibo.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,10 @@ With only 8 billion parameters, FIBO provides a new level of image quality, prom
2121
FIBO is trained exclusively on a structured prompt and will not work with freeform text prompts.
2222
you can use the [FIBO-VLM-prompt-to-JSON](https://huggingface.co/briaai/FIBO-VLM-prompt-to-JSON) model or the [FIBO-gemini-prompt-to-JSON](https://huggingface.co/briaai/FIBO-gemini-prompt-to-JSON) to convert your freeform text prompt to a structured JSON prompt.
2323

24-
its not recommended to use freeform text prompts directly with FIBO, as it will not produce the best results.
24+
> [!NOTE]
25+
> Avoid using freeform text prompts directly with FIBO because it does not produce the best results.
2526
26-
you can learn more about FIBO in [Bria Fibo Hugging Face page](https://huggingface.co/briaai/FIBO).
27+
Refer to the Bria Fibo Hugging Face [page](https://huggingface.co/briaai/FIBO) to learn more.
2728

2829

2930
## Usage
@@ -37,9 +38,8 @@ hf auth login
3738
```
3839

3940

40-
## BriaPipeline
41+
## BriaFiboPipeline
4142

42-
[[autodoc]] BriaPipeline
43+
[[autodoc]] BriaFiboPipeline
4344
- all
44-
- __call__
45-
45+
- __call__

docs/source/en/api/pipelines/flux2.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,12 @@ Original model checkpoints for Flux can be found [here](https://huggingface.co/b
2626
>
2727
> [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.
2828
29+
## Caption upsampling
30+
31+
Flux.2 can potentially generate better better outputs with better prompts. We can "upsample"
32+
an input prompt by setting the `caption_upsample_temperature` argument in the pipeline call arguments.
33+
The [official implementation](https://github.com/black-forest-labs/flux2/blob/5a5d316b1b42f6b59a8c9194b77c8256be848432/src/flux2/text_encoder.py#L140) recommends this value to be 0.15.
34+
2935
## Flux2Pipeline
3036

3137
[[autodoc]] Flux2Pipeline
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
16+
# HunyuanVideo-1.5
17+
18+
HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture with selective and sliding tile attention (SSTA), enhanced bilingual understanding through glyph-aware text encoding, progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these designs, we developed a unified framework capable of high-quality text-to-video and image-to-video generation across multiple durations and resolutions. Extensive experiments demonstrate that this compact and proficient model establishes a new state-of-the-art among open-source models.
19+
20+
You can find all the original HunyuanVideo checkpoints under the [Tencent](https://huggingface.co/tencent) organization.
21+
22+
> [!TIP]
23+
> Click on the HunyuanVideo models in the right sidebar for more examples of video generation tasks.
24+
>
25+
> The examples below use a checkpoint from [hunyuanvideo-community](https://huggingface.co/hunyuanvideo-community) because the weights are stored in a layout compatible with Diffusers.
26+
27+
The example below demonstrates how to generate a video optimized for memory or inference speed.
28+
29+
<hfoptions id="usage">
30+
<hfoption id="memory">
31+
32+
Refer to the [Reduce memory usage](../../optimization/memory) guide for more details about the various memory saving techniques.
33+
34+
35+
```py
36+
import torch
37+
from diffusers import AutoModel, HunyuanVideo15Pipeline
38+
from diffusers.utils import export_to_video
39+
40+
41+
pipeline = HunyuanVideo15Pipeline.from_pretrained(
42+
"HunyuanVideo-1.5-Diffusers-480p_t2v",
43+
torch_dtype=torch.bfloat16,
44+
)
45+
46+
# model-offloading and tiling
47+
pipeline.enable_model_cpu_offload()
48+
pipeline.vae.enable_tiling()
49+
50+
prompt = "A fluffy teddy bear sits on a bed of soft pillows surrounded by children's toys."
51+
video = pipeline(prompt=prompt, num_frames=61, num_inference_steps=30).frames[0]
52+
export_to_video(video, "output.mp4", fps=15)
53+
```
54+
55+
## Notes
56+
57+
- HunyuanVideo1.5 use attention masks with variable-length sequences. For best performance, we recommend using an attention backend that handles padding efficiently.
58+
59+
- **H100/H800:** `_flash_3_hub` or `_flash_varlen_3`
60+
- **A100/A800/RTX 4090:** `flash_hub` or `flash_varlen`
61+
- **Other GPUs:** `sage_hub`
62+
63+
Refer to the [Attention backends](../../optimization/attention_backends) guide for more details about using a different backend.
64+
65+
66+
```py
67+
pipe.transformer.set_attention_backend("flash_hub") # or your preferred backend
68+
```
69+
70+
- [`HunyuanVideo15Pipeline`] use guider and does not take `guidance_scale` parameter at runtime.
71+
72+
You can check the default guider configuration using `pipe.guider`:
73+
74+
```py
75+
>>> pipe.guider
76+
ClassifierFreeGuidance {
77+
"_class_name": "ClassifierFreeGuidance",
78+
"_diffusers_version": "0.36.0.dev0",
79+
"enabled": true,
80+
"guidance_rescale": 0.0,
81+
"guidance_scale": 6.0,
82+
"start": 0.0,
83+
"stop": 1.0,
84+
"use_original_formulation": false
85+
}
86+
87+
State:
88+
step: None
89+
num_inference_steps: None
90+
timestep: None
91+
count_prepared: 0
92+
enabled: True
93+
num_conditions: 2
94+
```
95+
96+
To update guider configuration, you can run `pipe.guider = pipe.guider.new(...)`
97+
98+
```py
99+
pipe.guider = pipe.guider.new(guidance_scale=5.0)
100+
```
101+
102+
Read more on Guider [here](../../modular_diffusers/guiders).
103+
104+
105+
106+
## HunyuanVideo15Pipeline
107+
108+
[[autodoc]] HunyuanVideo15Pipeline
109+
- all
110+
- __call__
111+
112+
## HunyuanVideo15ImageToVideoPipeline
113+
114+
[[autodoc]] HunyuanVideo15ImageToVideoPipeline
115+
- all
116+
- __call__
117+
118+
## HunyuanVideo15PipelineOutput
119+
120+
[[autodoc]] pipelines.hunyuan_video1_5.pipeline_output.HunyuanVideo15PipelineOutput

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
[tool.ruff]
22
line-length = 119
3+
extend-exclude = [
4+
"src/diffusers/pipelines/flux2/system_messages.py",
5+
]
36

47
[tool.ruff.lint]
58
# Never enforce `E501` (line length violations).

0 commit comments

Comments
 (0)