Skip to content

Commit 95c9b04

Browse files
authored
Merge branch 'main' into integrations/wan2.2-s2v
2 parents cfddf35 + a1f36ee commit 95c9b04

File tree

154 files changed

+7404
-3363
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

154 files changed

+7404
-3363
lines changed

docs/source/en/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,8 @@
375375
title: MochiTransformer3DModel
376376
- local: api/models/omnigen_transformer
377377
title: OmniGenTransformer2DModel
378+
- local: api/models/ovisimage_transformer2d
379+
title: OvisImageTransformer2DModel
378380
- local: api/models/pixart_transformer2d
379381
title: PixArtTransformer2DModel
380382
- local: api/models/prior_transformer
@@ -567,6 +569,8 @@
567569
title: MultiDiffusion
568570
- local: api/pipelines/omnigen
569571
title: OmniGen
572+
- local: api/pipelines/ovis_image
573+
title: Ovis-Image
570574
- local: api/pipelines/pag
571575
title: PAG
572576
- local: api/pipelines/paint_by_example
@@ -660,6 +664,8 @@
660664
title: HunyuanVideo1.5
661665
- local: api/pipelines/i2vgenxl
662666
title: I2VGen-XL
667+
- local: api/pipelines/kandinsky5_image
668+
title: Kandinsky 5.0 Image
663669
- local: api/pipelines/kandinsky5_video
664670
title: Kandinsky 5.0 Video
665671
- local: api/pipelines/latte

docs/source/en/api/loaders/lora.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
3131
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
3232
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
3333
- [`QwenImageLoraLoaderMixin`] provides similar functions for [Qwen Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwen).
34+
- [`ZImageLoraLoaderMixin`] provides similar functions for [Z-Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/zimage).
3435
- [`Flux2LoraLoaderMixin`] provides similar functions for [Flux2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux2).
3536
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
3637

@@ -112,6 +113,10 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
112113

113114
[[autodoc]] loaders.lora_pipeline.QwenImageLoraLoaderMixin
114115

116+
## ZImageLoraLoaderMixin
117+
118+
[[autodoc]] loaders.lora_pipeline.ZImageLoraLoaderMixin
119+
115120
## KandinskyLoraLoaderMixin
116121
[[autodoc]] loaders.lora_pipeline.KandinskyLoraLoaderMixin
117122

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# OvisImageTransformer2DModel
13+
14+
The model can be loaded with the following code snippet.
15+
16+
```python
17+
from diffusers import OvisImageTransformer2DModel
18+
19+
transformer = OvisImageTransformer2DModel.from_pretrained("AIDC-AI/Ovis-Image-7B", subfolder="transformer", torch_dtype=torch.bfloat16)
20+
```
21+
22+
## OvisImageTransformer2DModel
23+
24+
[[autodoc]] OvisImageTransformer2DModel

docs/source/en/api/pipelines/bria_fibo.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,10 @@ With only 8 billion parameters, FIBO provides a new level of image quality, prom
2121
FIBO is trained exclusively on a structured prompt and will not work with freeform text prompts.
2222
you can use the [FIBO-VLM-prompt-to-JSON](https://huggingface.co/briaai/FIBO-VLM-prompt-to-JSON) model or the [FIBO-gemini-prompt-to-JSON](https://huggingface.co/briaai/FIBO-gemini-prompt-to-JSON) to convert your freeform text prompt to a structured JSON prompt.
2323

24-
its not recommended to use freeform text prompts directly with FIBO, as it will not produce the best results.
24+
> [!NOTE]
25+
> Avoid using freeform text prompts directly with FIBO because it does not produce the best results.
2526
26-
you can learn more about FIBO in [Bria Fibo Hugging Face page](https://huggingface.co/briaai/FIBO).
27+
Refer to the Bria Fibo Hugging Face [page](https://huggingface.co/briaai/FIBO) to learn more.
2728

2829

2930
## Usage
@@ -37,9 +38,8 @@ hf auth login
3738
```
3839

3940

40-
## BriaPipeline
41+
## BriaFiboPipeline
4142

42-
[[autodoc]] BriaPipeline
43+
[[autodoc]] BriaFiboPipeline
4344
- all
44-
- __call__
45-
45+
- __call__

docs/source/en/api/pipelines/flux2.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,12 @@ Original model checkpoints for Flux can be found [here](https://huggingface.co/b
2626
>
2727
> [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.
2828
29+
## Caption upsampling
30+
31+
Flux.2 can potentially generate better better outputs with better prompts. We can "upsample"
32+
an input prompt by setting the `caption_upsample_temperature` argument in the pipeline call arguments.
33+
The [official implementation](https://github.com/black-forest-labs/flux2/blob/5a5d316b1b42f6b59a8c9194b77c8256be848432/src/flux2/text_encoder.py#L140) recommends this value to be 0.15.
34+
2935
## Flux2Pipeline
3036

3137
[[autodoc]] Flux2Pipeline

docs/source/en/api/pipelines/hunyuan_video15.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,8 @@ export_to_video(video, "output.mp4", fps=15)
5656

5757
- HunyuanVideo1.5 use attention masks with variable-length sequences. For best performance, we recommend using an attention backend that handles padding efficiently.
5858

59-
- **H100/H800:** `_flash_3_hub` or `_flash_varlen_3`
60-
- **A100/A800/RTX 4090:** `flash_hub` or `flash_varlen`
59+
- **H100/H800:** `_flash_3_hub` or `_flash_3_varlen_hub`
60+
- **A100/A800/RTX 4090:** `flash_hub` or `flash_varlen_hub`
6161
- **Other GPUs:** `sage_hub`
6262

6363
Refer to the [Attention backends](../../optimization/attention_backends) guide for more details about using a different backend.
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
<!--Copyright 2025 The HuggingFace Team and Kandinsky Lab Team. All rights reserved.
2+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
3+
the License. You may obtain a copy of the License at
4+
http://www.apache.org/licenses/LICENSE-2.0
5+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
6+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
7+
specific language governing permissions and limitations under the License.
8+
-->
9+
10+
# Kandinsky 5.0 Image
11+
12+
[Kandinsky 5.0](https://arxiv.org/abs/2511.14993) is a family of diffusion models for Video & Image generation.
13+
14+
Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters)
15+
16+
The model introduces several key innovations:
17+
- **Latent diffusion pipeline** with **Flow Matching** for improved training stability
18+
- **Diffusion Transformer (DiT)** as the main generative backbone with cross-attention to text embeddings
19+
- Dual text encoding using **Qwen2.5-VL** and **CLIP** for comprehensive text understanding
20+
- **Flux VAE** for efficient image encoding and decoding
21+
22+
The original codebase can be found at [kandinskylab/Kandinsky-5](https://github.com/kandinskylab/Kandinsky-5).
23+
24+
25+
## Available Models
26+
27+
Kandinsky 5.0 Image Lite:
28+
| model_id | Description | Use Cases |
29+
|------------|-------------|-----------|
30+
| [**kandinskylab/Kandinsky-5.0-T2I-Lite-sft-Diffusers**](https://huggingface.co/kandinskylab/Kandinsky-5.0-T2I-Lite-sft-Diffusers) | 6B image Supervised Fine-Tuned model | Highest generation quality |
31+
| [**kandinskylab/Kandinsky-5.0-I2I-Lite-sft-Diffusers**](https://huggingface.co/kandinskylab/Kandinsky-5.0-I2I-Lite-sft-Diffusers) | 6B image editing Supervised Fine-Tuned model | Highest generation quality |
32+
| [**kandinskylab/Kandinsky-5.0-T2I-Lite-pretrain-Diffusers**](https://huggingface.co/kandinskylab/Kandinsky-5.0-T2I-Lite-pretrain-Diffusers) | 6B image Base pretrained model | Research and fine-tuning |
33+
| [**kandinskylab/Kandinsky-5.0-I2I-Lite-pretrain-Diffusers**](https://huggingface.co/kandinskylab/Kandinsky-5.0-I2I-Lite-pretrain-Diffusers) | 6B image editing Base pretrained model | Research and fine-tuning |
34+
35+
## Usage Examples
36+
37+
### Basic Text-to-Image Generation
38+
39+
```python
40+
import torch
41+
from diffusers import Kandinsky5T2IPipeline
42+
43+
# Load the pipeline
44+
model_id = "kandinskylab/Kandinsky-5.0-T2I-Lite-sft-Diffusers"
45+
pipe = Kandinsky5T2IPipeline.from_pretrained(model_id)
46+
_ = pipe.to(device='cuda',dtype=torch.bfloat16)
47+
48+
# Generate image
49+
prompt = "A fluffy, expressive cat wearing a bright red hat with a soft, slightly textured fabric. The hat should look cozy and well-fitted on the cat’s head. On the front of the hat, add clean, bold white text that reads “SWEET”, clearly visible and neatly centered. Ensure the overall lighting highlights the hat’s color and the cat’s fur details."
50+
51+
output = pipe(
52+
prompt=prompt,
53+
negative_prompt="",
54+
height=1024,
55+
width=1024,
56+
num_inference_steps=50,
57+
guidance_scale=3.5,
58+
).image[0]
59+
```
60+
61+
### Basic Image-to-Image Generation
62+
63+
```python
64+
import torch
65+
from diffusers import Kandinsky5I2IPipeline
66+
from diffusers.utils import load_image
67+
# Load the pipeline
68+
model_id = "kandinskylab/Kandinsky-5.0-I2I-Lite-sft-Diffusers"
69+
pipe = Kandinsky5I2IPipeline.from_pretrained(model_id)
70+
71+
_ = pipe.to(device='cuda',dtype=torch.bfloat16)
72+
pipe.enable_model_cpu_offload() # <--- Enable CPU offloading for single GPU inference
73+
74+
# Edit the input image
75+
image = load_image(
76+
"https://huggingface.co/kandinsky-community/kandinsky-3/resolve/main/assets/title.jpg?download=true"
77+
)
78+
79+
prompt = "Change the background from a winter night scene to a bright summer day. Place the character on a sandy beach with clear blue sky, soft sunlight, and gentle waves in the distance. Replace the winter clothing with a light short-sleeved T-shirt (in soft pastel colors) and casual shorts. Ensure the character’s fur reflects warm daylight instead of cold winter tones. Add small beach details such as seashells, footprints in the sand, and a few scattered beach toys nearby. Keep the oranges in the scene, but place them naturally on the sand."
80+
negative_prompt = ""
81+
82+
output = pipe(
83+
image=image,
84+
prompt=prompt,
85+
negative_prompt=negative_prompt,
86+
guidance_scale=3.5,
87+
).image[0]
88+
```
89+
90+
91+
## Kandinsky5T2IPipeline
92+
93+
[[autodoc]] Kandinsky5T2IPipeline
94+
- all
95+
- __call__
96+
97+
## Kandinsky5I2IPipeline
98+
99+
[[autodoc]] Kandinsky5I2IPipeline
100+
- all
101+
- __call__
102+
103+
104+
## Citation
105+
```bibtex
106+
@misc{kandinsky2025,
107+
author = {Alexander Belykh and Alexander Varlamov and Alexey Letunovskiy and Anastasia Aliaskina and Anastasia Maltseva and Anastasiia Kargapoltseva and Andrey Shutkin and Anna Averchenkova and Anna Dmitrienko and Bulat Akhmatov and Denis Dimitrov and Denis Koposov and Denis Parkhomenko and Dmitrii and Ilya Vasiliev and Ivan Kirillov and Julia Agafonova and Kirill Chernyshev and Kormilitsyn Semen and Lev Novitskiy and Maria Kovaleva and Mikhail Mamaev and Mikhailov and Nikita Kiselev and Nikita Osterov and Nikolai Gerasimenko and Nikolai Vaulin and Olga Kim and Olga Vdovchenko and Polina Gavrilova and Polina Mikhailova and Tatiana Nikulina and Viacheslav Vasilev and Vladimir Arkhipkin and Vladimir Korviakov and Vladimir Polovnikov and Yury Kolabushin},
108+
title = {Kandinsky 5.0: A family of diffusion models for Video & Image generation},
109+
howpublished = {\url{https://github.com/kandinskylab/Kandinsky-5}},
110+
year = 2025
111+
}
112+
```

0 commit comments

Comments
 (0)