You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Wan-S2V: Audio-Driven Cinematic Video Generation
243
-
244
-
[Wan-S2V](https://huggingface.co/papers/2508.18621) by the Wan Team.
245
-
246
-
*Current state-of-the-art (SOTA) methods for audio-driven character animation demonstrate promising performance for scenarios primarily involving speech and singing. However, they often fall short in more complex film and television productions, which demand sophisticated elements such as nuanced character interactions, realistic body movements, and dynamic camera work. To address this long-standing challenge of achieving film-level character animation, we propose an audio-driven model, which we refere to as Wan-S2V, built upon Wan. Our model achieves significantly enhanced expressiveness and fidelity in cinematic contexts compared to existing approaches. We conducted extensive experiments, benchmarking our method against cutting-edge models such as Hunyuan-Avatar and Omnihuman. The experimental results consistently demonstrate that our approach significantly outperforms these existing solutions. Additionally, we explore the versatility of our method through its applications in long-form video generation and precise video lip-sync editing.*
247
-
248
-
The project page: https://humanaigc.github.io/wan-s2v-webpage/
249
-
250
-
This model was contributed by [M. Tolga Cangöz](https://github.com/tolgacangoz).
251
-
252
-
The example below demonstrates how to use the speech-to-video pipeline to generate a video using a text description, a starting frame, an audio, and a pose video.
253
-
254
-
<hfoptionsid="S2V usage">
255
-
<hfoptionid="usage">
256
-
257
-
```python
258
-
import numpy as np, math
259
-
import torch
260
-
from diffusers import AutoencoderKLWan, WanSpeechToVideoPipeline
261
-
from diffusers.utils import export_to_merged_video_audio, load_image, load_audio, load_video, export_to_video
-**num_frames**: Total number of frames to generate. Should be divisible by `vae_scale_factor_temporal` (default: 4)
589
467
590
468
469
+
### Wan-S2V: Audio-Driven Cinematic Video Generation
470
+
471
+
[Wan-S2V](https://huggingface.co/papers/2508.18621) by the Wan Team.
472
+
473
+
*Current state-of-the-art (SOTA) methods for audio-driven character animation demonstrate promising performance for scenarios primarily involving speech and singing. However, they often fall short in more complex film and television productions, which demand sophisticated elements such as nuanced character interactions, realistic body movements, and dynamic camera work. To address this long-standing challenge of achieving film-level character animation, we propose an audio-driven model, which we refere to as Wan-S2V, built upon Wan. Our model achieves significantly enhanced expressiveness and fidelity in cinematic contexts compared to existing approaches. We conducted extensive experiments, benchmarking our method against cutting-edge models such as Hunyuan-Avatar and Omnihuman. The experimental results consistently demonstrate that our approach significantly outperforms these existing solutions. Additionally, we explore the versatility of our method through its applications in long-form video generation and precise video lip-sync editing.*
474
+
475
+
The project page: https://humanaigc.github.io/wan-s2v-webpage/
476
+
477
+
This model was contributed by [M. Tolga Cangöz](https://github.com/tolgacangoz).
478
+
479
+
The example below demonstrates how to use the speech-to-video pipeline to generate a video using a text description, a starting frame, an audio, and a pose video.
480
+
481
+
<hfoptionsid="S2V usage">
482
+
<hfoptionid="usage">
483
+
484
+
```python
485
+
import numpy as np, math
486
+
import torch
487
+
from diffusers import AutoencoderKLWan, WanSpeechToVideoPipeline
488
+
from diffusers.utils import export_to_merged_video_audio, load_image, load_audio, load_video, export_to_video
0 commit comments