Skip to content

Conversation

@apolinario
Copy link
Contributor

@apolinario apolinario commented Dec 3, 2025

The goal of this new tasks is to support models that take in both image and text input and output either image or video.

The goal of this PR is making the tasks as analogous to image-to-image and image-to-video as possible, with the only difference that the image input should now be optional, as an empty image and a valid prompt should still work for a model like FLUX.2 (supports both text-to-image and image-to-image tasks) or LTX Video (both text-to-video and image-to-video)

Once this is in, I'll also have a widget PR in Moon to support this task in the model cards / widgets etc. and a follow up PR adding this to the inference providers, so that we can then PR repos to change the task for compatible models

@apolinario
Copy link
Contributor Author

AI agent disclosure: I added the about.md files with Claude and haven't reviewed its slop yet. Will do it but should not be a blocker for the more structural stuff

@pcuenca
Copy link
Member

pcuenca commented Dec 3, 2025

cc @merveenoyan

Copy link
Collaborator

@gary149 gary149 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good - maybe @merveenoyan you'll want to do some edits to md files

Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for seeing this gap and working on it!

],
models: [
{
description: "A powerful model for image-text-to-video generation.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to add more here and the Space too!

@Wauplin
Copy link
Contributor

Wauplin commented Dec 3, 2025

cc @julien-c as well on the (orthogonal) topic of generic any-to-any task + modality selection

@apolinario
Copy link
Contributor Author

apolinario commented Dec 4, 2025

Thanks @merveenoyan , modified the examples and opened this PR here https://huggingface.co/datasets/huggingfacejs/tasks/discussions/12

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the inference part, all good! thanks

Copy link
Contributor

@merveenoyan merveenoyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@merveenoyan merveenoyan merged commit 2c2de89 into main Dec 5, 2025
5 checks passed
@merveenoyan merveenoyan deleted the new-image-text-tasks branch December 5, 2025 15:41
@kefranabg
Copy link
Contributor

Can someone release a new version of @hf/inference so I can reflect the updates on the hub? @Wauplin maybe?

@Wauplin
Copy link
Contributor

Wauplin commented Dec 8, 2025

Can someone release a new version of @hf/inference so I can reflect the updates on the hub? @Wauplin maybe?

Done: https://github.com/huggingface/huggingface.js/actions/runs/20035108258
You should also be able to do it in the actions tab' :) Update is then automated every day in moon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants