-
Notifications
You must be signed in to change notification settings - Fork 6.6k
[Z-Image] various small changes, Z-Image transformer tests, etc. #12741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| return x, {} | ||
| if not return_dict: | ||
| return (x,) | ||
|
|
||
| return Transformer2DModelOutput(sample=x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a very safe change?
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Failures in "Fast tests for PRs / Fast PyTorch Models & Schedulers CPU tests (pull_request)" pass even when run with Edit: it likely fails when |
What does this PR do?
is_flakydecorator totest_inference()in the Z-Image pipeline test suite.return_dictargument to theforward()of Z-Image DiT, following other models in the library.returnpattern, i.e., return aTransformer2DModelOutputtype output or something likereturn (out,).Notes
list[torch.Tensor]which differs from other models. Output also follows the same type. This is why I had to modify a couple of tests (where it was reasonably easy) to allow this. Tests, where it was not relatively easy, were skipped (such astest_training, `test_ema_training, etc.).ZImageTransformerBlock, which is used fornoise_refiner,context_refiner, andlayers. As a consequence of this, the inputs recorded for the block would vary during compilation and full compilation withfullgraph=Truewould trigger recompilation at least thrice.x_pad_tokenandcap_pad_tokenparams within the DiT are initialized withtorch.empty(), possibly for memory efficiency, but they interfere during test in very weird ways. This is becausetorch.empty()can render NaNs. To prevent this from creeping into the tests, I tried addingis_flaky()to some of the tests that got affected by this, but that didn't help (see this). @JerryWu-code, would it be safe to getx_pad_tokenandcap_pad_tokeninitialized deterministically, maybe with something liketorch.ones()? Or do you think it would have memory implications?Minor nits
assertstatements inside the model implementations in favor of properly raising errors. Should we follow something similar here, too?self.scheduler.sigma_min = 0.0inside the Z-Image pipeline:diffusers/src/diffusers/pipelines/z_image/pipeline_z_image.py
Line 477 in 1b91856
forward()of the DiT has shorthand variable names:x,t,cap_feats, unlikehidden_states,timestep, andencoder_hidden_states._cfg_normalizationand_cfg_truncationinside the pipeline be turned into properties likeguidance_scale?Maybe we could consider revisiting them (but not a priority perhaps).
Cc: @JerryWu-code