Skip to content

Conversation

@E-Anlia
Copy link

@E-Anlia E-Anlia commented Dec 4, 2025

This PR introduces a new text-to-image pipeline named NewbiePipeline, as well as a new
NextDiT-based transformer architecture,
NextDiT_3B_GQA_patch2_Adaln_Refiner_WHIT_CLIP, fully implemented following
Diffusers' pipeline and model design principles.

🚀 Main additions

• New pipeline
Adds NewbiePipeline under diffusers.pipelines.newbie/.
The pipeline follows the standard Diffusers structure (DiffusionPipeline subclass) and
supports loading via from_pretrained.

• New transformer architecture
Adds transformer_newbie.py, implementing:

  • NextDiT backbone with grouped-query attention (GQA)
  • Adaln-Refiner blocks
  • Patch-size 2 vision encoder
  • 36 transformer layers
  • 2304 hidden dims
  • WHIT CLIP–style text conditioning

The transformer inherits from ModelMixin, enabling standard save/load, weight
serialization and integration with Diffusers utilities.

• RMSNorm implementation
Adds RMSNorm to diffusers.models.components, using a PyTorch fallback and supporting
Apex fused RMSNorm if available.

• Scheduler compatibility
The pipeline is compatible with FlowMatchEulerDiscreteScheduler without requiring
additional custom scheduler code.

🧩 Motivation

This PR provides an implementation of a modern NextDiT-style text-to-image architecture
with high-resolution capability and strong conditioning support.
The goal is to enable researchers and users to load, run, and fine-tune this model
directly through Diffusers with minimal friction.

📁 Files added

src/diffusers/models/components.py
src/diffusers/models/transformers/transformer_newbie.py
src/diffusers/pipelines/newbie/pipeline_newbie.py
src/diffusers/pipelines/newbie/init.py

shell
Copy code

📁 Files modified

src/diffusers/init.py
src/diffusers/models/init.py
src/diffusers/models/transformers/init.py
src/diffusers/pipelines/init.py

yaml
Copy code

✔ Notes

  • No external dependencies required
  • Apex is optional; PyTorch RMSNorm is the default path
  • The pipeline has been tested locally with from_pretrained and produces expected outputs
  • Follows the established structure of Diffusers pipelines & transformer modules

Fixes # (no issue linked)


Before submitting

  • I have read the contributor guidelines
  • This PR introduces a new pipeline and model
  • All necessary registration points are updated
  • The implementation is consistent with existing Diffusers conventions

Who can review?

Tagging pipeline & transformer reviewers:
@asomoza @yiyixuxu @sayakpaul

@sayakpaul
Copy link
Member

Can you link the original codebase, paper, and some results of this model?

@E-Anlia
Copy link
Author

E-Anlia commented Dec 5, 2025

https://huggingface.co/NewBie-AI/NewBie-image-Exp0.1
https://github.com/[NewBieAI-Lab/NewBie-image-Exp0.1
NewBie_image_Exp0 1_Training
This model is based on improvements made to research on lumina.
Based on NextDiT
Example:
newbie_image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants