IIT Delhi - ELL881 - Advanced LLM - Decorder Only Transformer Model From Scratch

Implementing a Decoder-Only Transformer: The goal of this assignment is to develop a decoder-only transformer language model from scratch.
Training and Inference Enhancements: Beam Search Decoding, KV Caching, Gradient Accumulation, and Gradient Checkpointing.

Hyperparameters:

vocab_size = 10000,
d_model = 300,
num_layers = 3,
num_heads = 8,
d_ff = 1024,
max_seq_length = 64,
batch_size = 32,
learning_rate = 3e-4,
num_epochs = 3

Model Parameters Breakdown:

1. Input Embedding:

Embedding matrix: 10000 × 300 = 3,000,000
Projection layer (300→296): 300 × 296 + 296 = 89,096
Total: 3,089,096

2. Transformer Blocks (3 layers):

Per block:

Multi-head Attention:
- Q, K, V projections: 3 × (296 × 296 + 296) = 263,736
- Output projection: 296 × 296 + 296 = 87,912
- Attention total: 351,648
Feed Forward:
- First linear: 296 × 1024 + 1024 = 304,128
- Second linear: 1024 × 296 + 296 = 303,400
- FF total: 607,528
Layer Norms (2 per block): 2 × (296 + 296) = 1,184
Per block total: 351,648 + 607,528 + 1,184 = 960,360
3 blocks total: 3 × 960,360 = 2,881,080

3. Output Layers:

Final LayerNorm: 296 + 296 = 592
Output linear: 296 × 10000 + 10000 = 2,970,000
Total: 2,970,592

Total:

  Input Embedding:    3,089,096
  Transformer Blocks: 2,881,080
  Output Layers:      2,970,592
  ────────────────────────────────
  TOTAL:              8,940,768 parameters

Run These Commands:

  $ git clone https://github.com/lohar-animesh-27112001/ELL881-advance_LLM-assignment.git

  $ cd ELL881-advance_LLM-assignment

  $ pip install -r requirements.txt

  $ cd part-i

  $ cd layers

  $ python fasttext_model.py

  $ cd ..

  $ cp layers/cc.en.300.bin .

  $ python transformer_model.py

To run this Python file, you need 32GB of RAM. You can run it on Google Colab.

  $ cd ..

  $ cp part-i/cc.en.300.bin part-ii/transformer_model-with_fasttext_embeddings/

  $ cd part-ii/transformer_model-with_fasttext_embeddings/

  $ python transformer_model.py

To run this Python file, you need 40GB of RAM. You can run it on Google Colab.

decoder-only model architecture:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
part_i		part_i
part_ii		part_ii
LICENSE		LICENSE
LLM2025_Assignment.pdf		LLM2025_Assignment.pdf
README.md		README.md
Report-Animesh_Lohar-2024EET2368-ELL881_Assignment.pdf		Report-Animesh_Lohar-2024EET2368-ELL881_Assignment.pdf
architecture_diagram.png		architecture_diagram.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IIT Delhi - ELL881 - Advanced LLM - Decorder Only Transformer Model From Scratch

Hyperparameters:

Model Parameters Breakdown:

1. Input Embedding:

2. Transformer Blocks (3 layers):

3. Output Layers:

Total:

Run These Commands:

To run this Python file, you need 32GB of RAM. You can run it on Google Colab.

To run this Python file, you need 40GB of RAM. You can run it on Google Colab.

decoder-only model architecture:

output_i_training_curves.png

output_ii_attention_visualization.png :

output_iii

output_iv

About

Uh oh!

Releases

Packages

Languages

License

lohar-animesh-27112001/Decorder-Only-Transformer-Model

Folders and files

Latest commit

History

Repository files navigation

IIT Delhi - ELL881 - Advanced LLM - Decorder Only Transformer Model From Scratch

Hyperparameters:

Model Parameters Breakdown:

1. Input Embedding:

2. Transformer Blocks (3 layers):

3. Output Layers:

Total:

Run These Commands:

To run this Python file, you need 32GB of RAM. You can run it on Google Colab.

To run this Python file, you need 40GB of RAM. You can run it on Google Colab.

decoder-only model architecture:

output_i_training_curves.png

output_ii_attention_visualization.png :

output_iii

output_iv

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages