Skip to content

lohar-animesh-27112001/Decorder-Only-Transformer-Model

Repository files navigation

IIT Delhi - ELL881 - Advanced LLM - Decorder Only Transformer Model From Scratch

  1. Implementing a Decoder-Only Transformer: The goal of this assignment is to develop a decoder-only transformer language model from scratch.

  2. Training and Inference Enhancements: Beam Search Decoding, KV Caching, Gradient Accumulation, and Gradient Checkpointing.

Hyperparameters:

  • vocab_size = 10000,
  • d_model = 300,
  • num_layers = 3,
  • num_heads = 8,
  • d_ff = 1024,
  • max_seq_length = 64,
  • batch_size = 32,
  • learning_rate = 3e-4,
  • num_epochs = 3

Model Parameters Breakdown:

1. Input Embedding:

  • Embedding matrix: 10000 × 300 = 3,000,000
  • Projection layer (300→296): 300 × 296 + 296 = 89,096
  • Total: 3,089,096

2. Transformer Blocks (3 layers):

Per block:

  • Multi-head Attention:
    • Q, K, V projections: 3 × (296 × 296 + 296) = 263,736
    • Output projection: 296 × 296 + 296 = 87,912
    • Attention total: 351,648
  • Feed Forward:
    • First linear: 296 × 1024 + 1024 = 304,128
    • Second linear: 1024 × 296 + 296 = 303,400
    • FF total: 607,528
  • Layer Norms (2 per block): 2 × (296 + 296) = 1,184
  • Per block total: 351,648 + 607,528 + 1,184 = 960,360
  • 3 blocks total: 3 × 960,360 = 2,881,080

3. Output Layers:

  • Final LayerNorm: 296 + 296 = 592
  • Output linear: 296 × 10000 + 10000 = 2,970,000
  • Total: 2,970,592

Total:

  Input Embedding:    3,089,096
  Transformer Blocks: 2,881,080
  Output Layers:      2,970,592
  ────────────────────────────────
  TOTAL:              8,940,768 parameters

Run These Commands:

  $ git clone https://github.com/lohar-animesh-27112001/ELL881-advance_LLM-assignment.git

  $ cd ELL881-advance_LLM-assignment

  $ pip install -r requirements.txt

  $ cd part-i

  $ cd layers

  $ python fasttext_model.py

  $ cd ..

  $ cp layers/cc.en.300.bin .

  $ python transformer_model.py

To run this Python file, you need 32GB of RAM. You can run it on Google Colab.

  $ cd ..

  $ cp part-i/cc.en.300.bin part-ii/transformer_model-with_fasttext_embeddings/

  $ cd part-ii/transformer_model-with_fasttext_embeddings/

  $ python transformer_model.py

To run this Python file, you need 40GB of RAM. You can run it on Google Colab.

decoder-only model architecture:

architecture_diagram

output_i_training_curves.png

output_i_training_curves

output_ii_attention_visualization.png :

output_ii_attention_visualization

output_iii

output_iii

output_iv

output_iv

About

TinyGPT - 9M Parameters

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages