Skip to content

raeesiarya/ForgeLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Forge-LM: A Transformer Language Model from Bytes to QA

Forge-LM is a decoder-only transformer language model built entirely from first principles. Starting from raw bytes, it implements a byte-level BPE tokenizer, a modern transformer stack (RMSNorm, RoPE, SwiGLU), the training utilities needed to optimize it, and an end-to-end pipeline that pretrains the model and adapts it to multiple-choice question answering.

Nothing here is imported from a high-level modeling library. Every core component is written by hand and validated against reference snapshots.

Highlights

  • Byte-level BPE tokenizer trained directly from a text corpus
  • Modern transformer architecture: RMSNorm, Rotary Position Embeddings (RoPE), and a SwiGLU feed-forward network
  • Hand-written training primitives: numerically stable softmax, cross-entropy, gradient clipping, token accuracy, and perplexity
  • Analytical FLOPs and memory estimators
  • A full pretrain plus fine-tune pipeline that adapts the model to QA, with a zero-shot prompting baseline for comparison

Architecture at a Glance

Stage Module What it does
Part 1 part1/ Byte-level BPE tokenizer (train, encode, decode)
Part 2 part2/model.py Full transformer LM with RoPE and SwiGLU
Part 3 part3/nn_utils.py Training and evaluation utilities
Part 4 part4/ Pretraining, QA fine-tuning, and prompting

Installation

Prerequisites

  • Python 3.10+
  • A CUDA-capable GPU (recommended for Part 4, not required for Parts 1 to 3)

Setup

conda create -n cs288a2 python=3.10 -y
conda activate cs288a2
pip install -r requirements.txt

Running Tests

Run tests from within each part's directory:

# Part 1: Tokenization
cd part1
python -m pytest tests/ -v

# Part 2: Transformer Model
cd part2
python -m pytest tests/ -v

# Part 3: NN Utilities
cd part3
python -m pytest tests/ -v

Or run them all from the source directory:

cd source
python -m pytest part1/tests/ part2/tests/ part3/tests/ -v

Training and Evaluation

After the core components are in place, you can train and evaluate the full model.

Run the Training Pipeline

cd part4
python train_baseline.py

This will:

  1. Train a BPE tokenizer on TinyStories
  2. Pretrain a transformer language model
  3. Fine-tune on multiple-choice QA
  4. Evaluate using zero-shot prompting
  5. Save predictions to part4/outputs/

Configuration Options

# Quick test run (smaller model, fewer steps)
python train_baseline.py --quick

# Medium configuration
python train_baseline.py --medium

# Full training (default)
python train_baseline.py

Output Files

After training, prediction files are saved to part4/outputs/:

  • finetuned_predictions.json: fine-tuned model predictions
  • prompting_predictions.json: zero-shot prompting predictions

Component Reference

Part 1: Tokenization

  • train_bpe(): train a BPE vocabulary from a text corpus
  • Tokenizer._bpe(): apply BPE merges to a token
  • Tokenizer._encode_chunk(): encode text to token IDs
  • Tokenizer.decode(): decode token IDs back to text

Part 2: Model Components

  • Linear: linear transformation layer
  • Embedding: token embedding layer
  • RMSNorm: root mean square layer normalization
  • softmax(): numerically stable softmax
  • silu(): SiLU activation function
  • SwiGLU: gated feed-forward network
  • RotaryPositionEmbedding: RoPE positional encoding
  • scaled_dot_product_attention(): attention mechanism
  • MultiHeadSelfAttention: multi-head attention
  • MultiHeadSelfAttentionWithRoPE: attention with RoPE
  • TransformerBlock: a single transformer layer
  • TransformerLM: the complete language model
  • count_flops_per_token(): FLOPs estimation
  • estimate_memory_bytes(): memory estimation

Part 3: Training Utilities

  • softmax(): numerically stable softmax for training
  • cross_entropy(): cross-entropy loss
  • gradient_clipping(): gradient norm clipping
  • token_accuracy(): token-level accuracy
  • perplexity(): language model perplexity

Submission

Create the submission archive:

bash create_submission.sh

Notes

  • Do not modify function signatures or class interfaces
  • Do not add dependencies beyond requirements.txt
  • Ensure the code passes local tests before submitting
  • The autograder runs additional hidden tests
  • Use the provided fixtures for testing

About

A transformer language model built from scratch, from byte-level BPE tokenization through pretraining and QA fine-tuning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors