Methodology · LLM-App Engineeringemergingverified

LLM-From-Scratch Build Progression

also known as build-an-LLM seven stages, Raschka build progression

Applies to: llm-app

Tags: from-scratchpedagogytransformersft

A seven-stage learning path for building a working LLM from scratch on a laptop. Each stage builds on the last and produces something you can run before the next stage starts. Stage 1 builds the text-data pipeline. Stage 2 builds the attention mechanism. Stage 3 builds the full architecture. Stage 4 is pre-training. Stage 5 is supervised fine-tuning for classification. Stage 6 is supervised fine-tuning for instructions. Stage 7 is evaluation with an LLM judge. The goal is not to beat frontier models. It is to remove the black box.

Methodology process overview

Intent. Walk a practitioner through building a working LLM on a laptop in seven stages. Each stage produces something runnable, so the internals stop being a black box.

When to apply. Use this to onboard ML engineers or applied scientists onto LLM projects, to deepen the instincts of people who have only ever called APIs, or for any team that needs a real grasp of attention, tokenisation, pre-training versus fine-tuning, and instruction tuning. Do not apply it when the goal is to ship product on a deadline. This is a learning path, not a delivery one. Skip it when the team already has deep in-house expertise and the time is better spent on the application itself.

Inputs

  • Laptop or modest GPU boxHardware that can run small-scale training, such as a modern laptop with 16GB or more of RAM, and optionally a single consumer GPU.
  • Public text corpusA small public dataset, such as the works of Shakespeare, an OpenWebText snippet, or public-domain books. It only needs to be enough for teaching-scale pre-training.
  • Existing pretrained weights for late stagesGPT-2 small or similar open weights to kick-start the fine-tuning stages without weeks of pre-training.

Outputs

  • Working small LLMA from-scratch GPT-style model that generates text. It has been fine-tuned for both classification and instruction-following.
  • Stage-by-stage runnable codeSeven runnable artefacts. Each one shows a single capability on its own.
  • Internalised mental modelA practitioner-level grasp of tokenisation, attention, pre-training loss, fine-tuning data, and LLM-judge evaluation.

Steps (7)

  1. Stage 1: text data and tokenisation

    Build the data pipeline. Write byte-pair or word-level tokenisation. Produce input-target pairs. Check it by turning tokens back into text. Skip this stage and every later debugging session turns into data archaeology.

  2. Stage 2: attention

    Write scaled dot-product attention from scratch, then multi-head attention. Check it on tiny matrices. Visualise the attention weights so the mechanism stops feeling abstract.

  3. Stage 3: full transformer architecture

    Put together the embeddings, attention blocks, feed-forward layers, residual connections, and layer normalisation. Run a forward pass on a fixed input and confirm the shapes match what you expect.

  4. Stage 4: pre-training

    Train next-token prediction on a small corpus. Watch the loss curve. Generate sample text along the way. The output is bad on purpose. This is the moment to feel what scale actually buys you.

  5. Stage 5: SFT for classification

    Fine-tune the pre-trained model for a classification task such as sentiment or topic. Confirm the model adapts. This is the smallest fine-tuning loop there is and the gentlest way into the supervised fine-tuning machinery.

  6. Stage 6: SFT for instructions

    Fine-tune on instruction-and-response pairs. The model goes from completing text to following instructions. Compare it side by side with the pre-trained model to feel the shift.

  7. Stage 7: LLM-judge evaluation

    Score the instruction-tuned model with an LLM judge against a rubric. This closes the loop on what 'better' means when there is no crisp reference answer.

    usesLLM-as-JudgeFrozen Rubric Reflection

Framework-specific instructions

Pick a framework and generate a framework-targeted rewrite of this methodology's steps.

Choose framework

AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.

Principles

  • Each stage produces something runnable before the next stage starts. The learning follows the build, not the textbook.
  • Small scale, full coverage. Every idea used in frontier models gets exercised at laptop scale.
  • Compare neighbouring stages. You feel the gap between pre-trained and instruction-tuned only by running both.
  • Evaluation is the final stage, not an afterthought. The learning mirrors production.

Known failure modes (2)

Related patterns (3)

Related methodologies (2)

Sources (2)

Provenance

  • Added to catalog:
  • Last updated:
  • Verification status: verified