Methodology · Fine-Tuning

Pretrain Then Adapt

Pay the cost of learning general language once, then spread it across many tasks by training one base and adapting it cheaply for each.

Description

Build one general model first, then specialise it for each task. To build the base, train it to predict the next word over a large pile of plain, unlabeled text. To specialise it, train that base further on a smaller set of labeled task data (this second step is supervised fine-tuning). The point is that one base can feed many specialised versions, such as a classifier, an instruction follower, or a domain assistant, without paying the heavy base-training cost again. It splits the work into two parts. The expensive part teaches the model general knowledge. The cheap part shapes how the model behaves. Because the cheap part is cheap, you can run it fast and once per use case.

When to apply

Use this when you control the model weights and a generic instruction-tuned model is not good enough for your domain or task family. You also need either an existing base checkpoint or enough unlabeled domain text. Skip it when a hosted instruction-tuned API model already clears your quality bar, because then the adaptation cost is not worth it. Exceptions: regulated settings that force the weights to stay on your own servers, or research into how base models themselves behave.

What it involves

Assemble and clean the pretraining corpus
Pretrain the base with next-token prediction
Branch to specialization tasks
Validate each variant against a task-specific eval
Re-use the base for the next task

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Augmented LLM

Description

When to apply

What it involves

Related