Pretrain Then Adapt
Pay the cost of learning general language once, then spread it across many tasks by training one base and adapting it cheaply for each.
Description
Build one general model first, then specialise it for each task. To build the base, train it to predict the next word over a large pile of plain, unlabeled text. To specialise it, train that base further on a smaller set of labeled task data (this second step is supervised fine-tuning). The point is that one base can feed many specialised versions, such as a classifier, an instruction follower, or a domain assistant, without paying the heavy base-training cost again. It splits the work into two parts. The expensive part teaches the model general knowledge. The cheap part shapes how the model behaves. Because the cheap part is cheap, you can run it fast and once per use case.
When to apply
Use this when you control the model weights and a generic instruction-tuned model is not good enough for your domain or task family. You also need either an existing base checkpoint or enough unlabeled domain text. Skip it when a hosted instruction-tuned API model already clears your quality bar, because then the adaptation cost is not worth it. Exceptions: regulated settings that force the weights to stay on your own servers, or research into how base models themselves behave.
What it involves
- Assemble and clean the pretraining corpus
- Pretrain the base with next-token prediction
- Branch to specialization tasks
- Validate each variant against a task-specific eval
- Re-use the base for the next task
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.