Methodology · Fine-Tuning

Instruction Fine-tune Then Judge Cycle

Iterate on instruction fine-tunes using one signal, a model-graded score on the test set, while keeping training fit and answer quality as separate readings.

Description

A closed loop for teaching a model to follow instructions, then checking the result. First, split your instruction data into three parts: training, validation, and test. Then train on the training data (this step is supervised fine-tuning, or SFT), logging the loss after each epoch on both the training and validation sets. Next, run the model on the test set to generate answers. Finally, have a local model score those answers against the reference answers, acting as a judge. The loop catches two problems on every pass. If the validation loss pulls away from the training loss, the model is overfitting. If the judge score drops, quality is slipping. So you keep the gains and roll back the slips. This is the smallest end-to-end fine-tuning workflow that checks its own quality without a human grading every step.

When to apply

Use this when you are teaching a small or mid-size open-weight model to follow instructions and you need to iterate fast without paying a human to grade every cycle. The dataset is small, in the hundreds to low thousands of examples, so overfitting is a real risk and a held-out test set is a must. Don't apply it when your judge model is weaker than the model it is grading, because then the scores collapse toward false positives. Also skip it as the only gate when the task is safety-critical, and keep humans in the loop. One rule holds either way: a production launch still needs a human-graded acceptance set before shipping.

What it involves

  • Curate and split the instruction dataset
  • Run SFT with per-epoch loss logging
  • Generate responses on the test set
  • Score with a local LLM-as-judge
  • Diagnose and iterate

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related