Azure AI Foundry RAFT fine-tuning recipe

Type: full-code · Vendor: Microsoft Azure · Language: Python · License: MIT · Status: active · Status in practice: emerging · First released: 2024-05-01

Links: homepage docs repo

This Azure recipe generates a synthetic RAFT training set with a teacher model and fine-tunes a student model on Azure AI Foundry to improve domain-specific RAG.

Description. This is an official Azure-Samples recipe that applies UC Berkeley's RAFT technique on Azure AI Foundry. A teacher model such as GPT-4o or Llama 3.1 405B generates a synthetic dataset using the Gorilla project's RAFT method, with question-document-answer triplets that include oracle and distractor documents. The dataset fine-tunes a smaller student model (such as GPT-4o-mini or Llama 3.1 8B) so it learns to ignore distractor documents in domain-specific RAG. The recipe then deploys the fine-tuned model and evaluates it against a baseline.

Agent loop shape. A teacher model deployed on Azure AI generates a synthetic RAFT dataset of question-document-answer triplets, each combining an oracle document with distractor documents. That dataset fine-tunes a student model on Azure AI Foundry so it learns to attend to the oracle and ignore the distractors. The fine-tuned model is then deployed and its performance evaluated against a baseline model.

Primary use cases

domain-specific RAG fine-tuning on Azure AI Foundry
synthetic RAFT training-data generation
training models to ignore distractor documents
evaluating a fine-tuned model against a baseline

flowchart TD fw["Azure AI Foundry RAFT fine-tuning recipe"] fw --> p1["RAFT<br/>(supported)"] fw --> p2["Eval Harness<br/>(supported)"]

Key concepts

Teacher-student distillation (docs) — A large teacher model (GPT-4o or Llama 3.1 405B) deployed on Azure AI generates the synthetic training set, which is then used to fine-tune a much smaller student model (GPT-4o-mini or Llama 3.1 8B) so the cheaper student inherits the domain RAG behaviour.
Oracle and distractor documents → raft — Each generated training triplet pairs an oracle document that actually answers the question with one or more distractor documents that do not, teaching the student to attend to the oracle and ignore the distractors at inference time.
Gorilla RAFT method (docs) — The synthetic-data generation follows UC Berkeley's Gorilla project RAFT (Retrieval Augmented Fine-Tuning) recipe, a published method for fine-tuning a model to better use retrieved context.