Instructor
Type: full-code · Vendor: Jason Liu / community · Language: Python, TypeScript, Go, Ruby, Elixir, Rust · License: MIT · Status: active · Status in practice: mature · First released: 2023-07-28
Get reliable, type-safe structured data from any LLM by patching the provider client to accept a Pydantic response_model, validate the response, and retry with validation feedback when the model violates the schema.
Description. Instructor is the MIT-licensed library that turns Pydantic models into the contract between application code and an LLM. It patches each supported provider's chat-completion client to add response_model, max_retries, and context parameters, then wraps the call: when validation fails, the library reasks the model with the validation error attached so the next attempt converges on a valid object. Modes (TOOLS, JSON, MD_JSON, FUNCTIONS) select the underlying provider mechanism; from_provider exposes a unified entry point across 15+ providers (OpenAI, Anthropic, Google, Mistral, Cohere, Ollama, DeepSeek, Groq, and more). Instructor is positioned as an extraction library, not an agent framework: 'Instructor for extraction, PydanticAI for agents.'
Agent loop shape. Single-call extraction with internal retry, not an agent loop. The patched create() runs the request, parses the response into the response_model, validates it, and on ValidationError reasks the model up to max_retries times. There is no multi-step tool loop, no handoff, no session memory; agentic behaviour requires composing Instructor inside a higher-level framework.
Primary use cases
- extracting typed objects from LLM responses
- schema-validated tool-call arguments before execution
- multi-provider structured-output code that does not change when swapping models
- self-correcting JSON output via validation-feedback retries
Key concepts
- Patching (docs) — Adds response_model, max_retries, and context to the provider's create() method without changing original client code.
- response_model → structured-output (docs) — Pydantic BaseModel that defines the expected output shape; docstrings and field annotations feed the prompt.
- Modes (TOOLS / JSON / MD_JSON / FUNCTIONS) (docs) — Selects which provider-side mechanism Instructor uses to coerce the model into the schema. OpenAI / Anthropic / Google default to TOOLS.
- max_retries with validation feedback → self-refine (docs) — On Pydantic ValidationError Instructor retries with the validation error fed back to the model.
- from_provider (docs) — Unified entry point across all supported providers; recommended over manual patching.
Patterns this full-code implements —
- ★★Structured Output
Core purpose: response_model is a Pydantic class that defines the LLM output shape, drives the prompt via docstrings and field annotations, and is used to validate the response.
- ★★Self-Refine
On Pydantic ValidationError Instructor reasks the model with the validation feedback attached, up to max_retries times; this is the self-refine loop applied to structured-output extraction.
- ★★Multi-Model Routing
from_provider exposes a single API surface across 15+ providers (OpenAI, Anthropic, Google, Mistral, Cohere, Ollama, DeepSeek, Groq, ...) so the same response_model code works across vendors.
- ★★Tool Use
Instructor does not implement an agent tool loop; it uses the provider's tool-calling API (TOOLS mode) as the transport for structured output. Building a true tool-loop agent requires wrapping Instru…
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.