OpenAI Model Spec

Type: app · Vendor: OpenAI · Language: N/A · License: CC0-1.0 · Status: active · Status in practice: mature · First released: 2024-05-08

Links: homepage docs repo

The OpenAI Model Spec is a public document that specifies the intended behaviour of OpenAI's models, defining an authority ordering across instruction sources and the rules for when the model should comply, refuse, or partially answer.

Description. The Model Spec is a human-authored specification of desired model behaviour for OpenAI's models in the API and ChatGPT. It defines a chain of command that assigns each instruction source a default authority level, so Platform and System instructions override Developer, which override User, which override Guideline-level defaults. It also sets out when the assistant should refuse a request, decline prohibited help while still assisting with a permissible goal, or provide only neutral factual information. The document is released into the public domain under CC0.

Agent loop shape. The Model Spec is not a runtime but a behavioural contract the model is trained and evaluated against. On each request the model treats incoming instructions according to their position in the chain of command, applying higher-authority instructions over lower ones, and follows the spec's rules for borderline requests: comply when permissible, refuse disallowed content, or offer a constrained partial answer. The spec thus governs how a model resolves conflicting instructions before any tool or response is emitted.

Primary use cases

specifying intended model behaviour
defining instruction authority and conflict resolution
rules for refusal and partial compliance
shared reference for model alignment and evaluation

flowchart TD fw["OpenAI Model Spec"] fw --> p1["Priority Matrix (Conflict Resolution)<br/>(first-class)"] fw --> p2["Refusal<br/>(first-class)"] fw --> p3["Constitutional Charter<br/>(first-class)"] fw --> p4["Mandatory Red-Flag Escalation<br/>(supported)"]

Key concepts

Chain of command → priority-matrix-conflict-resolution (docs) — An ordering of instruction sources by default authority level — Root, then Platform/System, Developer, User, and Guideline — that the model uses to resolve conflicting instructions deterministically.
Levels of authority → constitutional-charter (docs) — Each section of the spec and each message role is tagged with one of the authority levels (root, system, developer, user, guideline), and higher-authority content overrides lower-authority content.
Seek the truth together → sycophancy — A behavioural principle directing the assistant to be honest and not sycophantic, pushing back when a request conflicts with established principles or the user's interests rather than agreeing to please.
Refusals and partial compliance → refusal (docs) — Rules for borderline requests: refuse disallowed content, decline the prohibited part of a request while still helping with a permissible goal, or give only neutral factual information.

Patterns this app implements —

Neighbourhood

Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.

Anti-patterns avoided

Alternatives & relatives

full-code · framework
Meta Llama Guard 3
complements
Llama Guard is a deployed input/output safety classifier; the Model Spec is the upstream behavioural contract a model is trained and evaluated against rather than a runtime filter.

References

Provenance

Last analyzed: 2026-06-17
Last updated: 2026-06-19
Verification status: needs-verification