MCP Server-Side Sampling

also known as Reasoning Inversion (MCP), createMessage Callback

Let an MCP server, mid-tool-call, send a prompt back to the host through createMessage and use the host's model so the server does language work without holding its own model or key.

This pattern helps complete certain larger patterns —

specialisesModel Context Protocol★★— Standardise how agents discover and call tools so that a tool written once is usable by any conformant agent.

Context

An MCP server exposes tools to a host that owns the model, the credentials, and the user relationship. Some of those tools need a language step partway through — summarising a fetched document, classifying a record, drafting a reply, deciding which branch of an internal workflow to take. The protocol now lets the host expose a sampling primitive to servers, so the direction of the call can reverse: the server, instead of only returning a result, can ask the host's model for a completion.

Problem

A tool that needs reasoning has two unappealing options if the server must supply the model itself. Embedding a model key in the server duplicates billing, leaks a second credential surface, and pins the server to one provider while the host may already be on another. Returning the raw material to the host and asking it to reason instead forces the tool's internal logic out into the host's prompt, where the server cannot control or sequence it. The server needs to borrow the host's existing model for a scoped step without owning it.

Forces

A server that carries its own model key duplicates cost and credentials and pins itself to one provider, while a server with no model cannot do the language step its tool requires.
Reasoning done inside the tool stays encapsulated and sequenced, but reasoning pushed back to the host's prompt leaks the tool's internal logic into the caller.
A callback up to the host's model adds a network round-trip and a point where the host may deny, rate-limit, or modify the request, against the convenience of the server reasoning locally.
The host owns the user relationship and the spend, so an unbounded server-issued completion request is a trust and budget hazard the host must be able to gate.

Example

A document-fetcher MCP server exposes a 'summarise this contract' tool. Rather than carry its own model key, the server fetches the PDF, extracts the text, and sends a createMessage request back to the host: 'summarise the obligations in this text'. The host runs the completion on the model it already holds, returns the summary, and the server attaches it to the tool result — the server never touched a model credential.

Diagram

sequenceDiagram participant H as Host (model + key) participant S as MCP server tool H->>S: call tool(args) S->>S: do work up to reasoning step S->>H: createMessage(prompt, model hint, cap) Note over H: approve / budget / pick model H-->>S: completion text S->>S: fold completion into tool logic S-->>H: tool result

Solution

Therefore:

The host advertises a sampling capability to connected servers. When a tool handler reaches a step that needs language work, instead of calling a model directly it constructs a sampling request — messages, a model-preference hint, a token cap — and sends createMessage back up the connection. The host receives the request, applies its own policy (optionally surfacing it to the user, enforcing a budget, choosing the model), runs the completion on the model it already holds, and returns the text down to the server. The server folds that text into the rest of the tool's logic and returns the final tool result. The model, the key, and the spend stay with the host; the orchestration and the prompt construction stay with the server.

What it gives you

A server gains a language step without shipping a model key, removing a credential surface and the duplicate billing that comes with it.
The server stays provider-agnostic: the host's model choice, not the server's, runs the completion.
The host keeps a single chokepoint for approval, budget, and model selection over every reasoning step its servers trigger.
The tool's multi-step internal logic stays encapsulated on the server rather than leaking into the host's prompt.

What it costs you

Each reasoning step is a round-trip the host can deny, throttle, or alter, so a tool that depends on sampling fails when the host withholds the capability.
A server-constructed prompt is attacker-influenced if it folds in tool inputs or fetched content, turning the callback into a prompt-injection path into the host's model.
Nested completions are hard to attribute: spend and latency from a deep server call surface on the host's bill without obvious provenance.
Not every host implements the sampling primitive, so a server that relies on it is not portable across all clients.

What this pattern forbids. The server must not call any model or hold any model credential of its own; every reasoning step it needs must be requested from the host through createMessage and may be denied, capped, or modified by the host.

And the patterns that stand alongside it, or against it —

composes-withMCP Bidirectional Bridge★— Run a framework as both MCP client (consuming external MCP servers as tools) and MCP server (publishing its own agents, tools, and workflows back over MCP) so capabilities flow both directions across the protocol boundary.
complementsComposite Service★— Expose one MCP tool that orchestrates several underlying API calls into a single higher-level operation, so the agent invokes a task-level capability instead of chaining many low-level endpoints.
complementsDual LLM Pattern★— Split agent work between a privileged model that holds tool access and a quarantined model that reads untrusted content, exchanging only opaque references between them.