MCP Server-Side Sampling
Let an MCP server, mid-tool-call, send a prompt back to the host through createMessage and use the host's model so the server does language work without holding its own model or key.
Problem
A tool that needs reasoning has two unappealing options if the server must supply the model itself. Embedding a model key in the server duplicates billing, leaks a second credential surface, and pins the server to one provider while the host may already be on another. Returning the raw material to the host and asking it to reason instead forces the tool's internal logic out into the host's prompt, where the server cannot control or sequence it. The server needs to borrow the host's existing model for a scoped step without owning it.
Solution
The host advertises a sampling capability to connected servers. When a tool handler reaches a step that needs language work, instead of calling a model directly it constructs a sampling request — messages, a model-preference hint, a token cap — and sends createMessage back up the connection. The host receives the request, applies its own policy (optionally surfacing it to the user, enforcing a budget, choosing the model), runs the completion on the model it already holds, and returns the text down to the server. The server folds that text into the rest of the tool's logic and returns the final tool result. The model, the key, and the spend stay with the host; the orchestration and the prompt construction stay with the server.
When to use
- A tool handler needs a language step (summarise, classify, draft, route) partway through, but the server should not own a model or a key.
- The host already holds the model, the spend, and the user relationship, and should remain the chokepoint for any reasoning the server triggers.
- The server must stay provider-agnostic and let the host's model choice run the completion.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.