Framework · Model-Vendor Agents

Ollama

Ollama downloads and runs open-weight language models on the operator's own machine and serves them over a local API, including an OpenAI-compatible endpoint, so applications can run inference without sending prompts to a third party.

Description

Ollama is an open-source runtime for running large language models locally. It pulls model weights from a library, runs inference on the local machine, and exposes a local REST API on port 11434. It also provides an OpenAI-compatible endpoint so existing OpenAI client code can be pointed at the local instance by changing the base URL. Because inference runs locally, prompts and outputs do not leave the operator's machine unless cloud models are explicitly used.

Solution

Ollama is the inference engine an agent loop calls rather than the loop itself. A model is pulled into the local library and loaded on demand; the application sends a request naming the model to the local API (native or the OpenAI-compatible /v1 path) and receives a completion. The same endpoint serves any pulled model, so the caller selects which model answers by passing its name, and all computation stays on the local host.

Primary use cases

  • running open-weight LLMs locally
  • serving local inference over a REST API
  • OpenAI-compatible drop-in for existing clients
  • keeping prompts and data on the operator's own machine

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.