Ollama
Type: full-code · Vendor: Ollama · Language: Go · License: MIT · Status: active · Status in practice: mature · First released: 2023-07-07
Ollama downloads and runs open-weight language models on the operator's own machine and serves them over a local API, including an OpenAI-compatible endpoint, so applications can run inference without sending prompts to a third party.
Description. Ollama is an open-source runtime for running large language models locally. It pulls model weights from a library, runs inference on the local machine, and exposes a local REST API on port 11434. It also provides an OpenAI-compatible endpoint so existing OpenAI client code can be pointed at the local instance by changing the base URL. Because inference runs locally, prompts and outputs do not leave the operator's machine unless cloud models are explicitly used.
Agent loop shape. Ollama is the inference engine an agent loop calls rather than the loop itself. A model is pulled into the local library and loaded on demand; the application sends a request naming the model to the local API (native or the OpenAI-compatible /v1 path) and receives a completion. The same endpoint serves any pulled model, so the caller selects which model answers by passing its name, and all computation stays on the local host.
Primary use cases
- running open-weight LLMs locally
- serving local inference over a REST API
- OpenAI-compatible drop-in for existing clients
- keeping prompts and data on the operator's own machine
Key concepts
- Model library / pull (docs) — A registry of open-weight models that `ollama pull` downloads to the local machine; `ollama ls` lists the local catalog the runtime can load on demand.
- Local API (port 11434) → sovereign-inference-stack (docs) — The REST endpoint Ollama serves on the local host for generate/chat/embeddings requests, so applications send a model name and receive completions without leaving the machine.
- OpenAI compatibility → provider-string-routing (docs) — A parallel /v1 endpoint that mimics the OpenAI API so existing OpenAI client code reaches a local model just by changing the base URL and model name.
- Modelfile (docs) — A declarative file specifying a base model plus parameters, system prompt, and template, used to build customized local models with `ollama create`.
Patterns this full-code implements —
- ★Sovereign Inference Stack
Ollama downloads and runs model weights and inference entirely on the operator's own machine and exposes a local API, so prompts and outputs never cross into a third-party API.
- ★Provider-String Routing
Ollama exposes an OpenAI-compatible API so existing clients reach a local model by setting the base URL and naming the model, letting callers select which served model answers via the standard provid…
- ★★Tool Use
Ollama supports tool (function) calling: the caller passes tool definitions with name, description, and parameter schema, the model decides when to emit a typed tool call, and the tool result is fed…
- ★★Structured Output
Ollama lets the caller enforce a JSON schema on model responses via the format field so replies conform to a typed shape for reliable downstream extraction; constraint is applied by the runtime rathe…
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.