Stagehand
Type: full-code · Vendor: Browserbase, Inc. · Language: TypeScript, Python · License: MIT · Status: active · Status in practice: mature · First released: 2024-03-24
Browserbase's open-source SDK for browser agents — a Playwright-based framework with three natural-language primitives (act, extract, observe) plus an agent() mode that supports computer-use models from OpenAI, Anthropic, Google and Microsoft.
Description. Stagehand layers AI primitives onto Playwright. The act() primitive runs single, self-healing actions described in plain English; extract() pulls structured data validated against Zod or JSON schemas; observe() returns the actionable elements on a page so a plan can be built before execution. The agent() primitive turns a high-level instruction into a fully autonomous browser workflow and supports both DOM-based reasoning with any LLM and coordinate-based computer-use models (e.g. Google gemini-2.5-computer-use-preview, OpenAI computer-use models). Stagehand is MIT-licensed and developed by Browserbase, against which it runs by default.
Agent loop shape. Playwright-based SDK with two surfaces. The atomic surface is act / extract / observe — each call is one LLM round-trip that either runs an action, pulls structured data, or returns candidate actions. The autonomous surface is agent(), which runs a multi-step loop until the task finishes; in CUA mode the agent grounds actions in screen coordinates, in DOM mode it grounds them in serialised DOM nodes, and in hybrid mode it can use both.
Primary use cases
- AI-driven browser automation with natural-language actions
- structured data extraction from web pages with schema validation
- exploration and planning of multi-step browser workflows
- computer-use agents driving Playwright via OpenAI / Anthropic / Google models
Key concepts
- act() → browser-agent (docs) — Run a single, self-healing browser action from natural language.
- extract() → structured-output (docs) — Pull schema-validated structured data from a page using Zod or JSON.
- observe() → plan-and-execute (docs) — Discover actionable elements on a page and return structured actions you can replay or validate before acting.
- agent() → computer-use (docs) — Multi-step autonomous workflows; supports CUA, DOM and hybrid grounding modes.
- Multi-provider LLMs → multi-model-routing (docs) — Pluggable LLM provider; CUA mode supports OpenAI, Anthropic, Google and Microsoft computer-use models.
Patterns this full-code implements —
- ★Browser Agent
Core framing of the SDK: natural-language browser automation.
- ★★Tool Use
act / extract / observe / agent are the documented tool surface; agent() also supports MCP integrations and custom tools.
- ★★Structured Output
extract() validates against Zod (TypeScript) or JSON schemas.
- ★★Plan-and-Execute
observe() returns structured actions you can run later; documented as a plan-then-execute pattern.
- ★Computer Use
agent() supports Computer-Using Agent models from Google, OpenAI, Anthropic and Microsoft, plus a hybrid coord+DOM mode.
- ★★ReAct
agent() runs a multi-step observe-then-act loop until the high-level task is complete.
- ★★Multi-Model Routing
Pluggable LLM provider; CUA mode spans OpenAI, Anthropic, Google and Microsoft models.
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.