Full-Code · Browser & Computer-Useactive

Stagehand

Type: full-code · Vendor: Browserbase, Inc. · Language: TypeScript, Python · License: MIT · Status: active · Status in practice: mature · First released: 2024-03-24

Links: homepage docs repo

Browserbase's open-source SDK for browser agents — a Playwright-based framework with three natural-language primitives (act, extract, observe) plus an agent() mode that supports computer-use models from OpenAI, Anthropic, Google and Microsoft.

Description. Stagehand layers AI primitives onto Playwright. The act() primitive runs single, self-healing actions described in plain English; extract() pulls structured data validated against Zod or JSON schemas; observe() returns the actionable elements on a page so a plan can be built before execution. The agent() primitive turns a high-level instruction into a fully autonomous browser workflow and supports both DOM-based reasoning with any LLM and coordinate-based computer-use models (e.g. Google gemini-2.5-computer-use-preview, OpenAI computer-use models). Stagehand is MIT-licensed and developed by Browserbase, against which it runs by default.

Agent loop shape. Playwright-based SDK with two surfaces. The atomic surface is act / extract / observe — each call is one LLM round-trip that either runs an action, pulls structured data, or returns candidate actions. The autonomous surface is agent(), which runs a multi-step loop until the task finishes; in CUA mode the agent grounds actions in screen coordinates, in DOM mode it grounds them in serialised DOM nodes, and in hybrid mode it can use both.

Primary use cases

AI-driven browser automation with natural-language actions
structured data extraction from web pages with schema validation
exploration and planning of multi-step browser workflows
computer-use agents driving Playwright via OpenAI / Anthropic / Google models

flowchart TD dev[Developer] --> sh[Stagehand on Playwright] sh --> atomic{Primitive} atomic -->|act| act[act: one self-healing action] atomic -->|extract| ex[extract: schema-validated data] atomic -->|observe| obs[observe: candidate actions] atomic -->|agent| agentmode{Agent mode} agentmode -->|CUA| cua[Coordinate grounding] agentmode -->|DOM| dom[DOM-node grounding] agentmode -->|hybrid| hyb[Both] cua --> step[Multi-step loop] dom --> step hyb --> step step --> page[Real browser via Playwright] page --> step sh -.runs by default against.-> bb[Browserbase session]

Key concepts

act() → browser-agent (docs) — Run a single, self-healing browser action from natural language.
extract() → structured-output (docs) — Pull schema-validated structured data from a page using Zod or JSON.
observe() → plan-and-execute (docs) — Discover actionable elements on a page and return structured actions you can replay or validate before acting.
agent() → computer-use (docs) — Multi-step autonomous workflows; supports CUA, DOM and hybrid grounding modes.
Multi-provider LLMs → multi-model-routing (docs) — Pluggable LLM provider; CUA mode supports OpenAI, Anthropic, Google and Microsoft computer-use models.

Stagehand

Neighbourhood

Alternatives & relatives

Listed as alternative by (7)

References

Provenance