Stagehand
Browserbase's open-source SDK for browser agents — a Playwright-based framework with three natural-language primitives (act, extract, observe) plus an agent() mode that supports computer-use models from OpenAI, Anthropic, Google and Microsoft.
Description
Stagehand layers AI primitives onto Playwright. The act() primitive runs single, self-healing actions described in plain English; extract() pulls structured data validated against Zod or JSON schemas; observe() returns the actionable elements on a page so a plan can be built before execution. The agent() primitive turns a high-level instruction into a fully autonomous browser workflow and supports both DOM-based reasoning with any LLM and coordinate-based computer-use models (e.g. Google gemini-2.5-computer-use-preview, OpenAI computer-use models). Stagehand is MIT-licensed and developed by Browserbase, against which it runs by default.
Solution
Playwright-based SDK with two surfaces. The atomic surface is act / extract / observe — each call is one LLM round-trip that either runs an action, pulls structured data, or returns candidate actions. The autonomous surface is agent(), which runs a multi-step loop until the task finishes; in CUA mode the agent grounds actions in screen coordinates, in DOM mode it grounds them in serialised DOM nodes, and in hybrid mode it can use both.
Primary use cases
- AI-driven browser automation with natural-language actions
- structured data extraction from web pages with schema validation
- exploration and planning of multi-step browser workflows
- computer-use agents driving Playwright via OpenAI / Anthropic / Google models
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.