Framework · Browser & Computer-Use

Stagehand

Browserbase's open-source SDK for browser agents — a Playwright-based framework with three natural-language primitives (act, extract, observe) plus an agent() mode that supports computer-use models from OpenAI, Anthropic, Google and Microsoft.

Description

Stagehand layers AI primitives onto Playwright. The act() primitive runs single, self-healing actions described in plain English; extract() pulls structured data validated against Zod or JSON schemas; observe() returns the actionable elements on a page so a plan can be built before execution. The agent() primitive turns a high-level instruction into a fully autonomous browser workflow and supports both DOM-based reasoning with any LLM and coordinate-based computer-use models (e.g. Google gemini-2.5-computer-use-preview, OpenAI computer-use models). Stagehand is MIT-licensed and developed by Browserbase, against which it runs by default.

Solution

Playwright-based SDK with two surfaces. The atomic surface is act / extract / observe — each call is one LLM round-trip that either runs an action, pulls structured data, or returns candidate actions. The autonomous surface is agent(), which runs a multi-step loop until the task finishes; in CUA mode the agent grounds actions in screen coordinates, in DOM mode it grounds them in serialised DOM nodes, and in hybrid mode it can use both.

Primary use cases

AI-driven browser automation with natural-language actions
structured data extraction from web pages with schema validation
exploration and planning of multi-step browser workflows
computer-use agents driving Playwright via OpenAI / Anthropic / Google models

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.