Full-Code · Browser & Computer-Useactive

Stagehand

Type: full-code  ·  Vendor: Browserbase, Inc.  ·  Language: TypeScript, Python  ·  License: MIT  ·  Status: active  ·  Status in practice: mature  ·  First released: 2024-03-24

Links: homepage docs repo

Browserbase's open-source SDK for browser agents — a Playwright-based framework with three natural-language primitives (act, extract, observe) plus an agent() mode that supports computer-use models from OpenAI, Anthropic, Google and Microsoft.

Description. Stagehand layers AI primitives onto Playwright. The act() primitive runs single, self-healing actions described in plain English; extract() pulls structured data validated against Zod or JSON schemas; observe() returns the actionable elements on a page so a plan can be built before execution. The agent() primitive turns a high-level instruction into a fully autonomous browser workflow and supports both DOM-based reasoning with any LLM and coordinate-based computer-use models (e.g. Google gemini-2.5-computer-use-preview, OpenAI computer-use models). Stagehand is MIT-licensed and developed by Browserbase, against which it runs by default.

Agent loop shape. Playwright-based SDK with two surfaces. The atomic surface is act / extract / observe — each call is one LLM round-trip that either runs an action, pulls structured data, or returns candidate actions. The autonomous surface is agent(), which runs a multi-step loop until the task finishes; in CUA mode the agent grounds actions in screen coordinates, in DOM mode it grounds them in serialised DOM nodes, and in hybrid mode it can use both.

Primary use cases

  • AI-driven browser automation with natural-language actions
  • structured data extraction from web pages with schema validation
  • exploration and planning of multi-step browser workflows
  • computer-use agents driving Playwright via OpenAI / Anthropic / Google models

Key concepts

  • act() browser-agent (docs)Run a single, self-healing browser action from natural language.
  • extract() structured-output (docs)Pull schema-validated structured data from a page using Zod or JSON.
  • observe() plan-and-execute (docs)Discover actionable elements on a page and return structured actions you can replay or validate before acting.
  • agent() computer-use (docs)Multi-step autonomous workflows; supports CUA, DOM and hybrid grounding modes.
  • Multi-provider LLMs multi-model-routing (docs)Pluggable LLM provider; CUA mode supports OpenAI, Anthropic, Google and Microsoft computer-use models.

Patterns this full-code implements

Neighbourhood

Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.