Browser Use
Type: full-code · Vendor: Browser Use (Magnus Mueller, Gregor Zunic) · Language: Python · License: MIT · Status: active · Status in practice: mature · First released: 2024-10-31
Open-source Python library that wraps a Playwright-controlled browser into an agent loop driven by any of 15+ LLM providers, with a paid stealth-browser cloud as the production tier.
Description. Browser Use exposes three primary objects — Agent, Browser and a chat client — that together let a language model navigate, click, type, scroll and extract data from real web pages. The agent extracts an annotated DOM, optionally adds screenshots for vision, and emits structured actions executed by the underlying Playwright browser. The library is MIT-licensed and integrates with OpenAI, Anthropic, Google Gemini, Azure, Bedrock, Groq, Ollama, DeepSeek, OpenRouter and others. A paid managed cloud at api.browser-use.com layers stealth Chromium, residential proxies in 195+ countries and CAPTCHA solving on top.
Agent loop shape. Per-step Playwright-backed loop. Each turn the Agent snapshots the page (annotated DOM plus optional screenshot), the LLM emits up to max_actions_per_step structured actions, the Controller executes them through Playwright, the result feeds back into the next observation. The loop runs under agent.run() until the task completes or max_steps is hit.
Primary use cases
- natural-language web automation and form filling
- agent-driven data extraction from authenticated sites
- browser testing and monitoring via LLM-written tasks
- production scraping behind a stealth cloud with residential proxies
Key concepts
- Agent → browser-agent (docs) — High-level entry point that owns the task, LLM and step loop.
- Browser (docs) — Playwright-controlled browser instance the Agent drives.
- Annotated DOM + optional vision (docs) — DOM elements are extracted and indexed; screenshots are included when the vision mode requests them.
- max_actions_per_step (docs) — Per-turn action budget; defaults to 4 so the model can submit a whole form in one step.
- Multi-provider LLM support → multi-model-routing (docs) — Native integrations for 15+ providers plus LiteLLM and OpenAI-compatible base_url.
- Browser Use Cloud (docs) — Paid stealth-browser API with residential proxies, CAPTCHA solving and managed infrastructure.
Patterns this full-code implements —
- ★Browser Agent
Core abstraction — Agent class drives a real browser to complete natural-language tasks.
- ★★Tool Use
Structured browser actions (click, type, scroll, navigate, extract) are the LLM's tool surface; MCP registry adds external tools.
- ★★Multi-Model Routing
15+ provider integrations plus LiteLLM and any OpenAI-compatible endpoint.
- ★★Model Context Protocol
Browser Use ships a documented MCP server surface that exposes Autonomous Agent Tools, Direct Browser Control (browser_navigate / browser_click / browser_type / browser_get_state), Tab Management, Co…
- ★★ReAct
Per-step observe (DOM + optional screenshot) → act (structured actions) → observe again loop.
- ★★Session Isolation
Browser Use Cloud advertises stealth Chromium sessions per agent run; local library uses standard Playwright contexts.
- ★Computer Use
Honest downgrade — browser-use primarily acts via the DOM, not coordinate-based screen control. Vision is opt-in supplement, not the headline modality.
- ★★Plan-and-Execute
Re-verified 2026-05-24 against docs.browser-use.com customize/agent-settings — no planner, planning mode or plan_steps primitive is documented; max_actions_per_step is a within-step batch, not an exp…
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.