Full-Code · Browser & Computer-Useactive

Skyvern

Type: full-code · Vendor: Skyvern-AI · Language: Python, TypeScript · License: AGPL-3.0 · Status: active · Status in practice: emerging

Links: homepage docs repo

Automate browser-based workflows with AI using vision LLMs instead of brittle selectors.

Description. Playwright-compatible browser automation that uses vision LLMs plus computer vision to interact with websites without per-site XPath/CSS selectors, adapting to layout changes and sites it has never seen. Ships a no-code workflow builder alongside the SDK; reports strong WebVoyager and form-filling results.

Agent loop shape. A vision-LLM observes the rendered page, decides the next interaction, and acts via a Playwright-compatible layer — looping perceive→decide→act without relying on code-defined selectors.

Primary use cases

automating browser workflows across unfamiliar or changing sites
form-filling and data-entry automation without per-site selectors
no-code browser automation for non-technical users

flowchart TD TASK[Task / workflow] --> PERCEIVE[Vision LLM perceives page] PERCEIVE --> DECIDE[Decide next action] DECIDE --> ACT[Act via Playwright-compatible layer] ACT --> PERCEIVE DECIDE -->|done| OUT[Result] BUILDER[No-code workflow builder] -.defines.-> TASK

Key concepts

Vision-LLM interaction → computer-use (docs) — The agent sees the rendered page and acts, instead of parsing the DOM.
Workflow builder → visual-workflow-graph — No-code composition of multi-step browser automations.