Full-Desktop Computer Use
Give the agent a complete containerized OS desktop with native apps, a persistent filesystem, and desktop credential stores, so it can finish multi-application workflows a browser-only surface cannot.
This pattern helps complete certain larger patterns —
- specialisesComputer Use★— Let the model drive a desktop end-to-end via screenshots plus virtual mouse/keyboard tool calls instead of bespoke per-app APIs.
Context
A team needs an agent to complete real end-user workflows that cross several native applications: download an invoice in a mail client, edit it in a spreadsheet app, sign into a vendor portal through a password manager, then file the result in a local folder. The applications have no shared API, some live only on the desktop, and steps depend on files and logins that must survive from one step to the next.
Problem
A browser-only agent reaches web pages but cannot drive native desktop applications, install tools it needs mid-task, or hold a working filesystem across steps. A single-app or pixel-only Computer Use surface drives one screen at a time but provides no durable storage and no credential store, so logins, downloaded artifacts, and installed tooling evaporate between turns. Workflows that span a mail client, an editor, a terminal, and an authenticated portal stall because no single narrow surface covers all of them and nothing carries state across the application boundaries.
Forces
- Full-OS scope covers native apps a browser surface cannot reach.
- A persistent filesystem and installed tooling must survive across steps.
- Desktop credential stores enable logins and 2FA but widen the blast radius.
- A whole OS is heavier and slower to provision than a single browser tab.
Example
An operations agent must pull a PDF invoice from a mail client, reconcile it in a desktop spreadsheet, log into a supplier portal that requires two-factor authentication, and save a receipt locally. A browser-only agent cannot open the mail client or the spreadsheet app and loses the downloaded file between steps. Placed on a full containerized desktop with a persistent home directory and a configured password manager, the agent opens each native app in turn, keeps the invoice on disk across steps, and clears the 2FA prompt from the password manager extension.
Diagram
Solution
Therefore:
Provision a containerized desktop OS (for example Ubuntu with a lightweight window manager) preloaded with a browser, mail client, editor, and terminal. The agent observes the screen and emits mouse and keyboard actions over the whole desktop, not one app. A mounted persistent filesystem retains downloads, installed packages, and intermediate artifacts across steps. A desktop password manager extension supplies credentials and handles two-factor prompts. The entire desktop is the sandbox: the agent has full scope inside it and none outside it.
What this pattern forbids. The agent must operate inside the containerized desktop boundary only; it cannot reach the host OS or any resource outside the provisioned image.
The smaller patterns that complete this one —
- usesSandbox Isolation★★— Run agent-emitted code or actions in a contained environment with restricted filesystem, network, and process privileges.
And the patterns that stand alongside it, or against it —
- alternative-toBrowser Agent★— Expose websites to the agent through a structured DOM/accessibility tree plus a small action vocabulary, sitting between raw HTML and pixel-level Computer Use.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.