Tool Use & Environment

Full-Desktop Computer Use

Give the agent a complete containerized OS desktop with native apps, a persistent filesystem, and desktop credential stores, so it can finish multi-application workflows a browser-only surface cannot.

Problem

A browser-only agent reaches web pages but cannot drive native desktop applications, install tools it needs mid-task, or hold a working filesystem across steps. A single-app or pixel-only Computer Use surface drives one screen at a time but provides no durable storage and no credential store, so logins, downloaded artifacts, and installed tooling evaporate between turns. Workflows that span a mail client, an editor, a terminal, and an authenticated portal stall because no single narrow surface covers all of them and nothing carries state across the application boundaries.

Solution

Provision a containerized desktop OS (for example Ubuntu with a lightweight window manager) preloaded with a browser, mail client, editor, and terminal. The agent observes the screen and emits mouse and keyboard actions over the whole desktop, not one app. A mounted persistent filesystem retains downloads, installed packages, and intermediate artifacts across steps. A desktop password manager extension supplies credentials and handles two-factor prompts. The entire desktop is the sandbox: the agent has full scope inside it and none outside it.

When to use

  • The task spans several native desktop applications with no shared API.
  • State (downloads, installed tools, logins) must persist across steps or sessions.
  • The agent needs authenticated access through a desktop password manager, including 2FA.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related