Tool Use & Environment

App Exploration Phase

Before deploying an agent against an opaque app, have it explore (or watch a human demonstrate) the app, generating a per-element documentation knowledge base; at deployment, retrieve element docs to ground actions.

Problem

Without any prior knowledge of what each element does, the agent has to guess on every screen of every task: it confuses the cancel button with the confirm button, misreads which icon opens search, and hallucinates the names of fields it has never seen. Every user task pays for the same rediscovery work, and a single misclick on a sensitive action (payment, deletion) cannot be undone by the agent reasoning harder next turn.

Solution

Split the agent's lifecycle into two phases. (1) Exploration — agent autonomously interacts with the app or watches a human demo, and writes per-element documentation: what the element is, what it does, when to use it. Store as a structured knowledge base. (2) Deployment — for each task, retrieve relevant element docs (e.g. via vector search), inject into context, then act. Refresh docs when the UI changes.

When to use

  • The agent must operate against an opaque app with no API documentation for its UI elements.
  • The agent will be deployed against the same app many times, amortising up-front exploration cost.
  • Per-element semantics (what each control does and when to use it) are stable enough to document once.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related