App Exploration Phase
also known as Pre-Deployment Exploration, App Onboarding Crawl, UI Element Documentation
Before deploying an agent against an opaque app, have it explore (or watch a human demonstrate) the app, generating a per-element documentation knowledge base; at deployment, retrieve element docs to ground actions.
This pattern helps complete certain larger patterns —
- specialisesTool Discovery★— Let the agent discover available tools at runtime rather than hardcoding the tool list at agent build time.
Context
A team is deploying an agent against a mobile or desktop app whose user interface exposes no public API and no accessibility metadata that names its controls. The only way to learn what a given button does, or which menu reveals a particular setting, is to interact with the app and observe what happens. The same app will be driven many times by many users.
Problem
Without any prior knowledge of what each element does, the agent has to guess on every screen of every task: it confuses the cancel button with the confirm button, misreads which icon opens search, and hallucinates the names of fields it has never seen. Every user task pays for the same rediscovery work, and a single misclick on a sensitive action (payment, deletion) cannot be undone by the agent reasoning harder next turn.
Forces
- Exploration costs time and money up front;
- Demonstrations require a human, but a single demo amortises across many deployments.
- App UIs change; the documentation goes stale and needs refresh.
- Documentation that is too verbose drowns the agent in irrelevant context at deployment.
Example
A logistics company points its agent at an internal warehouse app it has never seen before. On every task the agent stumbles: it misreads which button submits, hallucinates field names, and clicks 'Cancel' thinking it confirms. The team runs an exploration phase first: a human demonstrates a few flows while the agent records each element's role and the surrounding context, building a per-element knowledge base. At deployment, the agent retrieves the relevant element docs before each click and stops guessing.
Diagram
Solution
Therefore:
Split the agent's lifecycle into two phases. (1) Exploration — agent autonomously interacts with the app or watches a human demo, and writes per-element documentation: what the element is, what it does, when to use it. Store as a structured knowledge base. (2) Deployment — for each task, retrieve relevant element docs (e.g. via vector search), inject into context, then act. Refresh docs when the UI changes.
What this pattern forbids. At deployment, the agent may not act on an element whose documentation is missing; missing-doc events trigger re-exploration rather than improvisation.
The smaller patterns that complete this one —
- usesNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
And the patterns that stand alongside it, or against it —
- complementsSkill Library★— Let the agent grow its own toolkit by writing reusable skills that subsequent runs can call.
- complementsMobile UI Agent★— Drive a smartphone end-to-end through a small, touch-native action vocabulary (tap, long-press, swipe, type, back, home) over screenshots, as a distinct interaction surface from desktop Computer Use and from web Browser Agents.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.