Exploration vs Exploitation
Balance taking the best-known action (exploit) with trying alternatives that might be better (explore).
Problem
An agent that always picks whatever is currently the best-known option (pure exploitation) locks in at whatever local optimum it stumbled into early and never discovers that a different tool or template would have worked better. An agent that always tries something new (pure exploration) burns budget on unproven options and never compounds what it has already learned. Picking the trade-off informally — by gut feel or by occasional manual override — gives neither the predictable improvement of a scheduled policy nor the statistical guarantees that bandit theory provides.
Solution
Pick a strategy: epsilon-greedy (exploit with probability 1-ε), upper-confidence-bound (favor under-explored options with bonus), Thompson sampling (sample from posterior). Apply across tools, strategies, prompts. Track outcomes and adjust.
When to use
- The agent chooses repeatedly among options (tools, strategies, prompts) and outcomes can be tracked.
- Pure exploitation is locking the agent into local optima.
- A strategy (epsilon-greedy, UCB, Thompson sampling) can be picked and tuned.
Open the full interactive page →
Diagram, neighbourhood map, code examples, related patterns and full provenance.