Cognition & Introspection

Reflexive Metacognitive Agent

Agent maintains an explicit self-model of its own capabilities, confidence and limitations, and reasons over that model when accepting / refusing / handing off tasks.

Problem

Without an explicit self-model, the agent has no principled way to refuse tasks outside its competence or hand off to a more suitable peer. Refusals are ad-hoc, based on prompt-level instructions that are inconsistent across calls. Differs from confidence-reporting (which is per-output) by making the self-model an *input* to decision-making, not just an output.

Solution

Self-model is a structured artifact: {capabilities: [...], confidence-by-task-class: {...}, declared-limitations: [...]}. At task acceptance, agent reasons over self-model: does this task fall in my capabilities? what's my confidence for this class? are any declared limitations triggered? Output: accept / refuse-with-reason / handoff-to-peer-with-capability-X. Self-model refreshed periodically against eval-suite results. Pair with confidence-reporting, decentralized-swarm-handoff, refusal, typed-refusal-codes.

When to use

  • Agent operates in a domain where competence boundaries are clear.
  • Handoff to peer agents is feasible.
  • Eval suite can refresh self-model periodically.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related