Methodology · Prompt Engineeringprovenpartial

Iterative Prompt Refinement Loop

also known as prompt engineering lifecycle, test-and-iterate prompting

Applies to: llm-appagentcoding-agentrag-pipeline

Tags: prompt-engineeringiterationevaluation

Treat a prompt like an experiment, not a one-shot write. Draft the simplest prompt that could work, run it on real inputs, read the failures, change one thing, and run again. Stop when the prompt clears a quality bar you set before you started. The loop is cheap, and it beats guessing at wording.

Methodology process overview

Intent. Turn prompt writing into a measured loop so every change is judged against real outputs instead of a hunch.

When to apply. Use this whenever a single prompt drives a feature and its output quality matters. Reach for it the moment a prompt 'mostly works but sometimes fails'. Don't apply it to a throwaway one-off prompt you will never run again.

Inputs

  • Task and success barWhat the prompt must produce, and the measurable bar that counts as good enough.
  • Real example inputsA handful of genuine inputs the prompt will face in production, including the awkward ones.
  • Target modelThe model the prompt will run against, since wording that helps one model can hurt another.
  • A way to judge outputA human reader or an automated grader that can score each run against the bar.

Outputs

  • A prompt that clears the barThe frozen prompt text that passed on the real inputs.
  • Failure logThe short record of which inputs broke which version and why.
  • Regression example setThe example inputs, kept so the prompt can be re-checked after any later edit.

Steps (6)

  1. Draft the simplest prompt

    Write the shortest prompt that could plausibly work. Resist adding rules you have not yet seen fail.

  2. Run on real inputs

    Run the prompt against the genuine example inputs, not imagined ones. Capture every output.

    usesSampled Prompt Trace Eval

  3. Read and bucket the failures

    Read each bad output and group the failures by cause. The buckets tell you what to fix.

  4. Change one thing

    Make a single change aimed at the biggest bucket: a clearer instruction, one example, or a tighter output format.

    usesStructured OutputChain of Thought

  5. Re-run and compare

    Run the new version on the same inputs and compare scores against the bar. If still below, return to reading failures.

    usesPrompt Versioning

  6. Freeze and keep the examples

    When the prompt clears the bar, freeze it and keep the example inputs as a regression set for future edits.

Framework-specific instructions

Pick a framework and generate a framework-targeted rewrite of this methodology's steps.

Choose framework

AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.

Principles

  • Change one variable at a time, or a score change tells you nothing.
  • Judge against real inputs, never imagined ones.
  • Keep every version and the inputs that broke it.
  • Stop at a bar you set before you started, not when you get bored.

Known failure modes (2)

Related patterns (5)

Related methodologies (1)

Sources (2)

Provenance

  • Added to catalog:
  • Last updated:
  • Verification status: partial