Methodology · Prompt Engineeringemergingpartial

Automatic Prompt Optimization

also known as prompt optimization, DSPy-style compilation, programmatic prompting

Applies to: llm-appagentclassification-taskpipeline

Tags: prompt-engineeringoptimizationdspy

Stop hand-tuning the prompt. Define the task as inputs, outputs, and a metric, then let an optimizer search the prompt space for you. The optimizer proposes prompt variants, scores them on your metric, and keeps the winners. It pays off once you have a clear metric and a labelled example set, and it scales past what a human can tune by hand.

Methodology process overview

Intent. Replace manual prompt tweaking with a metric-driven search that an optimizer runs over the prompt space.

When to apply. Use this when you have a measurable metric, a labelled example set of a few dozen cases or more, and a task stable enough to be worth optimising. It earns its keep on prompts in pipelines and on tasks too fiddly to tune by hand. Don't apply it when you have no metric, since the optimizer has nothing to climb.

Inputs

  • Typed task signatureThe task stated as named inputs and outputs, so a program can call it and check the result.
  • Labelled example setInput-output pairs the optimizer scores against, split into training and held-out parts.
  • MetricA function that scores an output, from exact match to an LLM judge with a rubric.
  • An optimizerThe search procedure, such as a DSPy optimizer, that proposes and scores prompt variants.

Outputs

  • Optimised promptThe winning prompt text or few-shot demonstration set the optimizer found.
  • Score reportThe metric scores on held-out data that justify the chosen variant.

Steps (5)

  1. Express the task as a program

    State the task as a typed signature of named inputs and outputs so an optimizer can call it and check results.

    usesStructured Output

  2. Assemble a labelled example set

    Collect input-output pairs and split them into a training set the optimizer learns on and a held-out set for honest scoring.

  3. Pick a metric

    Choose a function that scores an output against the label, from exact match to a rubric-driven LLM judge.

    usesEvaluator-Optimizer

  4. Choose and run an optimizer

    Pick an optimizer and let it propose prompt variants and few-shot demonstrations, scoring each on the metric.

    usesPrompt Variant EvaluationAutomatic Workflow Search

  5. Lock the winner and re-check

    Freeze the best-scoring prompt and confirm it holds up on the held-out set, not just the training set.

    usesPrompt/Response Optimiser

Framework-specific instructions

Pick a framework and generate a framework-targeted rewrite of this methodology's steps.

Choose framework

AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.

Principles

  • The optimizer climbs exactly what you measure, so the metric is the design.
  • Always keep a held-out split; tuned prompts over-fit too.
  • A clear typed signature makes the task optimisable in the first place.
  • Automate the search, but a human still chooses the metric and reads the failures.

Known failure modes (2)

Related patterns (5)

Related methodologies (2)

Sources (2)

Provenance

  • Added to catalog:
  • Last updated:
  • Verification status: partial