Methodology · LLM-App Engineering

Model Selection Workflow

Turn model selection into a repeatable four-step routine. The output is a private leaderboard and a live monitor, not a one-time decision.

Description

Pick a foundation model in four steps, in order. First, throw out any model that breaks a hard rule, such as licence, data location, or what kinds of input it handles. Second, narrow the rest using public benchmarks and leaderboards. Third, test the survivors on your real task and rank them in your own private leaderboard. Fourth, keep watching the chosen model in production, using the same scores, so you catch when it slips. The thing you keep is the private leaderboard, not a one-off 'we picked GPT-something' call.

When to apply

Use this when you start any LLM application where the model really matters, such as chat, retrieval-augmented generation (RAG), an agent, classification, or text generation. Run it before you lock down your first prompt. Run it again when a strong new model ships or when production numbers move. Don't apply it when an outside rule already forces one model, such as a regulation or a single-vendor contract. Skip it for throwaway prototypes that last under a day. For those, just use the cheapest model that might work.

What it involves

  • Apply the hard-constraint filter
  • Screen on public information
  • Build the private leaderboard
  • Select and document
  • Monitor in production

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related