Methodology · LLM-App Engineeringprovenverified

Build-or-Buy Foundation Model Decision

also known as seven-axis model sourcing, open vs proprietary decision, build vs buy foundation model

Applies to: llm-appagentrag-system

Tags: build-vs-buymodel-sourcingseven-axes

Choose how to source your model by scoring it on seven clear factors. The choices are: host an open-weights model yourself, call a vendor's API, or fine-tune your own. The seven factors are data privacy, data lineage, performance, functionality, control, cost, and team capability. Give each factor a verdict and a weight. Together they point to the sourcing decision. The thing you keep is a written record that names which factors the losing options failed on. That way the choice can be checked later when conditions change.

Methodology process overview

Intent. Replace gut-feel calls like 'use OpenAI' or 'self-host Llama' with a seven-factor comparison whose verdicts and weights are written down.

When to apply. Use this for any new LLM application, or any system whose current model choice is up for review. Triggers include a new regulation, a new model release, a new cost ceiling, or a new performance gap. Run it again from time to time, because the open-versus-vendor landscape moves fast. Don't apply it when an outside rule already forces the answer, such as a contract that names one vendor. Skip it for throwaway prototypes that last under a day.

Inputs

  • Use case and data profileWhat the model must do and what data it will see. Include how sensitive the data is, where it must live, and whether a third party may log the prompts.
  • Performance targetThe quality bar from your test set, plus how fast and how many requests the system must handle.
  • Team capability inventoryAn honest read on your in-house skills: machine-learning operations, running GPUs, and security review.
  • Cost envelopeThe most you will pay per call, the most you will spend per month, and how long you have to pay off any up-front cost.

Outputs

  • Seven-axis comparison tableA grid of the candidate options scored on each factor, with a weight and a verdict. Options are vendor API, self-hosted open weights, and fine-tuned own model.
  • Decision recordA written reason that names which factors the losing options failed on, and the conditions under which to revisit the choice.

Steps (6)

  1. List candidate options

    List the realistic options. They include a vendor API such as OpenAI, Anthropic, or Google, self-hosted open weights such as Llama, Mistral, or Qwen, a fine-tuned in-house model, and a hybrid of these. Treat hybrid as a real option, not a fallback.

  2. Score each option on data privacy

    Where does the prompt go? Is it logged? Can it be used for training? Does it meet your data-location rules? Vendor APIs usually lose this factor when the data is regulated.

  3. Score on data lineage

    Do you know what the model was trained on? This is its data lineage. Open-weights models tend to be more open about it. Vendor models tend to be less open. It matters when regulators or your own users ask.

  4. Score on performance, functionality, and control

    Performance is the quality from your test set. Functionality is the features the option supports, such as function calling, structured output, vision, and long context. Control is whether you can pin a model version, switch providers, fine-tune, or run offline.

    usesMulti-Model RoutingStructured Output

  5. Score on cost and team needs

    Cost is the per-call cost at your expected volume plus the running cost, such as GPUs, security review, and monitoring. Team is whether you have the people to run this option. A self-hosted model with no on-call rotation is a future incident.

  6. Weight, decide, and document

    Weight each factor by how much it matters for your task. Add up the weighted scores. Pick the highest. Then write down why. Name the factors the losers failed on and the condition that would flip the choice. This makes the next review cheap.

Framework-specific instructions

Pick a framework and generate a framework-targeted rewrite of this methodology's steps.

Choose framework

AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.

Principles

  • Seven factors, not vibes. Every factor has a verdict and a weight written down.
  • Hybrid is a first-class option. Use a vendor for some calls and self-host for others.
  • Team capability is a factor. The model you cannot operate at 3am is the wrong model.
  • The choice can be checked later. Name the factors the loser failed on, and the next review is cheap.

Known failure modes (3)

Related patterns (4)

Related compositions (1)

Related methodologies (2)

Sources (2)

Provenance

  • Added to catalog:
  • Last updated:
  • Verification status: verified