Build-or-Buy Foundation Model Decision
also known as seven-axis model sourcing, open vs proprietary decision, build vs buy foundation model
Choose how to source your model by scoring it on seven clear factors. The choices are: host an open-weights model yourself, call a vendor's API, or fine-tune your own. The seven factors are data privacy, data lineage, performance, functionality, control, cost, and team capability. Give each factor a verdict and a weight. Together they point to the sourcing decision. The thing you keep is a written record that names which factors the losing options failed on. That way the choice can be checked later when conditions change.
Methodology process overview
Intent. Replace gut-feel calls like 'use OpenAI' or 'self-host Llama' with a seven-factor comparison whose verdicts and weights are written down.
When to apply. Use this for any new LLM application, or any system whose current model choice is up for review. Triggers include a new regulation, a new model release, a new cost ceiling, or a new performance gap. Run it again from time to time, because the open-versus-vendor landscape moves fast. Don't apply it when an outside rule already forces the answer, such as a contract that names one vendor. Skip it for throwaway prototypes that last under a day.
Inputs
- Use case and data profile — What the model must do and what data it will see. Include how sensitive the data is, where it must live, and whether a third party may log the prompts.
- Performance target — The quality bar from your test set, plus how fast and how many requests the system must handle.
- Team capability inventory — An honest read on your in-house skills: machine-learning operations, running GPUs, and security review.
- Cost envelope — The most you will pay per call, the most you will spend per month, and how long you have to pay off any up-front cost.
Outputs
- Seven-axis comparison table — A grid of the candidate options scored on each factor, with a weight and a verdict. Options are vendor API, self-hosted open weights, and fine-tuned own model.
- Decision record — A written reason that names which factors the losing options failed on, and the conditions under which to revisit the choice.
Steps (6)
List candidate options
List the realistic options. They include a vendor API such as OpenAI, Anthropic, or Google, self-hosted open weights such as Llama, Mistral, or Qwen, a fine-tuned in-house model, and a hybrid of these. Treat hybrid as a real option, not a fallback.
Score each option on data privacy
Where does the prompt go? Is it logged? Can it be used for training? Does it meet your data-location rules? Vendor APIs usually lose this factor when the data is regulated.
Score on data lineage
Do you know what the model was trained on? This is its data lineage. Open-weights models tend to be more open about it. Vendor models tend to be less open. It matters when regulators or your own users ask.
Score on performance, functionality, and control
Performance is the quality from your test set. Functionality is the features the option supports, such as function calling, structured output, vision, and long context. Control is whether you can pin a model version, switch providers, fine-tune, or run offline.
Score on cost and team needs
Cost is the per-call cost at your expected volume plus the running cost, such as GPUs, security review, and monitoring. Team is whether you have the people to run this option. A self-hosted model with no on-call rotation is a future incident.
Weight, decide, and document
Weight each factor by how much it matters for your task. Add up the weighted scores. Pick the highest. Then write down why. Name the factors the losers failed on and the condition that would flip the choice. This makes the next review cheap.
Framework-specific instructions
Pick a framework and generate a framework-targeted rewrite of this methodology's steps.
Choose framework
AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.
Principles
- Seven factors, not vibes. Every factor has a verdict and a weight written down.
- Hybrid is a first-class option. Use a vendor for some calls and self-host for others.
- Team capability is a factor. The model you cannot operate at 3am is the wrong model.
- The choice can be checked later. Name the factors the loser failed on, and the next review is cheap.
Known failure modes (3)
- ✕Vendor Lock-In
Picking a proprietary API without scoring the control axis — switching cost is invisible until the vendor changes terms.
- ✕Top-Tier Model For Everything (Cost)
Defaulting to the flagship API on every call because no axis was given a weight — cost balloons without matching quality gain.
- ✕Shadow AI
Failing to score data-privacy properly so teams start using forbidden providers covertly — the methodology was bypassed because it never produced a usable decision.
Related patterns (4)
- ★★Multi-Model Routing
Send each request to the cheapest model that can handle it well.
- ★★Cost Observability
Surface per-request, per-user, and per-feature cost and token consumption to operators in near-real-time.
- ★★Cost Gating
Block actions whose expected cost exceeds a threshold without explicit user (or operator) acknowledgement.
- ★★Structured Output
Constrain the model's output to conform to a JSON Schema (or similar typed shape).
Related compositions (1)
Related methodologies (2)
- Model Selection Workflow★★
Turn model selection into a repeatable four-step routine. The output is a private leaderboard and a live monitor, not a one-time decision.
- Finetune-as-Last-Resort Escalation★★
Make teams use up prompt engineering, retrieval, and task splitting before they fine-tune, because fine-tuning is the most expensive and the hardest to undo.
Sources (2)
AI Engineering
Ch 4 'Evaluate AI Systems', §'Build vs. Buy' “seven axes for comparison: data privacy, data lineage, performance, functionality, control, and cost”
chiphuyen/aie-book — chapter-summaries.md (Ch 4)
Ch 4 'Evaluate AI Systems' “this chapter outlined the pros and cons of each approach along seven axes, including data privacy, data lineage, performance, functionality, control, and cost”
Provenance
- Added to catalog:
- Last updated:
- Verification status: verified