Full-Code · Orchestration Frameworksactive

Together Mixture-of-Agents (MoA)

Type: full-code · Vendor: Together AI · Language: Python · License: Apache-2.0 · Status: active · Status in practice: emerging · First released: 2024-06-01

Links: homepage docs repo

Together Mixture-of-Agents sends a prompt to several open-source LLMs acting as proposers and has a final aggregator LLM synthesize their responses into one answer.

Description. Mixture-of-Agents is a method and reference implementation from Together AI that runs several LLMs in parallel on the same prompt, then passes their outputs to a final aggregator LLM that synthesizes a single response. The proposer models run independently in one layer; the aggregator combines their results in a second layer. The architecture can be stacked across multiple layers, where each layer comprises several LLM agents. The reference code is published under Apache 2.0.

Agent loop shape. A prompt is sent in parallel to several proposer LLMs, each producing an independent response. The collected responses are passed to a final aggregator LLM whose instruction is to synthesize them into a single high-quality answer. The layer can be repeated, feeding aggregated output back into another round of proposers and aggregation for further refinement.

Primary use cases

combining multiple open-source LLMs on one prompt
synthesizing several model responses into one answer
quality improvement through proposer-aggregator layering

flowchart TD fw["Together Mixture-of-Agents (MoA)"] fw --> p1["Heterogeneous-Model Council with Synthesis Judge<br/>(core)"] fw --> p2["Parallel Fan-Out / Gather<br/>(core)"] fw --> p3["Self-Refine<br/>(supported)"]

Key concepts

Proposer → parallel-fan-out-gather (docs) — An LLM in a layer that generates a candidate response to the prompt; several proposers run in parallel and the next layer treats their outputs as auxiliary context.
Aggregator → heterogeneous-model-council-with-judge (docs) — The final LLM that synthesizes the proposers' responses into a single high-quality answer rather than picking one verbatim.
Layered architecture → self-refine (docs) — The stacked structure in which each layer comprises several LLM agents and each agent consumes the previous layer's outputs, enabling additional refinement rounds.

Patterns this full-code implements —

Neighbourhood

Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.

Alternatives & relatives

References

Provenance

Last analyzed: 2026-06-17
Last updated: 2026-06-17
Verification status: partial