Together Mixture-of-Agents (MoA)
Type: full-code · Vendor: Together AI · Language: Python · License: Apache-2.0 · Status: active · Status in practice: emerging · First released: 2024-06-01
Together Mixture-of-Agents sends a prompt to several open-source LLMs acting as proposers and has a final aggregator LLM synthesize their responses into one answer.
Description. Mixture-of-Agents is a method and reference implementation from Together AI that runs several LLMs in parallel on the same prompt, then passes their outputs to a final aggregator LLM that synthesizes a single response. The proposer models run independently in one layer; the aggregator combines their results in a second layer. The architecture can be stacked across multiple layers, where each layer comprises several LLM agents. The reference code is published under Apache 2.0.
Agent loop shape. A prompt is sent in parallel to several proposer LLMs, each producing an independent response. The collected responses are passed to a final aggregator LLM whose instruction is to synthesize them into a single high-quality answer. The layer can be repeated, feeding aggregated output back into another round of proposers and aggregation for further refinement.
Primary use cases
- combining multiple open-source LLMs on one prompt
- synthesizing several model responses into one answer
- quality improvement through proposer-aggregator layering
Key concepts
- Proposer → parallel-fan-out-gather (docs) — An LLM in a layer that generates a candidate response to the prompt; several proposers run in parallel and the next layer treats their outputs as auxiliary context.
- Aggregator → heterogeneous-model-council-with-judge (docs) — The final LLM that synthesizes the proposers' responses into a single high-quality answer rather than picking one verbatim.
- Layered architecture → self-refine (docs) — The stacked structure in which each layer comprises several LLM agents and each agent consumes the previous layer's outputs, enabling additional refinement rounds.
Patterns this full-code implements —
- ★Heterogeneous-Model Council with Synthesis Judge
Sends a prompt to several different LLMs acting as proposers, then a final aggregator LLM synthesizes their outputs into one answer; realises the role-specialized-personas-on-different-architectures…
- ★Parallel Fan-Out / Gather
The proposer LLMs run in parallel on the same prompt and a dedicated aggregator LLM gathers and reconciles their independent results into a single output, with the layered architecture stackable for…
- ★★Self-Refine
Stacking layers makes MoA iterative: each additional layer re-processes the prompt together with the previous layer's aggregated outputs as auxiliary information, refining the answer across rounds (r…
Neighbourhood
Click any neighbour to follow the lineage. Scroll to zoom, drag to pan.