III · Tool Use & EnvironmentExperimental·

Large Action Models (LAMs)

also known as LAM, Action-Tuned Model

Use a model class specifically trained for action execution (tool calls, UI navigation, workflow steps) rather than text generation, when the workload is dominated by reliably completing actions in real systems.

Context

The standard LLM is text-tuned: optimized for generating fluent prose. Wrapping it in agent scaffolding to drive tools works but is brittle — the model wasn't trained on the action-completion objective. For workloads where the value is in 'did the action commit correctly' not 'is the output well-written', LLMs leave reliability on the table.

Problem

Text-tuned LLMs are suboptimal for action-completion workloads: they generate plausible-sounding tool calls with wrong arguments, hallucinate UI steps, fail on long action chains. The mismatch between training objective (next-token) and operational objective (action committed) shows up as unreliable execution that no amount of prompting fully fixes.

Forces

  • Training a model class for action completion requires action-completion training data, which is scarce.
  • LAMs may be weaker at generation than text-tuned LLMs of similar size.
  • Tooling ecosystem (Bedrock, OpenAI, Anthropic) primarily exposes text-tuned models.

Example

A booking automation tool tries text-LLM-driven scaffolding for hotel/flight UI navigation. Success rate plateaus at 62% — model hallucinates 'click' on UI elements that don't exist, generates wrong form field names. Team switches the UI-navigation step to a LAM trained on UI-action-completion. Success rate climbs to 91%. Generation steps (summarizing the booking) stay on the text-tuned LLM.

Diagram

Solution

Therefore:

Identify workloads where success is measured by action completion (UI automation, multi-step API orchestration, structured workflow). Route those workloads to a LAM (Microsoft's research, Apple's UI-Tars, etc.) rather than a general LLM. Keep text-tuned LLMs for generation workloads. Pair with multi-model-routing, complexity-based-routing, computer-use, agent-computer-interface.

What this pattern forbids. Workloads classified as action-completion route to LAM; mixed workloads must explicitly decide the routing per step.

And the patterns that stand alongside it, or against it —

  • complementsMulti-Model Routing★★Send each request to the cheapest model that can handle it well.
  • complementsComplexity-Based RoutingEstimate a request's difficulty up front and bind it to the cheapest model tier that can answer well, using an explicit complexity classifier as the routing key.
  • complementsComputer UseLet the model drive a desktop end-to-end via screenshots plus virtual mouse/keyboard tool calls instead of bespoke per-app APIs.
  • complementsAgent-Computer InterfaceDesign the tool surface for an LLM agent specifically, with affordances different from human-facing CLIs.
  • complementsTool Use★★Let the LLM produce typed calls against an external toolkit instead of producing free-form text the surrounding system has to parse.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance