Framework · Agent SDKs

OpenClaw-RL

Train personalised LLM agents by turning live multi-turn conversations into fully-asynchronous RL training signals across terminal, GUI, software-engineering, and tool-call settings.

Description

OpenClaw-RL is an Apache-2.0 reinforcement-learning framework from Gen-Verse that wraps a self-hosted model as an OpenAI-compatible API, intercepts live conversations through the OpenClaw plugin, and runs four async loops (agent serving, rollout collection, PRM/judge evaluation, policy training) that continuously optimise the policy without interrupting usage. Two paradigms are supported: Binary RL (GRPO with a Process Reward Model) and On-Policy Distillation (OPD via a judge model that emits textual hints), plus a Hybrid combination. Technical report on arXiv (2603.10165, 2026-03-10) reached #1 on HuggingFace Daily Papers.

Solution

Four independent asynchronous loops (serving, rollout, judge, training) instead of a single synchronous agent loop. Conversation traffic flowing through an OpenAI-compatible wrapper feeds the rollout collector; the trainer updates the policy in the background while serving and judging continue concurrently.

Primary use cases

  • personalising a self-hosted agent from a single user's conversational feedback
  • scaling RL training across terminal, GUI, SWE, and tool-call agent environments
  • continuously updating a deployed policy without taking the inference endpoint offline

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.