Skip to Content
Introduction to Marlo

Introduction to Marlo

Marlo is the agent learning platform by Marshmallo that enables AI agents to learn and improve autonomously in production. It captures agent behavior, evaluates outcomes, and turns failures into actionable learnings, so your agents get smarter over time without manual intervention.

The Problem: Silent Agent Failures in Production

In production, agents fail in ways that are hard to see and harder to fix, because the evidence is scattered across logs, prompts, tool calls, and model outputs, with no single place to understand what actually happened.

Marlo solves this by creating a single, consistent record of what the agent did, grading that record, and turning the result into guidance you can reuse.

The most common failure modes include:

  • The agent produces confident-looking output that is incorrect, making failures appear successful.
  • The same mistakes repeat because failures are not captured in a reusable or actionable form.
  • When a task fails, it is difficult or impossible to determine where the failure occurred or why it happened.

The Solution: A Learning Loop for Production Agents

Marlo solves silent agent failures by adding a persistent learning loop to existing agent systems. Instead of relying on scattered logs and manual debugging, Marlo continuously captures evidence, evaluates outcomes, and feeds lessons back into the agent at runtime.

The learning loop consists of four stages:

  1. Capture: Marlo records the complete execution timeline for each task, including LLM calls, tool calls, and logs, creating a single source of truth for agent behavior.
  2. Evaluate: A judge scores the task outcome and explains the reasoning behind the score, turning raw execution data into a structured evaluation signal.
  3. Learn: Evaluation results are converted into learning objects that describe how the agent should behave in similar situations in the future.
  4. Apply: Active learnings are injected into the agent’s context at runtime, allowing behavior to improve automatically over time.
Last updated on