Rewards
Rewards evaluate how well your agent completed a task. They turn a raw trace into a score and explanation, telling you what worked, what failed, and why.
Why Rewards Matter
Agents can fail in subtle ways. A response might look correct but miss key details. A tool might return data that the agent misinterprets. Without structured evaluation, these failures go unnoticed.
Rewards solve this by grading every task against consistent criteria. You get a numeric score that tracks quality over time, plus a written rationale that explains the reasoning. This makes failures visible and actionable.
Rewards also feed into Marlo’s learning system. When a task receives a low score, Marlo uses the rationale to generate guidance that prevents similar failures in the future.
How Rewards Work
Rewards are computed automatically after a task ends:
-
Trace Collection: Marlo gathers the complete trace for the task, including the agent definition, user input, LLM calls, tool calls, and final output.
-
Evaluation: A judge model reviews the trace and evaluates the outcome based on task completion, response quality, and tool usage.
-
Scoring: The judge assigns a score between 0.0 and 1.0, where 0.0 means complete failure and 1.0 means perfect execution.
-
Rationale: The judge writes a short explanation describing what the agent did well and what it did poorly, with references to specific steps in the trace.
What Rewards Evaluate
The judge considers several factors when scoring a task:
- Task Completion: Did the agent accomplish what the user asked for?
- Response Quality: Is the final output accurate, relevant, and well-formed?
- Tool Usage: Did the agent use tools correctly and interpret their outputs properly?
- Efficiency: Did the agent complete the task without unnecessary steps or repeated calls?
- Error Handling: If something went wrong, did the agent recover gracefully or fail silently?
What You Get
Each reward includes:
- Score: A number between 0.0 and 1.0 representing overall task quality.
- Rationale: A written explanation of the score, highlighting strengths and weaknesses.
- Evidence: Links to specific events in the trace that support the evaluation.
Viewing Rewards in the Dashboard
You can view rewards for any task in the Marlo dashboard:
- Navigate to your project and select a thread.
- Click on a task to open the trace view.
- The reward score and rationale appear alongside the trace events.
- Click on evidence links to jump directly to the relevant steps.
Over time, you can track reward trends to see whether your agent is improving or degrading.
Managing Rewards in the Dashboard
The dashboard provides tools to review and adjust rewards when the automated evaluation needs correction.
Viewing Reward Details
- Open a task in the dashboard.
- Click on the reward score to expand the full evaluation.
- Review the rationale and linked evidence.
Adjusting a Reward
If the automated evaluation is incorrect, you can provide feedback:
- Click the Adjust button next to the reward score.
- Enter a new score between 0.0 and 1.0.
- Provide a brief explanation for the adjustment.
- Click Save.
Adjusted rewards are marked in the dashboard and used to improve future evaluations.
Bulk Review
For reviewing multiple tasks at once:
- Navigate to the Tasks view in your project.
- Filter by date range, agent, or score threshold.
- Select multiple tasks to review their rewards in sequence.
- Use keyboard shortcuts to quickly approve or flag evaluations.
Reward Trends
The dashboard displays reward trends over time:
- Navigate to the Analytics section.
- View average scores by day, week, or month.
- Break down scores by agent to identify which agents need improvement.
- Identify patterns in low-scoring tasks to guide agent development.
Connection to Learnings
Rewards are the input to Marlo’s learning system. When a task receives a reward, Marlo analyzes the rationale and generates learning objects that capture what the agent should do differently. These learnings are then surfaced in future tasks, allowing the agent to avoid repeating the same mistakes.
Low scores with clear rationales produce the most useful learnings. When you adjust a reward, include a specific rationale to help the learning system generate better guidance.