Skip to Content
SdkPythonAuto Instrumentation

Auto Instrumentation

Marlo can automatically capture LLM calls, eliminating the need for manual tracking code. Just call the instrumentation function once after marlo.init(), and all subsequent API calls are tracked automatically.

Supported Providers

  • OpenAI - GPT-4, GPT-4o, GPT-5, o1, o3, and all chat completion models
  • Anthropic - Claude 3, Claude 3.5, Claude 4, including extended thinking
  • LiteLLM - Any model through LiteLLM’s unified interface

OpenAI

import os import marlo from openai import OpenAI marlo.init(api_key=os.getenv("MARLO_API_KEY")) marlo.instrument_openai() # Call once after init client = OpenAI() with marlo.task(thread_id="user-123", agent="my-agent") as task: task.input("What is the capital of France?") # This call is automatically tracked response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "What is the capital of France?"}] ) task.output(response.choices[0].message.content)

What Gets Captured

  • Model name
  • Messages sent
  • Response content
  • Token usage (prompt, completion, reasoning)

Reasoning Models (o1, o3, GPT-5)

For models with reasoning capabilities, Marlo automatically captures reasoning tokens:

response = client.chat.completions.create( model="gpt-5", messages=[{"role": "user", "content": "Solve this step by step..."}], reasoning_effort="medium", max_completion_tokens=2000, )

The captured usage includes:

{ "prompt_tokens": 150, "completion_tokens": 200, "reasoning_tokens": 5000, # Automatically captured "total_tokens": 5350 }

Anthropic

import os import marlo from anthropic import Anthropic marlo.init(api_key=os.getenv("MARLO_API_KEY")) marlo.instrument_anthropic() client = Anthropic() with marlo.task(thread_id="user-123", agent="my-agent") as task: task.input("Explain quantum computing") response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": "Explain quantum computing"}] ) task.output(response.content[0].text)

Extended Thinking

When using Claude with extended thinking, reasoning tokens are captured automatically:

response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=16000, thinking={ "type": "enabled", "budget_tokens": 10000 }, messages=[{"role": "user", "content": "Solve this logic puzzle..."}] ) # Extract the text response output_text = "" for block in response.content: if block.type == "text": output_text = block.text task.output(output_text)

LiteLLM

LiteLLM  provides a unified interface for 100+ LLM providers. Marlo’s LiteLLM instrumentation captures calls to any model:

import os import marlo import litellm marlo.init(api_key=os.getenv("MARLO_API_KEY")) marlo.instrument_litellm() with marlo.task(thread_id="user-123", agent="my-agent") as task: task.input("What is 2+2?") # Works with any LiteLLM-supported model response = litellm.completion( model="openai/gpt-4o-mini", messages=[{"role": "user", "content": "What is 2+2?"}] ) task.output(response.choices[0].message.content)

Why LiteLLM?

  • Provider flexibility - Switch between OpenAI, Anthropic, Azure, Bedrock, Vertex AI, and more
  • Single instrumentation - One instrument_litellm() call captures all providers
  • Consistent tracking - Same event format regardless of underlying provider

Examples with Different Providers

# OpenAI via LiteLLM response = litellm.completion( model="openai/gpt-4", messages=[{"role": "user", "content": "Hello"}] ) # Anthropic via LiteLLM response = litellm.completion( model="anthropic/claude-3-haiku-20240307", messages=[{"role": "user", "content": "Hello"}] ) # Azure OpenAI via LiteLLM response = litellm.completion( model="azure/gpt-4-deployment", messages=[{"role": "user", "content": "Hello"}] )

Reasoning Tokens with LiteLLM

LiteLLM passes through reasoning parameters and token counts:

response = litellm.completion( model="openai/gpt-5", messages=[{"role": "user", "content": "Calculate 15% of 80. Think step by step."}], reasoning_effort="medium", max_completion_tokens=2000, )

Multiple Providers

You can instrument multiple providers in the same application:

marlo.init(api_key=os.getenv("MARLO_API_KEY")) marlo.instrument_openai() marlo.instrument_anthropic() marlo.instrument_litellm()

Each instrumentation is independent—use whichever providers your agent needs.

Token Usage Aggregation

Within a task, Marlo aggregates token usage across all LLM calls:

with marlo.task(thread_id="user-123", agent="my-agent") as task: task.input("Complex multi-step question") # First LLM call response1 = client.chat.completions.create(...) # Second LLM call response2 = client.chat.completions.create(...) task.output(final_answer) # Task context automatically tracks: # - Total prompt tokens across all calls # - Total completion tokens # - Total reasoning tokens (if applicable) # - Number of LLM calls

This aggregated usage appears in the dashboard for cost tracking and analysis.

Manual LLM Tracking

If you’re using an unsupported provider or need explicit control, you can track LLM calls manually:

with marlo.task(thread_id="user-123", agent="my-agent") as task: task.input("Hello") # Your custom LLM call response = my_custom_llm(prompt="Hello") # Manual tracking task.llm( model="custom-model", usage={"input_tokens": 50, "output_tokens": 25}, messages=[{"role": "user", "content": "Hello"}], response=response.text, ) task.output(response.text)

Both automatic and manual tracking can be used in the same application.

Last updated on