Observability

@agentium/observability is a separate, opt-in package that adds tracing, metrics, and structured logging to any Agentium agent. It listens to the agent’s EventBus from the outside — zero changes to core, zero overhead when not installed.

npm install @agentium/observability

Quick Start

import { Agent, openai } from "@agentium/core";
import { instrument } from "@agentium/observability";

const agent = new Agent({
  name: "assistant",
  model: openai("gpt-4o"),
});

// One-liner — just pass exporter names as strings
const obs = instrument(agent, {
  exporters: ["console"],
});

await agent.run("Hello!");

// Access metrics
const m = obs.metrics.getMetrics();
console.log(`Runs: ${m.counters.runs_total}, Tokens: ${m.gauges.total_tokens}`);

// Clean up when done
await obs.tracer.flush();
obs.detach();

Exporter Shorthands

Pass exporter names as strings — credentials are read from env vars automatically:

// Langfuse — reads LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_BASE_URL
instrument(agent, { exporters: ["langfuse"] });

// Multiple exporters
instrument(agent, { exporters: ["langfuse", "console"] });

// OpenTelemetry — reads OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS
instrument(agent, { exporters: ["otel"] });

// JSON file — writes to traces-<timestamp>.json
instrument(agent, { exporters: ["json-file"] });

You can also mix shorthands with custom instances when you need to override defaults:

import { instrument, LangfuseExporter } from "@agentium/observability";

instrument(agent, {
  exporters: [
    new LangfuseExporter({ baseUrl: "https://self-hosted.example.com" }),
    "console",
  ],
});

How It Works

The instrument() function attaches three listeners to the agent’s EventBus:

Tracer — builds a span tree from events (run.start → tool.call → tool.result → run.complete)
MetricsCollector — counts runs, tool calls, errors, cache hits, and tracks latency histograms
StructuredLogger — emits JSON log entries correlated with trace IDs

Since core already emits rich events for every operation, observability works automatically with all features: handoffs, teams, cost tracking, caching, tools, etc.

Provider Metrics in Traces & Logs

When a run completes, the run.complete event includes providerMetrics — the raw usage object from the underlying model API. This is automatically captured by:

Tracer — stored as a span attribute (providerMetrics) on the root run span
StructuredLogger — included in the JSON log payload for run.complete events
MetricsExporter — stored in the RunRecord for export and dashboard consumption
LangfuseExporter — forwarded as generation metadata in Langfuse

This means you get full provider-level transparency (e.g., thoughtsTokenCount, prompt_tokens_details, cache_read_input_tokens) in your observability pipeline without any extra configuration.

Trace Tree

Every agent.run() produces a trace like:

──────────────────────────────────────────
  Trace abc123  duration=1240ms
  agent=assistant
──────────────────────────────────────────
  ✓ agent.run        [0ms → +1240ms]  578 tok
  ├─ ✓ tool.get_weather  [450ms → +35ms]
  ├─ ✓ tool.search       [500ms → +120ms]
──────────────────────────────────────────

Exporters

Shorthand	Env Vars	Description
`"console"`	—	Pretty-print trace tree to terminal
`"langfuse"`	`LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`	Native Langfuse format with generations and spans
`"otel"`	`OTEL_EXPORTER_OTLP_ENDPOINT`	OTLP/HTTP JSON to any OpenTelemetry collector
`"json-file"`	—	Append traces to a JSON file

Plus CallbackExporter for custom integrations:

import { instrument, CallbackExporter } from "@agentium/observability";

instrument(agent, {
  exporters: [new CallbackExporter((trace) => myCustomSink(trace))],
});

Metrics

const snap = obs.metrics.getMetrics();

snap.counters.runs_total;          // Total runs
snap.counters.runs_success;        // Successful runs
snap.counters.runs_error;          // Failed runs
snap.counters.tool_calls_total;    // Total tool invocations
snap.counters.handoffs_total;      // Agent handoffs
snap.counters.cache_hits;          // Semantic cache hits
snap.counters.cache_misses;        // Semantic cache misses
snap.histograms.run_duration_ms;   // Array of run durations
snap.histograms.tool_latency_ms;   // Array of tool latencies
snap.gauges.total_tokens;          // Total tokens consumed
snap.gauges.total_cost_usd;       // Total cost from CostTracker events
snap.rates.cache_hit_ratio;        // Hits / (hits + misses)
snap.rates.error_rate;             // Errors / total

Structured Logging

Three drain modes:

// JSON to stdout (for log aggregators like Datadog, ELK)
instrument(agent, { exporters: ["console"], structuredLogs: "json" });

// Plain text to stdout
instrument(agent, { exporters: ["console"], structuredLogs: "console" });

// Custom function
instrument(agent, {
  exporters: ["console"],
  structuredLogs: (entry) => myLogger.log(entry),
});

Each entry includes traceId for correlation with traces.

Works With Teams & Workflows

Use instrumentBus() to attach to any EventBus:

import { instrumentBus } from "@agentium/observability";

const team = new Team({ ... });
const obs = instrumentBus(team.eventBus, { exporters: ["langfuse", "console"] });

Langfuse Integration Example

Langfuse provides an open-source LLM observability dashboard. Set up in 3 steps:

# 1. Set environment variables
export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."
export LANGFUSE_BASE_URL="https://cloud.langfuse.com" # or self-hosted URL

// 2. Instrument your agent
import { Agent, openai } from "@agentium/core";
import { instrument } from "@agentium/observability";

const agent = new Agent({
  name: "assistant",
  model: openai("gpt-4o"),
  instructions: "You are a helpful assistant.",
});

const obs = instrument(agent, {
  exporters: ["langfuse"],
});

// 3. Every run is now traced in Langfuse
await agent.run("What is quantum computing?", {
  sessionId: "session-abc",
  userId: "user-42",
});

// Flush traces before process exit
await obs.tracer.flush();

In the Langfuse dashboard, you’ll see:

Traces for each agent.run() with duration, token usage, and cost
Generations for each LLM call within a run
Spans for tool calls, handoffs, and other operations
Sessions grouping traces by sessionId

OpenTelemetry Export

Send traces to any OTLP-compatible backend (Jaeger, Grafana Tempo, Honeycomb, etc.):

export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer my-token"

import { instrument } from "@agentium/observability";

const obs = instrument(agent, {
  exporters: ["otel"],
});

Traces follow the OpenTelemetry semantic conventions for GenAI, making them compatible with standard OTLP tooling.

Building a Custom Dashboard

Combine metrics and events to build a real-time dashboard:

import { Agent, openai, CostTracker } from "@agentium/core";
import { instrument } from "@agentium/observability";

const tracker = new CostTracker();
const agent = new Agent({
  name: "assistant",
  model: openai("gpt-4o"),
  costTracker: tracker,
});

const obs = instrument(agent, { exporters: ["console"] });

// Periodic metrics snapshot
setInterval(() => {
  const m = obs.metrics.getMetrics();
  const cost = tracker.getSummary();

  console.log({
    totalRuns: m.counters.runs_total,
    successRate: (1 - m.rates.error_rate) * 100,
    cacheHitRate: m.rates.cache_hit_ratio * 100,
    avgLatency: average(m.histograms.run_duration_ms),
    totalCost: cost.totalCost,
    totalTokens: m.gauges.total_tokens,
  });
}, 30_000);

function average(arr: number[]): number {
  return arr.length ? arr.reduce((a, b) => a + b, 0) / arr.length : 0;
}

Capacity Metrics

When the Session Profiler is attached to the same EventBus, MetricsExporter automatically captures capacity-related metrics:

AgentMetrics fields

Field	Type	Description
`estimatedKvCacheGb`	`number?`	Estimated total KV cache memory across all sessions
`avgContextLength`	`number?`	Average prompt tokens per run
`sessionCategories`	`Record<string, number>?`	Session counts by category (light/medium/heavy/extreme)

Prometheus output

The toPrometheus() method includes three new capacity counters:

# HELP agentium_kv_cache_estimated_gb Estimated KV cache size in GB
# TYPE agentium_kv_cache_estimated_gb gauge
agentium_kv_cache_estimated_gb 12.5

# HELP agentium_session_category_total Sessions by category
# TYPE agentium_session_category_total counter
agentium_session_category_total{category="light"} 5
agentium_session_category_total{category="medium"} 2
agentium_session_category_total{category="heavy"} 1

# HELP agentium_capacity_sessions_total Total tracked sessions
# TYPE agentium_capacity_sessions_total counter
agentium_capacity_sessions_total 8

These appear automatically when capacity.session.classified and capacity.warning events are emitted on the EventBus — no additional configuration needed.

Accuracy Metrics (v2.4+)

MetricsExporter tracks how often agent output needs human correction, and the agent’s self-critique quality when Reflection is enabled.

AgentMetrics fields

Field	Type	Description
`correctionsTotal`	`number`	Total human corrections recorded against this agent’s output
`correctionRate`	`number`	Corrections per run — the inverse of first-pass accuracy
`avgCritiqueScore`	`number?`	Average reflection self-critique score (0–1)

How they’re captured

correctionsTotal / correctionRate increment on every memory.correction.recorded event — emitted whenever a correction is recorded via the corrections endpoint, agent.memory.recordCorrection(), or the record_correction tool. See Correction Capture.
avgCritiqueScore aggregates reflection.critique events emitted during runs.

Prometheus output

# HELP agentium_agent_corrections_total Total human corrections recorded
# TYPE agentium_agent_corrections_total counter
agentium_agent_corrections_total{agent="ap-reconciler"} 17

# HELP agentium_agent_correction_rate Corrections per run (inverse of first-pass accuracy)
# TYPE agentium_agent_correction_rate gauge
agentium_agent_correction_rate{agent="ap-reconciler"} 0.12

# HELP agentium_agent_critique_score_avg Average reflection self-critique score (0-1)
# TYPE agentium_agent_critique_score_avg gauge
agentium_agent_critique_score_avg{agent="ap-reconciler"} 0.84

A falling correction_rate over time is the clearest signal that the correction-capture learning loop is working — each recorded correction is retrieved on future relevant runs, so the same mistake stops recurring.

​Observability

​Quick Start

​Exporter Shorthands

​How It Works

​Provider Metrics in Traces & Logs

​Trace Tree

​Exporters

​Metrics

​Structured Logging

​Works With Teams & Workflows

​Langfuse Integration Example

​OpenTelemetry Export

​Building a Custom Dashboard

​Capacity Metrics

​AgentMetrics fields

​Prometheus output

​Accuracy Metrics (v2.4+)

​AgentMetrics fields

​How they’re captured

​Prometheus output

Observability

Quick Start

Exporter Shorthands

How It Works

Provider Metrics in Traces & Logs

Trace Tree

Exporters

Metrics

Structured Logging

Works With Teams & Workflows

Langfuse Integration Example

OpenTelemetry Export

Building a Custom Dashboard

Capacity Metrics

AgentMetrics fields

Prometheus output

Accuracy Metrics (v2.4+)

AgentMetrics fields

How they’re captured

Prometheus output