Observability
@agentium/observability is a separate, opt-in package that adds tracing, metrics, and structured logging to any Agentium agent. It listens to the agent’s EventBus from the outside — zero changes to core, zero overhead when not installed.
Quick Start
Exporter Shorthands
Pass exporter names as strings — credentials are read from env vars automatically:How It Works
Theinstrument() function attaches three listeners to the agent’s EventBus:
- Tracer — builds a span tree from events (
run.start→tool.call→tool.result→run.complete) - MetricsCollector — counts runs, tool calls, errors, cache hits, and tracks latency histograms
- StructuredLogger — emits JSON log entries correlated with trace IDs
Provider Metrics in Traces & Logs
When a run completes, therun.complete event includes providerMetrics — the raw usage object from the underlying model API. This is automatically captured by:
- Tracer — stored as a span attribute (
providerMetrics) on the root run span - StructuredLogger — included in the JSON log payload for
run.completeevents - MetricsExporter — stored in the
RunRecordfor export and dashboard consumption - LangfuseExporter — forwarded as generation metadata in Langfuse
thoughtsTokenCount, prompt_tokens_details, cache_read_input_tokens) in your observability pipeline without any extra configuration.
Trace Tree
Everyagent.run() produces a trace like:
Exporters
| Shorthand | Env Vars | Description |
|---|---|---|
"console" | — | Pretty-print trace tree to terminal |
"langfuse" | LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY | Native Langfuse format with generations and spans |
"otel" | OTEL_EXPORTER_OTLP_ENDPOINT | OTLP/HTTP JSON to any OpenTelemetry collector |
"json-file" | — | Append traces to a JSON file |
CallbackExporter for custom integrations:
Metrics
Structured Logging
Three drain modes:traceId for correlation with traces.
Works With Teams & Workflows
UseinstrumentBus() to attach to any EventBus:
Langfuse Integration Example
Langfuse provides an open-source LLM observability dashboard. Set up in 3 steps:- Traces for each
agent.run()with duration, token usage, and cost - Generations for each LLM call within a run
- Spans for tool calls, handoffs, and other operations
- Sessions grouping traces by
sessionId
OpenTelemetry Export
Send traces to any OTLP-compatible backend (Jaeger, Grafana Tempo, Honeycomb, etc.):Building a Custom Dashboard
Combine metrics and events to build a real-time dashboard:Capacity Metrics
When the Session Profiler is attached to the sameEventBus, MetricsExporter automatically captures capacity-related metrics:
AgentMetrics fields
| Field | Type | Description |
|---|---|---|
estimatedKvCacheGb | number? | Estimated total KV cache memory across all sessions |
avgContextLength | number? | Average prompt tokens per run |
sessionCategories | Record<string, number>? | Session counts by category (light/medium/heavy/extreme) |
Prometheus output
ThetoPrometheus() method includes three new capacity counters:
capacity.session.classified and capacity.warning events are emitted on the EventBus — no additional configuration needed.
Accuracy Metrics (v2.4+)
MetricsExporter tracks how often agent output needs human correction, and the agent’s self-critique quality when Reflection is enabled.
AgentMetrics fields
| Field | Type | Description |
|---|---|---|
correctionsTotal | number | Total human corrections recorded against this agent’s output |
correctionRate | number | Corrections per run — the inverse of first-pass accuracy |
avgCritiqueScore | number? | Average reflection self-critique score (0–1) |
How they’re captured
correctionsTotal/correctionRateincrement on everymemory.correction.recordedevent — emitted whenever a correction is recorded via the corrections endpoint,agent.memory.recordCorrection(), or therecord_correctiontool. See Correction Capture.avgCritiqueScoreaggregatesreflection.critiqueevents emitted during runs.
Prometheus output
correction_rate over time is the clearest signal that the correction-capture learning loop is working — each recorded correction is retrieved on future relevant runs, so the same mistake stops recurring.