Performance

Agentium is built for minimal overhead — fewer tokens, faster responses, lower cost. This page covers the key optimizations and benchmark results.

Benchmark Results

All benchmarks use gpt-4o-mini, identical prompts, and 5 runs per scenario. Agentium and LangChain run on Node.js; Agno runs on Python.

Simple Completion

Metric	Agentium	LangChain	Agno
Startup (ms)	171	301	2730
Avg Response (ms)	769	737	2077
Avg Prompt Tokens	28	28	28
Avg Total Tokens	35	35	35
Avg Cost / Run	$0.000008	$0.000008	$0.000008

Tool Calling

Metric	Agentium	LangChain	Agno
Avg Response (ms)	1617	1678	3064
Avg Prompt Tokens	167	167	173
Avg Total Tokens	196	196	202
Avg Cost / Run	$0.000042	$0.000042	$0.000043

Agentium and LangChain produce identical tool schemas (167 prompt tokens). Agentium strips verbose JSON Schema metadata ($schema, additionalProperties) to keep schemas compact.

Multi-turn Memory

Metric	Agentium	LangChain	Agno
Avg Response (ms)	2408	2324	5892
Avg Prompt Tokens	189	309	94
Avg Completion Tokens	30	57	66
Avg Total Tokens	219	366	160
Avg Cost / Run	$0.000046	$0.000081	$0.000054

Agentium uses 39% fewer prompt tokens and 43% less cost than LangChain for multi-turn conversations. LangChain injects heavier system prompts and history formatting overhead.

Summary

Scenario	Fastest	Fewest Tokens	Cheapest
Simple Completion	LangChain (737ms)	Tied (35)	Tied
Tool Calling	Agentium (1617ms)	Agentium (196)	Tied
Multi-turn Memory	LangChain (2324ms)	Agno (160)	Agentium ($0.000046)

Agentium is the fastest for tool calling, the cheapest for multi-turn conversations, and matches LangChain on tool schema efficiency. Response latency is within noise across simple completions.

Optimizations

Tool Schema Caching & Optimization

Tool definitions (Zod-to-JSON Schema conversion) are computed once at agent construction and cached. Verbose JSON Schema metadata ($schema, additionalProperties, description on the root object) is stripped automatically — reducing token overhead without losing semantic information.

const agent = new Agent({
  name: "bot",
  model: openai("gpt-4o"),
  tools: [weatherTool, searchTool],
});

For OpenAI models, tools can opt into strict mode for guaranteed valid JSON output:

const weatherTool = defineTool({
  name: "getWeather",
  description: "Get weather for a city",
  parameters: z.object({ city: z.string() }),
  execute: async ({ city }) => `Sunny in ${city}`,
  strict: true, // enables OpenAI Structured Outputs on this tool
});

Automatic Retry

Transient LLM API failures are automatically retried with exponential backoff + jitter. Retryable errors include HTTP 429, 5xx, and network errors.

const agent = new Agent({
  name: "reliable-bot",
  model: openai("gpt-4o"),
  retry: {
    maxRetries: 5,
    initialDelayMs: 1000,
    maxDelayMs: 30000,
  },
});

Default: 3 retries, 500ms initial delay, 10s max delay.

Token-Based History Trimming

Set maxContextTokens to automatically trim conversation history (oldest messages first) to fit within a token budget:

const agent = new Agent({
  name: "bot",
  model: openai("gpt-4o"),
  maxContextTokens: 8000,
});

Non-Blocking User Memory

When userMemory is configured, fact extraction runs asynchronously in the background after the response is returned. This eliminates 500-1000ms+ of latency per request.

Smart Context Deduplication

When userMemory.asTool() is registered in the agent’s tools, user facts are not also injected into the system prompt. The agent retrieves facts on demand via the tool, saving tokens.

Streaming Usage Tracking

Token usage (promptTokens, completionTokens, totalTokens, reasoningTokens) is accurately tracked in both run() and stream() modes. Stream usage is accumulated from provider finish chunks.

Methodology

All benchmarks use gpt-4o-mini with identical prompts
Each scenario runs 5 times; results are averaged
Startup time measures framework import + agent initialization
Cost uses gpt-4o-mini pricing: $0.15/1M input,$ 0.6/1M output
Network latency to OpenAI is shared across all frameworks
Full benchmark scripts are in benchmarks/ in the repository

System Architecture Security

​Performance

​Benchmark Results

​Simple Completion

​Tool Calling

​Multi-turn Memory

​Summary

​Optimizations

​Tool Schema Caching & Optimization

​Automatic Retry

​Token-Based History Trimming

​Non-Blocking User Memory

​Smart Context Deduplication

​Streaming Usage Tracking

​Methodology