Skip to main content

Performance

RadarOS is built for minimal overhead — fewer tokens, faster responses, lower cost. This page covers the key optimizations and benchmark results.

Benchmark Results

All benchmarks use gpt-4o-mini, identical prompts, and 5 runs per scenario. RadarOS and LangChain run on Node.js; Agno runs on Python.

Simple Completion

MetricRadarOSLangChainAgno
Startup (ms)1713012730
Avg Response (ms)7697372077
Avg Prompt Tokens282828
Avg Total Tokens353535
Avg Cost / Run$0.000008$0.000008$0.000008

Tool Calling

MetricRadarOSLangChainAgno
Avg Response (ms)161716783064
Avg Prompt Tokens167167173
Avg Total Tokens196196202
Avg Cost / Run$0.000042$0.000042$0.000043
RadarOS and LangChain produce identical tool schemas (167 prompt tokens). RadarOS strips verbose JSON Schema metadata ($schema, additionalProperties) to keep schemas compact.

Multi-turn Memory

MetricRadarOSLangChainAgno
Avg Response (ms)240823245892
Avg Prompt Tokens18930994
Avg Completion Tokens305766
Avg Total Tokens219366160
Avg Cost / Run$0.000046$0.000081$0.000054
RadarOS uses 39% fewer prompt tokens and 43% less cost than LangChain for multi-turn conversations. LangChain injects heavier system prompts and history formatting overhead.

Summary

ScenarioFastestFewest TokensCheapest
Simple CompletionLangChain (737ms)Tied (35)Tied
Tool CallingRadarOS (1617ms)RadarOS (196)Tied
Multi-turn MemoryLangChain (2324ms)Agno (160)RadarOS ($0.000046)
RadarOS is the fastest for tool calling, the cheapest for multi-turn conversations, and matches LangChain on tool schema efficiency. Response latency is within noise across simple completions.

Optimizations

Tool Schema Caching & Optimization

Tool definitions (Zod-to-JSON Schema conversion) are computed once at agent construction and cached. Verbose JSON Schema metadata ($schema, additionalProperties, description on the root object) is stripped automatically — reducing token overhead without losing semantic information.
const agent = new Agent({
  name: "bot",
  model: openai("gpt-4o"),
  tools: [weatherTool, searchTool],
});
For OpenAI models, tools can opt into strict mode for guaranteed valid JSON output:
const weatherTool = defineTool({
  name: "getWeather",
  description: "Get weather for a city",
  parameters: z.object({ city: z.string() }),
  execute: async ({ city }) => `Sunny in ${city}`,
  strict: true, // enables OpenAI Structured Outputs on this tool
});

Automatic Retry

Transient LLM API failures are automatically retried with exponential backoff + jitter. Retryable errors include HTTP 429, 5xx, and network errors.
const agent = new Agent({
  name: "reliable-bot",
  model: openai("gpt-4o"),
  retry: {
    maxRetries: 5,
    initialDelayMs: 1000,
    maxDelayMs: 30000,
  },
});
Default: 3 retries, 500ms initial delay, 10s max delay.

Token-Based History Trimming

Set maxContextTokens to automatically trim conversation history (oldest messages first) to fit within a token budget:
const agent = new Agent({
  name: "bot",
  model: openai("gpt-4o"),
  maxContextTokens: 8000,
});

Non-Blocking User Memory

When userMemory is configured, fact extraction runs asynchronously in the background after the response is returned. This eliminates 500-1000ms+ of latency per request.

Smart Context Deduplication

When userMemory.asTool() is registered in the agent’s tools, user facts are not also injected into the system prompt. The agent retrieves facts on demand via the tool, saving tokens.

Streaming Usage Tracking

Token usage (promptTokens, completionTokens, totalTokens, reasoningTokens) is accurately tracked in both run() and stream() modes. Stream usage is accumulated from provider finish chunks.

Methodology

  • All benchmarks use gpt-4o-mini with identical prompts
  • Each scenario runs 5 times; results are averaged
  • Startup time measures framework import + agent initialization
  • Cost uses gpt-4o-mini pricing: 0.15/1Minput,0.15/1M input, 0.6/1M output
  • Network latency to OpenAI is shared across all frameworks
  • Full benchmark scripts are in benchmarks/ in the repository