Performance
RadarOS is built for minimal overhead — fewer tokens, faster responses, lower cost. This page covers the key optimizations and benchmark results.Benchmark Results
All benchmarks usegpt-4o-mini, identical prompts, and 5 runs per scenario. RadarOS and LangChain run on Node.js; Agno runs on Python.
Simple Completion
| Metric | RadarOS | LangChain | Agno |
|---|---|---|---|
| Startup (ms) | 171 | 301 | 2730 |
| Avg Response (ms) | 769 | 737 | 2077 |
| Avg Prompt Tokens | 28 | 28 | 28 |
| Avg Total Tokens | 35 | 35 | 35 |
| Avg Cost / Run | $0.000008 | $0.000008 | $0.000008 |
Tool Calling
| Metric | RadarOS | LangChain | Agno |
|---|---|---|---|
| Avg Response (ms) | 1617 | 1678 | 3064 |
| Avg Prompt Tokens | 167 | 167 | 173 |
| Avg Total Tokens | 196 | 196 | 202 |
| Avg Cost / Run | $0.000042 | $0.000042 | $0.000043 |
$schema, additionalProperties) to keep schemas compact.
Multi-turn Memory
| Metric | RadarOS | LangChain | Agno |
|---|---|---|---|
| Avg Response (ms) | 2408 | 2324 | 5892 |
| Avg Prompt Tokens | 189 | 309 | 94 |
| Avg Completion Tokens | 30 | 57 | 66 |
| Avg Total Tokens | 219 | 366 | 160 |
| Avg Cost / Run | $0.000046 | $0.000081 | $0.000054 |
Summary
| Scenario | Fastest | Fewest Tokens | Cheapest |
|---|---|---|---|
| Simple Completion | LangChain (737ms) | Tied (35) | Tied |
| Tool Calling | RadarOS (1617ms) | RadarOS (196) | Tied |
| Multi-turn Memory | LangChain (2324ms) | Agno (160) | RadarOS ($0.000046) |
Optimizations
Tool Schema Caching & Optimization
Tool definitions (Zod-to-JSON Schema conversion) are computed once at agent construction and cached. Verbose JSON Schema metadata ($schema, additionalProperties, description on the root object) is stripped automatically — reducing token overhead without losing semantic information.
Automatic Retry
Transient LLM API failures are automatically retried with exponential backoff + jitter. Retryable errors include HTTP 429, 5xx, and network errors.Token-Based History Trimming
SetmaxContextTokens to automatically trim conversation history (oldest messages first) to fit within a token budget:
Non-Blocking User Memory
WhenuserMemory is configured, fact extraction runs asynchronously in the background after the response is returned. This eliminates 500-1000ms+ of latency per request.
Smart Context Deduplication
WhenuserMemory.asTool() is registered in the agent’s tools, user facts are not also injected into the system prompt. The agent retrieves facts on demand via the tool, saving tokens.
Streaming Usage Tracking
Token usage (promptTokens, completionTokens, totalTokens, reasoningTokens) is accurately tracked in both run() and stream() modes. Stream usage is accumulated from provider finish chunks.
Methodology
- All benchmarks use
gpt-4o-miniwith identical prompts - Each scenario runs 5 times; results are averaged
- Startup time measures framework import + agent initialization
- Cost uses gpt-4o-mini pricing: 0.6/1M output
- Network latency to OpenAI is shared across all frameworks
- Full benchmark scripts are in
benchmarks/in the repository