Headroom
Context compression system that reduces AI agent token usage by 60-95% while maintaining accuracy
Overview
Headroom is a context compression system that reduces token usage for AI agents by 60-95% while maintaining accuracy. It compresses everything agents read—tool outputs, logs, RAG chunks, files, and conversation history—before sending to LLMs. Supports library, proxy, MCP server, and agent wrapper deployment modes with reversible compression (CCR) that preserves originals for LLM retrieval.
The Verdict
Who Should Use Headroom?
Best For
- Teams running daily AI coding agents
- High-volume agentic workflows
- Multi-agent systems needing shared memory
- SRE/DevOps with large log analysis
- RAG pipelines with token constraints
Not Ideal For
- Simple single-query use cases
- Projects with minimal context needs
- Teams not tracking token costs
- Workflows requiring exact original text
What's Great
- 92% token reduction on code search (100 results)
- 92% savings on SRE incident debugging
- Multiple deployment modes (library, proxy, MCP, wrapper)
- Reversible compression preserves originals
- Cross-agent memory with automatic deduplication
- Accuracy preserved on GSM8K, TruthfulQA, SQuAD v2
- Works with Anthropic, OpenAI, Bedrock, any OpenAI-compatible
Watch Out For
- Additional processing overhead
- Compression may lose nuance in some cases
- Requires Python 3.10+ or Node.js
- Learning curve for optimal configuration
Editor's Note
The 90% token savings claim is misleading. Like RTK and similar tools, Headroom focuses on one narrow piece of the waste puzzle: compressing tool output before the model sees it. That's the entire source of savings.
The "90%" figure refers specifically to reducing tool output size—not your total token usage. In practice, this represents a small fraction of actual spend. For 85-90% of real-world token waste, these tools simply don't help.
Pricing
View all features & details
Deployment Modes
- Library: compress(messages) API
- Local proxy: zero code changes
- Agent wrapper for claude/cursor/codex/aider
- MCP server with headroom_compress tool
Compression Engines
- SmartCrusher for JSON structures
- CodeCompressor with AST awareness
- Kompress-base (custom HuggingFace model)
- CacheAligner for KV cache optimization
- IntelligentContext for score-based fitting
Real-World Results
- Code search: 17,765 → 1,408 tokens (92%)
- SRE debugging: 65,694 → 5,118 tokens (92%)
- GitHub triage: 54,174 → 14,761 tokens (73%)
Supported Platforms
- Python (pip install headroom-ai)
- TypeScript/Node (npm install headroom-ai)
- Docker image available
How It Compares
| Category | Token Optimizer | Headroom | RTK |
|---|---|---|---|
| Tool output compression | 99%+ per-output, progressive disclosure | 60-95% (cherry-picked benchmarks) | 60-90% (CLI only) |
| First-read file skeletons | Shadow-validated, fail-open | — | — |
| Bash/CLI output compression | Generic + git/ls/pytest patterns | Partial | Yes (main feature) |
| Tabular/JSON compression | Value-preserving columnar | Yes (main feature) | — |
| Delta reads (re-read = diff only) | Yes | — | — |
| Model routing (wrong model for task) | 9 waste detectors | — | — |
| Loop/spin detection | Yes | — | — |
| Context quality scoring | Per-session, cross-session average | — | — |
| Cache instability detection | Yes | — | — |
| Retry churn detection | Yes | — | — |
| Tool cascade waste | Yes | — | — |
| Code structure maps | Outlines on repeated reads | — | — |
| Conversation history (60-75% of cost) | Checkpoint + compaction awareness | Doesn’t touch it | Doesn’t touch it |
| Quality gates | 3-tier system, edit-rate proxies | “Same answers” (untested) | — |
| Measured dollar savings | Real bill reduction per category | Per-output ratios only | rtk gain analytics |
| Multi-platform | Claude Code, Codex, OpenClaw, OpenCode | Python library + proxy | macOS, Linux, WSL |
Summary: Token Optimizer covers 16/16 categories. Headroom and RTK each focus on one narrow slice (tool output compression) and miss 85-90% of actual token waste.