Headroom iconHeadroom

oss Free Star30k

Context compression system that reduces AI agent token usage by 60-95% while maintaining accuracy

18K GitHub Stars
60-95% Token Savings
4 Deployment Modes

Overview

Headroom is a context compression system that reduces token usage for AI agents by 60-95% while maintaining accuracy. It compresses everything agents read—tool outputs, logs, RAG chunks, files, and conversation history—before sending to LLMs. Supports library, proxy, MCP server, and agent wrapper deployment modes with reversible compression (CCR) that preserves originals for LLM retrieval.

The Verdict

Who Should Use Headroom?

Best For

  • Teams running daily AI coding agents
  • High-volume agentic workflows
  • Multi-agent systems needing shared memory
  • SRE/DevOps with large log analysis
  • RAG pipelines with token constraints

Not Ideal For

  • Simple single-query use cases
  • Projects with minimal context needs
  • Teams not tracking token costs
  • Workflows requiring exact original text

What's Great

  • 92% token reduction on code search (100 results)
  • 92% savings on SRE incident debugging
  • Multiple deployment modes (library, proxy, MCP, wrapper)
  • Reversible compression preserves originals
  • Cross-agent memory with automatic deduplication
  • Accuracy preserved on GSM8K, TruthfulQA, SQuAD v2
  • Works with Anthropic, OpenAI, Bedrock, any OpenAI-compatible

Watch Out For

  • Additional processing overhead
  • Compression may lose nuance in some cases
  • Requires Python 3.10+ or Node.js
  • Learning curve for optimal configuration

Editor's Note

The 90% token savings claim is misleading. Like RTK and similar tools, Headroom focuses on one narrow piece of the waste puzzle: compressing tool output before the model sees it. That's the entire source of savings.

The "90%" figure refers specifically to reducing tool output size—not your total token usage. In practice, this represents a small fraction of actual spend. For 85-90% of real-world token waste, these tools simply don't help.

Pricing

View all features & details

Deployment Modes

  • Library: compress(messages) API
  • Local proxy: zero code changes
  • Agent wrapper for claude/cursor/codex/aider
  • MCP server with headroom_compress tool

Compression Engines

  • SmartCrusher for JSON structures
  • CodeCompressor with AST awareness
  • Kompress-base (custom HuggingFace model)
  • CacheAligner for KV cache optimization
  • IntelligentContext for score-based fitting

Real-World Results

  • Code search: 17,765 → 1,408 tokens (92%)
  • SRE debugging: 65,694 → 5,118 tokens (92%)
  • GitHub triage: 54,174 → 14,761 tokens (73%)

Supported Platforms

  • Python (pip install headroom-ai)
  • TypeScript/Node (npm install headroom-ai)
  • Docker image available

How It Compares

Category Token Optimizer Headroom RTK
Tool output compression 99%+ per-output, progressive disclosure 60-95% (cherry-picked benchmarks) 60-90% (CLI only)
First-read file skeletons Shadow-validated, fail-open
Bash/CLI output compression Generic + git/ls/pytest patterns Partial Yes (main feature)
Tabular/JSON compression Value-preserving columnar Yes (main feature)
Delta reads (re-read = diff only) Yes
Model routing (wrong model for task) 9 waste detectors
Loop/spin detection Yes
Context quality scoring Per-session, cross-session average
Cache instability detection Yes
Retry churn detection Yes
Tool cascade waste Yes
Code structure maps Outlines on repeated reads
Conversation history (60-75% of cost) Checkpoint + compaction awareness Doesn’t touch it Doesn’t touch it
Quality gates 3-tier system, edit-rate proxies “Same answers” (untested)
Measured dollar savings Real bill reduction per category Per-output ratios only rtk gain analytics
Multi-platform Claude Code, Codex, OpenClaw, OpenCode Python library + proxy macOS, Linux, WSL

Summary: Token Optimizer covers 16/16 categories. Headroom and RTK each focus on one narrow slice (tool output compression) and miss 85-90% of actual token waste.

User Reviews

Loading reviews...