Headroom

oss Free Star30k

Context compression system that reduces AI agent token usage by 60-95% while maintaining accuracy

mcp server skill

18K GitHub Stars

60-95% Token Savings

4 Deployment Modes

Overview

Headroom is a context compression system that reduces token usage for AI agents by 60-95% while maintaining accuracy. It compresses everything agents read—tool outputs, logs, RAG chunks, files, and conversation history—before sending to LLMs. Supports library, proxy, MCP server, and agent wrapper deployment modes with reversible compression (CCR) that preserves originals for LLM retrieval.

The Verdict

Who Should Use Headroom?

Best For

Teams running daily AI coding agents
High-volume agentic workflows
Multi-agent systems needing shared memory
SRE/DevOps with large log analysis
RAG pipelines with token constraints

Not Ideal For

Simple single-query use cases
Projects with minimal context needs
Teams not tracking token costs
Workflows requiring exact original text

What's Great

92% token reduction on code search (100 results)
92% savings on SRE incident debugging
Multiple deployment modes (library, proxy, MCP, wrapper)
Reversible compression preserves originals
Cross-agent memory with automatic deduplication
Accuracy preserved on GSM8K, TruthfulQA, SQuAD v2
Works with Anthropic, OpenAI, Bedrock, any OpenAI-compatible

GitHub README

Watch Out For

Additional processing overhead
Compression may lose nuance in some cases
Requires Python 3.10+ or Node.js
Learning curve for optimal configuration

Editor's Note

The 90% token savings claim is misleading. Like RTK and similar tools, Headroom focuses on one narrow piece of the waste puzzle: compressing tool output before the model sees it. That's the entire source of savings.

The "90%" figure refers specifically to reducing tool output size—not your total token usage. In practice, this represents a small fraction of actual spend. For 85-90% of real-world token waste, these tools simply don't help.

Pricing

Free

Open source, Apache 2.0

View all features & details

Deployment Modes

Library: compress(messages) API
Local proxy: zero code changes
Agent wrapper for claude/cursor/codex/aider
MCP server with headroom_compress tool

Compression Engines

SmartCrusher for JSON structures
CodeCompressor with AST awareness
Kompress-base (custom HuggingFace model)
CacheAligner for KV cache optimization
IntelligentContext for score-based fitting

Real-World Results

Code search: 17,765 → 1,408 tokens (92%)
SRE debugging: 65,694 → 5,118 tokens (92%)
GitHub triage: 54,174 → 14,761 tokens (73%)

Supported Platforms

Python (pip install headroom-ai)
TypeScript/Node (npm install headroom-ai)
Docker image available

How It Compares

Category	Token Optimizer	Headroom	RTK
Tool output compression	99%+ per-output, progressive disclosure	60-95% (cherry-picked benchmarks)	60-90% (CLI only)
First-read file skeletons	Shadow-validated, fail-open	—	—
Bash/CLI output compression	Generic + git/ls/pytest patterns	Partial	Yes (main feature)
Tabular/JSON compression	Value-preserving columnar	Yes (main feature)	—
Delta reads (re-read = diff only)	Yes	—	—
Model routing (wrong model for task)	9 waste detectors	—	—
Loop/spin detection	Yes	—	—
Context quality scoring	Per-session, cross-session average	—	—
Cache instability detection	Yes	—	—
Retry churn detection	Yes	—	—
Tool cascade waste	Yes	—	—
Code structure maps	Outlines on repeated reads	—	—
Conversation history (60-75% of cost)	Checkpoint + compaction awareness	Doesn’t touch it	Doesn’t touch it
Quality gates	3-tier system, edit-rate proxies	“Same answers” (untested)	—
Measured dollar savings	Real bill reduction per category	Per-output ratios only	`rtk gain` analytics
Multi-platform	Claude Code, Codex, OpenClaw, OpenCode	Python library + proxy	macOS, Linux, WSL

Summary: Token Optimizer covers 16/16 categories. Headroom and RTK each focus on one narrow slice (tool output compression) and miss 85-90% of actual token waste.

User Reviews

Loading reviews...