Caveman
Token-efficient stack for agent-native builders that compresses prompts and outputs by ~75%, treating tokens as a precious resource across compression, workflow, and memory layers
Overview
Caveman is a three-part open-source ecosystem for agent-native developers who treat tokens as a precious resource. The compression primitive cuts prompt and output tokens by ~75% through a deterministic, user-controlled dictionary. The workflow layer (Cavekit) adds spec-driven task execution with acceptance criteria and verification checkpoints. The memory layer (Cavemem) provides cross-agent persistent memory via local SQLite with FTS5 and vector search. When all components are stacked together via the Caveman Code CLI, the project claims ~77% total token savings (21,340 → 4,812 tokens in baseline testing). The entire stack is MIT-licensed and available on npm.
The Verdict
Who Should Use Caveman?
Best For
- Developers running long-context or multi-agent workflows where token costs compound
- Teams building on Claude Code who want a drop-in compression skill
- Agent-native builders who need persistent cross-session memory without a cloud backend
- Anyone hitting usage limits and wanting to do more per dollar
- Projects requiring spec-driven development with structured verification
Not Ideal For
- Teams needing peer-reviewed benchmarks — savings claims are self-reported
- Workflows where output verbosity and natural language are required (compressed output can read awkwardly)
- Non-Claude Code environments (Cavekit/Cavemem have limited multi-agent breadth vs. alternatives)
What's Great
- ~75% token reduction on the compression primitive alone — composable across apps and models
- Deterministic output with user-controlled dictionary: no black-box surprises
- Local-first memory (SQLite + FTS5) keeps sensitive data off the cloud (~1.2 MB for 4,812 observations)
- MCP protocol support in Cavemem exposes search, timeline, and get_observations tools to any MCP-compatible agent
- Works with 20+ model providers — not locked to Anthropic
- Full MIT stack: no licensing costs or vendor lock-in
Watch Out For
- Savings figures (77%) are self-reported by the creator — no independent reproduction guide yet
- The companion Cavemem and Cavekit projects are much earlier-stage than the flagship compression primitive
- Individual creator project — community and long-term maintenance are less established than VC-backed alternatives
- Compressed output syntax may reduce readability for humans reviewing agent outputs
Pricing
View all features & details
Caveman (Compression Primitive)
- ~75% token reduction on typical agent workloads
- Model-agnostic — works with any LLM provider
- Deterministic output with user-controlled dictionary
- Composable across multiple applications
Cavekit (Workflow Layer)
- Spec-driven development: prose to structured plan
- Task-based execution with acceptance criteria
- Verification checkpoints per task
- Iterative spec evolution as requirements change
Cavemem (Memory Layer)
- Persistent cross-agent memory via SQLite + FTS5
- Vector search capabilities
- Local-first, privacy-preserving (~1.2 MB for 4,812 observations)
- MCP exposure: search, timeline, get_observations tools
Caveman Code (CLI)
- Four independent compression layers: prompt, commands, outputs, context
- ~77% total token savings when fully stacked
- Support for 20+ model providers
- Available via npm, pnpm, yarn, bun, or Docker
How It Compares
| Feature | Caveman | Ponytail | Token Optimizer |
|---|---|---|---|
| Approach | Prompt compression | Code minimalism (YAGNI) | Context auditing |
| Token Savings | ~75–77% | 47–77% cost reduction | Varies |
| Memory Layer | Yes (Cavemem + MCP) | No | No |
| Workflow Layer | Yes (Cavekit) | No | No |
| Benchmarks | Self-reported | Published, reproducible | N/A |
| Model Support | 20+ providers | 11 agents | Claude Code |
| License | MIT | MIT | MIT |