Caveman

open-source Free Star73k

Token-efficient stack for agent-native builders that compresses prompts and outputs by ~75%, treating tokens as a precious resource across compression, workflow, and memory layers

skill coding mcp server multi model memory

72.7K GitHub Stars

~75% Token Reduction

~77% Savings (full stack)

20+ Model Providers

Overview

Caveman is a three-part open-source ecosystem for agent-native developers who treat tokens as a precious resource. The compression primitive cuts prompt and output tokens by ~75% through a deterministic, user-controlled dictionary. The workflow layer (Cavekit) adds spec-driven task execution with acceptance criteria and verification checkpoints. The memory layer (Cavemem) provides cross-agent persistent memory via local SQLite with FTS5 and vector search. When all components are stacked together via the Caveman Code CLI, the project claims ~77% total token savings (21,340 → 4,812 tokens in baseline testing). The entire stack is MIT-licensed and available on npm.

The Verdict

Who Should Use Caveman?

Best For

Developers running long-context or multi-agent workflows where token costs compound
Teams building on Claude Code who want a drop-in compression skill
Agent-native builders who need persistent cross-session memory without a cloud backend
Anyone hitting usage limits and wanting to do more per dollar
Projects requiring spec-driven development with structured verification

Not Ideal For

Teams needing peer-reviewed benchmarks — savings claims are self-reported
Workflows where output verbosity and natural language are required (compressed output can read awkwardly)
Non-Claude Code environments (Cavekit/Cavemem have limited multi-agent breadth vs. alternatives)

What's Great

~75% token reduction on the compression primitive alone — composable across apps and models
Deterministic output with user-controlled dictionary: no black-box surprises
Local-first memory (SQLite + FTS5) keeps sensitive data off the cloud (~1.2 MB for 4,812 observations)
MCP protocol support in Cavemem exposes search, timeline, and get_observations tools to any MCP-compatible agent
Works with 20+ model providers — not locked to Anthropic
Full MIT stack: no licensing costs or vendor lock-in

Official Site · GitHub README

Watch Out For

Savings figures (77%) are self-reported by the creator — no independent reproduction guide yet
The companion Cavemem and Cavekit projects are much earlier-stage than the flagship compression primitive
Individual creator project — community and long-term maintenance are less established than VC-backed alternatives
Compressed output syntax may reduce readability for humans reviewing agent outputs

Cavemem GitHub · Cavekit GitHub

Pricing

Free

MIT licensed across the full stack

View all features & details

Caveman (Compression Primitive)

~75% token reduction on typical agent workloads
Model-agnostic — works with any LLM provider
Deterministic output with user-controlled dictionary
Composable across multiple applications

Cavekit (Workflow Layer)

Spec-driven development: prose to structured plan
Task-based execution with acceptance criteria
Verification checkpoints per task
Iterative spec evolution as requirements change

Cavemem (Memory Layer)

Persistent cross-agent memory via SQLite + FTS5
Vector search capabilities
Local-first, privacy-preserving (~1.2 MB for 4,812 observations)
MCP exposure: search, timeline, get_observations tools

Caveman Code (CLI)

Four independent compression layers: prompt, commands, outputs, context
~77% total token savings when fully stacked
Support for 20+ model providers
Available via npm, pnpm, yarn, bun, or Docker

How It Compares

Feature	Caveman	Ponytail	Token Optimizer
Approach	Prompt compression	Code minimalism (YAGNI)	Context auditing
Token Savings	~75–77%	47–77% cost reduction	Varies
Memory Layer	Yes (Cavemem + MCP)	No	No
Workflow Layer	Yes (Cavekit)	No	No
Benchmarks	Self-reported	Published, reproducible	N/A
Model Support	20+ providers	11 agents	Claude Code
License	MIT	MIT	MIT

User Reviews

Loading reviews...