Caveman iconCaveman

open-source Free Star73k

Token-efficient stack for agent-native builders that compresses prompts and outputs by ~75%, treating tokens as a precious resource across compression, workflow, and memory layers

72.7K GitHub Stars
~75% Token Reduction
~77% Savings (full stack)
20+ Model Providers

Overview

Caveman is a three-part open-source ecosystem for agent-native developers who treat tokens as a precious resource. The compression primitive cuts prompt and output tokens by ~75% through a deterministic, user-controlled dictionary. The workflow layer (Cavekit) adds spec-driven task execution with acceptance criteria and verification checkpoints. The memory layer (Cavemem) provides cross-agent persistent memory via local SQLite with FTS5 and vector search. When all components are stacked together via the Caveman Code CLI, the project claims ~77% total token savings (21,340 → 4,812 tokens in baseline testing). The entire stack is MIT-licensed and available on npm.

The Verdict

Who Should Use Caveman?

Best For

  • Developers running long-context or multi-agent workflows where token costs compound
  • Teams building on Claude Code who want a drop-in compression skill
  • Agent-native builders who need persistent cross-session memory without a cloud backend
  • Anyone hitting usage limits and wanting to do more per dollar
  • Projects requiring spec-driven development with structured verification

Not Ideal For

  • Teams needing peer-reviewed benchmarks — savings claims are self-reported
  • Workflows where output verbosity and natural language are required (compressed output can read awkwardly)
  • Non-Claude Code environments (Cavekit/Cavemem have limited multi-agent breadth vs. alternatives)

What's Great

  • ~75% token reduction on the compression primitive alone — composable across apps and models
  • Deterministic output with user-controlled dictionary: no black-box surprises
  • Local-first memory (SQLite + FTS5) keeps sensitive data off the cloud (~1.2 MB for 4,812 observations)
  • MCP protocol support in Cavemem exposes search, timeline, and get_observations tools to any MCP-compatible agent
  • Works with 20+ model providers — not locked to Anthropic
  • Full MIT stack: no licensing costs or vendor lock-in

Watch Out For

  • Savings figures (77%) are self-reported by the creator — no independent reproduction guide yet
  • The companion Cavemem and Cavekit projects are much earlier-stage than the flagship compression primitive
  • Individual creator project — community and long-term maintenance are less established than VC-backed alternatives
  • Compressed output syntax may reduce readability for humans reviewing agent outputs

Pricing

View all features & details

Caveman (Compression Primitive)

  • ~75% token reduction on typical agent workloads
  • Model-agnostic — works with any LLM provider
  • Deterministic output with user-controlled dictionary
  • Composable across multiple applications

Cavekit (Workflow Layer)

  • Spec-driven development: prose to structured plan
  • Task-based execution with acceptance criteria
  • Verification checkpoints per task
  • Iterative spec evolution as requirements change

Cavemem (Memory Layer)

  • Persistent cross-agent memory via SQLite + FTS5
  • Vector search capabilities
  • Local-first, privacy-preserving (~1.2 MB for 4,812 observations)
  • MCP exposure: search, timeline, get_observations tools

Caveman Code (CLI)

  • Four independent compression layers: prompt, commands, outputs, context
  • ~77% total token savings when fully stacked
  • Support for 20+ model providers
  • Available via npm, pnpm, yarn, bun, or Docker

How It Compares

Feature Caveman Ponytail Token Optimizer
Approach Prompt compression Code minimalism (YAGNI) Context auditing
Token Savings ~75–77% 47–77% cost reduction Varies
Memory Layer Yes (Cavemem + MCP) No No
Workflow Layer Yes (Cavekit) No No
Benchmarks Self-reported Published, reproducible N/A
Model Support 20+ providers 11 agents Claude Code
License MIT MIT MIT

User Reviews

Loading reviews...