RAPTOR
Autonomous security research framework built on Claude Code that chains static analysis, binary analysis, LLM-powered vulnerability validation, exploit generation, and patch writing into a single workflow
Overview
RAPTOR (Recursive Autonomous Penetration Testing and Observation Robot) is an autonomous security research framework built on top of Claude Code that chains static analysis, binary analysis, LLM-powered vulnerability validation, exploit generation, and patch writing into a single workflow you can run against a codebase or binary. Created by veteran security researchers Gadi Evron, Daniel Cuthbert, Thomas Dullien (Halvar Flake), Michael Bargury, and John Cartwright, it configures the agent for adversarial thinking through rules, sub-agents, and skills. The architecture splits into a Python execution layer (Semgrep, CodeQL, SARIF parsing, LLM dispatch) that can run standalone in CI, and a Claude Code decision layer that prioritizes findings, interprets results, and judges exploitability.
The Verdict
Who Should Use RAPTOR?
Best For
- Security researchers hunting vulnerabilities in code and binaries
- Teams wanting LLM triage to cut SAST false positives
- Binary fuzzing and crash root-cause analysis (AFL++, rr)
- Supply chain audits with SBOM and SARIF output
- CI pipelines needing structured scan output without Claude Code
Not Ideal For
- Web application scanning (module is an alpha stub)
- Commercial use without license review (CodeQL dependency forbids it)
- Teams needing polished, supported commercial software
What's Great
- Full autonomous pipeline: scan, validate, exploit, patch via
/agentic - Multi-stage exploitability validation (Stages A–D) filters tool noise before reporting
- Z3 SMT pre-screening drops provably unreachable CodeQL paths before any LLM call
- Works fully offline for Semgrep scanning—registry packs shipped in the repo
- Provider-agnostic analysis layer: Anthropic, OpenAI, Gemini, Mistral, Ollama with per-role model assignment
- Built-in cost controls (
RAPTOR_MAX_COST) and a trust scorecard for cheap-model short-circuiting - Authored by recognized security researchers; MIT licensed
Watch Out For
- Self-described as "not polished software... held together with enthusiasm and duct tape"
- Web exploitation module is an alpha-stage stub
- CodeQL's license does not permit commercial use despite RAPTOR's MIT license
- Devcontainer image is large (~6 GB) and needs
--privilegedDocker for the rr debugger - Full agentic workflow requires Claude Code; local Ollama models produce unreliable exploit and patch code
Pricing
View all features & details
Core Commands
/agentic— full scan, validate, exploit, patch workflow/scan— static analysis with Semgrep and CodeQL/understand— attack surface mapping and data flow tracing/validate— multi-stage exploitability validation/fuzz— binary fuzzing with AFL++ and crash analysis/crash-analysis— autonomous C/C++ crash root-cause analysis/sca— software composition analysis with SBOM output/oss-forensics— evidence-backed GitHub repo investigation
Validation Pipeline
- Stage A: real vulnerability or pattern-matching noise?
- Stage B: attacker requirements and obstacles
- Stage C: does the code path exist and is it reachable?
- Stage D: final call on test code, preconditions, hedging
- Cross-finding analysis for shared root causes and attack chains
Supply Chain Analysis
- OSV advisories, CISA KEV, EPSS, SSVC enrichment
- CycloneDX SBOM with VEX data
- SARIF output for GitHub/GitLab code scanning
- Manifest, lockfile, workflow, and container package discovery
- Fix, upgrade, diff, and verify subcommands
Model Flexibility
- Roles: analysis, code, consensus, aggregate, fallback
- Multi-model correlation across providers
- Fast-tier short-circuit with Wilson-bound trust scorecard
- Z3 SMT dataflow pre-screening and one-gadget constraint analysis
- Offline Semgrep with bundled registry packs
Expert Personas
On-Demand Perspectives
- Mark Dowd — binary exploitation and vulnerability research
- Charlie Miller / Halvar Flake — low-level exploitation
- Penetration Tester — realistic attack scenario assessment
- Fuzzing Strategist — corpus design and triage
- CodeQL Dataflow Analyst — query writing and path analysis
Project Workspaces
- Merged findings across runs
- Coverage tracking per file
- Diffs between runs
- Export and report generation
How It Compares
| Feature | RAPTOR | Semgrep | CodeQL |
|---|---|---|---|
| Finding Validation | LLM multi-stage pipeline | Rule-based only | Query-based only |
| Exploit Generation | PoC + patch output | No | No |
| Binary Analysis | Fuzzing, crash triage, Z3 | No | No |
| Orchestration | Autonomous agent (Claude Code) | CLI/CI | CLI/CI |
| False Positive Filtering | SMT + LLM validation | Manual triage | Manual triage |
| Commercial Use | MIT, but CodeQL dependency restricted | Yes (OSS engine) | Restricted |