RAPTOR

open-source Open-source Star2k

Autonomous security research framework built on Claude Code that chains static analysis, binary analysis, LLM-powered vulnerability validation, exploit generation, and patch writing into a single workflow

agents python self hosted

2,950 GitHub Stars

13 Commands

9 Expert Personas

5 LLM Providers

Overview

RAPTOR (Recursive Autonomous Penetration Testing and Observation Robot) is an autonomous security research framework built on top of Claude Code that chains static analysis, binary analysis, LLM-powered vulnerability validation, exploit generation, and patch writing into a single workflow you can run against a codebase or binary. Created by veteran security researchers Gadi Evron, Daniel Cuthbert, Thomas Dullien (Halvar Flake), Michael Bargury, and John Cartwright, it configures the agent for adversarial thinking through rules, sub-agents, and skills. The architecture splits into a Python execution layer (Semgrep, CodeQL, SARIF parsing, LLM dispatch) that can run standalone in CI, and a Claude Code decision layer that prioritizes findings, interprets results, and judges exploitability.

The Verdict

Who Should Use RAPTOR?

Best For

Security researchers hunting vulnerabilities in code and binaries
Teams wanting LLM triage to cut SAST false positives
Binary fuzzing and crash root-cause analysis (AFL++, rr)
Supply chain audits with SBOM and SARIF output
CI pipelines needing structured scan output without Claude Code

Not Ideal For

Web application scanning (module is an alpha stub)
Commercial use without license review (CodeQL dependency forbids it)
Teams needing polished, supported commercial software

What's Great

Full autonomous pipeline: scan, validate, exploit, patch via /agentic
Multi-stage exploitability validation (Stages A–D) filters tool noise before reporting
Z3 SMT pre-screening drops provably unreachable CodeQL paths before any LLM call
Works fully offline for Semgrep scanning—registry packs shipped in the repo
Provider-agnostic analysis layer: Anthropic, OpenAI, Gemini, Mistral, Ollama with per-role model assignment
Built-in cost controls (RAPTOR_MAX_COST) and a trust scorecard for cheap-model short-circuiting
Authored by recognized security researchers; MIT licensed

GitHub README

Watch Out For

Self-described as "not polished software... held together with enthusiasm and duct tape"
Web exploitation module is an alpha-stage stub
CodeQL's license does not permit commercial use despite RAPTOR's MIT license
Devcontainer image is large (~6 GB) and needs --privileged Docker for the rr debugger
Full agentic workflow requires Claude Code; local Ollama models produce unreliable exploit and patch code

GitHub README

Pricing

Open Source

Free

MIT license; LLM API costs apply, capped per run via RAPTOR_MAX_COST

View all features & details

Core Commands

/agentic — full scan, validate, exploit, patch workflow
/scan — static analysis with Semgrep and CodeQL
/understand — attack surface mapping and data flow tracing
/validate — multi-stage exploitability validation
/fuzz — binary fuzzing with AFL++ and crash analysis
/crash-analysis — autonomous C/C++ crash root-cause analysis
/sca — software composition analysis with SBOM output
/oss-forensics — evidence-backed GitHub repo investigation

Validation Pipeline

Stage A: real vulnerability or pattern-matching noise?
Stage B: attacker requirements and obstacles
Stage C: does the code path exist and is it reachable?
Stage D: final call on test code, preconditions, hedging
Cross-finding analysis for shared root causes and attack chains

Supply Chain Analysis

OSV advisories, CISA KEV, EPSS, SSVC enrichment
CycloneDX SBOM with VEX data
SARIF output for GitHub/GitLab code scanning
Manifest, lockfile, workflow, and container package discovery
Fix, upgrade, diff, and verify subcommands

Model Flexibility

Roles: analysis, code, consensus, aggregate, fallback
Multi-model correlation across providers
Fast-tier short-circuit with Wilson-bound trust scorecard
Z3 SMT dataflow pre-screening and one-gadget constraint analysis
Offline Semgrep with bundled registry packs

Expert Personas

On-Demand Perspectives

Mark Dowd — binary exploitation and vulnerability research
Charlie Miller / Halvar Flake — low-level exploitation
Penetration Tester — realistic attack scenario assessment
Fuzzing Strategist — corpus design and triage
CodeQL Dataflow Analyst — query writing and path analysis

GitHub README

Project Workspaces

Merged findings across runs
Coverage tracking per file
Diffs between runs
Export and report generation

GitHub README

How It Compares

Feature	RAPTOR	Semgrep	CodeQL
Finding Validation	LLM multi-stage pipeline	Rule-based only	Query-based only
Exploit Generation	PoC + patch output	No	No
Binary Analysis	Fuzzing, crash triage, Z3	No	No
Orchestration	Autonomous agent (Claude Code)	CLI/CI	CLI/CI
False Positive Filtering	SMT + LLM validation	Manual triage	Manual triage
Commercial Use	MIT, but CodeQL dependency restricted	Yes (OSS engine)	Restricted

User Reviews

Loading reviews...