RAPTOR iconRAPTOR

open-source Open-source Star2k

Autonomous security research framework built on Claude Code that chains static analysis, binary analysis, LLM-powered vulnerability validation, exploit generation, and patch writing into a single workflow

2,950 GitHub Stars
13 Commands
9 Expert Personas
5 LLM Providers

Overview

RAPTOR (Recursive Autonomous Penetration Testing and Observation Robot) is an autonomous security research framework built on top of Claude Code that chains static analysis, binary analysis, LLM-powered vulnerability validation, exploit generation, and patch writing into a single workflow you can run against a codebase or binary. Created by veteran security researchers Gadi Evron, Daniel Cuthbert, Thomas Dullien (Halvar Flake), Michael Bargury, and John Cartwright, it configures the agent for adversarial thinking through rules, sub-agents, and skills. The architecture splits into a Python execution layer (Semgrep, CodeQL, SARIF parsing, LLM dispatch) that can run standalone in CI, and a Claude Code decision layer that prioritizes findings, interprets results, and judges exploitability.

The Verdict

Who Should Use RAPTOR?

Best For

  • Security researchers hunting vulnerabilities in code and binaries
  • Teams wanting LLM triage to cut SAST false positives
  • Binary fuzzing and crash root-cause analysis (AFL++, rr)
  • Supply chain audits with SBOM and SARIF output
  • CI pipelines needing structured scan output without Claude Code

Not Ideal For

  • Web application scanning (module is an alpha stub)
  • Commercial use without license review (CodeQL dependency forbids it)
  • Teams needing polished, supported commercial software

What's Great

  • Full autonomous pipeline: scan, validate, exploit, patch via /agentic
  • Multi-stage exploitability validation (Stages A–D) filters tool noise before reporting
  • Z3 SMT pre-screening drops provably unreachable CodeQL paths before any LLM call
  • Works fully offline for Semgrep scanning—registry packs shipped in the repo
  • Provider-agnostic analysis layer: Anthropic, OpenAI, Gemini, Mistral, Ollama with per-role model assignment
  • Built-in cost controls (RAPTOR_MAX_COST) and a trust scorecard for cheap-model short-circuiting
  • Authored by recognized security researchers; MIT licensed

Watch Out For

  • Self-described as "not polished software... held together with enthusiasm and duct tape"
  • Web exploitation module is an alpha-stage stub
  • CodeQL's license does not permit commercial use despite RAPTOR's MIT license
  • Devcontainer image is large (~6 GB) and needs --privileged Docker for the rr debugger
  • Full agentic workflow requires Claude Code; local Ollama models produce unreliable exploit and patch code

Pricing

View all features & details

Core Commands

  • /agentic — full scan, validate, exploit, patch workflow
  • /scan — static analysis with Semgrep and CodeQL
  • /understand — attack surface mapping and data flow tracing
  • /validate — multi-stage exploitability validation
  • /fuzz — binary fuzzing with AFL++ and crash analysis
  • /crash-analysis — autonomous C/C++ crash root-cause analysis
  • /sca — software composition analysis with SBOM output
  • /oss-forensics — evidence-backed GitHub repo investigation

Validation Pipeline

  • Stage A: real vulnerability or pattern-matching noise?
  • Stage B: attacker requirements and obstacles
  • Stage C: does the code path exist and is it reachable?
  • Stage D: final call on test code, preconditions, hedging
  • Cross-finding analysis for shared root causes and attack chains

Supply Chain Analysis

  • OSV advisories, CISA KEV, EPSS, SSVC enrichment
  • CycloneDX SBOM with VEX data
  • SARIF output for GitHub/GitLab code scanning
  • Manifest, lockfile, workflow, and container package discovery
  • Fix, upgrade, diff, and verify subcommands

Model Flexibility

  • Roles: analysis, code, consensus, aggregate, fallback
  • Multi-model correlation across providers
  • Fast-tier short-circuit with Wilson-bound trust scorecard
  • Z3 SMT dataflow pre-screening and one-gadget constraint analysis
  • Offline Semgrep with bundled registry packs

Expert Personas

On-Demand Perspectives

  • Mark Dowd — binary exploitation and vulnerability research
  • Charlie Miller / Halvar Flake — low-level exploitation
  • Penetration Tester — realistic attack scenario assessment
  • Fuzzing Strategist — corpus design and triage
  • CodeQL Dataflow Analyst — query writing and path analysis

Project Workspaces

  • Merged findings across runs
  • Coverage tracking per file
  • Diffs between runs
  • Export and report generation

How It Compares

Feature RAPTOR Semgrep CodeQL
Finding Validation LLM multi-stage pipeline Rule-based only Query-based only
Exploit Generation PoC + patch output No No
Binary Analysis Fuzzing, crash triage, Z3 No No
Orchestration Autonomous agent (Claude Code) CLI/CI CLI/CI
False Positive Filtering SMT + LLM validation Manual triage Manual triage
Commercial Use MIT, but CodeQL dependency restricted Yes (OSS engine) Restricted

User Reviews

Loading reviews...