Promptfoo

Open-source CLI and library for LLM evaluation and red teaming. Enables systematic prompt testing, model comparison, vulnerability scanning, and automated security assessments with CI/CD integration.

—

22K GitHub Stars

300K+ Open Source Users

156 Fortune 500 Users

Overview

Promptfoo is an open-source CLI and library for evaluating, testing, and red teaming LLM applications. It enables developers to systematically test prompts against datasets, compare model outputs side-by-side, and run automated security assessments including vulnerability scanning and adversarial attack simulations. Built with a developer-first approach, it supports declarative YAML configs, concurrent evaluation execution, and integrates directly into CI/CD pipelines. Now part of OpenAI while maintaining its MIT license and open-source status, Promptfoo is used by engineering teams at major tech companies including OpenAI and Anthropic themselves.

The Verdict

Who Should Use Promptfoo?

Best For

Teams needing systematic prompt evaluation workflows
Security-conscious organizations requiring red teaming
CI/CD-driven development with automated LLM testing
Multi-model comparison and selection
Enterprise security teams and CISOs

Not Ideal For

Simple single-prompt applications
Teams without Node.js in their stack
Those needing production observability (use dedicated tools)
Non-technical users wanting GUI-only workflows

What's Great

Comprehensive security testing - 50+ vulnerability types covered
Works with any LLM provider - OpenAI, Anthropic, Azure, local models
Declarative YAML configs for reproducible evaluations
CI/CD native - GitHub Actions, CLI integration
Local execution - data never leaves your machine
Side-by-side model comparison with matrix views
MIT licensed, fully open source
Backed by OpenAI with active development

Official Site · GitHub

Watch Out For

Requires Node.js 20.20+ or 22.22+ environment
CLI-focused - web UI is secondary
Learning curve for YAML config syntax
Enterprise features require sales contact

GitHub Issues

Pricing

Community

Free

All eval features, all providers, 10K red team probes/mo, self-hosted

Enterprise

Custom

Advanced detection, SSO, managed cloud, priority SLA

On-Premise

Custom

Full infrastructure control, complete data isolation

View all features & details

Evaluation Features

Declarative YAML test configs
Matrix view comparisons
Custom scoring metrics
Concurrent execution
Live reload and caching
Web viewer for results
Team sharing capabilities

Security & Red Teaming

Prompt injection detection
Jailbreak testing
Data leak scanning
Business rule violations
Compliance risk assessment
Real-time guardrails
Code scanning in IDEs/CI

Supported Providers

OpenAI / Azure OpenAI
Anthropic Claude
Google (Gemini)
Amazon Bedrock
HuggingFace
Ollama (local models)
Custom API endpoints

Platform & Installation

npm / npx (primary)
Homebrew (brew install)
pip install
TypeScript core (97%)
Node.js 20.20+ or 22.22+
MIT License

How It Compares

Feature	Promptfoo	Langfuse Evals	Braintrust	Weights & Biases
Open Source	MIT License	Apache 2.0	No	No
Red Teaming	50+ vuln types	Basic	Limited	No
CI/CD Native	Yes	Via API	Via API	Via API
Local Execution	Yes	Self-host	No	No
Free Tier	Full features	50K obs	Limited	Limited
Multi-Provider	All major + custom	All major	All major	All major
Primary Focus	Eval + Security	Observability	Evals	ML Ops
Best For	Security-first teams	Full observability	Data teams	ML workflows

User Reviews

Loading reviews...