Promptfoo iconPromptfoo

oss Free Star22k

Open-source CLI and library for LLM evaluation and red teaming. Enables systematic prompt testing, model comparison, vulnerability scanning, and automated security assessments with CI/CD integration.

22K GitHub Stars
300K+ Open Source Users
156 Fortune 500 Users

Overview

Promptfoo is an open-source CLI and library for evaluating, testing, and red teaming LLM applications. It enables developers to systematically test prompts against datasets, compare model outputs side-by-side, and run automated security assessments including vulnerability scanning and adversarial attack simulations. Built with a developer-first approach, it supports declarative YAML configs, concurrent evaluation execution, and integrates directly into CI/CD pipelines. Now part of OpenAI while maintaining its MIT license and open-source status, Promptfoo is used by engineering teams at major tech companies including OpenAI and Anthropic themselves.

The Verdict

Who Should Use Promptfoo?

Best For

  • Teams needing systematic prompt evaluation workflows
  • Security-conscious organizations requiring red teaming
  • CI/CD-driven development with automated LLM testing
  • Multi-model comparison and selection
  • Enterprise security teams and CISOs

Not Ideal For

  • Simple single-prompt applications
  • Teams without Node.js in their stack
  • Those needing production observability (use dedicated tools)
  • Non-technical users wanting GUI-only workflows

What's Great

  • Comprehensive security testing - 50+ vulnerability types covered
  • Works with any LLM provider - OpenAI, Anthropic, Azure, local models
  • Declarative YAML configs for reproducible evaluations
  • CI/CD native - GitHub Actions, CLI integration
  • Local execution - data never leaves your machine
  • Side-by-side model comparison with matrix views
  • MIT licensed, fully open source
  • Backed by OpenAI with active development

Watch Out For

  • Requires Node.js 20.20+ or 22.22+ environment
  • CLI-focused - web UI is secondary
  • Learning curve for YAML config syntax
  • Enterprise features require sales contact

Pricing

View all features & details

Evaluation Features

  • Declarative YAML test configs
  • Matrix view comparisons
  • Custom scoring metrics
  • Concurrent execution
  • Live reload and caching
  • Web viewer for results
  • Team sharing capabilities

Security & Red Teaming

  • Prompt injection detection
  • Jailbreak testing
  • Data leak scanning
  • Business rule violations
  • Compliance risk assessment
  • Real-time guardrails
  • Code scanning in IDEs/CI

Supported Providers

  • OpenAI / Azure OpenAI
  • Anthropic Claude
  • Google (Gemini)
  • Amazon Bedrock
  • HuggingFace
  • Ollama (local models)
  • Custom API endpoints

Platform & Installation

  • npm / npx (primary)
  • Homebrew (brew install)
  • pip install
  • TypeScript core (97%)
  • Node.js 20.20+ or 22.22+
  • MIT License

How It Compares

Feature Promptfoo Langfuse Evals Braintrust Weights & Biases
Open Source MIT License Apache 2.0 No No
Red Teaming 50+ vuln types Basic Limited No
CI/CD Native Yes Via API Via API Via API
Local Execution Yes Self-host No No
Free Tier Full features 50K obs Limited Limited
Multi-Provider All major + custom All major All major All major
Primary Focus Eval + Security Observability Evals ML Ops
Best For Security-first teams Full observability Data teams ML workflows

User Reviews

Loading reviews...