W&B Weave iconW&B Weave

open-source Freemium Star1k

Open source LLM observability and evaluation toolkit from Weights & Biases. Trace, evaluate, and monitor AI applications from experimentation to production with a single line of code.

1.1K GitHub Stars
700K+ W&B Users
35+ Integrations

Overview

W&B Weave is an open source observability and evaluation toolkit that helps developers trace, evaluate, and monitor LLM applications from experimentation to production. With a single line of code using the @weave.op decorator, developers can automatically log all inputs, outputs, and metadata at granular level—organizing data into navigable trace trees for debugging complex agentic workflows. Weave reached General Availability in December 2024 and is now part of CoreWeave following Weights & Biases' acquisition in March 2025 at a $1.7B valuation. It supports both Python (3.10+) and TypeScript/JavaScript environments with native integrations for 15+ LLM providers including OpenAI, Anthropic, and Google, plus 19+ frameworks including LangChain, LlamaIndex, Claude Agent SDK, and CrewAI.

The Verdict

Who Should Use W&B Weave?

Best For

  • Teams already using W&B for ML experiment tracking
  • Production agent debugging and root cause analysis
  • Multi-agent system observability
  • Teams wanting open source with enterprise backing
  • Evaluation-heavy workflows with LLM-as-judge

Not Ideal For

  • Teams wanting fully self-hosted (requires W&B account)
  • LangChain-only shops (LangSmith more native)
  • Simple single-model apps (overkill)
  • Teams avoiding vendor ecosystems

What's Great

  • Open source (Apache 2.0) with enterprise support
  • Single-line integration via @weave.op decorator
  • Native multi-agent trace trees with session/turn organization
  • Built-in scorers for safety (toxicity, PII, hallucinations)
  • Run evaluations on live production traces
  • Automatic code/dataset/scorer versioning
  • 35+ integrations including Claude Agent SDK

Watch Out For

  • Requires W&B account—can't run fully standalone
  • Younger than competitors (GA December 2024)
  • Less community content than Langfuse/LangSmith
  • Python 3.10+ required (no 3.9 support)
  • Advanced features tied to W&B enterprise tiers

Pricing

View all features & details

Tracing

  • @weave.op decorator for automatic logging
  • Nested trace trees with session organization
  • Multi-agent turn tracking
  • Input/output/metadata capture
  • Cost and latency tracking
  • Code versioning per trace

Evaluation

  • LLM-as-judge scorers
  • Safety scorers (toxicity, bias, PII, hallucinations)
  • Quality scorers (coherence, fluency, relevance)
  • Custom scorer support
  • Human/expert feedback collection
  • Production trace evaluation
  • Side-by-side comparison

LLM Providers

  • OpenAI
  • Anthropic
  • Google AI
  • Amazon Bedrock
  • Azure OpenAI
  • Cohere, Groq, Mistral
  • LiteLLM (unified interface)
  • Local models

Framework Integrations

  • LangChain / LangGraph
  • LlamaIndex
  • Claude Agent SDK
  • OpenAI Agents SDK
  • CrewAI / AutoGen
  • DSPy
  • Haystack
  • PydanticAI / Instructor
  • Vercel AI SDK

How It Compares

Feature W&B Weave Langfuse LangSmith Arize Phoenix
Open Source Apache 2.0 MIT No Apache 2.0
Self-Hosted No (needs W&B) Yes No Yes
Agent Tracing Native multi-agent Via SDK Native Via SDK
Built-in Scorers Safety + Quality LLM-as-judge Online evals LLM-as-judge
Prod Evaluation Live traces Manual Manual Manual
ML Integration Full W&B platform None None Limited
Free Tier Limited 50K obs/mo 5K traces/mo Unlimited local
Best For W&B users, agents Full control LangChain users Local dev

User Reviews

Loading reviews...