W&B Weave

open-source Freemium Star1k

Open source LLM observability and evaluation toolkit from Weights & Biases. Trace, evaluate, and monitor AI applications from experimentation to production with a single line of code.

observability tracing agents

1.1K GitHub Stars

700K+ W&B Users

35+ Integrations

Overview

W&B Weave is an open source observability and evaluation toolkit that helps developers trace, evaluate, and monitor LLM applications from experimentation to production. With a single line of code using the @weave.op decorator, developers can automatically log all inputs, outputs, and metadata at granular level—organizing data into navigable trace trees for debugging complex agentic workflows. Weave reached General Availability in December 2024 and is now part of CoreWeave following Weights & Biases' acquisition in March 2025 at a $1.7B valuation. It supports both Python (3.10+) and TypeScript/JavaScript environments with native integrations for 15+ LLM providers including OpenAI, Anthropic, and Google, plus 19+ frameworks including LangChain, LlamaIndex, Claude Agent SDK, and CrewAI.

The Verdict

Who Should Use W&B Weave?

Best For

Teams already using W&B for ML experiment tracking
Production agent debugging and root cause analysis
Multi-agent system observability
Teams wanting open source with enterprise backing
Evaluation-heavy workflows with LLM-as-judge

Not Ideal For

Teams wanting fully self-hosted (requires W&B account)
LangChain-only shops (LangSmith more native)
Simple single-model apps (overkill)
Teams avoiding vendor ecosystems

What's Great

Open source (Apache 2.0) with enterprise support
Single-line integration via @weave.op decorator
Native multi-agent trace trees with session/turn organization
Built-in scorers for safety (toxicity, PII, hallucinations)
Run evaluations on live production traces
Automatic code/dataset/scorer versioning
35+ integrations including Claude Agent SDK

Official Docs · GitHub

Watch Out For

Requires W&B account—can't run fully standalone
Younger than competitors (GA December 2024)
Less community content than Langfuse/LangSmith
Python 3.10+ required (no 3.9 support)
Advanced features tied to W&B enterprise tiers

GitHub Issues · PyPI

Pricing

Free

Basic tracing, evaluations, limited history

Teams

Usage-based

Unlimited history, team collaboration

Enterprise

Custom

SSO, RBAC, dedicated support, SLAs

View all features & details

Tracing

@weave.op decorator for automatic logging
Nested trace trees with session organization
Multi-agent turn tracking
Input/output/metadata capture
Cost and latency tracking
Code versioning per trace

Evaluation

LLM-as-judge scorers
Safety scorers (toxicity, bias, PII, hallucinations)
Quality scorers (coherence, fluency, relevance)
Custom scorer support
Human/expert feedback collection
Production trace evaluation
Side-by-side comparison

LLM Providers

OpenAI
Anthropic
Google AI
Amazon Bedrock
Azure OpenAI
Cohere, Groq, Mistral
LiteLLM (unified interface)
Local models

Framework Integrations

LangChain / LangGraph
LlamaIndex
Claude Agent SDK
OpenAI Agents SDK
CrewAI / AutoGen
DSPy
Haystack
PydanticAI / Instructor
Vercel AI SDK

How It Compares

Feature	W&B Weave	Langfuse	LangSmith	Arize Phoenix
Open Source	Apache 2.0	MIT	No	Apache 2.0
Self-Hosted	No (needs W&B)	Yes	No	Yes
Agent Tracing	Native multi-agent	Via SDK	Native	Via SDK
Built-in Scorers	Safety + Quality	LLM-as-judge	Online evals	LLM-as-judge
Prod Evaluation	Live traces	Manual	Manual	Manual
ML Integration	Full W&B platform	None	None	Limited
Free Tier	Limited	50K obs/mo	5K traces/mo	Unlimited local
Best For	W&B users, agents	Full control	LangChain users	Local dev

User Reviews

Loading reviews...