DSPy
Framework for programming—not prompting—language models through declarative, self-improving Python code
Overview
DSPy (Declarative Self-improving Python) is a Stanford NLP framework that fundamentally rethinks how developers interact with language models. Instead of manually crafting and tuning prompts, DSPy lets you write compositional Python code that defines your task's structure, then automatically optimizes prompts and even fine-tunes weights to maximize quality. Built on the research that produced the seminal "Demonstrate-Search-Predict" paper, DSPy treats prompts as hyperparameters to be learned rather than hand-engineered. This paradigm shift is particularly powerful for RAG pipelines, multi-hop reasoning, and any LLM system where reliability matters more than quick prototyping.
The Verdict
Who Should Use DSPy?
Best For
- ML researchers and academics
- Teams building production RAG systems
- Multi-hop reasoning pipelines
- Projects requiring systematic prompt optimization
- Those tired of brittle, hand-tuned prompts
Not Ideal For
- Simple chatbot applications (overkill)
- Quick prototyping (use LangChain)
- Teams unfamiliar with ML concepts
- Real-time agent orchestration (use LangGraph)
What's Great
- Automatic prompt optimization eliminates manual tuning
- Modular, composable pipeline architecture
- Works with any LLM provider (OpenAI, Anthropic, local)
- Strong academic foundation (Stanford NLP)
- Active research community and rapid development
- Clean separation of program logic from prompts
Watch Out For
- Steep learning curve requires ML background
- Less ecosystem and integrations than LangChain
- Optimization can be compute-intensive
- Documentation assumes research familiarity
- Smaller community for troubleshooting
Pricing
View all features & details
Core Modules
- dspy.Predict - basic LM calls
- dspy.ChainOfThought - reasoning steps
- dspy.ReAct - agent-like behavior
- dspy.ProgramOfThought - code generation
- dspy.Retrieve - retrieval augmentation
- dspy.Assert - runtime constraints
- dspy.Suggest - soft constraints
- Signatures - I/O declarations
Optimizers (Teleprompters)
- BootstrapFewShot - example selection
- BootstrapFewShotWithRandomSearch
- COPRO - prompt optimization
- MIPROv2 - multi-stage optimization
- SignatureOptimizer - structure tuning
- BootstrapFinetune - weight updates
- GEPA - reflective evolution
LLM Providers
- OpenAI (GPT-4, GPT-4o)
- Anthropic (Claude 3.5, Claude 4)
- Google (Gemini)
- Cohere (Command)
- Ollama (local models)
- vLLM, Together, Anyscale
- Hugging Face models
Retrieval Integrations
- Pinecone
- Weaviate
- Chroma
- Qdrant
- FAISS
- ColBERTv2
- RAGatouille
Research Foundation
- Stanford NLP Group
- ICLR 2024 publication
- 10+ research papers
- Matei Zaharia, Omar Khattab et al.
How It Compares
| Feature | DSPy | LangChain | LlamaIndex | Haystack |
|---|---|---|---|---|
| Primary Focus | Prompt optimization | General LLM apps | RAG & indexing | Search pipelines |
| GitHub Stars | 34.8K | 98K+ | 38K+ | 18K+ |
| Programming Model | Declarative Python | Imperative chains | Data-centric | Pipeline-based |
| Auto-optimization | Yes (built-in) | Manual | Manual | Manual |
| Learning Curve | Steep (ML background) | Moderate | Moderate | Easy |
| Agent Support | Basic (ReAct) | Full (LangGraph) | LlamaAgents | Basic |
| Production Tools | None | LangSmith | LlamaCloud | Haystack Cloud |
| Best For | Research & optimization | Full-stack apps | RAG systems | Enterprise search |