Microsoft GraphRAG

open-source Open-source Star33k

A modular graph-based Retrieval-Augmented Generation (RAG) system by Microsoft Research that builds knowledge graphs from private datasets for superior synthesis and holistic reasoning

rag python api available self hosted

33.7K GitHub Stars

3.6K Forks

v3.1 Latest Release

MIT License

Overview

Microsoft GraphRAG is a structured, hierarchical approach to Retrieval Augmented Generation (RAG), as opposed to naive semantic-search approaches using plain text snippets. Developed by Microsoft Research, it is a data pipeline and transformation suite designed to extract meaningful, structured data from unstructured text using LLMs — building knowledge graphs with hierarchical community clustering to enable holistic understanding of large private datasets. Unlike baseline RAG, which retrieves isolated text chunks, GraphRAG connects disparate information through shared entities and relationship context, dramatically improving answers to complex synthesis questions. The project is open-source (MIT) but is not an officially supported Microsoft product — users supply their own LLM API (OpenAI, Azure OpenAI) and bear those costs directly.

The Verdict

Who Should Use Microsoft GraphRAG?

Best For

Teams reasoning over large private document corpora (research, legal, business)
Use cases requiring synthesis across multiple disconnected sources
Organizations needing holistic summaries, not just point-in-time retrieval
ML engineers who can manage indexing infrastructure and LLM API costs
Projects where answer quality justifies higher upfront indexing spend

Not Ideal For

Budget-constrained projects — indexing can be very expensive
Simple keyword or semantic similarity search (overkill)
Real-time, low-latency retrieval (requires offline preprocessing)
Teams wanting a fully managed SaaS solution
Beginners: requires Python expertise and infrastructure setup

What's Great

Consistently outperforms baseline RAG on complex synthesis questions
Three search modes (Global, Local, DRIFT) cover different query types
Community-based summarization enables holistic reasoning over large corpora
Modular pipeline — swap LLM backends (OpenAI, Azure, local models)
Active Microsoft Research backing with a research paper and blog
MIT licensed — fully open for commercial use
Strong GitHub community with an active issue tracker

Official Docs · MS Research Blog

Watch Out For

Indexing is expensive — Microsoft explicitly warns to "start small" and review costs
Not an officially supported Microsoft product (research project)
Requires LLM API access (OpenAI/Azure) for indexing — adds per-token cost
Offline-only indexing step means no real-time document ingestion
Complex setup compared to turnkey RAG solutions
Config format can change between minor versions

GitHub README · Official Docs

Pricing

Open Source

Free

MIT licensed. Self-host on your infrastructure. LLM API costs (OpenAI/Azure) are separate and can be significant.

View all features & details

Search Modes

Global Search — Holistic reasoning over entire corpus via community summaries; best for synthesis questions
Local Search — Entity-focused retrieval fanning out to neighbors and related concepts
DRIFT Search — Local search enriched with community context for deeper entity reasoning

Indexing Pipeline

LLM-powered entity and relationship extraction
Knowledge graph construction from raw text
Hierarchical community detection (Leiden algorithm)
Community summary generation at multiple granularities
Configurable chunking and data preparation

Integrations

OpenAI and Azure OpenAI (primary LLM backends)
LangChain and LlamaIndex compatible
CLI and Python API access
Prompt tuning (auto and manual)
Configurable embedding models

Deployment

Self-hosted on any infrastructure
Python package (pip install graphrag)
CLI for indexing and querying
Compatible with local LLMs via Ollama (community)
Azure-native deployment supported

How It Compares

Feature	Microsoft GraphRAG	Baseline RAG	Graphiti	Neo4j GraphRAG
Knowledge Graph	Auto-built from text	None	Auto-built	Manual modeling
Synthesis Questions	Excellent	Poor	Good	Good
Search Modes	Global / Local / DRIFT	Semantic only	Graph traversal	Cypher + Vector
Indexing Cost	High (LLM calls)	Low	Medium	Low
Real-time Ingestion	No	Yes	Yes	Yes
Open Source	MIT	N/A	Apache 2.0	GPL3
Managed Option	No	Varies	No	Yes (AuraDB)
Best For	Private corpus synthesis	Simple lookup	Agents/memory	Relationship apps

User Reviews

Loading reviews...