Microsoft GraphRAG iconMicrosoft GraphRAG

open-source Open-source Star33k

A modular graph-based Retrieval-Augmented Generation (RAG) system by Microsoft Research that builds knowledge graphs from private datasets for superior synthesis and holistic reasoning

33.7K GitHub Stars
3.6K Forks
v3.1 Latest Release
MIT License

Overview

Microsoft GraphRAG is a structured, hierarchical approach to Retrieval Augmented Generation (RAG), as opposed to naive semantic-search approaches using plain text snippets. Developed by Microsoft Research, it is a data pipeline and transformation suite designed to extract meaningful, structured data from unstructured text using LLMs — building knowledge graphs with hierarchical community clustering to enable holistic understanding of large private datasets. Unlike baseline RAG, which retrieves isolated text chunks, GraphRAG connects disparate information through shared entities and relationship context, dramatically improving answers to complex synthesis questions. The project is open-source (MIT) but is not an officially supported Microsoft product — users supply their own LLM API (OpenAI, Azure OpenAI) and bear those costs directly.

The Verdict

Who Should Use Microsoft GraphRAG?

Best For

  • Teams reasoning over large private document corpora (research, legal, business)
  • Use cases requiring synthesis across multiple disconnected sources
  • Organizations needing holistic summaries, not just point-in-time retrieval
  • ML engineers who can manage indexing infrastructure and LLM API costs
  • Projects where answer quality justifies higher upfront indexing spend

Not Ideal For

  • Budget-constrained projects — indexing can be very expensive
  • Simple keyword or semantic similarity search (overkill)
  • Real-time, low-latency retrieval (requires offline preprocessing)
  • Teams wanting a fully managed SaaS solution
  • Beginners: requires Python expertise and infrastructure setup

What's Great

  • Consistently outperforms baseline RAG on complex synthesis questions
  • Three search modes (Global, Local, DRIFT) cover different query types
  • Community-based summarization enables holistic reasoning over large corpora
  • Modular pipeline — swap LLM backends (OpenAI, Azure, local models)
  • Active Microsoft Research backing with a research paper and blog
  • MIT licensed — fully open for commercial use
  • Strong GitHub community with an active issue tracker

Watch Out For

  • Indexing is expensive — Microsoft explicitly warns to "start small" and review costs
  • Not an officially supported Microsoft product (research project)
  • Requires LLM API access (OpenAI/Azure) for indexing — adds per-token cost
  • Offline-only indexing step means no real-time document ingestion
  • Complex setup compared to turnkey RAG solutions
  • Config format can change between minor versions

Pricing

View all features & details

Search Modes

  • Global Search — Holistic reasoning over entire corpus via community summaries; best for synthesis questions
  • Local Search — Entity-focused retrieval fanning out to neighbors and related concepts
  • DRIFT Search — Local search enriched with community context for deeper entity reasoning

Indexing Pipeline

  • LLM-powered entity and relationship extraction
  • Knowledge graph construction from raw text
  • Hierarchical community detection (Leiden algorithm)
  • Community summary generation at multiple granularities
  • Configurable chunking and data preparation

Integrations

  • OpenAI and Azure OpenAI (primary LLM backends)
  • LangChain and LlamaIndex compatible
  • CLI and Python API access
  • Prompt tuning (auto and manual)
  • Configurable embedding models

Deployment

  • Self-hosted on any infrastructure
  • Python package (pip install graphrag)
  • CLI for indexing and querying
  • Compatible with local LLMs via Ollama (community)
  • Azure-native deployment supported

How It Compares

Feature Microsoft GraphRAG Baseline RAG Graphiti Neo4j GraphRAG
Knowledge Graph Auto-built from text None Auto-built Manual modeling
Synthesis Questions Excellent Poor Good Good
Search Modes Global / Local / DRIFT Semantic only Graph traversal Cypher + Vector
Indexing Cost High (LLM calls) Low Medium Low
Real-time Ingestion No Yes Yes Yes
Open Source MIT N/A Apache 2.0 GPL3
Managed Option No Varies No Yes (AuraDB)
Best For Private corpus synthesis Simple lookup Agents/memory Relationship apps

User Reviews

Loading reviews...