Docling

open-source Open-source Star61k

Docling is an open-source document parser by IBM Research that converts PDFs, Office files, and 20+ formats into structured, LLM-ready data using layout-aware AI models with best-in-class table extraction accuracy.

rag python self hosted mcp server

61.7K GitHub Stars

97.9% Table Accuracy

100+ Releases

Free MIT License

Overview

Docling is IBM Research's open-source document intelligence toolkit, designed to get documents ready for generative AI. It converts PDFs, DOCX, PPTX, XLSX, HTML, images, audio, and more into a unified structured representation using state-of-the-art layout models and the TableFormer model — trained on 1M+ tables — for 97.9% cell accuracy on complex tables. It runs fully local with no API costs, GPU acceleration support, and air-gapped deployment capability, making it the go-to choice for privacy-sensitive and cost-sensitive RAG pipelines. Donated to the Linux Foundation's Agentic AI Foundation in 2025, it has seen rapid community adoption since its public launch in August 2025.

The Verdict

Who Should Use Docling?

Best For

Teams building RAG pipelines that need the highest document parsing accuracy, especially complex tables
Organizations with compliance or privacy requirements that prohibit sending documents to cloud APIs
Cost-sensitive workloads processing millions of pages — zero per-page cost vs. $0.10+/page for SaaS alternatives
Developers in the LangChain, LlamaIndex, Haystack, or Crew AI ecosystems — native integrations exist
Teams processing scientific papers, XBRL financial filings, JATS articles, or mixed-media documents

Not Ideal For

Teams needing the fastest processing — cloud APIs like LlamaParse are ~10x faster (6s vs. 65s for 50 pages)
Handwriting recognition or form checkbox extraction — not yet supported
Chart and figure extraction — still listed as coming soon
Teams needing vendor-backed enterprise compliance certifications (SOC 2, HIPAA) from the tool provider

What's Great

Best-in-class table extraction — 97.9% cell accuracy on complex tables (vs. 75% for Unstructured, unreliable for LlamaParse)
100% text extraction accuracy on independent benchmarks
Fully free and open-source (MIT) — no per-page fees, no API dependency, no vendor lock-in
Self-hosted and air-gapped deployment for regulated industries
Up to 6x GPU speedup on NVIDIA CUDA, AMD ROCm, and Apple Silicon MLX
TableFormer model trained on 1M+ tables; layout model on 81,000 manually labeled pages
MCP integration for agentic AI workflows
IBM processed 2.1M PDFs from Common Crawl using Docling — proven at scale

GitHub (official) · procycons Benchmark · IDP Software Review

Watch Out For

Slower than cloud APIs — ~65 seconds for 50 pages locally vs. ~6 seconds for LlamaParse
Chart and figure extraction not yet available — listed as coming soon
No form extraction or handwriting recognition
Multilingual support for Arabic, Chinese, Japanese is experimental, not enterprise-validated
TableFormer uses fixed batch size of 4 regardless of GPU VRAM — inefficient for high-VRAM setups
Large container images: 4.4GB (CPU) to 11.4GB (CUDA)

procycons Benchmark · Reducto Comparison

Pricing

Open Source

Free

MIT License. Self-hosted, air-gapped, no API costs. GPU acceleration included. No page limits.

Granite-Docling VLM

Free

258M parameter visual language model (Apache 2.0). Download separately for production-grade visual understanding.

View all features & details

Supported File Formats

PDF (layout-aware, scanned via OCR)
DOCX, PPTX, XLSX (Office documents)
HTML, EPUB (web and ebook)
PNG, TIFF, JPEG (images)
WAV, MP3 (audio transcription)
WebVTT (captions)
EML, MSG (email)
LaTeX, plain text
XBRL (financial/regulatory)
JATS (scientific articles)

Key AI Models

TableFormer — trained on 1M+ tables for complex table extraction
Granite-Docling-258M VLM — 258M parameter visual language model (Apache 2.0)
Layout model trained on 81,000 manually labeled pages
DocTags markup format preserving structure and provenance

Output Formats

Markdown (with structure preserved)
HTML
JSON (DoclingDocument schema)
Structured data via Pydantic schemas
Bounding box metadata for citations

Integrations

LangChain, LlamaIndex, Crew AI, Haystack (native)
Model Context Protocol (MCP) for agentic workflows
Red Hat AI 3.3 and OpenShift AI
Anyscale / KubeRay for distributed processing
Java via docling-serve REST API

Performance

Up to 6x speedup with GPU over CPU-only
NVIDIA CUDA, AMD ROCm, Apple Silicon MLX support
Distributed batch processing via Ray Data
DocLayNet 88.5% mAP on layout analysis benchmark

Deployment Options

Local Python library (pip install docling)
Docker container (CPU: 4.4GB, CUDA: 11.4GB)
Air-gapped / offline deployment
docling-serve REST API wrapper
OpenShift Operator for enterprise Kubernetes

How It Compares

Feature	Docling	LlamaParse	Unstructured
Text Accuracy	100%	Good	High
Complex Table Accuracy	97.9%	Inconsistent	75%
Processing Speed (50 pages)	~65s local	~6s (cloud)	~141s
Cost	Free (MIT)	$0.0013–$0.056/page	$0.03/page
Self-hosted / Air-gapped	Yes	No	Yes (Business)
GPU Acceleration	Yes	N/A (cloud)	No
Enterprise Compliance	Self-managed	No	SOC2, HIPAA
File Formats	20+	130+	60+
MCP Support	Yes	No	No
Best For	Accuracy + privacy	Speed + APIs	Compliance + breadth

User Reviews

Loading reviews...