Docling
Docling is an open-source document parser by IBM Research that converts PDFs, Office files, and 20+ formats into structured, LLM-ready data using layout-aware AI models with best-in-class table extraction accuracy.
Overview
Docling is IBM Research's open-source document intelligence toolkit, designed to get documents ready for generative AI. It converts PDFs, DOCX, PPTX, XLSX, HTML, images, audio, and more into a unified structured representation using state-of-the-art layout models and the TableFormer model — trained on 1M+ tables — for 97.9% cell accuracy on complex tables. It runs fully local with no API costs, GPU acceleration support, and air-gapped deployment capability, making it the go-to choice for privacy-sensitive and cost-sensitive RAG pipelines. Donated to the Linux Foundation's Agentic AI Foundation in 2025, it has seen rapid community adoption since its public launch in August 2025.
The Verdict
Who Should Use Docling?
Best For
- Teams building RAG pipelines that need the highest document parsing accuracy, especially complex tables
- Organizations with compliance or privacy requirements that prohibit sending documents to cloud APIs
- Cost-sensitive workloads processing millions of pages — zero per-page cost vs. $0.10+/page for SaaS alternatives
- Developers in the LangChain, LlamaIndex, Haystack, or Crew AI ecosystems — native integrations exist
- Teams processing scientific papers, XBRL financial filings, JATS articles, or mixed-media documents
Not Ideal For
- Teams needing the fastest processing — cloud APIs like LlamaParse are ~10x faster (6s vs. 65s for 50 pages)
- Handwriting recognition or form checkbox extraction — not yet supported
- Chart and figure extraction — still listed as coming soon
- Teams needing vendor-backed enterprise compliance certifications (SOC 2, HIPAA) from the tool provider
What's Great
- Best-in-class table extraction — 97.9% cell accuracy on complex tables (vs. 75% for Unstructured, unreliable for LlamaParse)
- 100% text extraction accuracy on independent benchmarks
- Fully free and open-source (MIT) — no per-page fees, no API dependency, no vendor lock-in
- Self-hosted and air-gapped deployment for regulated industries
- Up to 6x GPU speedup on NVIDIA CUDA, AMD ROCm, and Apple Silicon MLX
- TableFormer model trained on 1M+ tables; layout model on 81,000 manually labeled pages
- MCP integration for agentic AI workflows
- IBM processed 2.1M PDFs from Common Crawl using Docling — proven at scale
Watch Out For
- Slower than cloud APIs — ~65 seconds for 50 pages locally vs. ~6 seconds for LlamaParse
- Chart and figure extraction not yet available — listed as coming soon
- No form extraction or handwriting recognition
- Multilingual support for Arabic, Chinese, Japanese is experimental, not enterprise-validated
- TableFormer uses fixed batch size of 4 regardless of GPU VRAM — inefficient for high-VRAM setups
- Large container images: 4.4GB (CPU) to 11.4GB (CUDA)
Pricing
View all features & details
Supported File Formats
- PDF (layout-aware, scanned via OCR)
- DOCX, PPTX, XLSX (Office documents)
- HTML, EPUB (web and ebook)
- PNG, TIFF, JPEG (images)
- WAV, MP3 (audio transcription)
- WebVTT (captions)
- EML, MSG (email)
- LaTeX, plain text
- XBRL (financial/regulatory)
- JATS (scientific articles)
Key AI Models
- TableFormer — trained on 1M+ tables for complex table extraction
- Granite-Docling-258M VLM — 258M parameter visual language model (Apache 2.0)
- Layout model trained on 81,000 manually labeled pages
- DocTags markup format preserving structure and provenance
Output Formats
- Markdown (with structure preserved)
- HTML
- JSON (DoclingDocument schema)
- Structured data via Pydantic schemas
- Bounding box metadata for citations
Integrations
- LangChain, LlamaIndex, Crew AI, Haystack (native)
- Model Context Protocol (MCP) for agentic workflows
- Red Hat AI 3.3 and OpenShift AI
- Anyscale / KubeRay for distributed processing
- Java via docling-serve REST API
Performance
- Up to 6x speedup with GPU over CPU-only
- NVIDIA CUDA, AMD ROCm, Apple Silicon MLX support
- Distributed batch processing via Ray Data
- DocLayNet 88.5% mAP on layout analysis benchmark
Deployment Options
- Local Python library (pip install docling)
- Docker container (CPU: 4.4GB, CUDA: 11.4GB)
- Air-gapped / offline deployment
- docling-serve REST API wrapper
- OpenShift Operator for enterprise Kubernetes
How It Compares
| Feature | Docling | LlamaParse | Unstructured |
|---|---|---|---|
| Text Accuracy | 100% | Good | High |
| Complex Table Accuracy | 97.9% | Inconsistent | 75% |
| Processing Speed (50 pages) | ~65s local | ~6s (cloud) | ~141s |
| Cost | Free (MIT) | $0.0013–$0.056/page | $0.03/page |
| Self-hosted / Air-gapped | Yes | No | Yes (Business) |
| GPU Acceleration | Yes | N/A (cloud) | No |
| Enterprise Compliance | Self-managed | No | SOC2, HIPAA |
| File Formats | 20+ | 130+ | 60+ |
| MCP Support | Yes | No | No |
| Best For | Accuracy + privacy | Speed + APIs | Compliance + breadth |