Docling iconDocling

open-source Open-source Star61k

Docling is an open-source document parser by IBM Research that converts PDFs, Office files, and 20+ formats into structured, LLM-ready data using layout-aware AI models with best-in-class table extraction accuracy.

61.7K GitHub Stars
97.9% Table Accuracy
100+ Releases
Free MIT License

Overview

Docling is IBM Research's open-source document intelligence toolkit, designed to get documents ready for generative AI. It converts PDFs, DOCX, PPTX, XLSX, HTML, images, audio, and more into a unified structured representation using state-of-the-art layout models and the TableFormer model — trained on 1M+ tables — for 97.9% cell accuracy on complex tables. It runs fully local with no API costs, GPU acceleration support, and air-gapped deployment capability, making it the go-to choice for privacy-sensitive and cost-sensitive RAG pipelines. Donated to the Linux Foundation's Agentic AI Foundation in 2025, it has seen rapid community adoption since its public launch in August 2025.

The Verdict

Who Should Use Docling?

Best For

  • Teams building RAG pipelines that need the highest document parsing accuracy, especially complex tables
  • Organizations with compliance or privacy requirements that prohibit sending documents to cloud APIs
  • Cost-sensitive workloads processing millions of pages — zero per-page cost vs. $0.10+/page for SaaS alternatives
  • Developers in the LangChain, LlamaIndex, Haystack, or Crew AI ecosystems — native integrations exist
  • Teams processing scientific papers, XBRL financial filings, JATS articles, or mixed-media documents

Not Ideal For

  • Teams needing the fastest processing — cloud APIs like LlamaParse are ~10x faster (6s vs. 65s for 50 pages)
  • Handwriting recognition or form checkbox extraction — not yet supported
  • Chart and figure extraction — still listed as coming soon
  • Teams needing vendor-backed enterprise compliance certifications (SOC 2, HIPAA) from the tool provider

What's Great

  • Best-in-class table extraction — 97.9% cell accuracy on complex tables (vs. 75% for Unstructured, unreliable for LlamaParse)
  • 100% text extraction accuracy on independent benchmarks
  • Fully free and open-source (MIT) — no per-page fees, no API dependency, no vendor lock-in
  • Self-hosted and air-gapped deployment for regulated industries
  • Up to 6x GPU speedup on NVIDIA CUDA, AMD ROCm, and Apple Silicon MLX
  • TableFormer model trained on 1M+ tables; layout model on 81,000 manually labeled pages
  • MCP integration for agentic AI workflows
  • IBM processed 2.1M PDFs from Common Crawl using Docling — proven at scale

Watch Out For

  • Slower than cloud APIs — ~65 seconds for 50 pages locally vs. ~6 seconds for LlamaParse
  • Chart and figure extraction not yet available — listed as coming soon
  • No form extraction or handwriting recognition
  • Multilingual support for Arabic, Chinese, Japanese is experimental, not enterprise-validated
  • TableFormer uses fixed batch size of 4 regardless of GPU VRAM — inefficient for high-VRAM setups
  • Large container images: 4.4GB (CPU) to 11.4GB (CUDA)

Pricing

View all features & details

Supported File Formats

  • PDF (layout-aware, scanned via OCR)
  • DOCX, PPTX, XLSX (Office documents)
  • HTML, EPUB (web and ebook)
  • PNG, TIFF, JPEG (images)
  • WAV, MP3 (audio transcription)
  • WebVTT (captions)
  • EML, MSG (email)
  • LaTeX, plain text
  • XBRL (financial/regulatory)
  • JATS (scientific articles)

Key AI Models

  • TableFormer — trained on 1M+ tables for complex table extraction
  • Granite-Docling-258M VLM — 258M parameter visual language model (Apache 2.0)
  • Layout model trained on 81,000 manually labeled pages
  • DocTags markup format preserving structure and provenance

Output Formats

  • Markdown (with structure preserved)
  • HTML
  • JSON (DoclingDocument schema)
  • Structured data via Pydantic schemas
  • Bounding box metadata for citations

Integrations

  • LangChain, LlamaIndex, Crew AI, Haystack (native)
  • Model Context Protocol (MCP) for agentic workflows
  • Red Hat AI 3.3 and OpenShift AI
  • Anyscale / KubeRay for distributed processing
  • Java via docling-serve REST API

Performance

  • Up to 6x speedup with GPU over CPU-only
  • NVIDIA CUDA, AMD ROCm, Apple Silicon MLX support
  • Distributed batch processing via Ray Data
  • DocLayNet 88.5% mAP on layout analysis benchmark

Deployment Options

  • Local Python library (pip install docling)
  • Docker container (CPU: 4.4GB, CUDA: 11.4GB)
  • Air-gapped / offline deployment
  • docling-serve REST API wrapper
  • OpenShift Operator for enterprise Kubernetes

How It Compares

Feature Docling LlamaParse Unstructured
Text Accuracy 100% Good High
Complex Table Accuracy 97.9% Inconsistent 75%
Processing Speed (50 pages) ~65s local ~6s (cloud) ~141s
Cost Free (MIT) $0.0013–$0.056/page $0.03/page
Self-hosted / Air-gapped Yes No Yes (Business)
GPU Acceleration Yes N/A (cloud) No
Enterprise Compliance Self-managed No SOC2, HIPAA
File Formats 20+ 130+ 60+
MCP Support Yes No No
Best For Accuracy + privacy Speed + APIs Compliance + breadth

User Reviews

Loading reviews...