deepgram.md

name: Deepgram slug: deepgram website: https://deepgram.com/ type: commercial track: users category: generative-media subcategory: voice-speech status: active description: Enterprise voice AI platform offering speech-to-text, text-to-speech, and real-time voice agent APIs with industry-leading accuracy and speed pricing_model: freemium founded_year: 2015 headquarters: San Francisco, CA last_verified: ‘2026-06-09’ confidence_score: 0.95 —

$86M+ Total Funding
$200 Free Credits
10+ Languages
$0.0048 Per Min STT

Overview

Deepgram is an enterprise-grade voice AI platform that provides APIs for speech-to-text (STT), text-to-speech (TTS), and real-time voice agents. Founded in 2015 and backed by $86M+ in funding, the company offers the Nova-3 transcription model with industry-leading accuracy and the Aura-2 text-to-speech engine. Their unified Voice Agent API enables developers to build conversational AI that handles both listening and speaking in real-time. Deepgram serves major enterprises including Twilio, Cloudflare, and IBM, and offers both cloud and self-hosted deployment options.

The Verdict

Who Should Use Deepgram?

Best For

  • Developers building real-time voice applications
  • Contact centers needing accurate transcription
  • Teams requiring both STT and TTS in one platform
  • High-volume transcription workloads
  • Enterprises needing self-hosted deployment
  • Voice agent builders seeking low latency

Not Ideal For

  • Non-developers wanting turnkey solutions
  • Teams needing 100+ language support
  • Budget-constrained small projects
  • Voice cloning use cases (limited support)

What's Great

  • Industry-leading accuracy with Nova-3 model
  • Unified API for STT, TTS, and voice agents
  • True per-second billing (no rounding)
  • $200 free credits to start
  • Real-time and batch processing options
  • Self-hosted deployment available
  • Flux multilingual model supports 10+ languages

Watch Out For

  • Developer-focused—requires coding to use
  • Limited language support vs competitors (10+ vs 100+)
  • TTS quality not as natural as ElevenLabs
  • Voice cloning not a core offering
  • Enterprise pricing requires contact

Pricing

View all features & details

Speech-to-Text

  • Nova-3 model with Arabic support
  • Real-time streaming transcription
  • Batch processing for recordings
  • Speaker diarization
  • Punctuation and formatting
  • Custom vocabulary support

Text-to-Speech

  • Aura-2 voice synthesis
  • Natural-sounding output
  • Multiple voice options
  • Streaming audio generation
  • Low-latency responses

Voice Agent API

  • Unified listening + speaking
  • Real-time conversational AI
  • Sub-100ms latency possible
  • Turn-taking management
  • Interrupt handling

Languages (Flux Model)

  • English, Spanish, German
  • French, Hindi, Russian
  • Portuguese, Japanese
  • Italian, Dutch
  • Arabic (Nova-3)

Deployment

  • Cloud-hosted API
  • Self-hosted on-premise
  • REST API access
  • WebSocket streaming
  • Python/Node.js SDKs

Enterprise

  • SOC 2 compliance
  • HIPAA compliance
  • Custom concurrency limits
  • Volume discounts
  • Dedicated support

How It Compares

Feature Deepgram AssemblyAI OpenAI Whisper Google STT
STT Accuracy Best-in-class Excellent Very Good Good
TTS Built-in Yes No No Yes
Voice Agents Native API No No No
Languages 10+ 50+ 100+ 125+
Self-Hosted Yes No Yes No
Per-Second Billing Yes Per-second Per-second Per-second
Free Credits $200 $50 Open-source $300

User Reviews

Loading reviews...