deepgram.md

—

name: Deepgram slug: deepgram website: https://deepgram.com/ type: commercial track: users category: generative-media subcategory: voice-speech status: active description: Enterprise voice AI platform offering speech-to-text, text-to-speech, and real-time voice agent APIs with industry-leading accuracy and speed pricing_model: freemium founded_year: 2015 headquarters: San Francisco, CA last_verified: ‘2026-06-09’ confidence_score: 0.95 —

$86M+ Total Funding

$200 Free Credits

10+ Languages

$0.0048 Per Min STT

Overview

Deepgram is an enterprise-grade voice AI platform that provides APIs for speech-to-text (STT), text-to-speech (TTS), and real-time voice agents. Founded in 2015 and backed by $86M+ in funding, the company offers the Nova-3 transcription model with industry-leading accuracy and the Aura-2 text-to-speech engine. Their unified Voice Agent API enables developers to build conversational AI that handles both listening and speaking in real-time. Deepgram serves major enterprises including Twilio, Cloudflare, and IBM, and offers both cloud and self-hosted deployment options.

The Verdict

Who Should Use Deepgram?

Best For

Developers building real-time voice applications
Contact centers needing accurate transcription
Teams requiring both STT and TTS in one platform
High-volume transcription workloads
Enterprises needing self-hosted deployment
Voice agent builders seeking low latency

Not Ideal For

Non-developers wanting turnkey solutions
Teams needing 100+ language support
Budget-constrained small projects
Voice cloning use cases (limited support)

What's Great

Industry-leading accuracy with Nova-3 model
Unified API for STT, TTS, and voice agents
True per-second billing (no rounding)
$200 free credits to start
Real-time and batch processing options
Self-hosted deployment available
Flux multilingual model supports 10+ languages

Official Site · Pricing Page

Watch Out For

Developer-focused—requires coding to use
Limited language support vs competitors (10+ vs 100+)
TTS quality not as natural as ElevenLabs
Voice cloning not a core offering
Enterprise pricing requires contact

Pricing

Pay As You Go

$0.0048/min

$200 free credit, STT streaming

TTS (Aura-2)

$0.030/1K chars

Natural voice synthesis

Voice Agent

$0.075/min

Real-time conversational AI

Growth

$4K+/year

Up to 20% savings, priority support

Enterprise

Custom

Volume discounts, dedicated support

View all features & details

Speech-to-Text

Nova-3 model with Arabic support
Real-time streaming transcription
Batch processing for recordings
Speaker diarization
Punctuation and formatting
Custom vocabulary support

Text-to-Speech

Aura-2 voice synthesis
Natural-sounding output
Multiple voice options
Streaming audio generation
Low-latency responses

Voice Agent API

Unified listening + speaking
Real-time conversational AI
Sub-100ms latency possible
Turn-taking management
Interrupt handling

Languages (Flux Model)

English, Spanish, German
French, Hindi, Russian
Portuguese, Japanese
Italian, Dutch
Arabic (Nova-3)

Deployment

Cloud-hosted API
Self-hosted on-premise
REST API access
WebSocket streaming
Python/Node.js SDKs

Enterprise

SOC 2 compliance
HIPAA compliance
Custom concurrency limits
Volume discounts
Dedicated support

How It Compares

Feature	Deepgram	AssemblyAI	OpenAI Whisper	Google STT
STT Accuracy	Best-in-class	Excellent	Very Good	Good
TTS Built-in	Yes	No	No	Yes
Voice Agents	Native API	No	No	No
Languages	10+	50+	100+	125+
Self-Hosted	Yes	No	Yes	No
Per-Second Billing	Yes	Per-second	Per-second	Per-second
Free Credits	$200	$50	Open-source	$300

User Reviews

Loading reviews...