deepgram.md
name: Deepgram slug: deepgram website: https://deepgram.com/ type: commercial track: users category: generative-media subcategory: voice-speech status: active description: Enterprise voice AI platform offering speech-to-text, text-to-speech, and real-time voice agent APIs with industry-leading accuracy and speed pricing_model: freemium founded_year: 2015 headquarters: San Francisco, CA last_verified: ‘2026-06-09’ confidence_score: 0.95 —
Overview
Deepgram is an enterprise-grade voice AI platform that provides APIs for speech-to-text (STT), text-to-speech (TTS), and real-time voice agents. Founded in 2015 and backed by $86M+ in funding, the company offers the Nova-3 transcription model with industry-leading accuracy and the Aura-2 text-to-speech engine. Their unified Voice Agent API enables developers to build conversational AI that handles both listening and speaking in real-time. Deepgram serves major enterprises including Twilio, Cloudflare, and IBM, and offers both cloud and self-hosted deployment options.
The Verdict
Who Should Use Deepgram?
Best For
- Developers building real-time voice applications
- Contact centers needing accurate transcription
- Teams requiring both STT and TTS in one platform
- High-volume transcription workloads
- Enterprises needing self-hosted deployment
- Voice agent builders seeking low latency
Not Ideal For
- Non-developers wanting turnkey solutions
- Teams needing 100+ language support
- Budget-constrained small projects
- Voice cloning use cases (limited support)
What's Great
- Industry-leading accuracy with Nova-3 model
- Unified API for STT, TTS, and voice agents
- True per-second billing (no rounding)
- $200 free credits to start
- Real-time and batch processing options
- Self-hosted deployment available
- Flux multilingual model supports 10+ languages
Watch Out For
- Developer-focused—requires coding to use
- Limited language support vs competitors (10+ vs 100+)
- TTS quality not as natural as ElevenLabs
- Voice cloning not a core offering
- Enterprise pricing requires contact
Pricing
View all features & details
Speech-to-Text
- Nova-3 model with Arabic support
- Real-time streaming transcription
- Batch processing for recordings
- Speaker diarization
- Punctuation and formatting
- Custom vocabulary support
Text-to-Speech
- Aura-2 voice synthesis
- Natural-sounding output
- Multiple voice options
- Streaming audio generation
- Low-latency responses
Voice Agent API
- Unified listening + speaking
- Real-time conversational AI
- Sub-100ms latency possible
- Turn-taking management
- Interrupt handling
Languages (Flux Model)
- English, Spanish, German
- French, Hindi, Russian
- Portuguese, Japanese
- Italian, Dutch
- Arabic (Nova-3)
Deployment
- Cloud-hosted API
- Self-hosted on-premise
- REST API access
- WebSocket streaming
- Python/Node.js SDKs
Enterprise
- SOC 2 compliance
- HIPAA compliance
- Custom concurrency limits
- Volume discounts
- Dedicated support
How It Compares
| Feature | Deepgram | AssemblyAI | OpenAI Whisper | Google STT |
|---|---|---|---|---|
| STT Accuracy | Best-in-class | Excellent | Very Good | Good |
| TTS Built-in | Yes | No | No | Yes |
| Voice Agents | Native API | No | No | No |
| Languages | 10+ | 50+ | 100+ | 125+ |
| Self-Hosted | Yes | No | Yes | No |
| Per-Second Billing | Yes | Per-second | Per-second | Per-second |
| Free Credits | $200 | $50 | Open-source | $300 |