Modal
Serverless cloud platform for running Python code on GPUs with instant cold starts and pay-per-second billing
Overview
Modal is a serverless cloud platform designed for running Python code on GPUs without managing infrastructure. Unlike traditional cloud providers where you rent VMs and pay for idle time, Modal lets you define functions that automatically scale from zero to thousands of containers in seconds. The platform excels at ML inference, batch processing, and data pipelines—you write Python with decorators, and Modal handles containerization, orchestration, and GPU allocation. Founded by Erik Bernhardsson (creator of Luigi at Spotify), Modal focuses on developer experience with features like instant hot-reloading, built-in cron scheduling, and web endpoint generation. The pay-per-second billing means you only pay when your code runs, making it cost-effective for bursty workloads.
The Verdict
Who Should Use Modal?
Best For
- ML engineers deploying inference endpoints
- Data teams running batch GPU jobs
- Startups wanting serverless ML without DevOps
- Bursty workloads (pay only when running)
- Python-first teams (native SDK)
Not Ideal For
- 24/7 persistent workloads (dedicated VMs cheaper)
- Non-Python codebases (Python-only)
- Teams needing on-premise deployment
- Simple API calls (use Together/Fireworks)
What's Great
- Sub-second cold starts (vs minutes elsewhere)
- Pay-per-second billing with no idle costs
- Python-native SDK with decorators
- Generous free tier ($30/month credits)
- Built-in GPU memory caching
- Web endpoints auto-generated from functions
- Hot-reloading during development
- Excellent developer documentation
Watch Out For
- Python-only (no Node.js, Go, etc.)
- Higher per-hour cost than reserved instances
- Vendor lock-in with proprietary decorators
- Limited GPU memory on smaller tiers
- Learning curve for decorator-based model
Pricing
View all features & details
GPU Options
- NVIDIA T4 (16GB) — $0.59/hr
- NVIDIA L4 (24GB) — $0.80/hr
- NVIDIA A10G (24GB) — $1.10/hr
- NVIDIA A100 40GB — $2.78/hr
- NVIDIA A100 80GB — $3.78/hr
- NVIDIA H100 80GB — $4.76/hr
- Multi-GPU configurations available
Core Features
- Serverless GPU functions
- Sub-second cold starts
- Pay-per-second billing
- Auto-scaling to 1000s of containers
- Built-in cron scheduling
- Web endpoints (REST & WebSocket)
- Persistent volumes for storage
- Secrets management
Developer Experience
- Python SDK with decorators
- Hot-reload during development
- Local debugging support
- Built-in logging & monitoring
- CLI and dashboard
- Git-based deployments
- Environment snapshots
ML Features
- GPU memory caching (fast model loads)
- Custom container images
- Hugging Face integration
- PyTorch & TensorFlow support
- vLLM & TGI compatible
- Batch inference optimization
Company Background
- Founded 2021 by Erik Bernhardsson
- Creator of Luigi (Spotify)
- $60M+ total funding raised
- Y Combinator backed (W22)
Use Cases
- LLM inference endpoints
- Image/video processing pipelines
- Batch ML training jobs
- Web scrapers & data pipelines
How It Compares
| Feature | Modal | Replicate | RunPod | Lambda Labs |
|---|---|---|---|---|
| Model | Serverless functions | API endpoints | VM rental | VM rental |
| Cold Start | <1 second | 5-30 seconds | N/A (always-on) | N/A |
| Billing | Per-second | Per-prediction | Per-hour | Per-hour |
| Free Tier | $30/mo | Pay-as-you-go | $25 credit | None |
| H100 Price | $4.76/hr | ~$0.0023/sec | $4.49/hr | $2.49/hr |
| Python SDK | Native | Yes | REST only | N/A |
| Custom Code | Full control | Pre-built models | Full control | Full control |
| Best For | Bursty workloads | Quick prototypes | 24/7 training | Budget training |