Modal

commercial Pay-as-you-go

Serverless cloud platform for running Python code on GPUs with instant cold starts and pay-per-second billing

api available python serverless

$30/mo Free Credits

<1s Cold Start

$2.78/hr H100 GPU

Overview

Modal is a serverless cloud platform designed for running Python code on GPUs without managing infrastructure. Unlike traditional cloud providers where you rent VMs and pay for idle time, Modal lets you define functions that automatically scale from zero to thousands of containers in seconds. The platform excels at ML inference, batch processing, and data pipelines—you write Python with decorators, and Modal handles containerization, orchestration, and GPU allocation. Founded by Erik Bernhardsson (creator of Luigi at Spotify), Modal focuses on developer experience with features like instant hot-reloading, built-in cron scheduling, and web endpoint generation. The pay-per-second billing means you only pay when your code runs, making it cost-effective for bursty workloads.

The Verdict

Who Should Use Modal?

Best For

ML engineers deploying inference endpoints
Data teams running batch GPU jobs
Startups wanting serverless ML without DevOps
Bursty workloads (pay only when running)
Python-first teams (native SDK)

Not Ideal For

24/7 persistent workloads (dedicated VMs cheaper)
Non-Python codebases (Python-only)
Teams needing on-premise deployment
Simple API calls (use Together/Fireworks)

What's Great

Sub-second cold starts (vs minutes elsewhere)
Pay-per-second billing with no idle costs
Python-native SDK with decorators
Generous free tier ($30/month credits)
Built-in GPU memory caching
Web endpoints auto-generated from functions
Hot-reloading during development
Excellent developer documentation

Modal · Docs

Watch Out For

Python-only (no Node.js, Go, etc.)
Higher per-hour cost than reserved instances
Vendor lock-in with proprietary decorators
Limited GPU memory on smaller tiers
Learning curve for decorator-based model

r/MachineLearning · Developer feedback

Pricing

Free Tier

$30/mo credits

Generous free compute monthly

A10G GPU

$1.10/hr

24GB VRAM, good for inference

A100 40GB

$2.78/hr

Training & large model inference

H100 80GB

$4.76/hr

Frontier models, fastest inference

View all features & details

GPU Options

NVIDIA T4 (16GB) — $0.59/hr
NVIDIA L4 (24GB) — $0.80/hr
NVIDIA A10G (24GB) — $1.10/hr
NVIDIA A100 40GB — $2.78/hr
NVIDIA A100 80GB — $3.78/hr
NVIDIA H100 80GB — $4.76/hr
Multi-GPU configurations available

Core Features

Serverless GPU functions
Sub-second cold starts
Pay-per-second billing
Auto-scaling to 1000s of containers
Built-in cron scheduling
Web endpoints (REST & WebSocket)
Persistent volumes for storage
Secrets management

Developer Experience

Python SDK with decorators
Hot-reload during development
Local debugging support
Built-in logging & monitoring
CLI and dashboard
Git-based deployments
Environment snapshots

ML Features

GPU memory caching (fast model loads)
Custom container images
Hugging Face integration
PyTorch & TensorFlow support
vLLM & TGI compatible
Batch inference optimization

Company Background

Founded 2021 by Erik Bernhardsson
Creator of Luigi (Spotify)
$60M+ total funding raised
Y Combinator backed (W22)

Modal About

Use Cases

LLM inference endpoints
Image/video processing pipelines
Batch ML training jobs
Web scrapers & data pipelines

Modal Examples

How It Compares

Feature	Modal	Replicate	RunPod	Lambda Labs
Model	Serverless functions	API endpoints	VM rental	VM rental
Cold Start	<1 second	5-30 seconds	N/A (always-on)	N/A
Billing	Per-second	Per-prediction	Per-hour	Per-hour
Free Tier	$30/mo	Pay-as-you-go	$25 credit	None
H100 Price	$4.76/hr	~$0.0023/sec	$4.49/hr	$2.49/hr
Python SDK	Native	Yes	REST only	N/A
Custom Code	Full control	Pre-built models	Full control	Full control
Best For	Bursty workloads	Quick prototypes	24/7 training	Budget training

User Reviews

Loading reviews...