Modal iconModal

commercial Pay-as-you-go

Serverless cloud platform for running Python code on GPUs with instant cold starts and pay-per-second billing

$30/mo Free Credits
<1s Cold Start
$2.78/hr H100 GPU

Overview

Modal is a serverless cloud platform designed for running Python code on GPUs without managing infrastructure. Unlike traditional cloud providers where you rent VMs and pay for idle time, Modal lets you define functions that automatically scale from zero to thousands of containers in seconds. The platform excels at ML inference, batch processing, and data pipelines—you write Python with decorators, and Modal handles containerization, orchestration, and GPU allocation. Founded by Erik Bernhardsson (creator of Luigi at Spotify), Modal focuses on developer experience with features like instant hot-reloading, built-in cron scheduling, and web endpoint generation. The pay-per-second billing means you only pay when your code runs, making it cost-effective for bursty workloads.

The Verdict

Who Should Use Modal?

Best For

  • ML engineers deploying inference endpoints
  • Data teams running batch GPU jobs
  • Startups wanting serverless ML without DevOps
  • Bursty workloads (pay only when running)
  • Python-first teams (native SDK)

Not Ideal For

  • 24/7 persistent workloads (dedicated VMs cheaper)
  • Non-Python codebases (Python-only)
  • Teams needing on-premise deployment
  • Simple API calls (use Together/Fireworks)

What's Great

  • Sub-second cold starts (vs minutes elsewhere)
  • Pay-per-second billing with no idle costs
  • Python-native SDK with decorators
  • Generous free tier ($30/month credits)
  • Built-in GPU memory caching
  • Web endpoints auto-generated from functions
  • Hot-reloading during development
  • Excellent developer documentation

Watch Out For

  • Python-only (no Node.js, Go, etc.)
  • Higher per-hour cost than reserved instances
  • Vendor lock-in with proprietary decorators
  • Limited GPU memory on smaller tiers
  • Learning curve for decorator-based model
r/MachineLearning · Developer feedback

Pricing

View all features & details

GPU Options

  • NVIDIA T4 (16GB) — $0.59/hr
  • NVIDIA L4 (24GB) — $0.80/hr
  • NVIDIA A10G (24GB) — $1.10/hr
  • NVIDIA A100 40GB — $2.78/hr
  • NVIDIA A100 80GB — $3.78/hr
  • NVIDIA H100 80GB — $4.76/hr
  • Multi-GPU configurations available

Core Features

  • Serverless GPU functions
  • Sub-second cold starts
  • Pay-per-second billing
  • Auto-scaling to 1000s of containers
  • Built-in cron scheduling
  • Web endpoints (REST & WebSocket)
  • Persistent volumes for storage
  • Secrets management

Developer Experience

  • Python SDK with decorators
  • Hot-reload during development
  • Local debugging support
  • Built-in logging & monitoring
  • CLI and dashboard
  • Git-based deployments
  • Environment snapshots

ML Features

  • GPU memory caching (fast model loads)
  • Custom container images
  • Hugging Face integration
  • PyTorch & TensorFlow support
  • vLLM & TGI compatible
  • Batch inference optimization

Company Background

  • Founded 2021 by Erik Bernhardsson
  • Creator of Luigi (Spotify)
  • $60M+ total funding raised
  • Y Combinator backed (W22)

Use Cases

  • LLM inference endpoints
  • Image/video processing pipelines
  • Batch ML training jobs
  • Web scrapers & data pipelines

How It Compares

Feature Modal Replicate RunPod Lambda Labs
Model Serverless functions API endpoints VM rental VM rental
Cold Start <1 second 5-30 seconds N/A (always-on) N/A
Billing Per-second Per-prediction Per-hour Per-hour
Free Tier $30/mo Pay-as-you-go $25 credit None
H100 Price $4.76/hr ~$0.0023/sec $4.49/hr $2.49/hr
Python SDK Native Yes REST only N/A
Custom Code Full control Pre-built models Full control Full control
Best For Bursty workloads Quick prototypes 24/7 training Budget training

User Reviews

Loading reviews...