Modal
Free tierHigh-performance AI infrastructure with sub-second cold starts and instant autoscaling
Free tier available·Technical·API available
Key strengths
Sub-second cold starts with instant container boot timesAutoscale from 0 to 1000+ GPUs on demand with no capacity planningPython-native SDK — define infrastructure and logic in a single fileFull support for inference, training, sandboxes, and batch processingSOC2 & HIPAA compliant with battle-tested isolation and data residency controls
Free tier + paid plans · from $30 USD/mo
San Francisco, USA
Founded 2021
No ratings yet
Developer Documentation
Installation & Authentication
pip install modal
modal token new # authenticates via browser
Defining a GPU Function
import modal
app = modal.App("my-inference-app")
image = modal.Image.debian_slim().pip_install("torch", "transformers")
@app.function(gpu="H100", image=image)
def run_inference(prompt: str) -> str:
# your model logic here
return result
Key Primitives
@app.function()— decorates any Python function to run remotely; acceptsgpu,image,timeout,concurrency_limit, and more.modal.Image— defines the container environment (base OS, pip packages, system dependencies, custom Dockerfiles).modal.Sandbox— programmatically spins up ephemeral, isolated execution environments for running untrusted or agent-generated code.modal.web_endpoint()— exposes a function as an HTTPS endpoint with built-in support for streaming, WebSocket, and WebRTC.
Scaling & Deployment
- Autoscaling is handled automatically; set
allow_concurrent_inputsandkeep_warmparameters for latency-sensitive workloads. - Multi-node training uses
modal.Clusterwith gang scheduling and Infiniband networking configured in a single line. - Secrets and environment variables are managed via
modal.Secret, injectable at function level.
