Developer Documentation

Installation & Authentication

pip install modal
modal token new  # authenticates via browser

Defining a GPU Function

import modal

app = modal.App("my-inference-app")
image = modal.Image.debian_slim().pip_install("torch", "transformers")

@app.function(gpu="H100", image=image)
def run_inference(prompt: str) -> str:
    # your model logic here
    return result

Key Primitives

@app.function() — decorates any Python function to run remotely; accepts gpu, image, timeout, concurrency_limit, and more.
modal.Image — defines the container environment (base OS, pip packages, system dependencies, custom Dockerfiles).
modal.Sandbox — programmatically spins up ephemeral, isolated execution environments for running untrusted or agent-generated code.
modal.web_endpoint() — exposes a function as an HTTPS endpoint with built-in support for streaming, WebSocket, and WebRTC.

Scaling & Deployment

Autoscaling is handled automatically; set allow_concurrent_inputs and keep_warm parameters for latency-sensitive workloads.
Multi-node training uses modal.Cluster with gang scheduling and Infiniband networking configured in a single line.
Secrets and environment variables are managed via modal.Secret, injectable at function level.

Modal

Developer Documentation

Installation & Authentication

Defining a GPU Function

Key Primitives

Scaling & Deployment