Modal is an AI-native cloud runtime engineered from the ground up for GPU-heavy workloads. Its Python SDK allows developers to specify hardware requirements, dependencies, and business logic in a single composable code file, which is then executed on Modal's globally distributed GPU fleet (H100s, A100s, A10Gs, B200s). The platform supports online inference with sub-10ms overhead latency, multi-node training with up to 128 B200s connected via 3200 Gbps Infiniband, and programmatically spawned ephemeral sandboxes for agent rollouts. Integrated observability, token streaming, WebSocket/WebRTC support, and automatic autoscaling (0 to 1000+ GPUs) are all built into the stack.