RunPod
Free tierThe AI Developer Cloud — experiment, train, fine-tune, deploy, and scale on one platform
Free tier available·Technical·API available
Key strengths
Sub-200ms cold starts with FlashBoot — no warm-up tax30+ GPU SKUs across 31 global regionsAutoscaling from 0 to thousands of workers in under 250msZero idle cost on Serverless endpointsFull AI lifecycle: pods, serverless, and multi-node clusters in one account
Free tier + paid plans
Moorestown, United States
No ratings yet
Technical Setup & API Usage
Serverless Endpoint (Handler Pattern)
Write a Python handler function and push your Docker container to RunPod Serverless:
import runpod
def handler(job):
job_input = job["input"]
# Your inference logic here
result = my_model.predict(job_input["prompt"])
return {"output": result}
runpod.serverless.start({"handler": handler})
Key Deployment Steps
- Containerize your model with any framework (PyTorch, TensorFlow, JAX, etc.) using a standard Dockerfile.
- Push the image to a container registry (Docker Hub, GHCR, etc.) and reference it in the RunPod console or via the API.
- Configure autoscaling — set min/max worker counts, concurrency, and execution timeout per endpoint.
- Cold start optimization — RunPod's FlashBoot technology reduces cold start times to sub-200ms, eliminating the need for keep-warm hacks.
REST API
All Serverless endpoints expose a standard REST interface:
curl -X POST https://api.runpod.ai/v2/{endpoint_id}/run \
-H "Authorization: Bearer $RUNPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": {"prompt": "Hello, world!"}}'
Key Parameters
| Parameter | Description |
|---|---|
RUNPOD_API_KEY | Auth token from your RunPod dashboard |
endpoint_id | Unique ID for your Serverless endpoint |
input | JSON payload passed to your handler |
min_workers / max_workers | Autoscaling bounds |
