Developer Documentation

Deployment

Baseten supports deploying models via its Python SDK (truss) or directly through the web dashboard. Models are packaged as Truss containers — a standardized model-serving format that bundles weights, dependencies, and a serving config.

pip install truss
truss init my_model
truss push --trusted

Key Configuration Parameters

resources.accelerator — specify GPU type (e.g., A100, H100)
runtime.predict_concurrency — set per-replica concurrency for batching
autoscaling.min_replica / max_replica — control horizontal scaling bounds
build.base_image — define the base Docker image for your model environment

APIs & Integrations

REST API — each deployed model gets a unique HTTPS endpoint; authenticate with a bearer token.
Baseten Chains — define multi-step compound AI pipelines with per-node GPU allocation and autoscaling using a Python-native DSL.
Baseten Embeddings Inference (BEI) — drop-in high-throughput embeddings API with OpenAI-compatible interface.
Frontier Gateway — monetize and distribute your custom model via a Baseten-powered inference API.
Training SDK (Baseten Loops) — train frontier RL models and deploy them to inference-optimized infra in one click.