Technical Documentation & Integration Guide

Installation

pip install clearml
clearml-init  # Configure credentials against your ClearML server

Experiment Tracking (SDK)

from clearml import Task

task = Task.init(project_name="My Project", task_name="Training Run")
# ClearML auto-captures: hyperparameters, metrics, console output, Git diff, installed packages, and artifacts

Key Technical Components

Infrastructure Control Plane: REST API-driven GPU orchestration supporting Kubernetes, bare-metal, and cloud VMs. Supports dynamic fractional GPU allocation, multi-tenant isolation (separate networks + storage per tenant), priority-based job queues, and chargeback billing by compute hours, storage, and API calls.
AI Development Center: Provides a cloud-like self-serve workbench with integrated IDE access, data versioning, automated ML pipelines (DAG-based), model registry, hyperparameter optimization (HPO), and CI/CD integration hooks.
GenAI App Engine: Deploys LLM inference endpoints onto GPU clusters with built-in auth, monitoring, vector DB tooling, and feedback collection APIs. Supports custom models and fine-tuning workflows.
ClearML Agent: A daemon that pulls tasks from queues and executes them remotely, enabling fully reproducible remote training and experiment orchestration.
Data Management: clearml-data CLI and SDK for versioned dataset management with built-in deduplication and remote storage support (S3, GCS, Azure Blob).

Self-Hosting

Deploy via Docker Compose or Helm chart on Kubernetes. Community and Enterprise server options available with varying SLA and support tiers.