Distributed LLM tracing: Instrument multi-step agentic workflows with OpenTelemetry spans conforming to the OpenInference spec, capturing full execution context across tool calls, retrievals, and LLM completions.
LLM-as-a-judge evaluation pipelines: Use phoenix.evals to run scalable automated evaluations (relevance, toxicity, hallucination, Q&A correctness) using any LLM as the judge.
Dataset curation from traces: Export production traces into structured datasets for fine-tuning, regression testing, or benchmarking new model versions.
A/B experimentation on prompts and retrievers: Run controlled experiments in the Prompt IDE comparing prompt variants or retrieval strategies against the same dataset using custom eval metrics.
Self-hosted observability backend: Deploy Phoenix on-prem via Docker or Kubernetes (Helm) to keep all trace data within your own infrastructure — critical for sensitive or regulated environments.
MCP-based agent automation: Connect coding agents to Phoenix via the MCP skill interface to programmatically instrument, query traces, and trigger evaluation runs as part of CI/CD workflows.

Phoenix