Phoenix
Free tierThe open-source platform for AI agent development, tracing, and evaluation
Free tier available·Technical·Powered by Vendor Agnostic·API available·Open source
Key strengths
Full OpenTelemetry-native tracing for LLM agentsLLM-as-a-judge and human annotation for evaluationVendor-agnostic — works with any model, framework, or languageSelf-hostable with zero data leaving your infrastructureEnd-to-end iteration loop: trace → annotate → experiment → measure
Free tier + paid plans
US
Self-hostable
No ratings yet
- Distributed LLM tracing: Instrument multi-step agentic workflows with OpenTelemetry spans conforming to the OpenInference spec, capturing full execution context across tool calls, retrievals, and LLM completions.
- LLM-as-a-judge evaluation pipelines: Use
phoenix.evalsto run scalable automated evaluations (relevance, toxicity, hallucination, Q&A correctness) using any LLM as the judge. - Dataset curation from traces: Export production traces into structured datasets for fine-tuning, regression testing, or benchmarking new model versions.
- A/B experimentation on prompts and retrievers: Run controlled experiments in the Prompt IDE comparing prompt variants or retrieval strategies against the same dataset using custom eval metrics.
- Self-hosted observability backend: Deploy Phoenix on-prem via Docker or Kubernetes (Helm) to keep all trace data within your own infrastructure — critical for sensitive or regulated environments.
- MCP-based agent automation: Connect coding agents to Phoenix via the MCP skill interface to programmatically instrument, query traces, and trigger evaluation runs as part of CI/CD workflows.
