Distributed agent tracing — Instrument multi-step agent workflows with span-level tracing to inspect every LLM call, tool invocation, and retrieval step individually.
Automated online scoring — Configure continuous scorers that evaluate production outputs in real time, triggering alerts or quality gates when scores drop below thresholds.
Prompt & model experimentation — Run versioned eval experiments against curated datasets, comparing prompts and model configurations with reproducible, side-by-side scoring.
Trace-to-dataset pipeline — Automatically promote production traces to labeled datasets for regression testing, closing the loop between observability and evaluation.
MCP server integration — Connect coding agents (e.g., Cursor) to Braintrust via the MCP server to query logs, run evals, and push prompt updates directly from the IDE.
Custom facet & annotation UIs — Define business-specific dimensions (compliance, tone, customer segment) and build task-specific annotation interfaces without any frontend engineering.

Braintrust