Vespa is a distributed AI search platform with native tensor computation support, enabling complex machine-learned model inference directly within the retrieval layer — eliminating the need for separate ranking services. It supports hybrid retrieval combining BM25 text search, approximate nearest neighbor (ANN) vector search, and structured filtering in a single query pass. Vespa's architecture provides automated horizontal scalability, continuous deployment with zero downtime, and a streaming search mode for per-user data that avoids expensive global indexing. The platform integrates with AWS and is available as a managed cloud service (Vespa Cloud) or as an open-source engine deployable on any infrastructure.