Contribute Media
A thank you to everyone who makes this possible: Read More

LightningTalk: MultiRay: An Accelerated Embedding Service for Content Understanding

Description

We present an overview of MultiRay, our high performance embedding service. The service provides a shared inference architecture that provides embeddings for content, and serves a small set of foundation models shared by all use cases. At present, we are serving 3 embedding services, for text understanding (TextRay), image understanding (ImageRay) and multi-modal whole-post understanding (text, image, etc) (PostRay). The system serves over over 800B requests daily, with up to 20M queries per second, serving over 125 different use cases.

Details

Improve this page