Contribute Media
A thank you to everyone who makes this possible: Read More

Query Embeddings: Web Scale Search powered by Deep Learning and Python

Description

Ankit Bahuguna - Query Embeddings: Web Scale Search powered by Deep Learning and Python [EuroPython 2016] [18 July 2016] [Bilbao, Euskadi, Spain] (https://ep2016.europython.eu//conference/talks/query-embeddings)

Query Embeddings is an unsupervised deep learning based system, built using Python and open source libraries (Annoy, keyvi etc.) which recognizes similarity between queries and their vector representations, for a web scale search engine integrated within Cliqz browser [https://cliqz.com/en]. It improves recall for previously unseen queries and is one of the many key components of our search stack. The framework be utilized by other low latency systems involving vector representations.


A web search engine allows a user to type few words of query and it presents list of potential relevant results within fraction of a second. Traditionally, keywords in the user query were fuzzy-matched in realtime with the keywords within different pages of the index and they didn't really focus on understanding meaning of query. Recently, Deep Learning + NLP techniques try to represent sentences or documents as fixed dimensional vectors in high dimensional space. These special vectors inherit semantics of the document.

Query embeddings is an unsupervised deep learning based system, built using Python, Word2Vec, Annoy and Keyvi (https://github.com/cliqz- oss/keyvi) which recognizes similarity between queries and their vectors for a web scale search engine within Cliqz browser. (https://cliqz.com/en)

alternate text

The goal is to describe how query embeddings contribute to our existing python search stack at scale and latency issues prevailing in real time search system. Also is a preview of separate vector index for queries, utilized by retrieval system at runtime via ANNs to get closest queries to user query, which is one of the many key components of our search stack.

alternate text

Prerequisites: Basic experience in NLP, ML, Deep Learning, Web search and Vector Algebra. Libraries: Annoy.

Improve this page