Description
Short Description:
This talk goes over how Vox Media built a content similarity tool for journalists, including algorithm selection, input weighting, UI design, and user feedback.
Abstract:
At the end of reading a news article, we sometimes find ourselves wanting more - more context, more insight, perhaps something related that dives deeper. In this talk, I will explain how we built a tool in Python for journalists to discover related content from a large corpus of articles.
I'll go over some of the different models we tried (given the abundance of machine learning and NLP libraries in Python) and how we eventually settled on Word2Vec. I will also talk about how we worked with journalists to incrementally improve the tool - from minor tweaks like assigning higher weights to title words vs article body words to the ability to feed in external seed articles.
Intended Audience:
Python developers who are interested in machine learning and neural networks