Contribute Media
A thank you to everyone who makes this possible: Read More

Keep it secret, keep it safe! Preserving anonymity by subverting stylometry

Description

In this talk, I will introduce you to adversarial stylometry and demonstrate several techniques with a web tool I built that uses Flask, the Natural Language Toolkit (NLTK), and Scikit-learn.

What’s stylometry? If you wish to remain anonymous, you can use any number of privacy technologies, but you could still be identified simply by the words you use. Using machine learning, stylometry can identify authors of anonymous documents by analyzing the frequency of function words (“of” and “was,” for example) and comparing results to known writing samples. Your writing style is therefore uniquely quantifiable and can serve reliably as a biometric. Writers who wish to remain anonymous — like whistleblowers, activists, and cryptocurrency inventors — should consider using “adversarial” stylometric techniques to outsmart authorship attribution software. In this presentation, I will explain how this is possible and demonstrate a few ways to preserve your anonymity, including using a synonym replacer programmed in Python.

As a relatively new programmer, I took advantage of several Python libraries to help me build this tool. I will touch on calculating word frequency with NLTK and using Scikit-learn to classify documents. This talk is geared toward people who want to use Python to analyze, transform, and generate written language.

Improve this page