Contribute Media
A thank you to everyone who makes this possible: Read More

Using Python for Linguistic Data Analysis


This tutorial is an introduction to Natural Language Processing using Python, to rapidly build your own NLP module. We will start with the very basics of NLP - Lemmatization, Stemming, POS tagging, Parsing, Language Models, to the more complex pieces of NLP involving probabilities, statistics and word co-occurrences and finally deep learning approaches to NLP and word vectorization techniques.

As the amount of Unstructured Linguistic Data is increasing each day, it is becoming important to develop tools to analyze this data automatically. In this tutorial I will take you through the basics of linguistic data analytics and then build up to come more complicated pieces of NLP. We will start with basic linguistic techniques - such as Lemmatization, Part of Speech Tagging, Parsing etc, and write some code to implement some these using NLTK. Next, I will talk about how probabilities and statistics are used with Linguistic Data Processing to develop Language Models, and finally we will talk about more complicated techniques such as Deep Learning. If we have time, we will go over two use of NLP - search engines, and sentiment analysis of customer reviews

Slides can be found here:


Improve this page