Contribute Media
A thank you to everyone who makes this possible: Read More

Building Features from Audio for Machine Learning – Jyotika Singh (PyCon Taiwan 2021)


Day 2, 10:00-10:30


Audio signals are a different type of data than the more commonly seen types, such as text, numbers and images. Thus, numerical feature extraction for audio data is different and follows processes that try to replicate how human ears perceive sound. Audio and speech processing has a massive amount of research and methods available in MATLAB. Given the popularity of Python in the field of Machine Learning, feature extraction and audio classification model building in Python will be discussed. This talk will cover details about Audio signals, feature types, feature extraction using Python, open-source tools, followed by practical examples of training Machine Learning models using Audio data.


Unlike types of data that are more commonly dealt with in the industry these days, such as numerical data, text or image data, audio signals need a different approach while trying to extract information and building machine learning models. This talk will highlight the challenges with Audio Classification problems starting with what an audio signal is and what its numerical representation means, how it is widely different from other data types, what feature extraction from audio looks like, how to go about it, what it means and the open-source tools in Python that can be leveraged for the same. Digital signal processing, that includes audio processing, is a whole separate field to study and leveraging portions of learning from that in order to build successful models on audio data is an interesting and challenging problem. In addition, Matlab is a popular language of choice with great tools for audio signal processing. Python being a popular language of choice for Machine Learning presents another set of challenges to build successful audio and speech classification solutions in Python alone. Focus will then upon how to build classification models from the features representing the unseen information from audio and speech signals and doing it all leveraging different open-source tools available to Python users. This will be followed by a few examples of different audio classification and prediction problem statements, such as audio type classification, music genre classification and spoke location name classification, and a solution for attempting to solve them using Python using the different features formation techniques and tools discussed earlier in the talk.

Slides not uploaded by the speaker. HackMD:

Speaker: Jyotika Singh

Jyotika Singh is the VP of Data Science at ICX Media where mentors and manages her team as they work on NLP, feature engineering, supervised & unsupervised machine learning, research, analytics, programming in Python and distributed computing with Spark. She earned her Master’s in Science from the UCLA where she researched signal & speech processing, developed novel approaches to remove noise from speech and worked on a variety of machine learning on image, text, social media, consumer & entertainment data. She enjoys working on problem solving techniques on text, audio & image, and has opened multiple open-source projects to share her work with the Python & Data Science community. She is passionate about women in STEM and continues mentorship efforts to support the topic. She volunteers as a mentor at Data Science Nigeria and Women Impact Tech.


Improve this page