Description
We’ve all heard terms like Bayes error, perceptron learning theorem, the fundamental theorem of statistical learning, VC dimension, etc. This talk is about using the math-heavy fundamentals of machine learning to understand the very solvability of classification problems. By the end of the talk, you will get a clear picture of how these ideas can be practically applied to classification problems.
Why does a classifier not fit? This can only happen for two reasons:
- Because the model is not smart enough, or
- Because the training data itself is not “classifiable”.
Unfortunately, the only obvious way to determine the classifiability or separability of a training dataset is to use a variety of classification models with a variety of hyperparameters. In other words, separability of classes in a dataset is usually expressed only in terms of which model worked on that dataset.
Unfortunately, this does not answer the fundamental question of whether a dataset is classifiable or not. If we keep on increasing the complexity of models and trying them out on a dataset without success, all we can infer from this is that the set of models we have tried out so far are incapable of learning the classification problem. It does not necessarily mean that the problem is unsolvable.
Fortunately, many shallow learning models have been widely studied and are well understood. As such, it is quite possible to place theoretical bounds on their performance in the context of a dataset. There are a variety of statistics that we can use a priori to determine the likelihood of a model fitting a dataset.
This talk is about how we can use these results towards developing a strategy, a structured approach for carrying out machine learning experiments, instead of blindly running models and hoping that one of them works. Starting from elementary results like Bayes theorem and the perceptron learning rule all the way up to complex ideas like kernel methods and VC dimension, this talk develops a framework for the analysis of data in the context of separability of classes.
While the talk might sound theoretical, major focus will be on how to make practical, hands-on use of these concepts to better understand your data and your models. By the end of the talk, you will have learnt how to prioritize which models to use on which dataset, and how to compute the likelihood of them fitting on the data. This rigorous analysis of models and data saves a lot of effort and money, as the talk will demonstrate with real-world examples.