When developing a predictive machine learning model for a tabular data problem, we are normally inundated with variety of predictive features to try out.The features are a blend of numerical and categorical features.When handling the categorical features normally an analysts defaults to the most convenient method or most documented method on the web or on stack-overflow forums.But this is where there is a high possibility of missing out on significant predictive gain by representing a feature to an algorithm in a format where it adds the most to overall predictive efficiency .
This talk aims to share a quick overview of categorical encoding techniques and some time-tested intuitions on when to use what.
Produced by NDV: https://youtube.com/channel/UCQ7dFBzZGlBvtU2hCecsBBg?sub_confirmation=1
Python, PyCon, PyConAU, PyConline
Sat Sep 5 16:00:00 2020 at Floperator