Summary
An introduction to combinatorial construction of features for protein structures and some practical applications and state of the art results on task like structural and functional classification, decoy identification, and fast finding of neighboring structures.
Slides available here:http://162.243.152.57:7000/PyDataLDN2015f.pdf
Code and quick tutorial: https://github.com/RicardoCorralC/rccPyDataLondon2015
Description
Proteins are the most abundant macromolecules on cells. They perform a wide range of biological activities due to its adopted three dimensional structures. First requirement to make use of Machine Learning technologies on this context, is to construct an informative set of features for representing protein structures. We make use of the Residue Cluster Class System, a labeled Sperner Family arising from atomic positions, giving a total set of 26 features. Practical applications are presented for various classical computational biology tasks. Entire code base is implemented on Python as an API and ready to use final user programs.