Description
Test-driven data analysis fuses and builds upon the ideas of test-driven development and reproducible research to support higher quality data analysis. This talk will extend the foundation parts of TDDA with extensions including tight constraints on string fields with automatically discovered regular expressions and automatically discovered relationships between datasets.
Abstract
Test-driven data analysis fuses and builds upon the ideas of test-driven development and reproducible research to support higher quality data analysis.
Foundational concepts are:
- Level 0: Reference Tests
- Level 1: Automatic constraint discovery and validation.
This talk will extend these to cover tight constraints on string fields with
- automatically discovered regular expressions with rexpy
- constraints between datasets and probably more.
Background material:
PyCon UK Talk, Cardiff, Test-Driven Data Analysis https://www.youtube.com/watch?v=FIw_7aUuY50 Blog: http://tdda.info, especially posts http://www.tdda.info/the-new-referencetest-class-for-tdda and http://www.tdda.info/constraint-discovery-and-verification-for-pandas-dataframes Overview: http://www.predictiveanalyticsworld.com/patimes/four-ways-data-science-goes-wrong-and-how-test-driven-data-analysis-can-help/ In terms of some of the new material that will be covered in this talk, see http://www.tdda.info/introducing-rexpy-automatic-discovery-of-regular-expressions http://rexpy.herokuapp.com