Description
PyData SF 2016
Python is a great tool for performing data analysis, but often time the hardest part is getting access to your data that’s located in a variety of business systems - files, database, and SaaS applications. Productionizing this process is even harder: scripts frequently fail and require precious to to fix and re-test. In this talk, I will review some open source tools I authored and show you how
In this talk we will cover:
- How we created a data collection tool that can read any chaotically formatted files called "CSV" by guessing its structure automatically
- Explore the plugin-based-architecture that makes it easy to load data from external sources and publish to production systems. From files to business systems such as Salesforce & Mixpanel.
- Review current plugins (over 100 released by the OSS community) and use cases
- Explain how distributed execution enhances stability and scalability