Contribute Media
A thank you to everyone who makes this possible: Read More

Democratizing Data Science


What if programming and data science was something everybody learned? As part of the Community Data Science Workshops, more than 50 volunteers have taught than 200 complete beginners the basics of Python and data analysis in a series of 4-day workshops. We'll talk about our approach and describe the successes and challenges of approaching Python and data science as basic literacies.

The Community Data Science Workshops (CDSW) are a series of project-based workshops for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, Twitter, free and open source software, and civic media. The workshops are for people with no previous programming experience. The workshops bring together researchers, academics, and participants and leaders in online communities. Run three times in 2014 and 2015, the workshops have all been free of charge and are open to the public.

The sessions are scheduled for one Friday evening and three Saturdays all day. Each session involves a period for lecture and technical demonstrations in the morning. The rest of the day consists of self-directed work on programming and data science projects supported by more experienced mentors.

Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like: Are new contributors to an article in Wikipedia sticking around longer or contributing more than people who joined last year? Who are the most active or influential users of a particular Twitter hashtag? Are people who participated in a Wikipedia outreach event staying involved? How do they compare to people that joined the project outside of the event?

Our very first workshops was originally modeled after the Boston Python Workshops but most our curriculum is new and has been developed and modified by the mentors and with feedback from the participants. The CDSW curriculum, now being taught outside Seattle by others inspired by our model, is entirely based on Python. Our most recent round of workshops in Spring 2015 was taught entirely using Python 3.

Teaching data science over only four days to people who begin without any familiarity with concepts like the command line or variables is a major departure from traditional data science curricula that assume at least some familiarity with programming and statistics.

This talk will describe the approach we have taken to refine our material over the three times we have run the workshops and will share details of our experience. CDSW's organizers are professional programmers and data scientists and several of us have experience teaching data science in more traditional university and corporate settings. Our talk will describe how "democratized" data science is similar to — and sometimes extremely different from — these more traditional approaches. We will talk about some of the challenges we have faced and highlight some of our most inspirational successes.


Improve this page