Reproducible Science: Walking the Walk Part 1

YouTube

Summary

This tutorial will train reproducible research warriors on the practices and tools that make experimental verification possible with an end-to-end data analysis workflow. The tutorial will expose attendees to open science methods during data gathering, storage, analysis up to publication into a reproducible article. Attendees are expected to have basic familiarity with scientific Python and Git.

Description

The tutorial will cover four hours with the following topics

Introduction (10min)
History of scientific societies and publications
- Leeuwenhoek was the Man !
- The Invisible College
- Nullius in Verba
Replication of the early microscope experiments by Leeuwenhoek[a][b]
Image Acquisition (15 min)
- Hands on: Cell camera phone microscope
  - With drop of water
- Hands on: Each pair acquires images
Data Sharing (45min)
- Image gathering, storage, and sharing (15min)
  - GitHub (www.github.com)
  - Figshare (www.figshare.com)
  - Midas (www.midasplatform.com)
  - Hands on: Upload the images
- Metadata Identifiers (15 min)
  - Citable
  - Machine Readable
  - Hands on: Create data citation and machine readable metadata
- Hands on: Download data via RESTful API (15min)
  - Provenance and
  - Python scripts
  - Hands on: Download the data via HTTP
Break (10min)
Local processing (60min)
- Replication Enablement (20min)
  - Package versioning
  - Virtual Machines
  - Docker
  - Cloud services
  - Hands on:
    - Create a virtualenv
    - Run our tutorial package verification script
- Revision Control with Git (20min)
  - Keeping track of changes
  - Unique hashes
  - Hands on:
    - Forking a repository in GitHub
    - Cloning a repository
    - Creating a branch
    - Making a commit
    - Pushing a branch
    - Diffing
    - Merging
    - Pushing again
    - Create pull request
- Python scripts (20min)
  - Data analysis, particle counting.
  - Hands on:
    - Run scripts on new data
    - Generate histogram for the data
Testing (30min)
- Unit testing with known data
- Regression testing with known data
- Hands on:
  - Run tests
  - Add coverage for another method to the unit tests
Break (10min)
Publication Tools (30min)
- Article generation
- RST to HTML
- GitHub replication and sharing
- Hands on:
  - Run dexy to generate the document
Reproducibility Verification (30min)
- Reproducing Works
- Publication of Positive and Negative results
- Hands on:
  - Create Open Science Framework (OSF) project
  - Connect Figshare and Github to OSF project
  - Fork or link another group’s project in the OSF to run dexy on their work

Infrastructure:

Attendees will use software installed in their laptops to gather and process data, then publish and share a reproducible report.

They will access repositories in GitHub, upload data to a repository and publish materials necessary to replicate their data analysis.

We expect that wireless network will be have moderate bandwidth to allow all attendees to move data, source code and publications between their laptops and hosting servers.

PyVideo