The elements of reproducible open science

Ariel Rokem, University of Washington eScience Institute

Follow along at http://arokem.github.io/2016-02-26-ros-seminar-msu

Notes at https://etherpad.wikimedia.org/p/2016-02-26-ros

The eScience Institute
DSE sponsors

The elements of reproducible open research

Automation and computational reproducibility

Availability of data and code

Open access to publication

A detour through human neuroscience

Normal behavior is supported by brain connectivity

Image from Catani and ffytche (2015)

Not just passive cables

Brain connections change with development

Individual differences account for differences in behaviour

Adapt with learning

Clinically useful information

Magnetic Resonance Imaging (MRI)

Neural activity: functional MRI

Anatomy: structural MRI

...

Brain connectivity: diffusion MRI

Diffusion MRI

Isotropic diffusion

Diffusion MRI

Anisotropic diffusion

Diffusion MRI

From diffusion to tracks

From diffusion to tracks

From diffusion to tracks

"An article about a computational result is advertising, not scholarship. The actual scholarship is the full software environment code and data, that produced the result" Buckheit and Donoho (1995)
see here

Reproducible and open science are a matter of degree, not of kind!

The elements of reproducible open research

Automation and computational reproducibility

Availability of data and code

Open access to publication

Automate everything

Can you produce all the figures in your paper with a single button press?

Use literate programming methods!

For example, Jupyter notebooks
An example (with Jason Yeatman): Automated Fiber Quantification

Public availability of data and code

"Sharing is caring"

"Sharing is caring"

But what if you don't care? Or don't want to be altruistic?

Let's call it "publishing" instead

Reasons that data/code sharing publication is important

Based on Poline et al. (2012)

Accelerate progress in your field

Many fields have already demonstrated the benefits: Astronomy, Genomics are examples

Improve the quality of publications, and of the data

Reduce the cost of research, and increase ROI

Reproducibility

For all these reasons - funding agencies/journals are going to increasingly require that you do it!

A few more “selfish” benefits

Get cited more: The “data sharing advantage”: ~10% more citations when data is shared (all else being equal).

This is really just a way of saying that the research is better and has more impact when the data is available.

Similar effects when software is shared

For the time being, people still think you’re somehow being altruistic...

Open source everything

Python for neuroscience: the SCIPY & NIPY ecosystem

The scipy & nipy ecosystem

The solar system

The scipy & nipy ecosystem

The solar system

The scipy & nipy ecosystem

The solar system

The scipy & nipy ecosystem

The solar system
DIPY

Considerations in data publication

Make sure your data has a permanent URL

Figshare
Files up to 500 MB can be deposited here

Use your library!

Provide code that makes your data use(ful/able)

Start thinking about data/code sharing at the start

Make sure your IRB approval and consent forms include data "sharing"

Arrange your data in files in the way you will ultimately share it

If you create a unique dataset:

Consider publishing a data paper!

Open access to publication

Use preprint servers

Establishes precedence

Makes the research available

Preprint servers are increasingly available for many fields:

arXiv

biorXiv

Figshare will do!

The elements of reproducible open research

Automation and computational reproducibility

Availability of data and code

Open access to publication

Stay in touch!

http://arokem.org
arokem@gmail.com
@arokem
github.com/arokem