Data "sharing" in neuroimaing: why and how

Ariel Rokem, University of Washington eScience Institute

Follow along at http://arokem.github.io/2015-11-19-public-nidata

The eScience Institute
DSE sponsors

Data Science?

Programming and software engineering

Data management

Statistics and machine learning

Data visualization and communication

A focus on reproducibility and openess

"Sharing" is caring

"Sharing" is caring

al But what if you don't care? Or don't want to be altruistic?

Let's call it "publishing" instead

Reasons that data sharing publication is important

Based on Poline et al. (2012)

Accelerate progress in understanding the brain

Other fields have already demonstrated the benefits: Astronomy, Genomics are examples

Improve the quality of publications, and of the data

Reduce the cost of research, and increase ROI

Reproducibility

"An article about a computational result is advertising, not scholarship. The actual scholarship is the full software environment code and data, that produced the result" Buckheit and Donoho (1995)
see here

Reasons that data sharing publication is important

Based on Poline et al. (2012)

Accelerate progress in understanding the brain

Other fields have already demonstrated the benefits: Astronomy, Genomics are examples

Improve the quality of publications, and of the data

Reduce the cost of research, and increase ROI

Reproducibility

For all these reasons - funding agencies/journals are going to increasingly require that you do it!

A few more “selfish” benefits

Get cited more: The “data sharing advantage”: ~10% more citations when data is shared (all else being equal).

This is really just a way of saying that the research is better and has more impact when the data is available.

Similar effects when software is shared - a topic for a different day…

For the time being, people still think you’re somehow being altruistic...

OK - sign me up! How do I do it?

Make sure your data has a permanent URL

Researchworks
(is going to be superseded soon by Data Repository)

Figshare
Files up to 500 MB can be deposited here

Provide code that makes your data use(ful/able)

License your data

Data without a license cannot be used!

Consider licenses available through Creative Commons

Much more here

Start thinking about data sharing at the start

Make sure your IRB approval and consent forms include data "sharing"

Open brain consent makes it easy

Use the Brain Imaging Data Structure to structure your data even before you analyze it:

- A common exchange format that lives in your file-system.

- Will facilitate data-sharing in different databases.

- Will facilitate development of common analysis tools.

Use the automated validator to make sure your format is compliant

BIDS example

If you create a unique dataset:

Consider publishing a data paper!

http://arokem.org
arokem@gmail.com
@arokem
github.com/arokem