Ariel Rokem, The University of Washington eScience Institute
Follow along at:
"In god we trust, the rest bring data"- Deming
"An article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data that produced the result."
Reproducibility
Public access
Improved research quality
Greater impact
About 10% higher citation rate
Reuse, extension, interoperability
Standards
Software
Compute platforms
Socio-technical: training, incentives, careers
The Brain Imaging Data Standard
Supported by INCF
Developed through an open and collaborative design process
An ongoing process
Minimal curation => reusability
Validation: human and machine readable
Development of automated analysis tools
Bring the compute to the data
Scalable computing
Provide useful tools and interfaces
Facilitate interoperability (between datasets, between software libraries)
Control access
Co-localization of data and compute
Scaling and elasticity
Consistent and open to all
Python: an ecosystem for scientific computing
Free and open source
High-level interpreted language
Very wide adoption
Both in academic research and in industry
Grew out of IPython (an interactive Python shell)
Awarded the 2018 ACM Software System Award
Vendor agnostic
Public cloud, HPC, combinations
Focusing on the wrong things
Waste of effort and resources
No immediate utility
Provides a disincentive to doing hard/risky experiments
Reasonable data embargos
Provide incentives for publication of valuable data and analysis
Make data and code publication easier
Publish useful extractions from the data
Methods in data science are rapidly changing
Tools and practcies that are not usually part of the standard neuroscience curriculum
Learning often requires substantial hands-on experience