Ariel Rokem, University of Washington eScience Institute
Slides available at:
The era of brain observatories
Allen Institute for Brain Science
n=1200
n=500,000
New data sets will enable important new discoveries
Data-driven discovery
Data arriving at unprecedented volume, variety and velocity
=> Instead of moving the data to the compute, need to bring the compute to the data
=> Need new tools and approaches to process, analyze and interpret
=> Web-based analysis tools become first-class citizens in our tool-kit
=> New sociotechnical structures needed to facilitate training and collaboration
To the cloud!
Infinitely scalable
"Elastic"
But the cloud is hard to operate
import cloudknot as ck
def awesome_func(...):
...
knot = ck.Knot(func=awesome_func)
...
future = knot.map(args)
Results from large datasets are hard to understand
Hard to communicate
Hard to reproduce
Data sharing is not incentivized and is not easy enough
Classification accuracy of ~84% (AUC of 0.9)
Top 10 features selected include CST
Required for reproducibility
Enables building on previous work
Similar to the Comp Neuro option
Take a few courses:
Data management
Data analysis
Machine learning and stats
Discussions
Tutorials
Code reviews
...