Ariel Rokem, University of Washington eScience Institute
Follow along at
1. Empirical (experimental)
2. Theoretical (mathematical)
3. Simulation (computational)
4. Data-intensive (eScience)
Programming and software engineering
Data management
Statistics and machine learning
Data visualization and communication
A focus on reproducibility and openess
Facilitate data-intensive research in different fields
(inter- and cross- disciplinary)
Focus on methodology
Focus on reproducibility
Contribute to openly available tools, rather than/in addition to peer-reviewed publications
"Career paths for data scientists that recognize and reward contributions in methodology, computation, or development of tools are important."
Focused, intensive, collaborative projects
Data scientists + domain scientists
Results that wouldn't be possible otherwise
Inspired by DSSG program at U Chicago, GA Tech
10-week internship program
16 DSSG fellows/students
6 high-school students from ALVA program
4 projects (+project leads!)
+ Data scientist mentors
Brain connections change with development
Individual differences account for differences in behaviour
Adapt with learning
This has clinical significance
Started in 2009 by Eleftherios Garyfallidis
Contributors from at least six different countries and many different labs
The lingua franca of reproducible computational science
Open source
Easy to learn
The lingua franca of reproducible computational science
Open source
Easy to learn
Phenomenal ecosystem of open-source tools
gtab = gradient_table(...)
model = ReconstModel(gtab, ...)
fit = model.fit(data, ...) # => ReconstFit
prediction = fit.predict(gtab, ...)
For example
model = dti.TensorModel(gtab)
fit = model.fit(data1)
prediction = fit.predict(gtab)
RMSE = np.sqrt(\
np.mean((prediction - data2) ** 2), -1))
rRMSE = RMSE / np.sqrt(\
np.mean((data1 - data2) ** 2), -1))
Rokem et al. (2015)
Corpus callosum
Corticospinal tract
Superior
longitudinal fasciculus
DTI
Crossing fiber model
Rokem et al. (2015)
When you've only measured once
k-fold cross-validation
# Use a k of 2
dti_pred = kfold_xval(dti_model, data, 2)
csd_pred = kfold_xval(csd_model, data, 2)
Algorithm 1
Algorithm 2
LiFE: Linear Fascicle Evaluation
Forward model from the tracks to the measured signal
Pestilli et al. (2014)
From diffusion to tracks
From tracks to diffusion
...
=
Pestilli et al. (2014)
Solve for
>>> X.shape
(10e8, 10e6)
Pestilli et al. (2014)
fiber_model = life.FiberModel(gtab)
fit = fiber_model.fit(data, tracks)
prediction = fit.predict(gtab)
optimized_tracks = tracks[fit.beta>0]
The verical occipital fasciculus - a century old controversy
Yeatman et al. (2014)
The verical occipital fasciculus - a century old controversy
Yeatman et al. (2014)
Resolved through computational neuroanatomy!
Yeatman et al. (2014)
The VOF is strategically located
Takemura et al. (2014)
To transmit information between dorsal and ventral visual areas
Takemura et al. (2014)
Summary
The eScience Institute
The Dipy project
In vivo validation through statistical learning
Come visit the Data Science Studio!
http://arokem.org
arokem@gmail.com
@arokem
github.com/arokem