Vision and Cognition Lab
UW Psychology
November 16, 2015

Data Science meets Neuroscience at the University

Ariel Rokem, University of Washington eScience Institute

Follow along at http://arokem.github.io/2015-11-16-viscog

All research is becoming data-intensive research
All research is becoming data-intensive research
All research is becoming data-intensive research
Including neuroimaging...
Van Horn and Toga (2014)

The fourth paradigm of science

1. Empirical (experimental)

2. Theoretical (mathematical)

3. Simulation (computational)

4. Data-intensive (eScience)


Jim Gray

The eScience Institute
Our mission: "All across our campus, the process of discovery will increasingly rely on researchers’ ability to extract knowledge from vast amounts of data... In order to remain at the forefront, UW must be a leader in advancing these techniques and technologies, and in making [them] accessible to researchers in the broadest imaginable range of fields"
DSE sponsors

Data Science?


Data science

Data Science?

Programming and software engineering

Data management

Statistics and machine learning

Data visualization and communication

A focus on reproducibility and openess

DSE

New role for data scientists

Facilitate data-intensive research in different fields
(inter- and cross- disciplinary)

Focus on methodology

Focus on reproducibility

Contribute to openly available tools, rather than/in addition to peer-reviewed publications

"Career paths for data scientists that recognize and reward contributions in methodology, computation, or development of tools are important."

(From a recent NIH BD2K RFA)

Incubator projects

Focused, intensive, collaborative projects

Data scientists + domain scientists

Results that wouldn't be possible otherwise

The eScience Institute

Data Science for Social Good

Urban@UW
Urban @ UW

Inspired by DSSG program at U Chicago, GA Tech

10-week internship program

16 DSSG fellows/students

6 high-school students from ALVA program

4 projects (+project leads!)

+ Data scientist mentors

Predictors of Permanent Housing for Homeless Families

Housing
Project Leads: Anjana Sundaram, Neil Roche, Bill & Melinda Gates Foundation
DSSG Fellows: Joan Wang, Jason Portenoy, Fabliha Ibnat, Chris Suberlak
ALVA Students: Cameron Holt, Xilalit Sanchez
eScience Data Scientist Mentors: Ariel Rokem, Bryna Hazelton
Family Trajectories through Programs trajectories
http://tinyurl.com/dssg-homeless

Neuroimaging and Data Science

Normal behavior is supported by brain connectivity

Image from Catani and ffytche (2015)

Not just passive cables

Brain connections change with development

Individual differences account for differences in behaviour

Adapt with learning

This has clinical significance

Diffusion MRI

Isotropic diffusion

Diffusion MRI

Anisotropic diffusion

Diffusion MRI

Modeling diffusion

Basser, Mattielo and Le Bihan (1994)

Diffusion statistics

Mean diffusivity
Fractional anisotropy
Principal diffusion direction

From diffusion to tracks

From diffusion to tracks

From diffusion to tracks

DIPY: Diffusion MRI in Python

Part of the NIPY community

Started in 2009 by Eleftherios Garyfallidis

Contributors from at least six different countries and many different labs

Why Python?

The lingua franca of reproducible computational science

Open source

Easy to learn

Come learn Python!


Software Carpentry
January 7th-8th, the WRF Data Science Studio (Physics/Astronomy building)

Why Python?

The lingua franca of reproducible computational science

Open source

Easy to learn

Phenomenal ecosystem of open-source tools

The scipy & nipy ecosystem

The solar system

The scipy & nipy ecosystem

The solar system

The scipy & nipy ecosystem

The solar system

The scipy & nipy ecosystem

The solar system
DIPY

Diffusion MRI: the challenge of validation

Algorithm 1
Algorithm 2

A statistical learning approach

In-vivo validation
Measurement #1
Measurement #2
Test-retest reliability
Model
Cross-validation
Rokem et al. (2015)

Dipy cross-validation API

http://tinyurl.com/dipy-wmm
(powered by http://mybinder.org)
gtab = gradient_table(...)

model = ReconstModel(gtab, ...)

fit = model.fit(data, ...) # => ReconstFit

prediction = fit.predict(gtab, ...)

For example

model = dti.TensorModel(gtab)

fit = model.fit(data1)

prediction = fit.predict(gtab)

RMSE = np.sqrt(\
np.mean((prediction - data2) ** 2), -1))

rRMSE = RMSE / np.sqrt(\
np.mean((data1 - data2) ** 2), -1))

Rokem et al. (2015)
Corpus callosum
Corticospinal tract
Superior
longitudinal fasciculus
DTI
Crossing fiber model
Rokem et al. (2015)

When you've only measured once

k-fold cross-validation

# Use a k of 2

dti_pred = kfold_xval(dti_model, data, 2)

csd_pred = kfold_xval(csd_model, data, 2)

Algorithm 1
Algorithm 2

LiFE: Linear Fascicle Evaluation

Forward model from the tracks to the measured signal

Pestilli et al. (2014)

From diffusion to tracks

From tracks to diffusion

...
=
Pestilli et al. (2014)
Solve for
>>> X.shape
(10e8, 10e6)
Pestilli et al. (2014)

fiber_model = life.FiberModel(gtab)

fit = fiber_model.fit(data, tracks)

prediction = fit.predict(gtab)

optimized_tracks = tracks[fit.beta>0]

The verical occipital fasciculus - a century old controversy

Yeatman et al. (2014)

The verical occipital fasciculus - a century old controversy

Yeatman et al. (2014)

Resolved through computational neuroanatomy!

Yeatman et al. (2014)

The VOF is strategically located

Takemura et al. (2014)

To transmit information between dorsal and ventral visual areas

Takemura et al. (2014)

Summary

The eScience Institute

The Dipy project

In vivo validation through statistical learning

Come visit the Data Science Studio!

http://arokem.org
arokem@gmail.com
@arokem
github.com/arokem