Challenges and opportunities for computational neuroscience

The era of brain observatories

Challenges and opportunities for computational neuroscience

May 24th, 2018
University of Washington
Department of Physiology and Biophysics

Ariel Rokem, University of Washington eScience Institute

Follow along at: https://arokem.github.io/2018-05-24-uw-pbio

The era of brain observatories

Allen Institute for Brain Science

n=1200

n=~10,000

n=500,000

Opportunities

New data sets will enable important new discoveries

New methods

Data-driven discovery

Challenges

Methods that work in standard use may not apply to large datasets

=> Train machine learning algorithms to replace expert decision making

Tools are needed for data exploration and transparent sharing of results

=> Build browser-based applications for exploratory data analysis and data sharing

Algorithms are needed to extract information from complex high-dimensional data

=> Translate statistical techniques into practice in neuroscience

Sociotechnical structures are strained: collaboration, publication, training

=> Open source software collaborations and science-focused hack weeks

Challenge: Methods that work in standard datasets may fail in Big Data

Some methods require expert examination

Time consuming, tedious

=> Do not scale well!

The solution

Expert => results

Expert => training data => machine learning => results

Learning to replace experts

Aaron Lee

Sa Xiao

Parmita
Mehta

Magda
Balazinska

The UW OCT/EMR data-base

10 years (2006-2016)

9,285 patients

43,328 OCT volumes

2.64 million OCT images

2.5 TB of data

Linked to PIC electronic medical records

For each OCT we know:

Visual acuity

OCT interpretation

Diagnosis

Treatment determinations

In some cases - longitudinal measurements

Artificial neural networks

A family of machine learning algorithms

Biologically inspired

LeCun et al. 2015

Artificial neural networks

Minsky and Papert (1969)

Artificial neural networks

A family of machine learning algorithms

Biologically inspired

Implement a cascade of linear/non-linear operations

LeCun et al. 2015

Convolutional networks

Capitalize on spatial correlations in images

Inspired by the mammalian visual system

LeCun et al. 2015

Bosking et al. 1997

Krizhevsky et al. 2012

Deep learning accurately classifies age-related macular degeneration (AMD)

Patient-level AUC = 0.97

Lee et al. (2016)

Solving multi-class multi-label problems

Binary classification doesn't model clinical decision making

Patients can have any of a several diseases

Or more than one disease

=> Train several networks and integrate across them

Mehta, Lee, Lee, Balazinska & Rokem
(in review)

Segmenting experimental data:

oxygen induced retinopathy

Retinal segmentation

Xiao, Bucher, Wu, Rokem, Lee, Marra, Fallon, Diaz-Aguilar, Aguilar, Friedlander & Lee (2017), JCI Insight

Segmenting experimental data:

oxygen induced retinopathy

The vaso-obliteration zone

Xiao, Bucher, Wu, Rokem, Lee, Marra, Fallon, Diaz-Aguilar, Aguilar, Friedlander & Lee (2017), JCI Insight

Segmenting experimental data:

oxygen induced retinopathy

The neovascular tufts

Xiao, Bucher, Wu, Rokem, Lee, Marra, Fallon, Diaz-Aguilar, Aguilar, Friedlander & Lee (2017), JCI Insight

The solution

Expert => results

Expert => training data => machine learning => results

But: for many tasks, not enough training data

=> Amplify labeled data-sets with citizen science

Expert => citizen science => training data => machine learning => results

Scaling expertise with citizen science

Anisha Keshavan

Jason Yeatman

Example

Quality control of T1-weighted images

Healthy Brain Network
(Alexander et al. 2017)

Braindr

Are you at work but feel like playing Tinder? Why not play braindr (https://t.co/yXw191Q7Hy) instead, and help neuroscientists rate the quality of brain images? Swipe left to fail bad quality images! Built with @vuejs and @Firebase #citizenscience pic.twitter.com/tpI9Y3UKOb
— anisha (@akeshavan_) February 7, 2018

Keshavan, Yeatman &
Rokem (in prep)

Multiple ratings per image

Keshavan, Yeatman &
Rokem (in prep)

But often, no agreement

Keshavan, Yeatman &
Rokem (in prep)

Aggregating across raters

XGBoost (Chen & Guestrin, 2016)

Keshavan, Yeatman &
Rokem (in prep)

Aggregating across raters

Keshavan, Yeatman &
Rokem (in prep)

Aggregating across raters

Keshavan, Yeatman &
Rokem (in prep)

Aggregating across raters

Keshavan, Yeatman &
Rokem (in prep)

How do we scale this up?

Scaling expertise using citizen scientist ratings

Keshavan, Yeatman &
Rokem (in prep)

Scaling expertise using citizen scientist ratings

Keshavan, Yeatman &
Rokem (in prep)

Summary

When there is enough training data: deep learning

When we need to scale up: citizen scientists

Model of expertise (random forest) for aggregation

Model of perception (neural network) for automation and scaling

Future applications

Other tasks

Tumor segmentation in MRI

Other types of data and other procedures

Challenges

Methods that work in standard use may not apply to large datasets

=> Train machine learning algorithms to replace expert decision making

Tools are needed for data exploration and transparent sharing of results

=> Build browser-based applications for exploratory data analysis and data sharing

Challenge: tools for exploration of complex data

Results from large datasets are hard to understand

Hard to communicate

Hard to reproduce

Data sharing is not incentivized and is not easy enough

Normal behavior is supported by brain connectivity

Image from
Catani and ffytche (2015)

Not just passive cables

Brain connections develop and mature with age

Individual differences account for differences in behaviour

Adapt and change with learning

Diffusion MRI

Isotropic diffusion

Rokem et al. (2017), Journal of Vision
Rokem et al. (2015), PLoS One

Diffusion MRI

Anisotropic diffusion

Rokem et al. (2017), Journal of Vision
Rokem et al. (2015), PLoS One

Diffusion MRI

Rokem et al. (2017), Journal of Vision
Rokem et al. (2015), PLoS One

Diffusion statistics

Mean diffusivity

Fractional anisotropy

Principal diffusion direction

Rokem et al. (2017), Journal of Vision
Rokem et al. (2015), PLoS One

Amyotrophic Lateral Sclerosis (ALS)

Classify patients based on the tissue properties in this part of the brain

Random Forest algorithm => 80% accuracy

Sarica et al. (2017)

How could we improve on this?

If we can't get the data?

Challenge: improved data exploration and data sharing

Jason Yeatman

Adam
Richie-Halford

Josh Smith

Anisha
Keshavan

The solution

A web-based application

Builds a web-site for a diffusion MRI dataset

Automatically uploads the website to Github

Yeatman, Richie-Halford, Smith, Keshavan & Rokem (2018)
Nature Communications

https://yeatmanlab.github.io/Sarica_2017

Yeatman, Richie-Halford, Smith, Keshavan & Rokem (2018)
Nature Communications

Exploratory data analysis

Enhances published results

Linked visualizations facilitate easy exploration

Enables new discoveries in old datasets

Yeatman, Richie-Halford, Smith, Keshavan & Rokem (2018)
Nature Communications

Automatic data sharing

Yeatman, Richie-Halford, Smith, Keshavan & Rokem (2018)
Nature Communications

Further exploration

Yeatman, Richie-Halford, Smith, Keshavan & Rokem (2018)
Nature Communications

Summary

Exploratory data analysis

Automated data sharing

Dimensionality reduced data in tidy table format

Yeatman, Richie-Halford, Smith, Keshavan & Rokem (2018)
Nature Communications

Future applications

Other analysis pipelines

Dimensionality reduction in multi-channel neural recordings

Challenges

Methods that work in standard use may not apply to large datasets

=> Train machine learning algorithms to replace expert decision making

Tools are needed for data exploration and transparent sharing of results

=> Build browser-based applications for exploratory data analysis and data sharing

Algorithms are needed to extract information from complex high-dimensional data

=> Translate statistical techniques into practice in neuroscience

Opportunity: data-driven discovery

Adam Richie-Halford

Noah Simon

Jason Yeatman

Diffusion MRI data has group structure

Logistic regression

But in our case p (number of variables) >> n (number of subjects)

The Lasso

Enforces sparsity

But ignores group structure in the data

Accuracy: ~71% (AUC: ~71%)

Does not discover the right features

Top 10 features include some CST, but also other parts of the brain

Tibshirani (1996)

The Group Lasso

Where l are groups of variables

p(l) is the number of variables in group l

In our case: all the measurements of a tissue propetry within a tract

Enforces selection of groups

But does not enforce L1 sparsity within included groups

Yuan and Lin (2006)

Sparse Group Lasso

Enforces sparsity both at the group level and the within-group level

Subsumes the Lasso (λ₁ = 0)

And the Group Lasso (λ₂ = 0)

But more meta-parameters

Simon et al. (2013)

Fitting meta-parameters

Nested cross-validation

Fitting meta-parameters

Nested cross-validation

Fitting meta-parameters

Nested cross-validation

Accurate classification and feature detection

Classification accuracy of ~84% (AUC of 0.9)

Top 10 features selected include CST

Sarica et al. (2017)

Summary

Sparse Group Lasso accurately discovers structure in dMRI data

Classification of disease states

In a regression setting, prediction of continuous measures
(e.g, "brain age", IQ, reading skills)

Future applications

Multi-region, multi-neuron recordings

Neurons => features

Brain regions => groups

Trials => observations

Multi-neuron recordings also have group structure