SPORE logo eScience logo

The Data Science Revolution

in biomedical image processing


Ariel Rokem, University of Washington eScience Institute

Follow along at: http://arokem.github.io/2017-07-24-spore

License

Data Science

Cleveland (2001)
Patil and Hammerbacher (circa 2008)
Conway (2013)
"All across our campus, the process of discovery will increasingly rely on researchers’ ability to extract knowledge from vast amounts of data... In order to remain at the forefront, UW must be a leader in advancing these techniques and technologies, and in making [them] accessible to researchers in the broadest imaginable range of fields"

$ 37.8M for 5 years:
"Moore-Sloan Data Science Environments"

Additional funding from:

- Washington Research Foundation

- National Science Foundation

- Bill & Melinda Gates Foundation

- National Institute for Mental Health

Images are central to scientific discovery
Images are also useful in clinical practice

Modern biomedical data: drinking from the firehose

Jon Liu, Adam Glaser, Larry True, Nick Reder, Ye Chen

Pathology: the traditional way


Expensive, time-consuming, destructive, hazardous, susceptible to sampling error
Slide: Nick Reder

Light sheet microscopy: pathology in the 21st century

(Glaser et al., 2017)


Rapid, inexpensive, easy to use, non-destructive
Slide: Nick Reder

Outline

  • Modern biomedical data: drinking from the firehose

  • Case studies:

    - Big data technology: models of brain anatomy in MRI data

    - Machine intelligence: towards computer-assisted diagnosis

    - Virtual prototyping: improving retinal prosthetics

  • Reproducibility and scientific transparency:

    - Open source software for science

    - Data sharing made easy and beautiful

  • The era of brain observatories

    Allen Institute for Brain Science

    UK Biobank

    The Human Connectome Project

    - More than 1,000 participants

    - High-quality measurements of MRI

    - Genetics, cognitive measures, etc...

    Normal behavior is supported by brain connectivity

    Image from Catani and ffytche (2015)

    Not just passive cables

    Brain connections change with development

    Individual differences account for differences in behaviour

    Adapt with learning

    This has clinical significance

    Magnetic Resonance Imaging (MRI)

    Neural activity: functional MRI

    Anatomy: structural MRI

    ...

    Brain connectivity: diffusion MRI

    Diffusion MRI

    Isotropic diffusion
    Rokem et al. (2017)

    Diffusion MRI

    Anisotropic diffusion
    Rokem et al. (2017)

    Diffusion MRI

    Modeling diffusion

    Basser, Mattielo and Le Bihan (1994)

    Diffusion statistics

    Mean diffusivity
    Fractional anisotropy
    Principal diffusion direction

    From diffusion to tracks

    From diffusion to tracks

    From diffusion to tracks

    Diffusion MRI: the challenge of validation

    Algorithm 1
    Algorithm 2
    Slide: Franco Pestilli

    A statistical learning approach

    In-vivo validation
    Measurement #1
    Measurement #2
    Test-retest reliability
    Model
    Cross-validation
    Rokem et al. (2015)
    Rokem et al. (2015)
    Corpus callosum
    Corticospinal tract
    Superior
    longitudinal fasciculus
    DTI
    Crossing fiber model
    Rokem et al. (2015)

    Human Connectome Project -- data volume (GB)

    What computational system should we use to analyze these data?

    Database support for image analytics at scale

    - Parmita Mehta (CSE)

    - Sven Dorkenwald (CSE)

    - Dongfang Zhao (eScience/CSE)

    - Tomer Kaftan (CSE)

    - Alvin Cheung (eScience/CSE)

    - Magda Balazinska (eScience/CSE)

    - Andy Connoly (Astronomy)

    - Jake Vanderplas (Astronomy)

    - Yusra AlSayyad (Astronomy)

    Mehta et al. (2017)

    Database support for image analytics at scale

    - Declarative languages to access data

    => Select what to process and prepare the data

    - Declarative languages for specifying computations

    => But can also deploy user-defined code

    - Physical data independence

    => Data ingestion and distribution is automatic

    - Infrastructure independence

    => Can be deployed in institutional HPC resources

    => Can be deployed in cloud computing systems

    The argument for cloud computing

    Scalable

    The argument for cloud computing

    Scalable

    Elastic

    Secure, reliable

    Model comparison for the Human Connectome Project

    With Jason Yeatman, Libby Huber, Rafael Neto-Henriques

    DTI : 6 parameters

    DKI : 15 parameters

    10-fold cross-validation

    900 participants

    The argument for cloud computing:

    Scalable

    Elastic

    Secure, reliable

    Reproducible

    Machine intelligence: towards computer-assisted diagnosis

    With: Aaron Lee, Cecilia Lee, Sa Xiao, Yue Wu

    UW Department of Ophthalmology

    Eye diseases

    Common

    Debilitating

    Many forms of eye disease are preventable/treatable

    Age-related macular degeneration (AMD)

    Slide: Ione Fine

    Optical Coherence Tomography (OCT)

    High-fidelity in vivo measurements of retinal structure at micron resolution

    The UW OCT/EMR data-base

    10 years (2006-2016)

    9,285 patients

    43,328 OCT volumes

    2.64 million OCT images

    2.5 TB of data

    Linked to EPIC electronic medical records

    For each OCT we know:

    Visual acuity

    OCT interpretation

    Diagnosis

    Treatment determinations

    In some cases - longitudinal measurements

    Deep learning

    Krizhevsky et al. (2012)

    Inspired by architecture of the visual system

    Learns a hierarchy of filters through exposure to examples

    Deep learning

    Le et al. (2012)

    Network architecture for AMD classification

    Lee et al. (2016)

    Deep learning network classifies AMD from OCT with high accuracy

    Lee et al. (2016)

    Is there a ball in the picture?

    Zeiler and Fergus (2013)

    How about now?

    Zeiler and Fergus (2013)

    And now?

    Zeiler and Fergus (2013)

    Deep learning network identifies clinical features

    Lee et al. (2016)

    U-net for segmentation of retinal edema

    Intraretinal fluid segmentation

    Lee et al. (2016)

    Intraretinal fluid segmentation

    Lee et al. (2016)

    Intraretinal fluid segmentation

    Lee et al. (2016)

    Intraretinal fluid segmentation

    Lee et al. (2016)

    Towards computer-assisted diagnosis

    Accurate identification of clinical features

    Save clinician time by presenting relevant evidence

    Prevent error by providing educated guess

    Advance research in large-scale data collections

    Virtual prototyping: improving retinal prosthetics

    With: Michael Beyeler, Ione Fine, Geoff Boynton

    UW Department of Psychology

    Retinitis Pigmentosa (RP)

    Slide: Ione Fine

    The retina

    Electronic retinal prosthetics

    Electronic retinal prosthetics

    What do retinal prosthetic patients see?

    What do retinal prosthetic patients see?

    What do retinal prosthetic patients see?

    What do retinal prosthetic patients see?

    End-to-end model

    Virtual prototyping

    Virtual prototyping

    Virtual prototyping

    Scientific reproducibility
    and the data science revolution

    "An article about a computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result." Buckheit and Donoho (1995)

    Open source software for science

    The Python programming language:

    Relatively easy to learn

    Free and open source

    "Batteries included"

    The scientific Python ecosystem

    The scientific Python ecosystem

    The scientific Python ecosystem

    The scientific Python ecosystem

    DIPY: Diffusion MRI in Python

    Part of the NIPY community

    Started in 2009 by Eleftherios Garyfallidis

    Contributors from at least six different countries and many different labs

    Garyfallidis et al.(2014)

    Always open

    OCT interaretinal fluid segmenter:
    https://github.com/uw-biomedical-ml/irf-segmenter

    Pulse2percept: models for retinal prosthetics:
    https://uwescience.github.io/pulse2percept/

    What about data?

    Sharing data can be really challenging!

    Large volumes

    Domain specific or proprietary formats

    Concerns about privacy and personal health information

    AFQ-browser: an elegant and easy way to share brain dMRI data

    With Jason Yeatman, Adam Richie-Halford & Josh Smith. https://yeatmanlab.github.io/Sarica_2017

    Similarly, for prostate LSM data

    With Nick Reder, eScience Data Science Incubator, Winter 2017 https://uwescience.github.io/alpenglow-viz
    "All across our campus, the process of discovery will increasingly rely on researchers’ ability to extract knowledge from vast amounts of data... In order to remain at the forefront, UW must be a leader in advancing these techniques and technologies, and in making [them] accessible to researchers in the broadest imaginable range of fields"
    http://arokem.org
    arokem@gmail.com
    @arokem
    github.com/arokem