Ariel Rokem, University of Washington eScience Institute
Follow along at:
Data science education
Development of tools and practices for reproducible research
Building a data science community: open, rigorous and ethical
Data-driven research
A leading cause of irreversible blindness
2020: 76 M people affected
2040 (predicted): 112 M people affected
Several different etiologies all leading to increased intraocular pressure (IOP)
→ Glaucomatous optic neuropathy (GON)
Relies on multiple factors
Requires substantial expertise
Complex, expensive
Early detection of the disease very important
Crucial for successful clinical intervention
Objective: build an auomated system for detection of glaucoma
Incorporate information from multiple sources
Provide interpretable results
863 glaucoma patients
771 healthy controls
55 participants who progress to glaucoma
Baseline: age, gender, ethnicity
Model 1: + cardiovascular, pulmonary variables (e.g., BP, FVC, PEF)
Model 2: + ocular data (e.g., IOP)
What features explain the diagnosis?
SHAP values (Lundberg and Lee, 2017)
See excellent explainer here
A family of machine learning algorithms
Biologically inspired
Inspired by the visual system
Capitalize on spatial correlations in images
Color fundus photos
OCT
CFP + OCT
CFP + OCT + medical records
SHAP values of individual networks
Saliency maps (Integrated gradients; Sundararajan 2017)
SHAP values of contributions to overall ensemble
Accurate automated glaucoma detection (AUC: 0.97)
Comparison with clinicians
Validation with PtG
Novel association of pulmonary variables with glaucoma
See also Chua et al. (2019)
Effects of glaucoma in photoreceptor layer of retina
See also Choi et al. (2011)
"Found data"
Relatively clean sample (no ocular co-morbidities)
Limited comparison with clinician performance
Correlational
Large datasets
+ Machine learning techniques
+ Interpretational methods
= Scientific insight
+ Potential clinical application