dmriprep development sprint
Reliable, robust and efficient preprocessing of MRI data is hard. So many things
can go wrong. Building a general-purpose pipeline for preprocessing also faces
the challenge that even for just one type of data (e.g., dMRI) there are
multiple variations on the manner in which the data can be collected (for
example, are multiple gradient strengths collected in each scan, or in separate
scans? Are fieldmaps collected for susceptibility distortion correction, or b0
scans with reverse phase encode directions? Etc.).
fmriprep
provides an excellent
template to follow as an example of a robust, general-purpose
pipeline for preprocessing of data collected in many different kinds of fMRI
experiments. And so, for a while now, we have been thinking about a dmriprep
that would emulate the success of fmriprep
for the dMRI community. Initially,
this was a local effort (with Adam Richie-Halford and Anisha Keshavan at the
helm), but after presenting our initial work on this at OHBM, very quickly we
were able to get together other members of the community: Oscar Esteban (at
Stanford), who is the lead developer of fmriprep
, was already interested in
expanding fmriprep
to an ecosystem of
niprep
tools and wanted to make sure that we did it
right. Matt Cieslak (Penn), who has in the meanwhile created
qsiprep
, which does a lot of what
we might want a dmriprep
to do, was also interested in contributing his
(extensive) knowledge and experience to a community-oriented effort. Over the
course of the last few months, several others joined the effort as well: Gari Lerma (Stanford), Derek
Pisner (UT Austin), Erin Dickie and Michael Joseph (both at CAMH). We were able
to pull in Jelle Veraart (NYU) into some of our discussions, to contribute from
his expertise on the physics of dMRI and particularly on the way to mitigate
noise and other artifacts in dMRI data processing. Jelle has thought a lot
about a process for generating community consensus around dMRI processing
(including a session at ISMRM devoted to starting this process), and we’d
like to be part of this process, so his contribution is crucial.
Ross Lawrence (JHU) has also more recently joined the effort, as part of the
contributions that Joshua Vogelstein and his team are making to open source
(Ross is part of Jovo’s team).
Distributed software development is challenging. It is hard to figure ut who is
doing what, and what the overall architecture should look like. Even if we are
following a well-worn template laid out by fmriprep
. We had started doing
bi-weekly telecons, but we needed an opportunity to get together in person and
hammer out our process and coordinate our expectations. Luckily, I had some
funding available from the
Moore and Sloan Data Science Environments grant here at
UW eScience that I could use to support travel and accommodation for a three-day
code sprint. And so, on January 13th - 15th, we all congregated in Seattle
(with the exception of Jelle, who couldn’t make it). The sprint gave us just
the opportunity we needed to lay the ground work for the library, in terms of
development infrastructure (testing, documentation, continuous integration,
etc.) and an excellent opportunity to have some in-depth discussions about the
things we would like dmriprep
to do for us. At the end of all this, we could
even go so far as to write down a
roadmap for future developments
during this year. This lays the groundwork for the telecons that we will continue
to have on a bi-weekly basis (and that are open to anyone to join…).
For me personally, three things stand out as highlights. The first important thing
that I take from this sprint is how I might implement the philosophy of “release
early and often” more seriously in my own work in other projects. For example,
it took us several years to finally release a 0.1 of
pyAFQ, but it
really shouldn’t have. And if we adopt some of the approaches that are part of
the genetic make-up of dmriprep
, with its origins in fmriprep
, we will be
releasing more and hopefully leveraging this to make more rapid progress and
detect/fix problems with the software more rapidly.
The second thing that I was excited about is an approach that Matt developed to
correcting for head motion and potentially also for eddy currents. This approach
has its roots (at least in my mind) in a
2012 paper by Amitay,
Jones and Assaf. The idea is that we are limited in how we can register
different volumes to each other because differences between volumes are due to
both artifactual effects that we’d like to correct for: motion and eddy
currents; but also due to systematic effects that we’d like to retain. Different
parts of the tissue lose signal because of the different gradients applied in
different scans. This is particularly pernicious for high b-values (where a lot
of the signal is lost) and for parts of the brain in which orientation changes
gradually. The approach proposed by Amitay et al. is that a model of the
diffusion in high b-values could be used to predict what the image should look
like. This prediction is then used as a target for registration. This approach
was subsequently popularized by Andersson and Sotiropolous in FSL’s popular
eddy
tool (and their approach is also described in a
paper).
Their model of diffusion is slightly more complex than the CHARMED model used by
Amitay et al., but it’s not clear that it more accurately represents the data.
Matt has really run with this approach by using the well-motivated 3D SHORE
model to fit and predict the data using a cross-validation approach (he calls it
SHORELine).
However, in line with the goals of qsiprep
, this approach would be limited to
multi-shell diffusion. For dmriprep
, we need to expand this approach slightly
to also work for single-shell data. So, we need a model that accurately predicts
the data. Matt’s previous experiments suggest that DTI systematically fails in
some places (this is well-understood as a consequence of complex fiber
configurations that are not well-captured by DTI). On the other hand, CSD seems
to overfit. Luckily, during my postdoc, I developed a model that does exactly
that: fits the data and predicts it really accurately, and fortunately this model
is implemented in DIPY, including both fitting and prediction. One of the next
stages in development (already prototyped by Derek here) uses this Sparse Fascicle Model as the predictive model in the heart of a SHORELine-like algorithm. To be continued!
Finally, the last take-home is my optimism about community-led projects that
pool knoweldge, talent and resources across disparate groups and institutions.
Working together towards shared goals, when possible, makes a lot of sense. The
potential to save duplicated effort and to produce outcomes that take into
account more different use-cases is tantalizing.The challenges of bridging
between different work cultures, different scientific goals and inclinations, as
well as between the incentive structure governing the contributions of
individuals are non-trivial, but learning more about the patterns of
collaboration that facilitate productive and happy collaborations is a
worthwhile endeavour in and of itself. Maybe, like families, all happy
collaborations are alike in some essential way? Hopefully, that’s exactly
where dmriprep
is headed.