class: center, middle ### Binder: an instrument for open and reproducible science ####
Ariel Rokem
##### The University of Washington eScience Institute
Follow along at:
arokem.github.io/2016-03-28-binder
--- layout: true
--- ### Jupyter - Jupyter notebooks are great -- #### Problem -- To run my notebooks, you have to install all my dependencies. -- To reproduce my results, you have to download my code, and my data, to your machine. -- If my code has compiled components, you'll need to compile it. -- If you happen to have a different operating system, different compiler, different libraries, etc... we're probably out of luck! --- ### Binder http://mybinder.org Developed by the
Freeman Lab
at Janelia Farms -- System for deploying Jupyter notebooks from GitHub repos via Kubernetes -- Turns a github repo with jupyter notebooks into an interactive webpage where the notebooks can be run by others --- ### Let's see how that works An example: https://t.co/b1DfMFV5HK -- #### The components: -- A repo with .ipynb files https://github.com/ctb/2016-mybinder-inflammation http://mybinder.org/repo/ctb/2016-mybinder-inflammation --- If you need anything that isn't installed with Anaconda: A `requirements.txt` file that specifies requirements to be installed with `pip`. See: https://pip.pypa.io/en/stable/user_guide/#requirements-files https://github.com/arokem/try-tf --- Or a conda environment.yml file https://github.com/minrk/ligo-binder See : http://conda.pydata.org/docs/using/envs.html --- Or a DOCKERFILE See: https://docs.docker.com/engine/reference/builder/ This last option will also let you add data to the binder (see examples in http://github.com/arokem/white-matter-matters) --- ### If you need additional services -- You can attach postgres: https://github.com/binder-project/example-service-postgres -- Or Spark: https://github.com/binder-project/example-service-spark -- Not too sure how that works --- ### It's all in flux!
For example, a .binder.yml file format for configuration is being developed to specify dependencies --- ### let's give it a go! Bust out your notebooks and make a binder! --- ### Technical details Kubernetes is a service of Google Cloud Engine that automates the deployment of containerized applications. -- #### "Containerized"?! -- Instead of deploying the applciation by installing dependencies with the package manager, we used `Docker`, a super light-weight virtual machine that can be downloaded, installed and launched in a matter of seconds! -- This has clear application in scaling web-services up and down with usage. -- And we're enjoying it vicariously --- ### Are we there yet? "Is mybinder 95% of the way to next-gen computational science publishing, or only 90%?" Titus Brown: http://ivory.idyll.org/blog/2016-mybinder.html -- #### The following are relatively easy - Authentication and cost. How will this work in the long run? - Executing other than github repos - Rstudio/Rmarkdown - Pre-built Docker images --- #### These are a bit harder: - Large amounts of data. - Large amounts of compute. --- #### Things to do: "A grant proposal: A workshop on dockerized notebook computing" http://ivory.idyll.org/blog/2016-mybinder-workshop-proposal.html Proposed topics/hacks:
- Hack on mybinder and develop APIs and tools to connect mybinder to other hosting platforms, both commercial (AWS, Azure, etc.) and academic (e.g. XSEDE/TACC); - Connect mybinder to other versioning sites, including bitbucket and gitlab. - Brainstorm and hack on ways to connect credentials to mybinder to support private repositories and for-pay compute. - Identify missing links and technologies that are needed to more fully realize the promise of mybinder and Jupyter notebook. - Identify overlaps and complementarity with existing projects that we can make use of. - More integrated support for docker hub (and private hub) based images; brainstorm around blockers that prevent mybinder from being used for more data-intensive workflows.
--- class: center layout: false ### Stay in touch!
http://arokem.org
arokem@gmail.com
@arokem
github.com/arokem