Making Jupyter Notebooks Reproducible with ReproZip¶
reprozip-jupyter is a plugin for Jupyter Notebooks, a popular open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. These are valuable documents for data cleaning, analysis, writing executable papers/articles, and more. However, Jupyter Notebooks are subject to dependency hell like any other application – just the Notebook is not enough for full reproducibility. We have written a ReproZip plugin for Jupyter Notebooks to help users automatically capture dependencies (including data, environment variables, etc.) of Notebooks and also automatically set up those dependencies in another computing environment.
You can install reprozip-jupyter with pip:
$ pip install reprozip-jupyter
$ conda install --channel conda-forge reprozip-jupyter
Once successfully installed, you should then enable the plugin for both the client and server side of Jupyter Notebooks:
$ jupyter nbextension install --py reprozip_jupyter --user $ jupyter nbextension enable --py reprozip_jupyter --user $ jupyter serverextension enable --py reprozip_jupyter --user
Once these steps are completed, when you start a Jupyter Notebook server, you should be able to see the ReproZip button in your notebook’s toolbar.
Once you have a notebook that executes the way you want, you can trace and pack all the dependencies, data, and provenance with reprozip-jupyter by simply clicking the button on the notebook’s toolbar:
The notebook will execute from top-to-bottom and reprozip-jupyter traces that execution. If there are no errors in the execution, you’ll see two pop-ups like this one after the other:
reprozip-jupyter will name the resulting ReproZip bundle (.rpz) as
notebookname_datetime.rpz and save it to the same working directory the notebook is in:
Note that the notebook file itself (
.ipynb) is not included in the bundle, so you should share or archive both of those files. The reason is that a lot of services can render notebooks (GitHub, OSF…), and they wouldn’t be able to if it was in the RPZ file.
Now, anyone can rerun the Jupyter notebook, with all dependencies automatically configured. First, they would need to install reprounzip and the reprounzip-docker plugin (see the installation steps). Second, they need to download or otherwise acquire the
.rpz file and original
.ipynb notebook they’d like to reproduce.
To reproduce the notebook using the GUI, follow these steps:
- Double-click the .rpz file.
- The first tab in the window that appears is for you to set up how you’d like ReproUnzip to unpack and configure the contents of the .rpz. Choose docker as your unpacker, and choose the directory you’d like to unpack into.
- Make sure the Jupyter Integration is checked, and click Run experiment:
- This second table allows you to interact with and rerun the notebook. All you need to do is click ‘Run Experiment’ and the Jupyter Notebook home file list should pop up in your default browser (if not, navigate to
localhost:8888). Open the notebook, and rerun with every dependency configured for you!
On the command line, you would:
Set up the experiment using reprounzip-docker:
$ reprounzip docker setup <bundle.rpz> <directory>
Rerun the notebook using reprozip-jupyter:
$ reprozip-jupyter run <directory>
The Jupyter Notebook home file list should pop up in your default browser (if not, navigate to
Open the notebook, and rerun with every dependency configured for you!