Structure of Unpacked Experiments¶
While reprounzip is designed to allow users to reproduce an experiment without having to master the tool used to run it (e.g.: Vagrant and Docker), in some situations it might be useful to go behind the scenes and interact with the unpacked experiments directly.
This page describes in more details how the unpackers operate.
Note
Future versions of unpackers might work in a different way. No attempt is made to make unpacked experiments compatible across different versions of reprounzip. Bundles will always be compatible though.
Common Files across Unpackers¶
The unpacked directory contains the original configuration file as config.yml
. In fact, the VisTrails integration relies on it.
A file named .reprounzip
also marks the directory as an unpacked experiment. This is a Python pickle file containing a dictionary with various types of information:
unpacker
maps to the unpacker’s name.input_files
is used by the uploader/downloader machinery to keep the state of the input files inside the experiment, as they may be replaced by the user or overwritten by runs.Other information specific to the unpacker, as described next.
The directory Unpacker¶
The experiment directory contains:
The original configuration file
config.yml
.The pickle file
.reprounzip
.The tarball
inputs.tar.gz
, which contains the original files that were identifies as input files. This tarball is used for file restoration usingupload :<input-id>
(see Managing Input and Output Files).A directory called
root
, which contains all the bundled files in their original path, with symbolic links to absolute paths rewritten to prepend the path toroot
.
unpacked-directory/
.reprounzip
config.yml
inputs.tar.gz
root/
...
When running the run
command, the unpacker sets LD_LIBRARY_PATH
and PATH
to point inside root
, and optionally DISPLAY
and XAUTHORITY
to the host’s ones.
The chroot Unpacker¶
The experiment directory contains:
The original configuration file
config.yml
.The pickle file
.reprounzip
, which stores whether magic directories are mounted, as explained below.The tarball
inputs.tar.gz
, which contains the original files that were identifies as input files. This tarball is used for file restoration usingupload :<input-id>
(see Managing Input and Output Files).A directory called
root
, which contains all the bundled files in their original path, with no symbolic links rewritten and file ownership restored.
unpacked-directory/
.reprounzip
config.yml
inputs.tar.gz
root/
dev/
dev/pts/
proc/
...
If a file is listed in the configuration file but wasn’t packed (i.e.: pack_files
was set to false
for a software package), such file is copied from the host; if this file does not exist on the host, a warning is shown when unpacking.
Unless --dont-bind-magic-dirs
is specified when unpacking, the special directories /dev
, /dev/pts
, and /proc
are mounted with mount -o bind
from the host.
Also, if /bin/sh
or /usr/bin/env
weren’t both packed, a static build of busybox is downloaded and put under /bin/busybox
, and the missing binaries are created as symbolic links pointing to busybox.
Should you require a shell inside the experiment environment, you can use:
chroot root/ /bin/sh
The vagrant Unpacker¶
The experiment directory contains:
The original configuration file
config.yml
.The pickle file
.reprounzip
, which stores whether a chroot is used, as explained below.The tarball
data.tgz
, which is part of the.rpz
file and used to populate the virtual machine (VM) when it gets created.The setup script
setup.sh
.The file
rpz-files.list
, which contains the list of files to unpack. This list is passed totar -T
while unpacking.A
Vagrantfile
, which is used to build the VM.
unpacked-directory/
.reprounzip
config.yml
data.tgz
busybox
Vagrantfile
setup.sh
rpz-files.list
Once vagrant up
has been run by the setup/start
command, a .vagrant
subdirectory is created, and its content is managed by Vagrant (and appears to vary among different platforms).
Note that Vagrant drives VirtualBox or a similar virtualization software to run the VM. These will maintain state outside of the experiment directory. If you need to reconfigure or otherwise interact with the VM, you should do it from that virtualization software (e.g.: VirtualBox). The VM is named as the experiment directory with an additional suffix.
There are two modes for the virtual machine, controlled through command-line flags:
The default mode,
--use-chroot
, creates a chroot environment inside the VM at/experimentroot
. This allows ReproZip to unpack very different file system hierarchies without breaking the base system of the VM (in particular,ssh
needs to keep working for the VM to be usable). In this mode, software packages that were not packed (i.e.:pack_files
was set tofalse
) are installed in the VM and their required files are copied to the/experimentroot
hierarchy. The software packages that were packed are simply copied over without any interaction with the VM’s system.If
--dont-use-chroot
is used, no chroot environment is created. Files from software packages are never copied from the.rpz
file; instead, they get installed from the package manager. Other files are simply unpacked in the VM system, possibly overwriting existing files. As long as reprounzip-vagrant manages to find a VM image with the same operating system as the original one, reproduction is expected to work reliably.
In the --use-chroot
mode, a static build of busybox is downloaded and put under /experimentroot/busybox
, and if /bin/sh
wasn’t packed, it is created as a symbolic link pointing to busybox.
Uploading and downloading files from the environment is done via the shared directory /vagrant
, which is the experiment directory mounted in the VM by Vagrant.
Should you require a shell inside the experiment environment, you can use:
vagrant ssh
Please be aware of whether --use-chroot
is in use when accessing the experiment environment: in this case, the experiment’s files are located under /experimentroot
.
The docker Unpacker¶
The experiment directory contains:
The original configuration file
config.yml
.The pickle file
.reprounzip
, which stores the name of the images built by the unpacker, as explained below.The tarball
data.tgz
, which is part of the.rpz
file and used to populate the Docker container.The file
rpz-files.list
, which contains the list of files to unpack. This list is passed totar -T
while unpacking.A
Dockerfile
, which is used to build the original image.
unpacked-directory/
.reprounzip
config.yml
data.tgz
busybox
rpzsudo
Dockerfile
rpz-files.list
Static builds of busybox and rpzsudo are always downloaded and put into the Docker image as /busybox
and /rpzsudo
, respectively.
Note that the docker
command connects to a Docker daemon over a socket and that state will be changed there. The daemon might not be local; in particular, docker-machine
might be used, which allows reprounzip-docker to be used on non-Linux machines, and the daemon might be in a virtual machine, on another host, or in the cloud. The docker unpacker will keep the environment variables set when calling Docker, notably DOCKER_HOST
, so these can be set accordingly before running the unpacker.
Images and containers built by the unpacker are given a random name with the prefixes reprounzip_image_
and reprounzip_run_
, respectively; they are cleaned up when the destroy
command is invoked. There are two images of which reprounzip-docker keeps track in the .reprounzip
pickle file: the initial image, i.e., the one built by setup/build
by calling docker build
, and the current image (initially the same as the initial image), which has been affected by a number of run
and upload
calls. Running the reset
command returns to the initial image without having to rebuild. After each run
invocation, the container is committed to a new current image so that state is kept.
A --detach
option allows to start container and forget about them. reprounzip-docker leaves the container running and doesn’t wait for it; this means that you can start a service on a remote machine, but note that because that container won’t be committed to a new image, the side-effects of running it won’t affect later executions on the same unpacked folder.
Uploading files to the environment is done by running a simple Dockerfile that builds a new image. Downloading files is done via the docker cp
command.