How do I Reproduce Your Key Result in the Paper?
================================================
Step 1 - Environment and Dependencies
-------------------------------------
Local Environment
-----------------
- We assume you have the latest ansible_ installed on your work computer (could
be your laptop/home computer).
- On your work computer, you have cloned the latest ``libhotstuff`` repo and
updated all submodules (if not sure, run ``git submodules update --init
--recursive``). Right now, you should be at ``/scripts/deploy`` directory in
your shell (``cd <path-to-your-libhotstuff-repo/scripts/deploy``).
Remote Environment
------------------
- In this example, we use a typical Linux image, Ubuntu 18.04, on Amazon EC2.
But any machine with Ubuntu 18.04 installed may work, in general.
- We assume you have already properly configured the intra-network for the
machines that participate in our experiment. This includes some replica machines
(machines dedicated to running replica processes) and several client
machines.
- Replica machines should be able to talk to each other via TCP port ranging
from 10000 (default value generated by ``gen_conf.py``, which could
be changed).
- Each client machine should be able to talk to all replica machines via TCP
ranging from 20000.
- NOTE: In our paper, we used ``c5.4xlarge`` to be match the config of our baselines.
Step 2 - Generate the Deployment Setup
======================================
- Edit both ``replicas.txt`` and ``client.txt``:
- ``replicas.txt``: each line is the external IP and local IP separated by
one or more spaces. The external IP will be used for control actions
between your work computer and replica machines, whereas the local IP is
the address used in your inter-replica network infrastructure, with which
replicas establish TCP connections with others.
- ``clients.txt``: each line is a single external IP.
- The same IP can appear multiple times in both files. In this case, you will
share the same machine among different processes (not recommended for
replicas due to performance reasons).
- Generate ``node.ini`` and ``hotstuff.gen.*.conf`` by running ``./gen_all.sh``.
- Change the ssh key configuration in ``group_vars/all.yml``.
- Build ``libhotstuff`` on all remote machines by ``./run.sh setup``.
Step 3 - Run the Experiment
===========================
- (optional) Change the parameters in ``hotstuff.gen.conf`` to your liking.
- (optional) Change the parameters in ``group_vars/clients.yml`` to your liking.
- (for replicas) Create a new experiment run and start all replica processes by ``./run.sh new myrun1``.
- (wait for a while until all replica processes settle down, for good network like EC2, 10 seconds should be more than enough)
- (for replicas) Create a new experiment run and start all client processes by ``./run_cli.sh new myrun1_cli``.
- (wait until all commands are submitted, or you simply would like to end the experiment)
- To collect the results, run ``./run_cli.sh stop myrun1_cli`` and then ``./run_cli.sh fetch myrun1_cli``.
- To analyze the results, run ``cat myrun1_cli/remote/*/log/stderr | python ../thr_hist.py``.
- Finally, stop replicas: ``./run.sh stop myrun1``.
Other Notes
===========
- Each ``./run.sh new`` (same for ``./run_cli.sh``) will create a folder that
contains everything (chosen parameters, raw results) for the run. A good
practice is to always move on to a new name for a different run, so you keep
all of your previous experiments nicely.
- The ``run.sh`` script does NOT detect whether there is some other unfinished
run (it does, however, prevents you from messing up the state of the same run,
given the id like "myrun1"), so you need to make sure you always ``stop``
(gracefully exit and all results are available) or ``reset`` (simply kill all
processes) any historical runs to start fresh.
- To check the whether processes are still alive: ``./run.sh check myrun1``.