scripts/deploy/README.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94

How do I Reproduce Your Key Result in the Paper?
================================================

Step 1 - Environment and Dependencies
-------------------------------------

Local Environment
-----------------

- We assume you have the latest Ansible_ installed on your work computer (a
  work computer is your laptop/home computer).
- On your work computer, you have cloned the latest ``libhotstuff`` repo and
  updated all submodules (if not sure, run ``git submodule update --init
  --recursive``). Finally, you have already built the repo so binaries
  ``hotstuff-keygen`` and ``hotstuff-tls-keygen`` are available in the root
  directory of the repo.
- Right now, you should be at ``/scripts/deploy`` directory in your shell (``cd
  <path-to-your-libhotstuff-repo>/scripts/deploy``).

Remote Environment
------------------

- In this example, we use a typical Linux image, Ubuntu 18.04, on Amazon EC2.
  But any machine with Ubuntu 18.04 installed may work, in general.
- We assume you have already properly configured the intra-network for the
  machines that participate in our experiment. This includes some replica machines
  (machines dedicated to running replica processes) and several client
  machines.

  - Replica machines should be able to talk to each other via TCP port ranging
    from 10000 (default value generated by ``gen_conf.py``, which could
    be changed).
  - Each client machine should be able to talk to all replica machines via TCP
    ranging from 20000.
  - All machines should be accessible from your work computer given an ssh private key.
  - NOTE: In our paper, we used ``c5.4xlarge`` to match the configuration of our baselines.

Step 2 - Generate the Deployment Setup
======================================

- Edit both ``replicas.txt`` and ``client.txt``:

  - ``replicas.txt``: each line is the external IP and local IP separated by
    one or more spaces. The external IP will be used for control actions
    between your work computer and replica machines, whereas the local IP is
    the address used in your inter-replica network infrastructure, with which
    replicas establish TCP connections with others.
  - ``clients.txt``: each line is a single external IP.
  - The same IP can appear multiple times in both files. In this case, you will
    share the same machine among different processes (not recommended for
    replicas due to performance reasons).

- Generate ``node.ini`` and ``hotstuff.gen.*.conf`` by running ``./gen_all.sh``.
- Change the ssh key configuration in ``group_vars/all.yml``.
- Build ``libhotstuff`` on all remote machines by ``./run.sh setup``.

Step 3 - Run the Experiment
===========================

- (optional) Change the parameters in ``hotstuff.gen.conf`` to your liking.
- (optional) Change the parameters in ``group_vars/clients.yml`` to your liking.
- (for replicas) Create a new experiment run and start all replica processes by ``./run.sh new myrun1``.
- (wait for a while until all replica processes settle down, for good network like EC2, 10 seconds should be more than enough)
- (for replicas) Create a new experiment run and start all client processes by ``./run_cli.sh new myrun1_cli``.
- (wait until all commands are submitted, or you simply would like to end the experiment)
- To collect the results, run ``./run_cli.sh stop myrun1_cli`` followed by ``./run_cli.sh fetch myrun1_cli``.
- To analyze the results, run ``cat myrun1_cli/remote/*/log/stderr | python ../thr_hist.py``.

  - With all default settings on ``c5.4xlarge``, I got the following results:

    ::

        [349669, 367520, 371855, 370391, 366159, 367565, 365957, 322690]
        lat = 6.955ms # mean end-to-end latency
        lat = 6.970ms # after removing outliers

- Finally, stop replicas: ``./run.sh stop myrun1``.

Other Notes
===========

- Each ``./run.sh new`` (same for ``./run_cli.sh``) will create a folder that
  contains everything (chosen parameters, raw results) for the run. A good
  practice is to always move on to a new name for a different run, so you keep
  all of your previous experiments nicely.
- The ``run.sh`` script does NOT detect whether there is some other unfinished
  run (it does, however, prevents you from messing up the state of the same run,
  given the id like "myrun1"), so you need to make sure you always ``stop``
  (gracefully exit and all results are available) or ``reset`` (simply kill all
  processes) any historical runs to start fresh.
- To check the whether processes are still alive: ``./run.sh check myrun1``.


.. _Ansible: https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html