Collaboration Rules
===================

Introduction
------------

This document attempts to stipulate the rules and typical workflows that push
forward NERV development. It may be updated or complemented with more details
in future. Anyone who intends to contribute to the official repository must
read this document before (s)he makes any pull requests to the development
group or merges the changes into to the repository with permission.


Repository
----------

The latest stable and on-going code are hosted at SpeechLab and maintained with
the help of Git, a distributed version control system. Despite the "distribute"
nature of the tool, our project management is centralized, just like Linux
kernel development which was the original use case of Git. The NERV project, in
a general sense, includes two major sub-projects whose names known as ``nerv`` and
``nerv-speech``, respectively. The former contains the core part of NERV, which
includes a general deep learning implementation. The latter, ``nerv-speech``,
provides with modules (classes) that comply to the API of core NERV and offer
supports (e.g., for I/O) that are relevant to speech and language processing
(such as reading HTK/Kaldi features and labels).

Like Torch, NERV uses LuaRocks_ to manage optional components as *packages*.
When running ``make`` in ``nerv`` repository root directory, LuaRocks and
LuaJIT (compiler) will be first setup, then a LuaRock package named ``nerv``
will then be built and installed via LuaRocks, which is to say, the core part
of NERV is contained in a single LuaRocks package, ``nerv``. Next, by invoking
``make speech``, several speech processing packages (such as ``htk_io``,
``kaldi_io``, etc) will be compiled and installed from ``nerv/speech`` which
ought to be checked out from ``nerv-speech`` repository.  Therefore, thanks to
the flexibility of Lua and the modularity brought by LuaRocks, new
functionalities can be added to NERV and managed in a clear way by building
self-contained LuaRocks packages with possible dependencies on ``nerv`` or other
packages. The package system provides with good isolation so that the
contributions can be better managed and decoupled from core NERV.

.. _LuaRocks: https://luarocks.org/

Isolation v.s. Completeness
---------------------------

The loosely organized nature of Lua and the package manager LuaRocks give us
many possibilities in abstraction and collaboration. However, since no typical
patterns are really enforced by the Lua language, it is impossible to merely
hope the compiler or interpreter can regulate the implementation by all
contributors. As mentioned in NERV's overview document, one problem of Torch is
it strives to isolate components and wrap them up respectively into different
LuaRocks packages, which is seemingly a good choice for collaboration, however
not very wise in the long run. The methodology of such "collaboration" leads to
no collaboration at all. Under such methodology, each user has the tendency to
build her/his own package and the reluctance to merge others' code. This leads
to less and less shared code base and gradually erodes the completeness of a
toolkit.

When a new functionality is being added to NERV, there are several approaches,
where each has its merits and demerits. Therefore, here, we describe each
possibility and stipulate under which condition should the contributor takes it
as the resort.

- A gentle *modifition* (mod or "hacking"): just as those in video games, a
  mod is like a temporary patch applied to the original toolkit that slightly
  *overrides* some default features or behaviors. Thanks to the looseness of
  Lua, any NERV components can be altered or overriden by simply redefining the
  set of functions or classes that should be modified in the user script after
  loading the default ones. These modifications are only legal in user scripts,
  reflecting the difference between a task-specific user script with the
  standard one. The advantage of such approach is to confine the modifications
  into one place so all users can use the same toolkit code base while leaving
  modifications visible to others, rather than hacking the official source
  directly and individually which ends up in different code bases that cannot
  be shared and are difficult to detect modifications to synchronize the implementations.

  We encourage end users should first try this way if the default behavior of
  NERV cannot be changed to suit your needs due to limited options or
  generality. No matter how general your alternative approaches are, try this
  at first to make sure your implementation works as expected without touching
  the shared code base.  After that, if your modifications are meaningful for
  many other tasks, which means, general enough, please abstract out the
  non-task-specific part and consider directly contribute to the shared code base
  (take other approaches listed below).

- Making a *LuaRocks package*: a LuaRocks package is meant to be shared among the
  users who demand an extra common functionality:

  - which is not generally needed by the majority (e.g., an unusual network
    structure or training method, etc.), or
  - which is experimental, so temporarily cannot be merged into NERV (due to
    some implementation or stability issues), or
  - which is naturally a self-contained or de-coupled extension for NERV (e.g,
    I/O readers), or
  - contains modifications or feature enhancements written in not only Lua but
    also C/C++ (e.g, efficient data processing or new layer computations).

  Please note that making a hybrid LuaRocks package containing C/C++
  implementations might be a little difficult for the contributors who are not
  very familiar with writing ``Makefile`` or similar C/C++ auto building
  scripts. However, it is extremely easy to write a LuaRocks package in pure
  Lua or to convert a above-mentioned Lua modification into a valid package.

- Creating a git *branch* from "master": this measure is usually taken by
  developers or contributors who know well about the NERV internals. This
  branching technique can be used under the following circumstances:

  - Core developers make major changes to NERV that can possibly break the
    existing functionalities.
  - Core developers merge major changes from pull request.
  - Contributors make contributions in C/C++ code.
  - Contributors submit their LuaRocks packages.
  - End users need to locally modify the C/C++ code to change the default behavior
    (these branches will only exist in their local repositories and are less
    likely to be merged into the official master branch unless they generalize
    them and send pull requests to core developers).

  Contributors should keep the changes in their branches clear and should not
  make changes that can only run correctly on their own tasks or with
  particular settings, nor should they break the existing functionalities of
  NERV. The developers need to carefully review and qualify the changes by
  understanding the meaning of each line of code as well as the possible
  side-effects, if exist, leave comments to explain.

- Copying code: this is only for testing or personal use. It is *NOT* a
  correct way of collaboration or contribution.

When making a Lua modification or LuaRocks package as mentioned, end users or
contributors should always keep in mind the following principles:

- Try to disentangle the original issue by abstraction.
- Try to consider whether the solution could be generalized to solve others' problems.
- Try to override the default components (implemented by functions, classes) as
  "high-level" as possible. For example, when there is an opportunity to
  achieve your goal by hacking a trainer (scheduler), DO NOT change
  implementations for layers or buffers or even CUDA implementation. When there
  is a change of changing one function of a trainer, DO NOT re-implement the
  whole trainer.
- Try to follow the coding convention in the official code.

Workflows
---------

- End users usually slightly adjust the behavior of NERV via *modifications* if
  options do not help much. These mods are only for local use.

- For a contributor, when there is a common need of an additional
  functionality:
  
  1. Fork the ``nerv-speech``: make a local branch with a concise name consists
     of only lower case alphabets, digits or hyphens (regex: ``[a-z][a-z0-9-]*``).

  2. Generalize your modifications into a LuaRocks package (naming convention:
     ``[a-z][a-z0-9_]*``).
     
  3. Put the LuaRocks package as a new directory under the root directory of
     ``nerv-speech``. Include possible tutorials in ``/tutorial`` if any.
     Package documents should be located at ``doc`` directory of your
     package. All documents should be in plain-text format, however,
     human-readable lightweight markup formats are preferred, such as
     Markdown or reStructuredText. DO NOT change other directories in
     ``nerv-speech``.

  4. Commit your changes with a brief but meaningful message. Try to stash your
     commits to a single commit if there are too many. Avoid meaningless
     messages such as "...".

  5. Send a pull request of your branch to the developers.

- For those contributors interested in contributing to core NERV:

  1. Fork the ``nerv``: make a local branch with a concise name consist
     of only lower case alphabets, digits or hyphens (regex: ``[a-z][a-z0-9-]*``).

  2. Make changes.
  3. Commit your changes with a brief but meaningful message. Try to stash your
     commits to a single one if there are too many. Avoid meaningless
     messages such as "...".

  4. Send a pull request of your branch to the developers.

- Developers could only merge the tested code written with appropriate coding
  convention.

- A stable release is denoted by a Git tag with version number as its name.
- The version number is in the format of: ``<prefix>-<major number>.<minor
  number>``, where the ``<prefix>-`` and ``.<minor number>`` are optional. Here
  are some examples:

  - ``alpha-1``
  - ``alpha-1.1``
  - ``alpha-4``
  - ``beta-1.2``
  - ``beta-1.21``
  - ``1.0``

- For a given version, the complete release is the commit tagged by the largest
  version number which does not exceed the given number in both repositories,
  i.e., ``nerv`` and ``nerv-speech``. End users should checkout the latest
  version for general use by the tags with the largest version number in both
  repositories, for checking out, please refer to ``README.rst`` in ``nerv``.

- Developers must test major tasks on the version that is going to be tagged.