Collaboration Rules =================== Introduction ------------ This document attempts to stipulate the rules and typical workflows that push forward NERV development. It may be updated or complemented with more details in future. Anyone who intends to contribute to the official repository must read this document before (s)he makes any pull requests to the development group or merges the changes into to the repository with permission. Repository ---------- The latest stable and on-going code are hosted at SpeechLab and maintained with the help of Git, a distributed version control system. Despite the "distribute" nature of the tool, our project management is centralized, just like Linux kernel development which was the original use case of Git. The NERV project, in a general sense, includes two major sub-projects whose names known as ``nerv`` and ``nerv-speech``, respectively. The former contains the core part of NERV, which includes a general deep learning implementation. The latter, ``nerv-speech``, provides with modules (classes) that comply to the API of core NERV and offer supports (e.g., for I/O) that are relevant to speech and language processing (such as reading HTK/Kaldi features and labels). Like Torch, NERV uses LuaRocks_ to manage optional components as *packages*. When running ``make`` in ``nerv`` repository root directory, LuaRocks and LuaJIT (compiler) will be first setup, then a LuaRock package named ``nerv`` will then be built and installed via LuaRocks, which is to say, the core part of NERV is contained in a single LuaRocks package, ``nerv``. Next, by invoking ``make speech``, several speech processing packages (such as ``htk_io``, ``kaldi_io``, etc) will be compiled and installed from ``nerv/speech`` which ought to be checked out from ``nerv-speech`` repository. Therefore, thanks to the flexibility of Lua and the modularity brought by LuaRocks, new functionalities can be added to NERV and managed in a clear way by building self-contained LuaRocks packages with possible dependencies on ``nerv`` or other packages. The package system provides with good isolation so that the contributions can be better managed and decoupled from core NERV. .. _LuaRocks: https://luarocks.org/ Isolation v.s. Completeness --------------------------- The loosely organized nature of Lua and the package manager LuaRocks give us many possibilities in abstraction and collaboration. However, since no typical patterns are really enforced by the Lua language, it is impossible to merely hope the compiler or interpreter can regulate the implementation by all contributors. As mentioned in NERV's overview document, one problem of Torch is it strives to isolate components and wrap them up respectively into different LuaRocks packages, which is seemingly a good choice for collaboration, however not very wise in the long run. The methodology of such "collaboration" leads to no collaboration at all. Under such methodology, each user has the tendency to build her/his own package and the reluctance to merge others' code. This leads to less and less shared code base and gradually erodes the completeness of a toolkit. When a new functionality is being added to NERV, there are several approaches, where each has its merits and demerits. Therefore, here, we describe each possibility and stipulate under which condition should the contributor takes it as the resort. - A gentle *modifition* (mod or "hacking"): just as those in video games, a mod is like a temporary patch applied to the original toolkit that slightly *overrides* some default features or behaviors. Thanks to the looseness of Lua, any NERV components can be altered or overriden by simply redefining the set of functions or classes that should be modified in the user script after loading the default ones. These modifications are only legal in user scripts, reflecting the difference between a task-specific user script with the standard one. The advantage of such approach is to confine the modifications into one place so all users can use the same toolkit code base while leaving modifications visible to others, rather than hacking the official source directly and individually which ends up in different code bases that cannot be shared and are difficult to detect modifications to synchronize the implementations. We encourage end users should first try this way if the default behavior of NERV cannot be changed to suit your needs due to limited options or generality. No matter how general your alternative approaches are, try this at first to make sure your implementation works as expected without touching the shared code base. After that, if your modifications are meaningful for many other tasks, which means, general enough, please abstract out the non-task-specific part and consider directly contribute to the shared code base (take other approaches listed below). - Making a *LuaRocks package*: a LuaRocks package is meant to be shared among the users who demand an extra common functionality: - which is not generally needed by the majority (e.g., an unusual network structure or training method, etc.), or - which is experimental, so temporarily cannot be merged into NERV (due to some implementation or stability issues), or - which is naturally a self-contained or de-coupled extension for NERV (e.g, I/O readers), or - contains modifications or feature enhancements written in not only Lua but also C/C++ (e.g, efficient data processing or new layer computations). Please note that making a hybrid LuaRocks package containing C/C++ implementations might be a little difficult for the contributors who are not very familiar with writing ``Makefile`` or similar C/C++ auto building scripts. However, it is extremely easy to write a LuaRocks package in pure Lua or to convert a above-mentioned Lua modification into a valid package. - Creating a git *branch* from "master": this measure is usually taken by developers or contributors who know well about the NERV internals. This branching technique can be used under the following circumstances: - Core developers make major changes to NERV that can possibly break the existing functionalities. - Core developers merge major changes from pull request. - Contributors make contributions in C/C++ code. - Contributors submit their LuaRocks packages. - End users need to locally modify the C/C++ code to change the default behavior (these branches will only exist in their local repositories and are less likely to be merged into the official master branch unless they generalize them and send pull requests to core developers). Contributors should keep the changes in their branches clear and should not make changes that can only run correctly on their own tasks or with particular settings, nor should they break the existing functionalities of NERV. The developers need to carefully review and qualify the changes by understanding the meaning of each line of code as well as the possible side-effects, if exist, leave comments to explain. - Copying code: this is only for testing or personal use. It is *NOT* a correct way of collaboration or contribution. When making a Lua modification or LuaRocks package as mentioned, end users or contributors should always keep in mind the following principles: - Try to disentangle the original issue by abstraction. - Try to consider whether the solution could be generalized to solve others' problems. - Try to override the default components (implemented by functions, classes) as "high-level" as possible. For example, when there is an opportunity to achieve your goal by hacking a trainer (scheduler), DO NOT change implementations for layers or buffers or even CUDA implementation. When there is a change of changing one function of a trainer, DO NOT re-implement the whole trainer. - Try to follow the coding convention in the official code. Workflows --------- - End users usually slightly adjust the behavior of NERV via *modifications* if options do not help much. These mods are only for local use. - For a contributor, when there is a common need of an additional functionality: 1. Fork the ``nerv-speech``: make a local branch with a concise name consists of only lower case alphabets, digits or hyphens (regex: ``[a-z][a-z0-9-]*``). 2. Generalize your modifications into a LuaRocks package (naming convention: ``[a-z][a-z0-9_]*``). 3. Put the LuaRocks package as a new directory under the root directory of ``nerv-speech``. Include possible tutorials in ``/tutorial`` if any. Package documents should be located at ``doc`` directory of your package. All documents should be in plain-text format, however, human-readable lightweight markup formats are preferred, such as Markdown or reStructuredText. DO NOT change other directories in ``nerv-speech``. 4. Commit your changes with a brief but meaningful message. Try to stash your commits to a single commit if there are too many. Avoid meaningless messages such as "...". 5. Send a pull request of your branch to the developers. - For those contributors interested in contributing to core NERV: 1. Fork the ``nerv``: make a local branch with a concise name consist of only lower case alphabets, digits or hyphens (regex: ``[a-z][a-z0-9-]*``). 2. Make changes. 3. Commit your changes with a brief but meaningful message. Try to stash your commits to a single one if there are too many. Avoid meaningless messages such as "...". 4. Send a pull request of your branch to the developers. - Developers could only merge the tested code written with appropriate coding convention. - A stable release is denoted by a Git tag with version number as its name. - The version number is in the format of: ``-.``, where the ``-`` and ``.`` are optional. Here are some examples: - ``alpha-1`` - ``alpha-1.1`` - ``alpha-4`` - ``beta-1.2`` - ``beta-1.21`` - ``1.0`` - For a given version, the complete release is the commit tagged by the largest version number which does not exceed the given number in both repositories, i.e., ``nerv`` and ``nerv-speech``. End users should checkout the latest version for general use by the tags with the largest version number in both repositories, for checking out, please refer to ``README.rst`` in ``nerv``. - Developers must test major tasks on the version that is going to be tagged.