aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--nerv/doc/source/overview.rst97
1 files changed, 92 insertions, 5 deletions
diff --git a/nerv/doc/source/overview.rst b/nerv/doc/source/overview.rst
index f28be89..8ff1dd5 100644
--- a/nerv/doc/source/overview.rst
+++ b/nerv/doc/source/overview.rst
@@ -218,8 +218,8 @@ subclass implementing matrices with different types of values:
Layer
*****
-A layer (``nerv.Layer``) in NERV conceptually represents a computation node
-which declaratively defines the computation logic needed to produce the output
+*Layers* (``nerv.Layer``) in NERV conceptually represent computation nodes
+which declaratively define the computation logic needed to produce the output
from the input. This means a layer itself is "oblivious" in a sense that its
computation is time-invariant (except that some layers maintain some auditing
information, which do not change the behavior of the output though) when
@@ -259,7 +259,7 @@ subclasses of ``nerv.Layer``) where the structures are preserved.
Parameter
*********
-Parameters (``nerv.Param``) represents the state of layers (``nerv.Layer``) in
+*Parameters* (``nerv.Param``) represents the state of layers (``nerv.Layer``) in
NERV. They are time-variant during training because of the update. They can be
read from files (in NERV ``nerv.ChunkFile`` format) and written to files. Take
an fully-connected linear layer in a neural network as an example, the layer
@@ -275,7 +275,7 @@ decouples the layer and the corresponding parameters in a clear way.
Buffer
******
-Buffers (``nerv.DataBuffer``), as the name suggests, connect I/O ends with
+*Buffers* (``nerv.DataBuffer``), as the name suggests, connect I/O ends with
different speed (granularity). Buffers in NERV accept variable length of
samples (frames) from readers (``nerv.DataReader``) and produce a regularized
sequence of data to feed as the input to a network (``nerv.Network``). This
@@ -285,7 +285,7 @@ samples together and cut samples into mini-batches.
Scheduler (Trainer)
*******************
-Schedulers refer to those top-level scripts that implements the main training
+*Schedulers* refer to those top-level scripts that implements the main training
loop and ticks the training process. A general-purpose scheduler typically
takes in a Lua script written by an end user that contains description of the
network, task-specific processing for reading data and some hyper-parameters.
@@ -297,3 +297,90 @@ of the scheduler by overriding functions in their Lua script read by the
scheduler. Experienced users can also directly write their own schedulers to
train exotic neural networks that temporarily does not fit into the current
pipeline.
+
+Main Loop
+---------
+
+To demonstrate how NERV works at the top-level and how to put all components
+together, the main loop within a scheduler is shown as follow:
+
+.. image:: _static/nerv-dataflow.svg
+
+Parameter I/O
+*************
+
+At the begining of a training stage, parameters should be initialized either by
+loading from an external file or by certain generation method (such as random
+generation).
+
+NERV stores a parameter instance (``nerv.Param``) in a basic unit called
+*chunk*. Although currently, a chunk always stores a parameter instance,
+chunks by definition can be any sequence of serialized data. This means, by
+design, chunk files (``nerv.ChunkFile``) can store not only parameters, but
+also features and labels in the future. Each chunk contains metadata (such as
+length information) to let NERV skip through the data content (which could be
+extremely large) at the first glance, so NERV can create a temporary index of
+all chunks in a file. This means unlike many other tools, NERV does not scan
+through every bit of the file but only chunk headers until the user need the
+data of a certain chunk.
+
+A ``nerv.ParamRepo`` instance is constructed by the scheduler and its
+``import`` method is called to import the desired parameters (for most of the
+time, all parameters) from a chunk file. The instance will later be used for
+parameter binding. When the training finishes, the instance can be exported to a
+chunk file to save the parameters of the trained model.
+
+Layer & Parameter Binding
+*************************
+
+Then, layers are constructed. A typical scenario, for example, a
+fully-connected DNN, makes use of an outer graph layer to connect several
+sub-level inner affine (linear) layers. The chain-like connections are
+specified as a part of the configuration for the outer graph layer. For a more
+complex case, such as a deep neural network having several LSTM layers, each
+LSTM layer is also a graph layer containing more microscopic structures, i.e.,
+the connections between the internal gates which are implemented by gate
+layers.
+
+After a set of layers is constructed in ``nerv.LayerRepo``, parameters in a
+``nerv.ParamRepo`` are bound to these layers (some layers such as activation
+functions do not need to bind parameters). The missing parameters are
+automatically generated according to the configuration and bound to the layers.
+In some circumstances, the scheduler needs to rebind the parameters to change
+the state of the network.
+
+Network Construction
+********************
+
+As stated above, the final interested network is represented by one outer-most
+graph layer which contains all the other internal layers. This graph layer is
+then passed to the constructor of ``nerv.Network`` and get "compiled" into a
+network instance. The network instance can be trained and updated.
+
+Reader I/O
+**********
+
+At the other side, data are read by data readers (``nerv.DataReader``) and
+passed to the buffer in the form of pairs of a slot identifier and a data matrix.
+Features and labels are both considered to be data here.
+
+Buffering
+*********
+
+The buffer accepts the pairs from readers and try to regularized these data to
+meet the need of the network instance for training. For example, a sample-level
+shuffling buffer (``nerv.FrmBuffer``) will concatenate the data matrices with
+different number of rows to a larger matrix, and shuffle all the rows. Finally,
+it cuts the larger matrix into several equi-length small matrices called "batch
+matrix". There is also another type of buffers, the sequence buffer, which cuts
+the data in an orthogonal style by collecting the first sample from each of a
+fixed number of on-going sequences to form a batch matrix, the second sample
+from each of the sequences and so on.
+
+Network Computation
+*******************
+
+The buffered data are sent into the network input and the propagation method is
+invoked, then the backward propagation method, finally the update method. After
+the update method, parameters bound to the layers in the network will be
+updated.