adapt kaldi_decode propagator to the new arch

author: Determinant <ted.sybil@gmail.com> 2016-03-12 13:36:59 +0800
committer: Determinant <ted.sybil@gmail.com> 2016-03-12 13:36:59 +0800
commit: ddc4545050b41d12cfdc19cea9ba31c940d3d537 (patch)
tree: b47b54949885a11de97c1406c3a61ab7b0ffeb56
parent: 54b33aa3a95f5a7a023e9ea453094ae081c91f64 (diff)
3 files changed, 78 insertions, 42 deletions
diff --git a/kaldi_decode/README.timit b/kaldi_decode/README.timit
index 0a3e33a..4c4e310 100755
--- a/kaldi_decode/README.timit
+++ b/kaldi_decode/README.timit
@@ -5,8 +5,8 @@ source cmd.sh
 gmmdir=/speechlab/users/mfy43/timit/s5/exp/tri3/
 data_fmllr=/speechlab/users/mfy43/timit/s5/data-fmllr-tri3/
 dir=/speechlab/users/mfy43/timit/s5/exp/dnn4_nerv_dnn/
-nerv_config=/speechlab/users/mfy43/nerv/nerv/examples/timit_baseline2.lua
-decode=/speechlab/users/mfy43/nerv/install/bin/decode_with_nerv.sh
+nerv_config=/speechlab/users/mfy43/timit/s5/timit_baseline2.lua
+decode=/speechlab/users/mfy43/timit/s5/nerv/install/bin/decode_with_nerv.sh
 
 # Decode (reuse HCLG graph)
 $decode --nj 20 --cmd "$decode_cmd" --acwt 0.2 \
diff --git a/kaldi_decode/src/asr_propagator.lua b/kaldi_decode/src/asr_propagator.lua
index 5d0ad7c..4005875 100644
--- a/kaldi_decode/src/asr_propagator.lua
+++ b/kaldi_decode/src/asr_propagator.lua
@@ -24,6 +24,9 @@ function build_propagator(ifname, feature)
     local input_order = get_decode_input_order()
     local readers = make_decode_readers(feature, layer_repo)
 
+    network = nerv.Network("nt", gconf, {network = network})
+    global_transf = nerv.Network("gt", gconf, {network = global_transf})
+
     local batch_propagator = function()
         local data = nil
         for ri = 1, #readers do
@@ -38,7 +41,10 @@ function build_propagator(ifname, feature)
         end
 
         gconf.batch_size = data[input_order[1].id]:nrow()
-        network:init(gconf.batch_size)
+        global_transf:init(gconf.batch_size, 1)
+        global_transf:epoch_init()
+        network:init(gconf.batch_size, 1)
+        network:epoch_init()
 
         local input = {}
         for i, e in ipairs(input_order) do
@@ -58,7 +64,14 @@ function build_propagator(ifname, feature)
             table.insert(input, transformed)
         end
         local output = {nerv.MMatrixFloat(input[1]:nrow(), network.dim_out[1])}
-        network:propagate(input, output)
+        network:mini_batch_init({seq_length = table.vector(gconf.batch_size, 1),
+                                new_seq = {},
+                                do_train = false,
+                                input = {input},
+                                output = {output},
+                                err_input = {},
+                                err_output = {}})
+        network:propagate()
         
         local utt = data["key"]
         if utt == nil then
@@ -74,6 +87,7 @@ end
 
 function init(config, feature)
     dofile(config)
+    gconf.mmat_type = nerv.MMatrixFloat
     gconf.use_cpu = true -- use CPU to decode
     trainer = build_propagator(gconf.decode_param, feature)
 end
diff --git a/tutorial/howto_pretrain_from_kaldi.rst b/tutorial/howto_pretrain_from_kaldi.rst
index 8f6e0ad..2e8d674 100644
--- a/tutorial/howto_pretrain_from_kaldi.rst
+++ b/tutorial/howto_pretrain_from_kaldi.rst
@@ -1,9 +1,9 @@
-How to Use a Pretrained nnet Model from Kaldi
-=============================================
+How to Use a Pre-trained nnet Model from Kaldi
+==============================================
 
 :author: Ted Yin (mfy43) <ted.sybil@gmail.com>
-:abstract: Instruct on how to pretrain a basic dnn with timit dataset using
-           Kaldi and then convert the pretrained model to nerv format to let
+:abstract: Instruct on how to pre-train a basic dnn with timit dataset using
+           Kaldi and then convert the pre-trained model to nerv format to let
            NERV finetune. Finally it shows two possible ways to decode the
            finetuned model in Kaldi framework.
 
@@ -29,11 +29,11 @@ How to Use a Pretrained nnet Model from Kaldi
 
 - Run ``./run.sh`` (at ``<timit_home>``) to start the training stages. After that, we will get
   tri-phone GMM-HMM trained and the aligned labels. Let's move forward to
-  pretrain a DNN.
+  pre-train a DNN.
 
 - Open ``<timit_home>/local/nnet/run_dnn.sh``, there are again several stages.
-  Note that the first stage is what we actually need (pretraining the DNN),
-  since in this tutorial we want to demonstrate how to get the pretrained model
+  Note that the first stage is what we actually need (pre-training the DNN),
+  since in this tutorial we want to demonstrate how to get the pre-trained model
   from stage 1, replace stage 2 with NERV (finetune per-frame cross-entropy),
   and decode using the finetuned network. However, here we add a line ``exit
   0`` after stage 2 to preserve stage 2 in order to compare the NERV result
@@ -44,10 +44,10 @@ How to Use a Pretrained nnet Model from Kaldi
 - You'll find directory like ``dnn4_pretrain-dbn`` and
   ``dnn4_pretrain-dbn_dnn`` inside the ``<timit_home>/exp/``. They correspond
   to stage 1 and stage 2 respectively. To use NERV to do stage 2 instead, we
-  need the pretrained network and the global transformation from stage 1:
-  
+  need the pre-trained network and the global transformation from stage 1:
+
   - Check the file ``<timit_home>/exp/dnn4_pretrain-dbn/6.dbn`` exists.
-    (pretrained network)
+    (pre-trained network)
 
   - Check the file
     ``<timit_home>/exp/dnn4_pretrain-dbn/tr_splice5_cmvn-g.nnet`` exists.
@@ -55,28 +55,46 @@ How to Use a Pretrained nnet Model from Kaldi
 
   - Run script from ``<nerv_home>/speech/kaldi_io/tools/convert_from_kaldi_pretrain.sh`` to
     generate the parameters for the output layer and the script files for
-    training and cross-validation set.
-
-  - The previous conversion commands will automatically give identifiers to the
-    parameters read from the Kaldi network file. The identifiers are like, for
-    example, ``affine0_ltp`` and ``bias0``. These names should correspond to
-    the identifiers used in the declaration of the network. Luckily, this
-    tutorial comes with a written network declaration at
-    ``<nerv_home>/nerv/examples/timit_baseline2.lua``.
+    training and cross-validation set. A new diretory will be created at
+    ``<timit_home>/exp/dnn4_nerv_dnn`` with following files:
+
+    - ``nnet_init.nerv``: the converted NERV chunk file containing all pre-trained parameters
+    - ``nnet_trans.nerv``: the converted NERV chunk file containing global transformation
+    - ``nnet_output.proto``: used for random generation of parameters (Kaldi)
+    - ``nnet_output.init``: the randomly generated parameters of output layer in Kaldi nnet format
+    - ``nnet_output.nerv``: the converted NERV chunk file containing parameters of output layer
+    - ``cv.scp``: the script file listing utterances for cross-validation
+    - ``train.scp``: the script file listing utterances for training
+    - ``train_sorted.scp``: sorted version of ``train.scp``
+    - ``final.mdl``: HMM model, used for label generation and decoding
+    - ``ali_train_pdf.counts``: used for decoding
+    - ``tree``: used in decoding
+
+  - The conversion commands in ``convert_from_kaldi_pretrain.sh`` will
+    automatically give identifiers to the parameters read from the Kaldi
+    network file. The identifiers are like, for example, ``affine0_ltp`` and
+    ``bias0``. These names should correspond to the identifiers used in the
+    declaration of the network. Luckily, this tutorial comes with a written
+    network declaration at ``<nerv_home>/nerv/examples/timit_baseline2.lua``.
+    Have a look at ``nnet_init.nerv`` and ``timit_baseline2.lua``.
 
 - Copy the file ``<nerv_home>/nerv/examples/timit_baseline2.lua`` to
   ``<timit_home>/timit_mybaseline.lua``, and change the line containing
-  ``/speechlab`` to your own setting.
+  ``/speechlab`` to your own setting. Also change the dimension of the output
+  layer to the number of tied phone states (change all ``1959`` to your number
+  of states, you can peek ``nnet_output.nerv`` to make sure), because each run
+  of previous Kaldi script would yield a slightly different number.
 
 - Start the NERV training by
-  
+
   ::
-  
+
     <nerv_home>/install/bin/nerv <nerv_home>/nerv/examples/asr_trainer.lua timit_mybaseline.lua
 
   (at ``<timit_home>``).
 
-  - ``<nerv_home>/install/bin/nerv`` is the program which sets up the NERV environment,
+  - ``<nerv_home>/install/bin/nerv`` is the program which sets up the NERV
+    environment,
 
   - followed by an argument ``<nerv_home>/nerv/examples/asr_trainer.lua`` which
     is the script you actually want to run (the general DNN training
@@ -86,22 +104,26 @@ How to Use a Pretrained nnet Model from Kaldi
     specifying the network you want to train and some relevant settings, such
     as where to find the initialized parameters and learning rate, etc.
 
-- Finally, after about 13 iterations, the funetune ends. There are two ways to
-  decode your model:
-  
+- Finally, after about 13 iterations, the funetune ends. You will find the
+  trained models in directory named like ``nerv_*`` in your current working
+  directory. Used the one that has the highest cv (cross-validation) value and
+  ends with the extension ``.nerv``. There are two ways to decode your model:
+
   - Plan A:
-    
-    - Open your ``timit_mybaseline.lua`` again and modify ``decode_param`` to
-      your final chunk file (the file with an extension ``.nerv``) and your
-      global transformation chunk file once used in training. This part lets
-      the decoder know about the set of parameters for decoding.
 
-    - Copy the script ``<nerv_home>/nerv/speech/kaldi_io/README.timit`` to
+    - Open your ``timit_mybaseline.lua`` again and modify the first model in
+      ``decode_param`` to your final chunk file (the file with an extension
+      ``.nerv``) and your global transformation chunk file once used in
+      training (just keep it the same as the one in ``initialized_params``.
+      This configuration lets the decoder know about the set of parameters it
+      should use for decoding.
+
+    - Copy the script ``<nerv_home>/nerv/speech/kaldi_decode/README.timit`` to
       ``<timit_home>`` and modify the paths listed in the script.
 
     - Run the modified ``README.timit`` (at ``<timit_home>``).
 
-    - After decoding, run ``bash RESULT exp/dnn4_nerv_dnn`` to see the results.
+    - After decoding, run ``bash RESULTS exp/dnn4_nerv_dnn`` to see the results.
 
   - Plan B: In this plan, we manually convert the trained model back to Kaldi
     nnet format, and use Kaldi to decode.
@@ -112,26 +134,26 @@ How to Use a Pretrained nnet Model from Kaldi
       put into the output Kaldi parameter file in order. (You don't actually
       need to change for this tutorial) You may ask why the NERV-to-Kaldi
       converstion is so cumbersome. This is because Kaldi nnet is a special
-      case of more general NERV toolkit --- it only allows stacked DNNs and
+      case of more general NERV toolkit -- it only allows stacked DNNs and
       therefore Kaldi-to-NERV conversion is lossless but the other direction is
       not. Your future NERV network may have multiple branches and that's why
       you need to specify how to select and "stack" your layers in the Kaldi
       parameter output.
 
     - Do the conversion by:
-     
+
       ::
-     
-         <nerv_home>/install/bin/nerv nerv_to_kaldi.lua timit_mybaseline.lua your_trained_params.nerv your_kaldi_output.nnet
+
+         <nerv_home>/install/bin/nerv --use-cpu nerv_to_kaldi.lua timit_mybaseline.lua <your_trained_params>.nerv <path_to_converted>.nnet
 
     - Finally, locate the directory of stage 2:
       ``<timit_home>/exp/dnn4_pretrain-dbn_dnn`` and temporarily change the
       symbolic link for the final network file to the converted one:
 
       ::
-        
+
          cd <timit_home>/exp/dnn4_pretrain-dbn_dnn
          mv final.nnet final.nnet.orig
-         ln -sv your_kaldi_output.nnet final.nnet
+         ln -sv <path_to_converted>.nnet final.nnet
 
       Then proceed a normal Kaldi decoding.
author	Determinant <ted.sybil@gmail.com>	2016-03-12 13:36:59 +0800
committer	Determinant <ted.sybil@gmail.com>	2016-03-12 13:36:59 +0800
commit	ddc4545050b41d12cfdc19cea9ba31c940d3d537 (patch)
tree	b47b54949885a11de97c1406c3a61ab7b0ffeb56
parent	54b33aa3a95f5a7a023e9ea453094ae081c91f64 (diff)