summaryrefslogtreecommitdiff
path: root/tutorial/howto_pretrain_from_kaldi.rst
diff options
context:
space:
mode:
authorDeterminant <[email protected]>2016-02-29 20:03:52 +0800
committerDeterminant <[email protected]>2016-02-29 20:03:52 +0800
commit1e0ac0fb5c9f517e7325deb16004de1054454da7 (patch)
treec75a6f0fc9aa50caa9fb9dccec7a56b41d3b63fd /tutorial/howto_pretrain_from_kaldi.rst
parentfda1c8cf07c5130aff53775454a5f2cfc8f5d2e0 (diff)
refactor kaldi_decode
Diffstat (limited to 'tutorial/howto_pretrain_from_kaldi.rst')
-rw-r--r--tutorial/howto_pretrain_from_kaldi.rst60
1 files changed, 60 insertions, 0 deletions
diff --git a/tutorial/howto_pretrain_from_kaldi.rst b/tutorial/howto_pretrain_from_kaldi.rst
new file mode 100644
index 0000000..95b5f36
--- /dev/null
+++ b/tutorial/howto_pretrain_from_kaldi.rst
@@ -0,0 +1,60 @@
+How to Use a Pretrained nnet Model from Kaldi
+=============================================
+
+:author: Ted Yin (mfy43) <[email protected]>
+:abstract: Instruct on how to pretrain a basic dnn with timit dataset using
+ Kaldi and then convert the pretrained model to nerv format to let
+ NERV finetune. Finally it shows two possible ways to decode the
+ finetuned model in Kaldi framework.
+
+- Locate the egs/timit inside Kaldi trunk directory.
+
+- Configure ``cmd.sh`` and ``path.sh`` according to your machine setting.
+
+- Open the ``run.sh`` and locate the line saying ``exit 0 # From this point
+ you can run Karel's DNN: local/nnet/run_dnn.sh``. Uncomment this line. This
+ is because in this tutorial, we only want to train a basic tri-phone DNN,
+ so we simply don't do MMI training, system combination or fancy things like
+ these.
+
+- Run ``./run.sh`` to start the training stages. After that, we will get
+ tri-phone GMM-HMM trained and the aligned labels. Let's move forward to
+ pretrain a DNN.
+
+- Open ``local/nnet/run_dnn.sh``, there are again several stages. Note that
+ the first stage is what we actually need (pretraining the DNN), since in
+ this tutorial we want to demonstrate how to get the pretrained model from
+ stage 1, replace stage 2 with NERV (finetune per-frame cross-entropy), and
+ decode using the finetuned network. However, here we add a line ``exit 0``
+ after stage 2 to preserve stage 2 in order to compare the NERV result
+ against the standard one (the decode result using finetuned model produced
+ by the original stage 2).
+
+- Run ``local/nnet/run_dnn.sh`` (first two stages).
+- You'll find directory like ``dnn4_pretrain-dbn`` and ``dnn4_pretrain-dbn_dnn`` inside the ``exp/``. They correspond to stage 1 and stage 2 respectively. To use NERV to do stage 2 instead, we need the pretrained network and the global transformation from stage 1:
+
+ - Check the file ``exp/dnn4_pretrain-dbn/6.dbn`` exists. (pretrained network)
+ - Check the file ``exp/dnn4_pretrain-dbn/tr_splice5_cmvn-g.nnet`` exists. (global transformation)
+ - Run script from ``kaldi_io/tools/convert_from_kaldi_pretrain.sh`` to generate the parameters for the output layer and the script files for training and cross-validation set.
+
+ - The previous conversion commands will automatically give identifiers to the
+ parameters read from the Kaldi network file. The identifiers are like, for
+ example, ``affine0_ltp`` and ``bias0``. These names should correspond to
+ the identifiers used in the declaration of the network. Luckily, this
+ tutorial comes with a written network declaration at
+ ``nerv/examples/timit_baseline2.lua``.
+
+- Copy the file ``nerv/examples/timit_baseline2.lua`` to
+ ``timit_mybaseline.lua``, and change the line containing ``/speechlab`` to
+ your own setting.
+
+- Start the NERV training by ``install/bin/nerv nerv/examples/asr_trainer.lua timit_mybaseline.lua``.
+
+ - ``install/bin/nerv`` is the program which sets up the NERV environment,
+
+ - followed by an argument ``nerv/examples/asr_trainer.lua`` which is the script
+ you actually want to run (the general DNN training scheduler),
+
+ - followed by an argument ``timit_mybaseline.lua`` to the scheduler,
+ specifying the network you want to train and some relevant settings, such
+ as where to find the initialized parameters and learning rate, etc.