summaryrefslogtreecommitdiff
path: root/tutorial/howto_pretrain_from_kaldi.rst
blob: 95b5f36fe72cc09bc03dc211c504f28a32cd3d60 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
How to Use a Pretrained nnet Model from Kaldi
=============================================

:author: Ted Yin (mfy43) <ted.sybil@gmail.com>
:abstract: Instruct on how to pretrain a basic dnn with timit dataset using
           Kaldi and then convert the pretrained model to nerv format to let
           NERV finetune. Finally it shows two possible ways to decode the
           finetuned model in Kaldi framework.

- Locate the egs/timit inside Kaldi trunk directory.

- Configure ``cmd.sh`` and ``path.sh`` according to your machine setting.

- Open the ``run.sh`` and locate the line saying ``exit 0 # From this point
  you can run Karel's DNN: local/nnet/run_dnn.sh``. Uncomment this line. This
  is because in this tutorial, we only want to train a basic tri-phone DNN,
  so we simply don't do MMI training, system combination or fancy things like
  these.

- Run ``./run.sh`` to start the training stages. After that, we will get
  tri-phone GMM-HMM trained and the aligned labels. Let's move forward to
  pretrain a DNN.

- Open ``local/nnet/run_dnn.sh``, there are again several stages. Note that
  the first stage is what we actually need (pretraining the DNN), since in
  this tutorial we want to demonstrate how to get the pretrained model from
  stage 1, replace stage 2 with NERV (finetune per-frame cross-entropy), and
  decode using the finetuned network. However, here we add a line ``exit 0``
  after stage 2 to preserve stage 2 in order to compare the NERV result
  against the standard one (the decode result using finetuned model produced
  by the original stage 2).

- Run ``local/nnet/run_dnn.sh`` (first two stages).
- You'll find directory like ``dnn4_pretrain-dbn`` and ``dnn4_pretrain-dbn_dnn`` inside the ``exp/``. They correspond to stage 1 and stage 2 respectively. To use NERV to do stage 2 instead, we need the pretrained network and the global transformation from stage 1:
  
  - Check the file ``exp/dnn4_pretrain-dbn/6.dbn`` exists. (pretrained network)
  - Check the file ``exp/dnn4_pretrain-dbn/tr_splice5_cmvn-g.nnet`` exists. (global transformation)
  - Run script from ``kaldi_io/tools/convert_from_kaldi_pretrain.sh`` to generate the parameters for the output layer and the script files for training and cross-validation set.

  - The previous conversion commands will automatically give identifiers to the
    parameters read from the Kaldi network file. The identifiers are like, for
    example, ``affine0_ltp`` and ``bias0``. These names should correspond to
    the identifiers used in the declaration of the network. Luckily, this
    tutorial comes with a written network declaration at
    ``nerv/examples/timit_baseline2.lua``.

- Copy the file ``nerv/examples/timit_baseline2.lua`` to
  ``timit_mybaseline.lua``, and change the line containing ``/speechlab`` to
  your own setting.

- Start the NERV training by ``install/bin/nerv nerv/examples/asr_trainer.lua timit_mybaseline.lua``.

  - ``install/bin/nerv`` is the program which sets up the NERV environment,

  - followed by an argument ``nerv/examples/asr_trainer.lua`` which is the script
    you actually want to run (the general DNN training scheduler),

  - followed by an argument ``timit_mybaseline.lua`` to the scheduler,
    specifying the network you want to train and some relevant settings, such
    as where to find the initialized parameters and learning rate, etc.