1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
|
How to Use a Pretrained nnet Model from Kaldi
=============================================
:author: Ted Yin (mfy43) <[email protected]>
:abstract: Instruct on how to pretrain a basic dnn with timit dataset using
Kaldi and then convert the pretrained model to nerv format to let
NERV finetune. Finally it shows two possible ways to decode the
finetuned model in Kaldi framework.
- Locate the egs/timit inside Kaldi trunk directory.
- Configure ``cmd.sh`` and ``path.sh`` according to your machine setting.
- Open the ``run.sh`` and locate the line saying ``exit 0 # From this point
you can run Karel's DNN: local/nnet/run_dnn.sh``. Uncomment this line. This
is because in this tutorial, we only want to train a basic tri-phone DNN,
so we simply don't do MMI training, system combination or fancy things like
these.
- Run ``./run.sh`` to start the training stages. After that, we will get
tri-phone GMM-HMM trained and the aligned labels. Let's move forward to
pretrain a DNN.
- Open ``local/nnet/run_dnn.sh``, there are again several stages. Note that
the first stage is what we actually need (pretraining the DNN), since in
this tutorial we want to demonstrate how to get the pretrained model from
stage 1, replace stage 2 with NERV (finetune per-frame cross-entropy), and
decode using the finetuned network. However, here we add a line ``exit 0``
after stage 2 to preserve stage 2 in order to compare the NERV result
against the standard one (the decode result using finetuned model produced
by the original stage 2).
- Run ``local/nnet/run_dnn.sh`` (first two stages).
- You'll find directory like ``dnn4_pretrain-dbn`` and
``dnn4_pretrain-dbn_dnn`` inside the ``exp/``. They correspond to stage 1 and
stage 2 respectively. To use NERV to do stage 2 instead, we need the
pretrained network and the global transformation from stage 1:
- Check the file ``exp/dnn4_pretrain-dbn/6.dbn`` exists. (pretrained network)
- Check the file ``exp/dnn4_pretrain-dbn/tr_splice5_cmvn-g.nnet`` exists. (global transformation)
- Run script from ``kaldi_io/tools/convert_from_kaldi_pretrain.sh`` to
generate the parameters for the output layer and the script files for
training and cross-validation set.
- The previous conversion commands will automatically give identifiers to the
parameters read from the Kaldi network file. The identifiers are like, for
example, ``affine0_ltp`` and ``bias0``. These names should correspond to
the identifiers used in the declaration of the network. Luckily, this
tutorial comes with a written network declaration at
``nerv/examples/timit_baseline2.lua``.
- Copy the file ``nerv/examples/timit_baseline2.lua`` to
``timit_mybaseline.lua``, and change the line containing ``/speechlab`` to
your own setting.
- Start the NERV training by ``install/bin/nerv nerv/examples/asr_trainer.lua timit_mybaseline.lua``.
- ``install/bin/nerv`` is the program which sets up the NERV environment,
- followed by an argument ``nerv/examples/asr_trainer.lua`` which is the script
you actually want to run (the general DNN training scheduler),
- followed by an argument ``timit_mybaseline.lua`` to the scheduler,
specifying the network you want to train and some relevant settings, such
as where to find the initialized parameters and learning rate, etc.
- Finally, after about 13 iterations, the funetune ends. There are two ways to
decode your model:
- Plan A:
- Open your ``timit_mybaseline.lua`` again and modify ``decode_param`` to
your final chunk file (the file with an extension ``.nerv``) and your
global transformation chunk file once used in training. This part lets
the decoder know about the set of parameters for decoding.
- Copy the script ``nerv/speech/kaldi_io/README.timit`` to your Kaldi
working directory (``timit/s5``) and modify the paths listed in the
script.
- Run the modified ``README.timit`` in ``s5`` directory (where there is the
``path.sh``).
- After decoding, run ``bash RESULT exp/dnn4_nerv`` to see the results.
- Plan B: In this plan, we manually convert the trained model back to Kaldi
nnet format, and use Kaldi to decode.
- Create a copy of ``nerv/speech/kaldi_io/tools/nerv_to_kaldi.lua``.
- Modify the list named ``lnames`` to list the name of layers you want to
put into the output Kaldi parameter file in order. (You don't actually
need to change for this tutorial) You may ask why the NERV-to-Kaldi
converstion is so cumbersome. This is because Kaldi nnet is a special
case of more general NERV toolkit --- it only allows stacked DNNs and
therefore Kaldi-to-NERV conversion is lossless but the other direction is
not. Your future NERV network may have multiple branches and that's why
you need to specify how to select and "stack" your layers in the Kaldi
parameter output.
- Do the conversion by:
::
cat your_trained_params.nerv your_global_trans.nerv > all.nerv
install/bin/nerv nerv_to_kaldi.lua timit_mybaseline.lua all.nerv your_kaldi_output.nnet
- Finally, locate the directory of stage 2: ``exp/dnn4_pretrain-dbn_dnn``
and temporarily change the symbolic link for the final network file to the converted one:
::
cd exp/dnn4_pretrain-dbn_dnn
mv final.nnet final.nnet.orig
ln -sv your_kaldi_output.nnet final.nnet
Then proceed a normal Kaldi decoding.
|