1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
|
How to Use a Pre-trained nnet Model from Kaldi
==============================================
:author: Ted Yin (mfy43) <[email protected]>
:abstract: Instruct on how to pre-train a basic dnn with timit dataset using
Kaldi and then convert the pre-trained model to nerv format to let
NERV finetune. Finally it shows two possible ways to decode the
finetuned model in Kaldi framework.
- Note: in this tutorial, we use the following notations to denote the directory prefix:
- ``<nerv_home>``: the path of NERV (the location of outer most directory ``nerv``)
- ``<timit_home>``: the working directory of timit (the location of directory ``timit/s5``)
- Locate the ``egs/timit`` inside Kaldi trunk directory.
- Configure ``<timit_home>/cmd.sh`` and ``<timit_home>/path.sh`` according to your machine setting.
- Open the ``<timit_home>/run.sh`` and locate the line saying
::
exit 0 # From this point you can run Karel's DNN: local/nnet/run_dnn.sh
.
Uncomment this line. This is because in this tutorial, we only want to train
a basic tri-phone DNN, so we simply don't do MMI training, system combination
or fancy things like these.
- Run ``./run.sh`` (at ``<timit_home>``) to start the training stages. After that, we will get
tri-phone GMM-HMM trained and the aligned labels. Let's move forward to
pre-train a DNN.
- Open ``<timit_home>/local/nnet/run_dnn.sh``, there are again several stages.
Note that the first stage is what we actually need (pre-training the DNN),
since in this tutorial we want to demonstrate how to get the pre-trained model
from stage 1, replace stage 2 with NERV (finetune per-frame cross-entropy),
and decode using the finetuned network. However, here we add a line ``exit
0`` after stage 2 to preserve stage 2 in order to compare the NERV result
against the standard one (the decode result using finetuned model produced by
the original stage 2).
- Run ``local/nnet/run_dnn.sh`` (at ``<timit_home>``, for first two stages).
- You'll find directory like ``dnn4_pretrain-dbn`` and
``dnn4_pretrain-dbn_dnn`` inside the ``<timit_home>/exp/``. They correspond
to stage 1 and stage 2 respectively. To use NERV to do stage 2 instead, we
need the pre-trained network and the global transformation from stage 1:
- Check the file ``<timit_home>/exp/dnn4_pretrain-dbn/6.dbn`` exists.
(pre-trained network)
- Check the file
``<timit_home>/exp/dnn4_pretrain-dbn/tr_splice5_cmvn-g.nnet`` exists.
(global transformation)
- Run script from ``<nerv_home>/speech/kaldi_io/tools/convert_from_kaldi_pretrain.sh`` to
generate the parameters for the output layer and the script files for
training and cross-validation set. A new diretory will be created at
``<timit_home>/exp/dnn4_nerv_dnn`` with following files:
- ``nnet_init.nerv``: the converted NERV chunk file containing all pre-trained parameters
- ``nnet_trans.nerv``: the converted NERV chunk file containing global transformation
- ``nnet_output.proto``: used for random generation of parameters (Kaldi)
- ``nnet_output.init``: the randomly generated parameters of output layer in Kaldi nnet format
- ``nnet_output.nerv``: the converted NERV chunk file containing parameters of output layer
- ``cv.scp``: the script file listing utterances for cross-validation
- ``train.scp``: the script file listing utterances for training
- ``train_sorted.scp``: sorted version of ``train.scp``
- ``final.mdl``: HMM model, used for label generation and decoding
- ``ali_train_pdf.counts``: used for decoding
- ``tree``: used in decoding
- The conversion commands in ``convert_from_kaldi_pretrain.sh`` will
automatically give identifiers to the parameters read from the Kaldi
network file. The identifiers are like, for example, ``affine0_ltp`` and
``bias0``. These names should correspond to the identifiers used in the
declaration of the network. Luckily, this tutorial comes with a written
network declaration at ``<nerv_home>/nerv/examples/timit_baseline2.lua``.
Have a look at ``nnet_init.nerv`` and ``timit_baseline2.lua``.
- Copy the file ``<nerv_home>/nerv/examples/timit_baseline2.lua`` to
``<timit_home>/timit_mybaseline.lua``, and change the line containing
``/speechlab`` to your own setting. Also change the dimension of the output
layer to the number of tied phone states (change all ``1959`` to your number
of states, you can peek ``nnet_output.nerv`` to make sure), because each run
of previous Kaldi script would yield a slightly different number.
- Start the NERV training by
::
<nerv_home>/install/bin/nerv <nerv_home>/nerv/examples/trainer.lua timit_mybaseline.lua
(at ``<timit_home>``).
- ``<nerv_home>/install/bin/nerv`` is the program which sets up the NERV
environment,
- followed by an argument ``<nerv_home>/nerv/examples/trainer.lua`` which
is the script you actually want to run (the general DNN training
scheduler),
- followed by an argument ``timit_mybaseline.lua`` to the scheduler,
specifying the network you want to train and some relevant settings, such
as where to find the initialized parameters and learning rate, etc.
- Finally, after about 13 iterations, the funetune ends. You will find the
trained models in directory named like ``nerv_*`` in your current working
directory. Used the one that has the highest cv (cross-validation) value and
ends with the extension ``.nerv``. There are two ways to decode your model:
- Plan A:
- Open your ``timit_mybaseline.lua`` again and modify the first model in
``decode_param`` to your final chunk file (the file with an extension
``.nerv``) and your global transformation chunk file once used in
training (just keep it the same as the one in ``initialized_params``.
This configuration lets the decoder know about the set of parameters it
should use for decoding.
- Copy the script ``<nerv_home>/nerv/speech/kaldi_decode/README.timit`` to
``<timit_home>`` and modify the paths listed in the script.
- Run the modified ``README.timit`` (at ``<timit_home>``).
- After decoding, run ``bash RESULTS exp/dnn4_nerv_dnn`` to see the results.
- Plan B: In this plan, we manually convert the trained model back to Kaldi
nnet format, and use Kaldi to decode.
- Create a copy of ``<nerv_home>/nerv/speech/kaldi_io/tools/nerv_to_kaldi.lua``.
- Modify the list named ``lnames`` to list the name of layers you want to
put into the output Kaldi parameter file in order. (You don't actually
need to change for this tutorial) You may ask why the NERV-to-Kaldi
converstion is so cumbersome. This is because Kaldi nnet is a special
case of more general NERV toolkit -- it only allows stacked DNNs and
therefore Kaldi-to-NERV conversion is lossless but the other direction is
not. Your future NERV network may have multiple branches and that's why
you need to specify how to select and "stack" your layers in the Kaldi
parameter output.
- Do the conversion by:
::
<nerv_home>/install/bin/nerv --use-cpu nerv_to_kaldi.lua timit_mybaseline.lua <your_trained_params>.nerv <path_to_converted>.nnet
- Finally, locate the directory of stage 2:
``<timit_home>/exp/dnn4_pretrain-dbn_dnn`` and temporarily change the
symbolic link for the final network file to the converted one:
::
cd <timit_home>/exp/dnn4_pretrain-dbn_dnn
mv final.nnet final.nnet.orig
ln -sv <path_to_converted>.nnet final.nnet
Then proceed a normal Kaldi decoding.
|