• kaldi的TIMIT实例一


    TIMIT语音库是IT和MIT合作音素级别标注的语音库,用于自动语音识别系统的发展和评估,包括来自美式英语,8个地区方言,630个人。

    每个人读10个句子,每个发音都是音素级别、词级别文本标注,16kHz,16bit。

    注意:不用使用TIMIT配置作为运行Kaldi的一个通用型例子,因为它不是一个非常标准的结构。

    其它的一些配置也是非常好用的。

     --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    librispeech/s5是非常好的,因为它是免费的。

    yesno是非常轻量级、快速运行,而且也是免费的。

    wsj/s5有一个不普遍的例子脚本,这些脚本可能让人感到疑惑的。

     --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    s5: 单音素、3音素 GMM/HMM系统,用ML训练。接着是SGMM和DNN配置。

    基于48个音素完成训练,《Speaker-Independent Phone Recognition Using Hidden Markov Models》。

     --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    执行,修改run.sh里面的timit语音库路径,修改cmd.sh运行的脚本,queue.pl改成本地跑的run.pl,下载安装tools/extra/install_srilm.sh,拷贝irstlm文件夹

    到tools目录下,最后运行run.sh.

    运行完成后,屏幕打印的内容如下:

    ============================================================================
    Data & Lexicon & Language Preparation   数据、词典、语言准备
    ============================================================================
    wav-to-duration --read-entire-file=true scp:train_wav.scp ark,t:train_dur.ark 
    LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:92) Printed duration for 3696 audio files.
    LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:94) Mean duration was 3.06336, min and max durations were 0.91525, 7.78881
    wav-to-duration --read-entire-file=true scp:dev_wav.scp ark,t:dev_dur.ark 
    LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:92) Printed duration for 400 audio files.
    LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:94) Mean duration was 3.08212, min and max durations were 1.09444, 7.43681
    wav-to-duration --read-entire-file=true scp:test_wav.scp ark,t:test_dur.ark 
    LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:92) Printed duration for 192 audio files.
    LOG (wav-to-duration[5.2.124~1396-70748]:main():wav-to-duration.cc:94) Mean duration was 3.03646, min and max durations were 1.30562, 6.21444
    Data preparation succeeded
    LOGFILE:/dev/null
    $bin/ngt -i="$inpfile" -n=$order -gooout=y -o="$gzip -c > $tmpdir/ngram.${sdict}.gz" -fd="$tmpdir/$sdict" $dictionary $additional_parameters >> $logfile 2>&1
    $scr/build-sublm.pl $verbose $prune $prune_thr_str $smoothing "$additional_smoothing_parameters" --size $order --ngrams "$gunzip -c $tmpdir/ngram.${sdict}.gz" -sublm $tmpdir/lm.$sdict $additional_parameters >> $logfile 2>&1
    inpfile: data/local/lm_tmp/lm_phone_bg.ilm.gz
    outfile: /dev/stdout
    loading up to the LM level 1000 (if any)
    dub: 10000000
    OOV code is 50
    OOV code is 50
    Saving in txt format to /dev/stdout
    Dictionary & language model preparation succeeded
    Checking data/local/dict/silence_phones.txt ...
    --> reading data/local/dict/silence_phones.txt
    --> data/local/dict/silence_phones.txt is OK
    
    Checking data/local/dict/optional_silence.txt ...
    --> reading data/local/dict/optional_silence.txt
    --> data/local/dict/optional_silence.txt is OK
    
    Checking data/local/dict/nonsilence_phones.txt ...
    --> reading data/local/dict/nonsilence_phones.txt
    --> data/local/dict/nonsilence_phones.txt is OK
    
    Checking disjoint: silence_phones.txt, nonsilence_phones.txt
    --> disjoint property is OK.
    
    Checking data/local/dict/lexicon.txt
    --> reading data/local/dict/lexicon.txt
    --> data/local/dict/lexicon.txt is OK
    
    Checking data/local/dict/extra_questions.txt ...
    --> reading data/local/dict/extra_questions.txt
    --> data/local/dict/extra_questions.txt is OK
    --> SUCCESS [validating dictionary directory data/local/dict]
    
    **Creating data/local/dict/lexiconp.txt from data/local/dict/lexicon.txt
    fstaddselfloops data/lang/phones/wdisambig_phones.int data/lang/phones/wdisambig_words.int 
    prepare_lang.sh: validating output directory
    utils/validate_lang.pl data/lang
    Checking data/lang/phones.txt ...
    --> data/lang/phones.txt is OK
    
    Checking words.txt: #0 ...
    --> data/lang/words.txt is OK
    
    Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
    --> silence.txt and nonsilence.txt are disjoint
    --> silence.txt and disambig.txt are disjoint
    --> disambig.txt and nonsilence.txt are disjoint
    --> disjoint property is OK
    
    Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
    --> summation property is OK
    
    Checking data/lang/phones/context_indep.{txt, int, csl} ...
    --> 1 entry/entries in data/lang/phones/context_indep.txt
    --> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt
    --> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt
    --> data/lang/phones/context_indep.{txt, int, csl} are OK
    
    Checking data/lang/phones/nonsilence.{txt, int, csl} ...
    --> 47 entry/entries in data/lang/phones/nonsilence.txt
    --> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt
    --> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt
    --> data/lang/phones/nonsilence.{txt, int, csl} are OK
    
    Checking data/lang/phones/silence.{txt, int, csl} ...
    --> 1 entry/entries in data/lang/phones/silence.txt
    --> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt
    --> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt
    --> data/lang/phones/silence.{txt, int, csl} are OK
    
    Checking data/lang/phones/optional_silence.{txt, int, csl} ...
    --> 1 entry/entries in data/lang/phones/optional_silence.txt
    --> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt
    --> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt
    --> data/lang/phones/optional_silence.{txt, int, csl} are OK
    
    Checking data/lang/phones/disambig.{txt, int, csl} ...
    --> 2 entry/entries in data/lang/phones/disambig.txt
    --> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt
    --> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt
    --> data/lang/phones/disambig.{txt, int, csl} are OK
    
    Checking data/lang/phones/roots.{txt, int} ...
    --> 48 entry/entries in data/lang/phones/roots.txt
    --> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt
    --> data/lang/phones/roots.{txt, int} are OK
    
    Checking data/lang/phones/sets.{txt, int} ...
    --> 48 entry/entries in data/lang/phones/sets.txt
    --> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt
    --> data/lang/phones/sets.{txt, int} are OK
    
    Checking data/lang/phones/extra_questions.{txt, int} ...
    --> 2 entry/entries in data/lang/phones/extra_questions.txt
    --> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt
    --> data/lang/phones/extra_questions.{txt, int} are OK
    
    Checking optional_silence.txt ...
    --> reading data/lang/phones/optional_silence.txt
    --> data/lang/phones/optional_silence.txt is OK
    
    Checking disambiguation symbols: #0 and #1
    --> data/lang/phones/disambig.txt has "#0" and "#1"
    --> data/lang/phones/disambig.txt is OK
    
    Checking topo ...
    
    Checking word-level disambiguation symbols...
    --> data/lang/phones/wdisambig.txt exists (newer prepare_lang.sh)
    Checking data/lang/oov.{txt, int} ...
    --> 1 entry/entries in data/lang/oov.txt
    --> data/lang/oov.int corresponds to data/lang/oov.txt
    --> data/lang/oov.{txt, int} are OK
    
    --> data/lang/L.fst is olabel sorted
    --> data/lang/L_disambig.fst is olabel sorted
    --> SUCCESS [validating lang directory data/lang]
    Preparing train, dev and test data
    utils/validate_data_dir.sh: Successfully validated data-directory data/train
    utils/validate_data_dir.sh: Successfully validated data-directory data/dev
    utils/validate_data_dir.sh: Successfully validated data-directory data/test
    Preparing language models for test
    arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang_test_bg/words.txt - data/lang_test_bg/G.fst 
    LOG (arpa2fst[5.2.124~1396-70748]:Read():arpa-file-parser.cc:98) Reading data section.
    LOG (arpa2fst[5.2.124~1396-70748]:Read():arpa-file-parser.cc:153) Reading 1-grams: section.
    LOG (arpa2fst[5.2.124~1396-70748]:Read():arpa-file-parser.cc:153) Reading 2-grams: section.
    WARNING (arpa2fst[5.2.124~1396-70748]:ConsumeNGram():arpa-lm-compiler.cc:313) line 60 [-3.26717<s> <s>] skipped: n-gram has invalid BOS/EOS placement
    LOG (arpa2fst[5.2.124~1396-70748]:RemoveRedundantStates():arpa-lm-compiler.cc:359) Reduced num-states from 50 to 50
    fstisstochastic data/lang_test_bg/G.fst 
    0.000510126 -0.0763018
    utils/validate_lang.pl data/lang_test_bg
    Checking data/lang_test_bg/phones.txt ...
    --> data/lang_test_bg/phones.txt is OK
    
    Checking words.txt: #0 ...
    --> data/lang_test_bg/words.txt is OK
    
    Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...
    --> silence.txt and nonsilence.txt are disjoint
    --> silence.txt and disambig.txt are disjoint
    --> disambig.txt and nonsilence.txt are disjoint
    --> disjoint property is OK
    
    Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...
    --> summation property is OK
    
    Checking data/lang_test_bg/phones/context_indep.{txt, int, csl} ...
    --> 1 entry/entries in data/lang_test_bg/phones/context_indep.txt
    --> data/lang_test_bg/phones/context_indep.int corresponds to data/lang_test_bg/phones/context_indep.txt
    --> data/lang_test_bg/phones/context_indep.csl corresponds to data/lang_test_bg/phones/context_indep.txt
    --> data/lang_test_bg/phones/context_indep.{txt, int, csl} are OK
    
    Checking data/lang_test_bg/phones/nonsilence.{txt, int, csl} ...
    --> 47 entry/entries in data/lang_test_bg/phones/nonsilence.txt
    --> data/lang_test_bg/phones/nonsilence.int corresponds to data/lang_test_bg/phones/nonsilence.txt
    --> data/lang_test_bg/phones/nonsilence.csl corresponds to data/lang_test_bg/phones/nonsilence.txt
    --> data/lang_test_bg/phones/nonsilence.{txt, int, csl} are OK
    
    Checking data/lang_test_bg/phones/silence.{txt, int, csl} ...
    --> 1 entry/entries in data/lang_test_bg/phones/silence.txt
    --> data/lang_test_bg/phones/silence.int corresponds to data/lang_test_bg/phones/silence.txt
    --> data/lang_test_bg/phones/silence.csl corresponds to data/lang_test_bg/phones/silence.txt
    --> data/lang_test_bg/phones/silence.{txt, int, csl} are OK
    
    Checking data/lang_test_bg/phones/optional_silence.{txt, int, csl} ...
    --> 1 entry/entries in data/lang_test_bg/phones/optional_silence.txt
    --> data/lang_test_bg/phones/optional_silence.int corresponds to data/lang_test_bg/phones/optional_silence.txt
    --> data/lang_test_bg/phones/optional_silence.csl corresponds to data/lang_test_bg/phones/optional_silence.txt
    --> data/lang_test_bg/phones/optional_silence.{txt, int, csl} are OK
    
    Checking data/lang_test_bg/phones/disambig.{txt, int, csl} ...
    --> 2 entry/entries in data/lang_test_bg/phones/disambig.txt
    --> data/lang_test_bg/phones/disambig.int corresponds to data/lang_test_bg/phones/disambig.txt
    --> data/lang_test_bg/phones/disambig.csl corresponds to data/lang_test_bg/phones/disambig.txt
    --> data/lang_test_bg/phones/disambig.{txt, int, csl} are OK
    
    Checking data/lang_test_bg/phones/roots.{txt, int} ...
    --> 48 entry/entries in data/lang_test_bg/phones/roots.txt
    --> data/lang_test_bg/phones/roots.int corresponds to data/lang_test_bg/phones/roots.txt
    --> data/lang_test_bg/phones/roots.{txt, int} are OK
    
    Checking data/lang_test_bg/phones/sets.{txt, int} ...
    --> 48 entry/entries in data/lang_test_bg/phones/sets.txt
    --> data/lang_test_bg/phones/sets.int corresponds to data/lang_test_bg/phones/sets.txt
    --> data/lang_test_bg/phones/sets.{txt, int} are OK
    
    Checking data/lang_test_bg/phones/extra_questions.{txt, int} ...
    --> 2 entry/entries in data/lang_test_bg/phones/extra_questions.txt
    --> data/lang_test_bg/phones/extra_questions.int corresponds to data/lang_test_bg/phones/extra_questions.txt
    --> data/lang_test_bg/phones/extra_questions.{txt, int} are OK
    
    Checking optional_silence.txt ...
    --> reading data/lang_test_bg/phones/optional_silence.txt
    --> data/lang_test_bg/phones/optional_silence.txt is OK
    
    Checking disambiguation symbols: #0 and #1
    --> data/lang_test_bg/phones/disambig.txt has "#0" and "#1"
    --> data/lang_test_bg/phones/disambig.txt is OK
    
    Checking topo ...
    
    Checking word-level disambiguation symbols...
    --> data/lang_test_bg/phones/wdisambig.txt exists (newer prepare_lang.sh)
    Checking data/lang_test_bg/oov.{txt, int} ...
    --> 1 entry/entries in data/lang_test_bg/oov.txt
    --> data/lang_test_bg/oov.int corresponds to data/lang_test_bg/oov.txt
    --> data/lang_test_bg/oov.{txt, int} are OK
    
    --> data/lang_test_bg/L.fst is olabel sorted
    --> data/lang_test_bg/L_disambig.fst is olabel sorted
    --> data/lang_test_bg/G.fst is ilabel sorted
    --> data/lang_test_bg/G.fst has 50 states
    fstdeterminizestar data/lang_test_bg/G.fst /dev/null 
    --> data/lang_test_bg/G.fst is determinizable
    --> utils/lang/check_g_properties.pl successfully validated data/lang_test_bg/G.fst
    --> utils/lang/check_g_properties.pl succeeded.
    --> Testing determinizability of L_disambig . G
    fstdeterminizestar 
    fsttablecompose data/lang_test_bg/L_disambig.fst data/lang_test_bg/G.fst 
    --> L_disambig . G is determinizable
    --> SUCCESS [validating lang directory data/lang_test_bg]
    Succeeded in formatting data.
    ============================================================================
    ============================================================================
             MFCC Feature Extration & CMVN for Training and Test set          
    ============================================================================
    steps/make_mfcc.sh --cmd run.pl --mem 4G --nj 10 data/train exp/make_mfcc/train mfcc
    utils/validate_data_dir.sh: Successfully validated data-directory data/train
    steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
    Succeeded creating MFCC features for train
    steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc
    Succeeded creating CMVN stats for train
    steps/make_mfcc.sh --cmd run.pl --mem 4G --nj 10 data/dev exp/make_mfcc/dev mfcc
    utils/validate_data_dir.sh: Successfully validated data-directory data/dev
    steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
    Succeeded creating MFCC features for dev
    steps/compute_cmvn_stats.sh data/dev exp/make_mfcc/dev mfcc
    Succeeded creating CMVN stats for dev
    steps/make_mfcc.sh --cmd run.pl --mem 4G --nj 10 data/test exp/make_mfcc/test mfcc
    utils/validate_data_dir.sh: Successfully validated data-directory data/test
    steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
    Succeeded creating MFCC features for test
    steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc
    Succeeded creating CMVN stats for test
    ============================================================================
  • 相关阅读:
    ios 开发证书制作
    iOS UILable 高度自适
    asp 中创建日志打印文件夹
    ios iphone、ipad启动画面尺寸
    ios 更改UITableview中Section的字体颜色
    Azure Blob 存储简介
    java追加文件
    java读取文件
    DNS原理及其解析过程
    单点登录原理与简单实现
  • 原文地址:https://www.cnblogs.com/welen/p/7525137.html
Copyright © 2020-2023  润新知