• Sphinx语音识别学习记录 (三)小范围语音英文识别


    CMUSphinx系列目录
    http://www.cnblogs.com/yin52133/archive/2012/06/21/2557219.html - (一)基本运行测试
    
    http://www.cnblogs.com/yin52133/archive/2012/07/12/2587282.html - (二)自然语言处理原理研究
    
    http://www.cnblogs.com/yin52133/archive/2012/07/12/2587419.html - (三)小范围语音英文识别
    
    http://www.cnblogs.com/yin52133/archive/2012/07/12/2588201.html - (四)小范围语音中文识别
    
    http://www.cnblogs.com/yin52133/archive/2012/06/22/2558806.html - (五)错误调试
    
    http://www.cnblogs.com/yin52133/archive/2012/07/12/2588418.html - (六)我的目标和几个想像的方案(闲置中)

    那我们该如何提高准确率呢?

    根据第四章的分析,我们需要建立好一点的语音模型,而好一点的语音模型需要几个句子或者几个单词组合类型做出来的

    因为我们统计的概率就是连续的单词,出现的概率和,出现某个单词后接着出现另外的单词的概率

    语言模型的建立和使用可以参考http://cmusphinx.sourceforge.net/wiki/tutoriallm

    为了说明

    我重新做了一个文本

    4906.txt

    open browser
    open music
    open note
    close window
    close music

    然后直接用http://www.speech.cs.cmu.edu/tools/lmtool.html这个在线工具,生成lm文件和dic文件

    然后声学模型使用默认的hub4wsj_sc_8k

    直接用pocketsphinx_continuous调用

    pocketsphinx_continuous -hmm hub4wsj_sc_8k -lm 4906.lm -dict 4906.dic 

    以下是测试结果

    000000010: CLOSE WINDOW
    READY....
    Listening...
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 52.52  2.30  0.38  0.74 -0.22 -
    0.36 -0.25  0.07  0.17 -0.05  0.12 -0.41 -0.05 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 52.17  2.29  0.39  0.77 -0.19 -
    0.35 -0.23  0.08  0.17 -0.04  0.13 -0.39 -0.04 >
    INFO: ngram_search_fwdtree.c(1549):      822 words recognized (7/fr)
    INFO: ngram_search_fwdtree.c(1551):    14143 senones evaluated (124/fr)
    INFO: ngram_search_fwdtree.c(1553):     6385 channels searched (56/fr), 572 1st,
     4781 last
    INFO: ngram_search_fwdtree.c(1557):     1117 words for which last channels evalu
    ated (9/fr)
    INFO: ngram_search_fwdtree.c(1560):      135 candidate words for entering last p
    hone (1/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.05 CPU 0.041 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.02 wall 1.768 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 8 words
    INFO: ngram_search_fwdflat.c(940):      177 words recognized (2/fr)
    INFO: ngram_search_fwdflat.c(942):    13906 senones evaluated (122/fr)
    INFO: ngram_search_fwdflat.c(944):     7497 channels searched (65/fr)
    INFO: ngram_search_fwdflat.c(946):      546 words searched (4/fr)
    INFO: ngram_search_fwdflat.c(948):      363 word transitions (3/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.03 CPU 0.027 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.018 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.103
    INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1386): Lattice has 35 nodes, 37 links
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:103:112) = -716626
    INFO: ps_lattice.c(1390): Joint P(O,S) = -721218 P(S|O) = -4592
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.002 xRT
    000000011: CLOSE MUSIC
    READY....
    Listening...
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 52.17  2.29  0.39  0.77 -0.19 -
    0.35 -0.23  0.08  0.17 -0.04  0.13 -0.39 -0.04 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 52.13  2.48  0.07  0.71 -0.04 -
    0.31 -0.25  0.16  0.18 -0.05  0.03 -0.37 -0.08 >
    INFO: ngram_search_fwdtree.c(1549):      724 words recognized (6/fr)
    INFO: ngram_search_fwdtree.c(1551):    14052 senones evaluated (117/fr)
    INFO: ngram_search_fwdtree.c(1553):     5970 channels searched (49/fr), 567 1st,
     4580 last
    INFO: ngram_search_fwdtree.c(1557):     1153 words for which last channels evalu
    ated (9/fr)
    INFO: ngram_search_fwdtree.c(1560):       88 candidate words for entering last p
    hone (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.02 CPU 0.013 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.01 wall 1.675 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 7 words
    INFO: ngram_search_fwdflat.c(940):      152 words recognized (1/fr)
    INFO: ngram_search_fwdflat.c(942):    11290 senones evaluated (94/fr)
    INFO: ngram_search_fwdflat.c(944):     5553 channels searched (46/fr)
    INFO: ngram_search_fwdflat.c(946):      527 words searched (4/fr)
    INFO: ngram_search_fwdflat.c(948):      320 word transitions (2/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.013 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.015 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.107
    INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1386): Lattice has 30 nodes, 12 links
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:107:118) = -677028
    INFO: ps_lattice.c(1390): Joint P(O,S) = -677028 P(S|O) = 0
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.002 xRT
    000000012: OPEN BROWSER
    READY....
    Listening...
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 52.13  2.48  0.07  0.71 -0.04 -
    0.31 -0.25  0.16  0.18 -0.05  0.03 -0.37 -0.08 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 51.56  2.26  0.20  0.84 -0.14 -
    0.35 -0.22  0.12  0.18 -0.03  0.08 -0.42 -0.04 >
    INFO: ngram_search_fwdtree.c(1549):      787 words recognized (7/fr)
    INFO: ngram_search_fwdtree.c(1551):    13726 senones evaluated (117/fr)
    INFO: ngram_search_fwdtree.c(1553):     5723 channels searched (48/fr), 625 1st,
     4153 last
    INFO: ngram_search_fwdtree.c(1557):     1222 words for which last channels evalu
    ated (10/fr)
    INFO: ngram_search_fwdtree.c(1560):       94 candidate words for entering last p
    hone (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.03 CPU 0.027 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.04 wall 1.746 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 6 words
    INFO: ngram_search_fwdflat.c(940):      211 words recognized (2/fr)
    INFO: ngram_search_fwdflat.c(942):    11139 senones evaluated (95/fr)
    INFO: ngram_search_fwdflat.c(944):     5235 channels searched (44/fr)
    INFO: ngram_search_fwdflat.c(946):      497 words searched (4/fr)
    INFO: ngram_search_fwdflat.c(948):      281 word transitions (2/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.013 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.01 wall 0.005 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.105
    INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1386): Lattice has 43 nodes, 14 links
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:105:115) = -663256
    INFO: ps_lattice.c(1390): Joint P(O,S) = -663256 P(S|O) = 0
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.001 xRT
    000000013: OPEN MUSIC
    READY....
    Listening...
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 51.35  2.26  0.23  0.79 -0.10 -
    0.33 -0.25  0.15  0.18 -0.01  0.06 -0.42 -0.04 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 50.94  2.14  0.22  0.80 -0.16 -
    0.34 -0.20  0.14  0.18 -0.00  0.07 -0.44 -0.02 >
    INFO: ngram_search_fwdtree.c(1549):      656 words recognized (7/fr)
    INFO: ngram_search_fwdtree.c(1551):    11822 senones evaluated (119/fr)
    INFO: ngram_search_fwdtree.c(1553):     5069 channels searched (51/fr), 541 1st,
     3713 last
    INFO: ngram_search_fwdtree.c(1557):     1023 words for which last channels evalu
    ated (10/fr)
    INFO: ngram_search_fwdtree.c(1560):       84 candidate words for entering last p
    hone (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.03 CPU 0.032 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 1.89 wall 1.908 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 6 words
    INFO: ngram_search_fwdflat.c(940):      160 words recognized (2/fr)
    INFO: ngram_search_fwdflat.c(942):    11640 senones evaluated (118/fr)
    INFO: ngram_search_fwdflat.c(944):     5898 channels searched (59/fr)
    INFO: ngram_search_fwdflat.c(946):      437 words searched (4/fr)
    INFO: ngram_search_fwdflat.c(948):      263 word transitions (2/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.016 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.018 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.90
    INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1386): Lattice has 42 nodes, 12 links
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:90:97) = -566632
    INFO: ps_lattice.c(1390): Joint P(O,S) = -566744 P(S|O) = -112
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.002 xRT
    000000014: OPEN NOTE
    READY....
    Listening...
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 50.94  2.14  0.22  0.80 -0.16 -
    0.34 -0.20  0.14  0.18 -0.00  0.07 -0.44 -0.02 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 50.90  2.33  0.24  0.59 -0.04 -
    0.31 -0.26  0.20  0.18 -0.01  0.04 -0.48 -0.01 >
    INFO: ngram_search_fwdtree.c(1549):      533 words recognized (5/fr)
    INFO: ngram_search_fwdtree.c(1551):    13409 senones evaluated (133/fr)
    INFO: ngram_search_fwdtree.c(1553):     5722 channels searched (56/fr), 572 1st,
     4236 last
    INFO: ngram_search_fwdtree.c(1557):     1096 words for which last channels evalu
    ated (10/fr)
    INFO: ngram_search_fwdtree.c(1560):      129 candidate words for entering last p
    hone (1/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.03 CPU 0.031 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 1.86 wall 1.838 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 7 words
    INFO: ngram_search_fwdflat.c(940):      166 words recognized (2/fr)
    INFO: ngram_search_fwdflat.c(942):    14460 senones evaluated (143/fr)
    INFO: ngram_search_fwdflat.c(944):     7607 channels searched (75/fr)
    INFO: ngram_search_fwdflat.c(946):      542 words searched (5/fr)
    INFO: ngram_search_fwdflat.c(948):      336 word transitions (3/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.015 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.017 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.91
    INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1386): Lattice has 35 nodes, 12 links
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:91:99) = -650418
    INFO: ps_lattice.c(1390): Joint P(O,S) = -650418 P(S|O) = 0
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.001 xRT
    000000015: OPEN WINDOW
    READY....
    Listening...
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 50.90  2.33  0.24  0.59 -0.04 -
    0.31 -0.26  0.20  0.18 -0.01  0.04 -0.48 -0.01 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 50.80  2.08  0.32  0.79 -0.16 -
    0.38 -0.21  0.20  0.21 -0.00  0.08 -0.47 -0.01 >
    INFO: ngram_search_fwdtree.c(1549):      861 words recognized (7/fr)
    INFO: ngram_search_fwdtree.c(1551):    15363 senones evaluated (125/fr)
    INFO: ngram_search_fwdtree.c(1553):     6943 channels searched (56/fr), 614 1st,
     5227 last
    INFO: ngram_search_fwdtree.c(1557):     1227 words for which last channels evalu
    ated (9/fr)
    INFO: ngram_search_fwdtree.c(1560):      134 candidate words for entering last p
    hone (1/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.06 CPU 0.051 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.11 wall 1.720 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 7 words
    INFO: ngram_search_fwdflat.c(940):      225 words recognized (2/fr)
    INFO: ngram_search_fwdflat.c(942):    12072 senones evaluated (98/fr)
    INFO: ngram_search_fwdflat.c(944):     6521 channels searched (53/fr)
    INFO: ngram_search_fwdflat.c(946):      561 words searched (4/fr)
    INFO: ngram_search_fwdflat.c(948):      333 word transitions (2/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.013 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.014 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.111
    INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1386): Lattice has 42 nodes, 43 links
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:111:121) = -702331
    INFO: ps_lattice.c(1390): Joint P(O,S) = -707956 P(S|O) = -5625
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.003 xRT
    000000016: CLOSE MUSIC
    READY....
    Listening...
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 50.44  2.00  0.30  0.77 -0.17 -
    0.37 -0.22  0.23  0.22 -0.01  0.09 -0.45 -0.02 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 51.19  2.05  0.42  0.55 -0.13 -
    0.39 -0.26  0.22  0.19 -0.00  0.09 -0.50 -0.04 >
    INFO: ngram_search_fwdtree.c(1549):      786 words recognized (7/fr)
    INFO: ngram_search_fwdtree.c(1551):    14040 senones evaluated (119/fr)
    INFO: ngram_search_fwdtree.c(1553):     6064 channels searched (51/fr), 649 1st,
     4340 last
    INFO: ngram_search_fwdtree.c(1557):     1260 words for which last channels evalu
    ated (10/fr)
    INFO: ngram_search_fwdtree.c(1560):      141 candidate words for entering last p
    hone (1/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.03 CPU 0.026 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.08 wall 1.760 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 8 words
    INFO: ngram_search_fwdflat.c(940):      213 words recognized (2/fr)
    INFO: ngram_search_fwdflat.c(942):    12917 senones evaluated (109/fr)
    INFO: ngram_search_fwdflat.c(944):     6890 channels searched (58/fr)
    INFO: ngram_search_fwdflat.c(946):      601 words searched (5/fr)
    INFO: ngram_search_fwdflat.c(948):      359 word transitions (3/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.013 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.01 wall 0.012 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.108
    INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1386): Lattice has 40 nodes, 32 links
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:108:116) = -682573
    INFO: ps_lattice.c(1390): Joint P(O,S) = -686913 P(S|O) = -4340
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.002 xRT
    000000017: CLOSE WINDOW
    READY....
    Listening...
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 51.19  2.05  0.42  0.55 -0.13 -
    0.39 -0.26  0.22  0.19 -0.00  0.09 -0.50 -0.04 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 51.03  2.23  0.53  0.47 -0.05 -
    0.38 -0.27  0.29  0.19 -0.01  0.07 -0.47 -0.05 >
    INFO: ngram_search_fwdtree.c(1549):      874 words recognized (7/fr)
    INFO: ngram_search_fwdtree.c(1551):    15967 senones evaluated (133/fr)
    INFO: ngram_search_fwdtree.c(1553):     7237 channels searched (60/fr), 693 1st,
     5296 last
    INFO: ngram_search_fwdtree.c(1557):     1305 words for which last channels evalu
    ated (10/fr)
    INFO: ngram_search_fwdtree.c(1560):      207 candidate words for entering last p
    hone (1/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.02 CPU 0.013 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.08 wall 1.735 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 7 words
    INFO: ngram_search_fwdflat.c(940):      292 words recognized (2/fr)
    INFO: ngram_search_fwdflat.c(942):    16616 senones evaluated (138/fr)
    INFO: ngram_search_fwdflat.c(944):     9007 channels searched (75/fr)
    INFO: ngram_search_fwdflat.c(946):      624 words searched (5/fr)
    INFO: ngram_search_fwdflat.c(948):      334 word transitions (2/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.02 CPU 0.013 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.02 wall 0.020 xRT
    INFO: ngram_search.c(1253): lattice start node <s>.0 end node </s>.107
    INFO: ngram_search.c(1281): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1386): Lattice has 38 nodes, 33 links
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:107:118) = -797261
    INFO: ps_lattice.c(1390): Joint P(O,S) = -805533 P(S|O) = -8272
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.003 xRT
    000000018: CLOSE NOTE
    

    结果发现准确率立马提高到90%以上了。。。

    而且我的一开始文本库是

    open browser
    open music
    open note
    close window
    close music

    然后我测试的时候想测试下效果读了下open window和close note ,他竟然都准确识别出来了

    不过正常口音下准确率虽然很高,但是你如果故意拖长发音那还是会识别不准确的

    比如我拖长音节将近5秒读了 opennote 结果是

     000000020: CLOSE OPEN NOTE OPEN NOTE

    为什么能提高这么多准确率,就是跟统计模型的识别方法有关

    记得它经过运算获取到两个连续的单词的出现的概率后,再计算相对频度

    而我以前傻傻测试的单个单词族,因为没有什么相对频度之类的,都是单个单词

    组合读取后也没有前后关系统计数据,所以准确率就很低了,只能靠dic的发音匹配

    文章出处:http://www.cnblogs.com/yin52133/ 本文可自行转载,但转载时记得给出原文链接
  • 相关阅读:
    intersect参数
    创建图层只是保存lyr,此路不通
    点在线上
    GPS点和底图叠加
    Ifeature.set_value(index,value)怎么没结果,请高手指点
    GP的输入参数
    关于调用ArcGIS中GP工具.Erase、SymDiff
    ArcEngine 导出图层(shp)
    转载 高效实用的异或操作
    判断一个整数是否是奇数的小解
  • 原文地址:https://www.cnblogs.com/yin52133/p/2587419.html
Copyright © 2020-2023  润新知