Sphinx武林秘籍(上)

Sphinx武林秘籍(上)

一、     使用平台

Windows XP、VMware workstation+ Ubuntu10.10

(1) Soundrecorder 测试下能否使用

(2) sudo apt-get install libasound2-dev

二、     CMUSphinx语音识别工具包

Pocketsphinx — 用C语言编写的轻量级识别库

Sphinxbase — Pocketsphinx所需要的支持库

Sphinx3 — 为语音识别研究用C语言编写的解码器

CMUclmtk — 语言模型工具

Sphinxtrain — 声学模型训练工具

下载网址：http://sourceforge.net/projects/cmusphinx/files/

以上对应所使用的版本如下：

pocketsphinx-0.6.1（pocketsphinx_0.6.1-1.tar.gz）

sphinxbase-0.6.1（sphinxbase-0.6.1.tar.gz）

sphinx3-0.8（sphinx3-0.8.tar.bz2）

cmuclmtk（cmusphinx-cmuclmtk.tar.gz）

SphinxTrain-1.0（SphinxTrain-1.0.tar.bz2）

三、     安装pocketsphinx

由于pocketsphinx依赖于另外一个库Sphinxbase,所以先需要安装Sphinxbase。

(1)安装Sphinxbase

tar xzf sphinxbase-0.6.1.tar.gz

cd sphinxbase-0.6

./configure

make

sudo make install

默认安装在/usr/local/bin下面，ls可查看。

(2)安装pocketsphinx

export LD_LIBRARY_PATH=/usr/local/lib

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

cd pocketsphinx-0.6.1

./configure

make

sudo make install

完成安装,在/usr/local/bin下面可以看到三个新生成的文件，

cd /usr/local/bin

ls

pocketsphinx_batch

pocketsphinx_continuous

pocketsphinx_mdef_convert

   测试下安装结果

       pocketsphinx_continuous

   若出现如下信息，说明安装成功。

INFO: cmd_ln.c(512): Parsing command line:

pocketsphinx_continuous

Current configuration:

[NAME]              [DEFLT]             [VALUE]

-adcdev

-agc              none             none

-agcthresh     2.0         2.000000e+00

-alpha           0.97              9.700000e-01

-argfile

-ascale          20.0              2.000000e+01

-backtrace     no          no

-beam           1e-48            1.000000e-48

-bestpath      yes         yes

-bestpathlw 9.5        9.500000e+00

-bghist          no          no

-ceplen          13          13

-cmn             current          current

-cmninit 8.0         8.0

………………………………….

…………………………………

………………………………….

INFO: ngram_search_fwdtree.c(333): after: 457 root, 13300 non-root channels, 26 single-phone words

INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25

Warning: Could not find Mic element

INFO: continuous.c(261): pocketsphinx_continuous COMPILED ON: Feb 21 2011, AT: 22:31:47

READY....

四、     建立一个简单的语言模型

(1)创建一个语料库

vi corpus.txt

输入如下内容：

stop

forward

backward

turn right

turn left

     保存退出

(2)利用在线工具LMTool建立语言模型

进入网址：http://www.speech.cs.cmu.edu/tools/lmtool.html

点击Browse按钮,选择之前创建的corpus.txt, 最后点击COMPILE KNOWLEDGE BASE 。

  生成TAR2916.tar.gz

tar xzf TAR2916.tar.gz

2916.corpus 2916.lm    2916.sent.arpabo 2916.vocab

2916.dic     2916.sent 2916.token

真正有用的是.dic、.lm 的文件

(3)测试结果

pocketsphinx_continuous -lm 2916.lm -dict 2916.dic

INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 1 words

INFO: ngram_search_fwdflat.c(912):       97 words recognized (2/fr)

INFO: ngram_search_fwdflat.c(914):     2342 senones evaluated (38/fr)

INFO: ngram_search_fwdflat.c(916):     1011 channels searched (16/fr)

INFO: ngram_search_fwdflat.c(918):      167 words searched (2/fr)

INFO: ngram_search_fwdflat.c(920):       47 word transitions (0/fr)

WARNING: "ngram_search.c", line 1087: </s> not found in last frame, using <sil> instead

INFO: ngram_search.c(1137): lattice start node <s>.0 end node <sil>.56

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(<sil>:56:60) = -341653

INFO: ps_lattice.c(1266): Joint P(O,S) = -341653 P(S|O) = 0

000000000: STOP (-6531224)

READY....

Listening...

Stopped listening, please wait...

INFO: cmn_prior.c(121): cmn_prior_update: from < 37.45 -1.28 -0.16 -0.71 0.19 -0.19 -0.07 0.34 0.13 -0.07 -0.03 -0.42 0.19 >

INFO: cmn_prior.c(139): cmn_prior_update: to   < 42.22 -0.51 -0.35 -0.28 -0.24 -0.37 0.02 0.38 0.03 -0.05 0.10 -0.32 0.05 >

INFO: ngram_search_fwdtree.c(1513):      847 words recognized (9/fr)

INFO: ngram_search_fwdtree.c(1515):    11452 senones evaluated (123/fr)

INFO: ngram_search_fwdtree.c(1517):     4963 channels searched (53/fr), 534 1st, 3470 last

INFO: ngram_search_fwdtree.c(1521):     1094 words for which last channels evaluated (11/fr)

INFO: ngram_search_fwdtree.c(1524):      203 candidate words for entering last phone (2/fr)

INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 2 words

INFO: ngram_search_fwdflat.c(912):      225 words recognized (2/fr)

INFO: ngram_search_fwdflat.c(914):    10189 senones evaluated (110/fr)

INFO: ngram_search_fwdflat.c(916):     5206 channels searched (55/fr)

INFO: ngram_search_fwdflat.c(918):      329 words searched (3/fr)

INFO: ngram_search_fwdflat.c(920):      164 word transitions (1/fr)

WARNING: "ngram_search.c", line 1087: </s> not found in last frame, using RIGHT instead

INFO: ngram_search.c(1137): lattice start node <s>.0 end node RIGHT.48

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(RIGHT:48:91) = -647142

INFO: ps_lattice.c(1266): Joint P(O,S) = -647271 P(S|O) = -129

000000001: TURN RIGHT (-12643528)

READY....

注意：此方法不可用于中文命令词建立语言模型

五、     利用现有的语言模型和声学模型

(1)下载Mandarin language and acoustic model

下载网址：http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/

Mandarin language model : zh_broadcastnews_64000_utf8.DMP、zh_broadcastnews_utf8.dic

Mandarin Broadcast News acoustic models : zh_broadcastnews_16k_ptm256_8000.tar.bz2

tar xjf zh_broadcastnews_16k_ptm256_8000.tar.bz2

cd zh_broadcastnews_16k_ptm256_8000

ls

feat.params means            noisedict transition_matrices

mdef         mixture_weights sendump    variances

上面这些文件为声学模型中所拥有的文件。

把zh_broadcastnews_64000_utf8.DMP、zh_broadcastnews_utf8.dic、zh_broadcastnews_16k_ptm256_8000、pocketsphinx_continuous放在同一个目录下面，然后就可以使用模型了。

(2)测试结果

huang@ubuntu:/usr/local/bin$ pocketsphinx_continuous -hmm zh_broadcastnews_ptm256_8000 -lm zh_broadcastnews_64000_utf8.DMP -dict zh_broadcastnews_utf8.dic

       -lowerf 133.33334 \

       -upperf 6855.4976 \

       -nfft 512 \

       -wlen 0.0256 \

       -transform legacy \

       -feat s2_4x \

       -agc none \

       -cmn current \

       -varnorm no

Current configuration:

[NAME]        [DEFLT]        [VALUE]

-agc        none              none

-agcthresh       2.0          2.000000e+00

-alpha            0.97        9.700000e-01

-ceplen           13           13

-cmn              current           current

-cmninit 8.0          8.0

-dither            no           no

-doublebw      no           no

-feat        1s_c_d_dd      s2_4x

-frate             100         100

-input_endian little        little

-lda

-ldadim          0            0

-lifter             0            0

-logspec no           no

-lowerf           133.33334      1.333333e+02

-ncep             13           13

-nfft        512         512

-nfilt              40           40

-remove_dc    no           no

-round_filters yes          yes

-samprate       16000            1.600000e+04

-seed              -1           -1

-smoothspec    no           no

-svspec

-transform      legacy            legacy

-unit_area       yes          yes

-upperf           6855.4976      6.855498e+03

-varnorm no           no

-verbose no           no

-warp_params

-warp_type     inverse_linear inverse_linear

-wlen             0.025625 2.560000e-02

…………………………….

…………………………….

……………………………

INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 75539

INFO: ngram_search_fwdtree.c(333): after: 461 root, 75411 non-root channels, 27 single-phone words

INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25

Warning: Could not find Mic element

INFO: continuous.c(261): pocketsphinx_continuous COMPILED ON: Feb 21 2011, AT: 22:31:47

READY....

Listening...

Stopped listening, please wait...

INFO: cmn_prior.c(121): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >

INFO: cmn_prior.c(139): cmn_prior_update: to   < 9.20 -0.17 -0.27 -0.29 -0.38 -0.05 -0.08 -0.15 -0.12 -0.15 0.13 -0.08 -0.07 >

INFO: ngram_search_fwdtree.c(1513):     2628 words recognized (45/fr)

INFO: ngram_search_fwdtree.c(1515):   228830 senones evaluated (3878/fr)

INFO: ngram_search_fwdtree.c(1517):   506870 channels searched (8591/fr), 25129 1st, 119738 last

INFO: ngram_search_fwdtree.c(1521):     7773 words for which last channels evaluated (131/fr)

INFO: ngram_search_fwdtree.c(1524):   146203 candidate words for entering last phone (2478/fr)

INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 137 words

INFO: ngram_search_fwdflat.c(912):     1906 words recognized (32/fr)

INFO: ngram_search_fwdflat.c(914):    71680 senones evaluated (1215/fr)

INFO: ngram_search_fwdflat.c(916):   134571 channels searched (2280/fr)

INFO: ngram_search_fwdflat.c(918):     7855 words searched (133/fr)

INFO: ngram_search_fwdflat.c(920):     6388 word transitions (108/fr)

WARNING: "ngram_search.c", line 1087: </s> not found in last frame, using 啊 instead

INFO: ngram_search.c(1137): lattice start node <s>.0 end node 啊(4).21

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(啊(4):21:57) = -296641

INFO: ps_lattice.c(1266): Joint P(O,S) = -296641 P(S|O) = 0

000000000: 啊 (-4653851)

READY....

Listening...

Stopped listening, please wait...

INFO: cmn_prior.c(121): cmn_prior_update: from < 9.37 -0.27 -0.29 -0.06 -0.23 -0.13 -0.09 -0.15 -0.08 -0.15 -0.02 -0.07 -0.10 >

INFO: cmn_prior.c(139): cmn_prior_update: to   < 9.31 -0.25 -0.37 -0.08 -0.22 -0.14 -0.08 -0.11 -0.05 -0.13 -0.02 -0.10 -0.12 >

INFO: ngram_search_fwdtree.c(1513):     2368 words recognized (38/fr)

INFO: ngram_search_fwdtree.c(1515):   251689 senones evaluated (4059/fr)

INFO: ngram_search_fwdtree.c(1517):   499391 channels searched (8054/fr), 26703 1st, 127525 last

INFO: ngram_search_fwdtree.c(1521):     7782 words for which last channels evaluated (125/fr)

INFO: ngram_search_fwdtree.c(1524):   181902 candidate words for entering last phone (2933/fr)

INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 106 words

INFO: ngram_search_fwdflat.c(912):     1960 words recognized (32/fr)

INFO: ngram_search_fwdflat.c(914):    40695 senones evaluated (656/fr)

INFO: ngram_search_fwdflat.c(916):   107699 channels searched (1737/fr)

INFO: ngram_search_fwdflat.c(918):     6493 words searched (104/fr)

INFO: ngram_search_fwdflat.c(920):     5071 word transitions (81/fr)

INFO: ngram_search.c(1137): lattice start node <s>.0 end node </s>.50

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(</s>:50:60) = -190357

INFO: ps_lattice.c(1266): Joint P(O,S) = -206492 P(S|O) = -16135

000000002: 二 (-3082778)

READY....

Listening...

Stopped listening, please wait...

INFO: cmn_prior.c(121): cmn_prior_update: from < 9.31 -0.25 -0.37 -0.08 -0.22 -0.14 -0.08 -0.11 -0.05 -0.13 -0.02 -0.10 -0.12 >

INFO: cmn_prior.c(139): cmn_prior_update: to   < 9.26 -0.29 -0.28 0.11 -0.18 -0.16 -0.05 -0.16 -0.04 -0.19 -0.05 -0.10 -0.14 >

INFO: ngram_search_fwdtree.c(1513):     1595 words recognized (18/fr)

INFO: ngram_search_fwdtree.c(1515):   302259 senones evaluated (3358/fr)

INFO: ngram_search_fwdtree.c(1517):   487518 channels searched (5416/fr), 37862 1st, 104395 last

INFO: ngram_search_fwdtree.c(1521):     5835 words for which last channels evaluated (64/fr)

INFO: ngram_search_fwdtree.c(1524):   197251 candidate words for entering last phone (2191/fr)

INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 61 words

INFO: ngram_search_fwdflat.c(912):     1027 words recognized (11/fr)

INFO: ngram_search_fwdflat.c(914):    24680 senones evaluated (274/fr)

INFO: ngram_search_fwdflat.c(916):    65722 channels searched (730/fr)

INFO: ngram_search_fwdflat.c(918):     4251 words searched (47/fr)

INFO: ngram_search_fwdflat.c(920):     2861 word transitions (31/fr)

INFO: ngram_search.c(1137): lattice start node <s>.0 end node </s>.82

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(</s>:82:88) = -275522

INFO: ps_lattice.c(1266): Joint P(O,S) = -277262 P(S|O) = -1740

000000003: 一年 (-4414731)

READY....

Listening...

Stopped listening, please wait...

……………………………

…………………………..

……………………………

INFO: ngram_search_fwdflat.c(920):     6841 word transitions (73/fr)

INFO: ngram_search.c(1137): lattice start node <s>.0 end node </s>.85

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(</s>:85:91) = -261132

INFO: ps_lattice.c(1266): Joint P(O,S) = -278320 P(S|O) = -17188

000000008: 留念 (-3893136)

…………………..

…………………

……………………

INFO: ngram_search.c(1137): lattice start node <s>.0 end node </s>.99

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(</s>:99:105) = -305972

INFO: ps_lattice.c(1266): Joint P(O,S) = -325764 P(S|O) = -19792

000000010: 基民 (-4532446)

…………………..

…………………..

…………………..

INFO: ngram_search_fwdflat.c(920):     5175 word transitions (46/fr)

INFO: ngram_search.c(1137): lattice start node <s>.0 end node </s>.102

INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(</s>:102:110) = -283182

INFO: ps_lattice.c(1266): Joint P(O,S) = -283589 P(S|O) = -407

000000012: 一九八九 (-4134767)

READY....

从以上识别结果可以看出，这普通话语音识别正确率比原来的英文语音识别正确率低了很多，这和我们的口音存在一定的关系，为了能有比较高的普通话语音识别率，最好自已生成语音模型与声学模型。
相关阅读:
非常优秀的iphone学习文章总结！
转载：开源一款酷跑游戏源码完整版
 Faiss学习：一
 揭开Faiss的面纱探究Facebook相似性搜索工具的原理
 集成学习总结 & Stacking方法详解
 转：fastText原理及实践（达观数据王江）
转：ubuntu 下GPU版的 tensorflow / keras的环境搭建
 转：PCA的Python实现
 2017知乎看山杯总结(多标签文本分类)
转：TensorFlow入门（六）双端 LSTM 实现序列标注（分词）
原文地址：https://www.cnblogs.com/einyboy/p/2796965.html