Kaldi的data目录解析 - 润新知

Kaldi的data目录解析
- data/test
# things in data/*test* and data/*train*

cmvn.scp # Start point of every audio feature at ark file(binary) after CMVN(Cepstral Mean and Variance Normalization)

feats.scp # Start point of every audio feature at ark file(binary)

spk2utt # speaker name to his(her) utterance

text # Audio names and corresponding text

utt2spk # utterance to its speaker name

wav.scp # Location of every audio

/splitN # A directory used to split task into N parts, each part has splited things above
- data/lang
# thing in data/*lang*, which is a language directory
- /phones
  - align_lexicon.txt
    
    WORD WORD PRONUNCIATION
    
    e.g.
    
    HI HI HH_B AY_E
/tmp

G.fst # Grammar's finite state transducer

L.fst # Lexicon's finite state transducer

L_disambig.fst # disambigous lexicon's finite state transducer

oov.int # IDs of out of vocabulary phones

oov.txt # out of vocabulary phones

phones.txt # phones existed in words.txt file

topo #

words.txt # a word list of each word with its ID, the word existed in text file
相关阅读:
python爬虫基础（requests、BeautifulSoup）
python中字典按键、值进行排序
 进程和线程的区别
 MySQL中的索引
 python中浅拷贝和深拷贝的区别
 谈谈final、finally、finalize的区别
 python中布尔值是false
生成器的阐释
 文件处理
 内置函数
原文地址：https://www.cnblogs.com/JarvanWang/p/7499597.html

最新文章
MySQL基础
 常用rides命令
 rabbitmq
GIL与线程互斥锁
 循环
 数据类型与参数
 包，模块
 线程，进程
 异常与断言
 Python 编码

Copyright © 2020-2023 润新知