• Python自然语言处理学习笔记(15):2.7 Further Reading 深入阅读


    转载请注明出处一块努力的牛皮糖”:http://www.cnblogs.com/yuxc/

    新手上路,翻译不恰之处,恳请指出,不胜感谢 

    2.7 Further Reading 深入阅读

     

    Extra materials for this chapter are posted at http://www.nltk.org/ , including links to freely available resources on the Web. The corpus methods are summarized in the Corpus HOWTO, at http://www.nltk.org/howto , and documented extensively in the online API documentation.

    Significant sources of published corpora are the Linguistic Data Consortium (LDC) and the European Language Resources Agency (ELRA). Hundreds of annotated text and speech corpora are available in dozens of languages. Non-commercial licenses permit the data to be used in teaching and research. For some corpora, commercial licenses are also available (but for a higher fee).

     

    These and many other language resources have been documented using OLAC Metadata, and can be searched via the OLAC home page at http://www.language-archives.org/.Corpora List (see http://gandalf.aksis.uib.no/corpora/sub.html ) is a mailing list for discussions about corpora, and you can find resources by searching the list archives or posting to the list. The most complete inventory of the world’s languages is Ethnologue, http://www.ethnologue.com/ . Of 7,000 languages, only a few dozen have substantial digital resources suitable for use in NLP.

     

    This chapter has touched on the field of Corpus Linguistics(语料库语言学). Other useful books in this area include (Biber, Conrad, & Reppen, 1998), (McEnery, 2006), (Meyer, 2002), (Sampson & McCarthy, 2005), and (Scott & Tribble, 2006). Further readings in quantitative data analysis in linguistics are: (Baayen, 2008), (Gries, 2009), and (Woods, Fletcher, & Hughes, 1986).

    The original description of WordNet is (Fellbaum, 1998). Although WordNet was originally developed for research in psycholinguistics, it is now widely used in NLP and Information Retrieval. WordNets are being developed for many other languages, as documented at http://www.globalwordnet.org/ . For a study of WordNet similarity measures, see (Budanitsky & Hirst, 2006).

    Other topics touched on in this chapter were phonetics and lexical semantics, and we refer readers to Chapters 7 and 20 of (Jurafsky & Martin, 2008).

    None
  • 相关阅读:
    js 去除金额的千位分隔符
    vue中的iviewUI导出1W条列表数据每次只导出2000条的逻辑
    js取整数、取余数的方法
    http协议
    vue 项目安装sass的依赖包
    浅析vue的双向数据绑定
    闭包
    Top 20 NuGet packages for captcha
    IIS URL Rewrite Module的防盗链规则设置
    IIS URL Rewrite – Installation and Use
  • 原文地址:https://www.cnblogs.com/yuxc/p/2129038.html
Copyright © 2020-2023  润新知