• NLP(十二)指代消解


    原文链接:http://www.one2know.cn/nlp12/

    • 代词是用来代替重复出现的名词
      例句:
      1.Ravi is a boy. He often donates money to the poor.
      先出现主语,后出现代词,所以流动的方向从左到右,这类句子叫回指(Anaphora)
      2.He was already on his way to airport.Realized Ravi.
      这种句子表达的方式的逆序的,这类句子叫预指(Cataphora)
    • 代码
    import nltk
    from nltk.chunk import tree2conlltags
    from nltk.corpus import names # 有 人名和性别 标签
    import random
    
    class AnaphoraExample:
        def __init__(self): # 不需要参数就能构造
            males = [(name,'male') for name in names.words('male.txt')]
            females = [(name,'female') for name in names.words('female.txt')]
            combined = males + females # 列表元素:人名和性别构成的元组
            random.shuffle(combined)
            # print(combined)
            training = [(self.feature(name),gender) for (name,gender) in combined]
            self._classifier = nltk.NaiveBayesClassifier.train(training) # 分类器
    
        def feature(self,word): # 单词最后一个字母当特征
            return {'last(1)' : word[-1]}
    
        def gender(self,word): # 返回单词放到分类器中得到的性别标签
            return self._classifier.classify(self.feature(word))
    
        def learnAnaphora(self):
            sentences = [
                "John is a man. He walks",
                "John and Mary are married. They have two kids",
                "In order for Ravi to be successful, he should follow John",
                "John met Mary in Barista. She asked him to order a Pizza",
            ]
    
            for sent in sentences:
                chunks = nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent)),binary=False)
                # 实现分词,词性标注,组块(实体)抽取,返回组块树结果,赋给chunks
                stack = []
                print(sent)
                items = tree2conlltags(chunks) # 将一个句子展平成一个列表,并以IOB格式表示
                for item in items:
                    if item[1] == 'NNP' and (item[2] == 'B-PERSON' or item[2] == '0'): # 人名
                        stack.append((item[0],self.gender(item[0]))) # 人名和性别的元组
                    elif item[1] == 'CC': # 连词
                        stack.append(item[0])
                    elif item[1] == 'PRP': # 人称代词
                        stack.append(item[0])
                print('	{}'.format(stack))
    
    if __name__ == "__main__":
        anaphora = AnaphoraExample()
        anaphora.learnAnaphora()
    

    输出:

    John is a man. He walks
    	[('John', 'male'), 'He']
    John and Mary are married. They have two kids
    	[('John', 'male'), 'and', ('Mary', 'female'), 'They']
    In order for Ravi to be successful, he should follow John
    	[('Ravi', 'female'), 'he', ('John', 'male')]
    John met Mary in Barista. She asked him to order a Pizza
    	[('John', 'male'), ('Mary', 'female'), 'She', 'him']
    
  • 相关阅读:
    如何在EXCEL SHEET中 动态添加控件
    和菜鸟一起学OK6410之ADC模块
    和菜鸟一起学证券投资之消费物价指数CPI
    和菜鸟一起学证券投资之股市常见概念公式2
    作为软件工程师,你必须知道的20个常识
    和菜鸟一起学c++之虚函数
    和菜鸟一起学单片机之入门级led流水灯
    在国内各大软件下载网站上,“万能数据库查询分析器”已更新至 2.02 版本
    和菜鸟一起学证券投资之股市简单财务分析
    和菜鸟一起学OK6410之蜂鸣器buzzer字符驱动
  • 原文地址:https://www.cnblogs.com/peng8098/p/nlp_12.html
Copyright © 2020-2023  润新知