• python 自然语言处理(五)____WordNet


    WordNet是面向语义的英语词典,与传统辞典类似,但结构更丰富。nltk中包括英语WordNet,共有155287个单词和117659个同义词。

    1.寻找同义词

    这里以motorcar为例,寻找它的同义词集。

    1 >>> from nltk.corpus import wordnet as wn
    2 >>> wn.synsets('motorcar')                                //找到同义词集
    3 [Synset('car.n.01')]
    4 >>> wn.synset('car.n.01').lemma_names
    5 <bound method Synset.lemma_names of Synset('car.n.01')>
    6 >>> wn.synset('car.n.01').lemma_names()                   //访问同义词集
    7 ['car', 'auto', 'automobile', 'machine', 'motorcar']
    8 >>>
     1 >>> wn.synset('car.n.01').definition()              //获取该词在该词集的定义
     2 'a motor vehicle with four wheels; usually propelled by an internal combustion engine'
     3 >>> wn.synset('car.n.01').examples()            //获取该词在该词集下的例句
     4 ['he needs a car to get to work']
     5 >>> wn.synset('car.n.01').lemmas()
     6 [Lemma('car.n.01.car'), Lemma('car.n.01.auto'), Lemma('car.n.01.automobile'), Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')]
     7 >>> wn.lemma('car.n.01.automobile')
     8 Lemma('car.n.01.automobile')
     9 >>> wn.lemma('car.n.01.automobile').synset()
    10 Synset('car.n.01')
    11 >>> wn.lemma('car.n.01.automobile').name()
    12 'automobile'
    13 >>> wn.synsets('car')
    14 [Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')]
    15 >>> for synset in wn.synsets('car'):
    16 ...     print (synset.lemma_names())
    17 ...
    18 ['car', 'auto', 'automobile', 'machine', 'motorcar']
    19 ['car', 'railcar', 'railway_car', 'railroad_car']
    20 ['car', 'gondola']
    21 ['car', 'elevator_car']
    22 ['cable_car', 'car']
    23 >>> wn.lemmas('car')                          //访问所有包含词car的词条
    24 [Lemma('car.n.01.car'), Lemma('car.n.02.car'), Lemma('car.n.03.car'), Lemma('car.n.04.car'), Lemma('cable_car.n.01.car')]
    25 >>>
    View Code

    2.WordNet的层次结构

    WordNet的同义词集相当于抽象的概念,它们并不总是有对应的英语词汇。这些概念在层次结构中相互联系在一起。

    如上图,是WordNet概念的层次片段。每个节点对应一个同义词集,边表示上位词/下位词关系,即上级概念与从属概念的关系。

     1 >>> motorcar=wn.synset('car.n.01')
     2 >>> types_of_motorcar=motorcar.hyponyms()
     3 >>> types_of_motorcar[26]
     4 Synset('stanley_steamer.n.01')
     5 >>> sorted(
     6 ... [lemma.name()
     7 ... for synset in types_of_motorcar
     8 ... for lemma in synset.lemmas()])
     9 ['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer', 'ambulance', 'beach_waggon', 'beach_wagon', 'bus', 'cab', 'compact', 'compact_car', 'convert
    10 ible', 'coupe', 'cruiser', 'electric', 'electric_automobile', 'electric_car', 'estate_car', 'gas_guzzler', 'hack', 'hardtop', 'hatchback', '
    11 heap', 'horseless_carriage', 'hot-rod', 'hot_rod', 'jalopy', 'jeep', 'landrover', 'limo', 'limousine', 'loaner', 'minicar', 'minivan', 'pace
    12 _car', 'patrol_car', 'phaeton', 'police_car', 'police_cruiser', 'prowl_car', 'race_car', 'racer', 'racing_car', 'roadster', 'runabout', 'sal
    13 oon', 'secondhand_car', 'sedan', 'sport_car', 'sport_utility', 'sport_utility_vehicle', 'sports_car', 'squad_car', 'station_waggon', 'statio
    14 n_wagon', 'stock_car', 'subcompact', 'subcompact_car', 'taxi', 'taxicab', 'tourer', 'touring_car', 'two-seater', 'used-car', 'waggon', 'wago
    15 n']
    16 >>> motorcar.hypernyms()
    17 [Synset('motor_vehicle.n.01')]
    18 >>> paths=motorcar.hypernym_paths()
    19 >>> len(paths)
    20 2
    21 >>> [synset.name for synset in paths[0]]
    22 [<bound method Synset.name of Synset('entity.n.01')>, <bound method Synset.name of Synset('physical_entity.n.01')>, <bound method Synset.nam
    23 e of Synset('object.n.01')>, <bound method Synset.name of Synset('whole.n.02')>, <bound method Synset.name of Synset('artifact.n.01')>, <bou
    24 nd method Synset.name of Synset('instrumentality.n.03')>, <bound method Synset.name of Synset('container.n.01')>, <bound method Synset.name
    25 of Synset('wheeled_vehicle.n.01')>, <bound method Synset.name of Synset('self-propelled_vehicle.n.01')>, <bound method Synset.name of Synset
    26 ('motor_vehicle.n.01')>, <bound method Synset.name of Synset('car.n.01')>]
    27 >>> [synset.name() for synset in paths[0]]
    28 ['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01', 'instrumentality.n.03', 'container.n.01', 'wheeled_veh
    29 icle.n.01', 'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01']
    30 >>> [synset.name() for synset in paths[1]]
    31 ['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01', 'instrumentality.n.03', 'conveyance.n.03', 'vehicle.n.
    32 01', 'wheeled_vehicle.n.01', 'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01']
    33 >>> motorcar.root_hypernyms()
    34 [Synset('entity.n.01')]
    35 >>>
    View Code

    3.更多的词汇关系

    上位词和下位词被称为词汇关系,因为它们是同义集之间的关系。这两者的关系为上下定位“is-a”层次。WordNet网络另一个重要的定位方式是从条目到它们的部件(部分)或到包含它们的东西(整体)。

    1)部分-整体关系

     1 >>> wn.synset('tree.n.01').part_meronyms()
     2 [Synset('burl.n.02'), Synset('crown.n.07'), Synset('limb.n.02'), Synset('stump.n.01'), Synset('trunk.n.01')]
     3 >>> wn.synset('tree.n.01').substance_meronyms()
     4 [Synset('heartwood.n.01'), Synset('sapwood.n.01')]
     5 >>> wn.synset('tree.n.01').member_holonyms()
     6 [Synset('forest.n.01')]
     7 >>> for synset in wn.synsets('mint', wn.NOUN):
     8 ...     print("%s : %s" % (synset.name(), synset.definition())
     9 ...
    10 ...
    11 ... )
    12 ...
    13 batch.n.02 : (often followed by `of') a large number or amount or extent
    14 mint.n.02 : any north temperate plant of the genus Mentha with aromatic leaves and small mauve flowers
    15 mint.n.03 : any member of the mint family of plants
    16 mint.n.04 : the leaves of a mint plant used fresh or candied
    17 mint.n.05 : a candy that is flavored with a mint oil
    18 mint.n.06 : a plant where money is coined by authority of the government
    19 >>> wn.synset('mint.n.04').part_holonyms()
    20 [Synset('mint.n.02')]
    21 >>> wn.synset('mint.n.04').substance_holonyms()
    22 [Synset('mint.n.05')]

    2)蕴涵关系

    1 >>> wn.synset('walk.v.01').entailments()
    2 [Synset('step.v.01')]
    3 >>> wn.synset('eat.v.01').entailments()
    4 [Synset('chew.v.01'), Synset('swallow.v.01')]
    5 >>> wn.synset('tease.v.03').entailments()
    6 [Synset('arouse.v.07'), Synset('disappoint.v.01')]

    3)反义词

    1 >>> wn.lemma('supply.n.02.supply').antonyms()
    2 [Lemma('demand.n.02.demand')]
    3 >>> wn.lemma('rush.v.01.rush').antonyms()
    4 [Lemma('linger.v.04.linger')]
    5 >>> wn.lemma('horizontal.a.01.horizontal').antonyms()
    6 [Lemma('inclined.a.02.inclined'), Lemma('vertical.a.01.vertical')]
    7 >>> wn.lemma('staccato.r.01.staccato').antonyms()
    8 [Lemma('legato.r.01.legato')]
    9 >>>

    4. 语义相似度

    同义词集是由复杂的词汇关系网络所连接起来的。给定一个同义词集,可以遍历WordNet网络来查找相关含义的同义词集。每个同义词集都有一个或多个上位词路径连接到一个根上位词。连接到同一个根的两个同义词集可能有一些共同的上位词。如果两个同义词集共用一个特定的上位词——在上位词层次结构中处于较底层——它们一定有密切的联系。

  • 相关阅读:
    获取时间毫秒数
    http地址自动检测并添加URL链接
    extjs实现选择多表自定义查询功能————前台部分(ext源码)
    items中多个checkgroup在IE6下无法完整显示
    PHP压缩文件夹(ZIP)
    初始化checkboxgroup值
    下载电驴资源
    碰撞与交换
    Actionscript中MovieClip,Sprite,Shape的区别 « 檬檬前端行
    下载的ascb文件如何使用:Flash CS4 设置方法
  • 原文地址:https://www.cnblogs.com/no-tears-girl/p/6416765.html
Copyright © 2020-2023  润新知