详细使用说明:http://textgrocery.readthedocs.io/zh/latest/index.html
TextGrocery是一个基于LibLinear和结巴分词的短文本分类工具,特点是高效易用,同时支持中文和英文语料。
需要安装:
pip install classifier
过程:
>>> from tgrocery import Grocery # 新开张一个杂货铺(别忘了取名) >>> grocery = Grocery('sample') # 训练文本可以用列表传入 >>> train_src = [ ('education', '名师指导托福语法技巧:名词的复数形式'), ... ('education', '中国高考成绩海外认可 是“狼来了”吗?'), ... ('sports', '图文:法网孟菲尔斯苦战进16强 孟菲尔斯怒吼'), ... ('sports', '四川丹棱举行全国长距登山挑战赛 近万人参与') ... ] >>> grocery.train(train_src) Building prefix dict from the default dictionary ... Dumping model to file cache /tmp/jieba.cache Loading model cost 1.125 seconds. Prefix dict has been built succesfully. * optimization finished, #iter = 3 Objective value = -1.092381 nSV = 8 <tgrocery.Grocery object at 0x7f23cf243b50> >>> grocery.save() >>> new_grocery = Grocery('sample') >>> new_grocery.load() >>> new_grocery.predict('考生必读:新托福写作考试评分标准') <tgrocery.base.GroceryPredictResult object at 0x4490d50> >>> new_grocery.predict('考生必读:新托福写作考试评分标准') <tgrocery.base.GroceryPredictResult object at 0x4490d90> >>> result = new_grocery.predict('考生必读:新托福写作考试评分标准') >>> print result education
完毕。