whoosh,纯python的全文搜索引擎。这里记录简单使用,参考官方文档。
这里是我的代码,创建搜索文档(即索引文档),windows下操作。
#coding=utf-8 import os from whoosh.index import create_in,open_dir from whoosh import fields WHOOSH_ADD = 'E:\whoosh_index' WHOOSH_SCHEMA = fields.Schema(title=fields.TEXT(stored=True), content=fields.TEXT(stored=True), ) if not os.path.exists(WHOOSH_ADD): os.mkdir(WHOOSH_ADD) ix = create_in(WHOOSH_ADD,schema=WHOOSH_SCHEMA,indexname='comment') ix = open_dir(WHOOSH_ADD,indexname='comment') writer = ix.writer() writer.add_document(title=u'chang yanjie add',content= u' zheng wen 我是正文',) writer.add_document(title=u'chang yan1 jie2 add',content= u' zheng wen 我是正文2',) writer.commit()
学习使用的同学们自己更改地址WHOOSH_ADD,
当然也有更新方法,
writer.update_document(title=u"chang yanjie add", content="变啦",)
搜索代码:
#coding=utf-8 from whoosh import index from whoosh.qparser import QueryParser ix = index.open_dir('E:\whoosh_index', indexname='comment') hits = [] query = u' zheng' parser = QueryParser("content", schema=ix.schema) try: word = parser.parse(query) except: word = None if word is not None: s = ix.searcher() hits = s.search(word) #with ix.searcher() as s: 注意此处,如果使用with 方法的话,文件会自动closed()方法,下边将无法使用hits结果 # hits = s.search(word) print len(hits)
正常结果应该是2,哈哈。