• lucene vs zoie


    前段时间使用zoie的perf包内的性能测试代码对lucene和zoie的实时搜索部分做了对比测试,结果出乎我意料,从数据上看,lucene比zoie更适合于一般实时搜索的场景。

    zoie的perf从四个方面来评测:search lancenty, indexing lancenty, indexing event rate, indexing event size。图1为zoie的评测结果,图2为lucene nrt的评测结果。

    Zoie Perf Console 2012-10-09 17-32-50

    图1 zoie测试数据

    Zoie Perf Console 2012-10-09 17-34-29

    图2 lucene nrt 测试数据

    从数据上很容易看出,lucene在搜索响应时间上胜出,而zoie在索引数据时有更好的表现。Mike McCandless在他的一篇博客Lucene's near-real-time search is fast!后的评论回复中解释了nrt和zoie的差别:“

    The biggest difference is that Zoie aims for immediate consistency
    (reopen after every index change & next query), which I think very few
    apps really require, given how fast NRT is.
    Also, NRTCachingDir (caching small segments in RAM) achieves the
    biggest (in my opinion) benefit of Zoie, but with substantially less
    added complexity. Reducing complexity is important because it means
    less risk of bugs; for example, Zoie had some scary corruption bugs,
    which took quite some time to track down; see
    https://issues.apache.org/jira/browse/LUCENE-2729
    The other part of Zoie I remember is deferring resolving deletions to
    Lucene docIDs, and instead using a bloom filter to post-filter
    collected documents. While I understand the motivation for this
    ("immediate consistency") I think it's the wrong tradeoff since it
    necessarily slows down all searching (checking a bloom filter is more
    costly than Lucene's checking a bit set), not to mention the added RAM
    required for the bloom filter.
    Ie, it's better to spend more time during reopen to resolve the
    deletions, so that searches don't slow down.

    总的来说就是zoie的强一致性,推迟删除的特性导致了搜索响应时间比lucene长,而且zoie的特殊设计增加了代码的复杂性,bug难于追踪,而且对使用者来说,文档缺乏且阅读代码费时费力,我猜这也是它没能流行起来的原因之一。类似linkedin这样的频繁更新数据的搜索场景很少见,更一般的情况,lucene nrt足以胜任,所以真心觉得cntv和网易大可不用zoie……

  • 相关阅读:
    ARM装配说明MCR/MRC学习
    smark和openfire即时通信代码
    Bulk Insert具体订单
    Redis测井系统
    几种任务调度的 Java 实现方法与比较
    Android中Style和Theme的使用
    高仿优酷Android客户端图片左右滑动(自动切换)
    Android GridView 一行显示数据(包括图片和文本),解决的办法是计算数据占该行的宽度是多少
    为Android GridView 设置行背景
    Android利用Filter过滤数据
  • 原文地址:https://www.cnblogs.com/nanpo/p/2731713.html
Copyright © 2020-2023  润新知