• [ solr扩展 ] Different ways to implement autosuggest using SOLR


    转载地址:http://knowlspace.wordpress.com/2011/06/15/different-ways-to-implement-autosuggest-using-solr/


    There are currently five techniques that can be used to create an auto-suggest functionality:

    1- The TermsComponent
    2- Facet Prefixes
    3- The new Suggester component
    4- Edge N-Grams
    5- Wildcard queries.

    TermsComponent

    Implementing an autosuggest with the TermsComponent is probably the easiest way of doing it. The TermsComponent is a low level Solr component that returns all the terms indexed for one field for all the documents in the index. It also contains a parameter called “terms.prefix”, which restricts the terms returned by the component to only those that start with that prefix. So, using this component for autosuggest is as easy as querying it, setting the value of “terms.prefix” to the text entered by the user.

    Unfortunately this has big limitations. First, this component will show the “indexed” terms, and not the stored, so an extra field with no analysis should be used for it. But there is another problem related to this. If the term indexed is “Vostro”, and the user enters “vos” (with lowercase), then the terms component won’t return “Vostro” as it starts with upper case.

    Faceting to suggest
    Faceting is sometimes used for autosuggesting. The idea is similar to the TermsComponent approach, as faceting also has a “facet.prefix” parameter. By faceting on a field that contains the product names and using the facet.prefix parameter with the user entered text, the returned facets could be the suggestions. Unfortunately, this approach suffers the same problems as the TermsComponent approach.

    Suggester
    This is a new component available in version 3.1 of Solr. Suggester reuses much of the SpellCheckComponent infrastructure, so it also reuses many common SpellCheck parameters, such as spellcheck=true or spellcheck.build=true, etc. The way this component is configured in solrconfig.xml is also very similar. It is technically a spellchecker but instead of correcting misspelled words it returns a list of suggested words.

    It was developed with performance and versatility in mind. The other approaches weren’t thought as suggestion components in the first place but components that may be used to implement the autosuggest use case. The Suggester is a component made from scratch.The suggester obtains the suggestions from an external dictionary or a field.

    Edge Ngrams
    Edge Ngrams are substrings of the term that contain the first letters of it. For example the Edge Ngrams of the term “house” are “h”, “ho”, “hou”, “hous” and “house”
    The idea is to associate each of this Ngrams with the full word. Usually this is accomplished with a specialized field for the suggestions with a special analysis. Suggestion of text with multiple words can be easily accomplished using this approach.
    For this example, the user is searching for discs, and the system should recommend “Dark side of the moon” when the user begins to type “side”. For this , the schema of the recommendation index would consist of an Edge Ngrams field, that is, a field that at least has the following filters:

    Whitespace tokenizer
    Lowercase filter
    Edge Ngrams filter

    Applying this chain to the title of that disk will produce:
    Original text: Dark side of the moon
    Whitespace tokenizer: Dark | side | of | the | moon
    Lowecase filter: dark | side | of | the | moon
    EdgeNgrams filter: d | da | dar | dark | s | si | sid | side | o | of | t | th | the | m | mo | moo | moon

    The best way of implementing this approach for this example is to add an extra field named “edge_title” or similar, that must be indexed with the analysis chain provided above (not necessarily stored if the title is being stored on other field). The auto-suggest should issue queries like:
    …&q=edge_title:[user-entered-text]&fl=title
    The query analysis chain to apply should be the same as in the indexing phase, except for the EdgeNgrams filter that should not be applied in the query.
    There is a drawback with this approach that is the disk space usage. When using edge-ngrams, the index will grow significantly.

    Execute Wildcard queries
    There are two problems with this approach. Wildcard queries are not as fast as regular queries. Autosuggestion must be fast, and with a relatively large index, this approach wont probably achieve the necessary speed.
    The other big issue with this approach is the analysis. When a query contains wildcards, Solr don’t analyze it. So, if there is a small difference between the text entered by the user and the indexed text (case, etc), Solr won’t suggest that document, even when the user enters the text correctly. In the first example, if the user enters “Vostro” or “Dell”, Solr won’t suggest “Dell Vostro”, as that field was lower-cased on index time.
    One advantage of this approach against all the others is that when the user enters a part of the word, which is not the first part of it, like “str”, “Dell Vostro” could be suggested.

  • 相关阅读:
    HEVC软件记录
    怪诞行为学
    docker学习实践之路[第一站]环境安装
    centos系统(ssh)登录缓慢(输入账户密码后需要等待若干时间)
    CentOS7.x编译安装nginx,实现HTTP2
    vs 2017 IIS EXPRESS 增加局域网访问
    nginx在centos 7中源码编译安装【添加grpc的支持】
    docker学习实践之路[第五站]mysql镜像应用
    docker学习实践之路[第四站]利用pm2镜像部署node应用
    docker学习实践之路[第三站]node站点部署
  • 原文地址:https://www.cnblogs.com/huangfox/p/2350738.html
Copyright © 2020-2023  润新知