• Elasticsearch搜索之most_fields分析


         顾名思义,most_field就是匹配词干的字段数越多,分数越高,也可设置权重boost。

         下面是简易公式(详细评分算法请参考:http://m.blog.csdn.net/article/details?id=50623948):

         score=match_field1_score*boost+match_field2_score*boost+...match_fieldN_score*boost。

         在很多情况下,这种搜索很有效,但存在一个弱点,就是当文档中的字段冗余信息过多,将会影响那些文档比较精炼,而且意思较为全面的分值,

         不能使用operator和minimum_should_match来减少相关性低的doc的长尾问题,简单的来说就是按term匹配的个数取胜

        例下:

        搜索关键字“北京东路”,先下面的分词结果,我们知道它的词干为“北京”与“东路”:

    curl   'localhost:9200/fullbiz_index/_analyze?analyzer=ik_smart&pretty=true' -d '{"text":"北京东路"}'
    {
       "tokens" : [
          {
             "token" : "text",
             "start_offset" : 2,
             "end_offset" : 6,
             "type" : "ENGLISH",
             "position" : 1
          },
          {
             "token" : "北京",
             "start_offset" : 9,
             "end_offset" : 11,
             "type" : "CN_WORD",
             "position" : 2
          },
          {
             "token" : "东路",
             "start_offset" : 11,
             "end_offset" : 13,
             "type" : "CN_WORD",
             "position" : 3
          }
       ]
    }
    curl  'localhost:9200/fullbiz1/fullbizinfo/_search?pretty' -d '
    {
      "from" : 0,
      "size" : 20,
      "query" : {
        "multi_match" : {
          "query" : "北京东路",
          "fields" : [ "title", "highlight", "tags", "address", "businessDistrict", "cuisineStyle" ],
          "type" : "most_fields",
    	  "minimum_should_match" : "70%",//这是指最少匹配词干占比,例如三个词干,只要配置了二个以上就算match,66.6%会啥入70%。二个词干或以下,只要匹配了一个就行。所以“北京东路”只要匹配了“北京”或“东路”都可得分
          "analyzer" : "ik_smart" //ik有二种模式,一种是ik_max_word(最细词干法),ik_smart(最粗词干法),这里我们配置第二种,以更接近于业务结果。        
        }
      },
      "post_filter" : {
        "bool" : {
          "must" : [ {
            "term" : {
              "status" : 0
            }
          }, {
            "term" : {
              "hostDisplay" : 1
            }
          }, {
            "term" : {
              "cityId" : 2
            }
          }, {
            "term" : {
              "productType" : 3
            }
          } ]
        }
      }
    }'
     
        "hits" : [ {
          "_index" : "fullbiz1",
          "_type" : "fullbizinfo",
          "_id" : "324239",
          "_score" : 0.33371,
          "_source":{"boost":1,"productId":24239,"productType":3,"subType":2,"title":"城市公牛(南京东路店)","viceTitle":"城市公牛(南京东路店)","personMax":"-1","personMin":"-1","picUrl":"meal/2016/08/11/1470892987880.jpg","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":null,"status":0,"isFree":-1,"duration":"10:00:00-22:30:00","onlineTime":1470280723,"updateTime":1486951326,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"南京东路","businessDistrictId":73,"hostId":24239,"contactNumber":"13764741956","hostName":"城市公牛(南京东路店)","address":"南京东路300号L221-222室(河南中路口)","hostDisplay":1,"hostPicUrl":"meal/2016/08/11/1470892987880.jpg","hostSharePicUrl":"meal/2016/08/11/1470892987880.jpg","hostLatitude":"31.243455970586","hostLongitude":"121.49099099941","location":{"lat":"31.243455970586","lon":"121.49099099941"},"hostLatitudeGD":"31.237701","hostLongitudeGD":"121.484409","locationGD":{"lat":"31.237701","lon":"121.484409"},"headPics":"","catalogIds":null,"cuisineStyleId":41,"cuisineStyle":"西餐","hideMask":0,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":1,"orderNums":3,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":16000,"hostProductLabelIds":",1,2,4,5,7,8,9,12,13,14,15,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"["2010年世博会加拿大馆特约餐厅","加拿大简约西部乡村风格小酒馆餐厅","家庭式的用餐氛围 80%均是外国食客"]","isSeatBook":1,"lastUTCTimestamp":"2017-02-13T10:02:06.000+08:00"}
        }, {
          "_index" : "fullbiz1",
          "_type" : "fullbizinfo",
          "_id" : "392659",
          "_score" : 0.31962717,
          "_source":{"boost":1,"productId":92659,"productType":3,"subType":4,"title":"THAIBEAUTY美容连锁机构(南京东路店)","viceTitle":"THAIBEAUTY美容连锁机构(南京东路店)","personMax":"-1","personMin":"-1","picUrl":"hostInfo/2017/01/11/1484121279773528.jpg","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":"","status":0,"isFree":-1,"duration":null,"onlineTime":1484121281,"updateTime":1484202471,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"南京东路","businessDistrictId":73,"hostId":92659,"contactNumber":"021-63511876","hostName":"THAIBEAUTY美容连锁机构(南京东路店)","address":"南京东路580号6楼","hostDisplay":1,"hostPicUrl":"hostInfo/2017/01/11/1484121279773528.jpg","hostSharePicUrl":"hostInfo/2017/01/11/1484121279773528.jpg","hostLatitude":"31.241721400027","hostLongitude":"121.48585125776","location":{"lat":"31.241721400027","lon":"121.48585125776"},"hostLatitudeGD":"31.235887","hostLongitudeGD":"121.479289","locationGD":{"lat":"31.235887","lon":"121.479289"},"headPics":"","catalogIds":null,"cuisineStyleId":0,"cuisineStyle":"美容/SPA","hideMask":-1,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":0,"orderNums":0,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":284500,"hostProductLabelIds":",60,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"["高端局部瘦身","环境舒适 按摩师手法专业","使用高品质产品"]","isSeatBook":1,"lastUTCTimestamp":"2017-01-12T14:27:51.000+08:00"}
        }, {
          "_index" : "fullbiz1",
          "_type" : "fullbizinfo",
          "_id" : "364804",
          "_score" : 0.31002828,
          "_source":{"boost":1,"productId":64804,"productType":3,"subType":2,"title":"斗牛士(南京东路店)","viceTitle":"斗牛士(南京东路店)","personMax":"-1","personMin":"-1","picUrl":"hostInfo/2016/12/26/1482718008927949.png","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":"","status":0,"isFree":-1,"duration":null,"onlineTime":1482718014,"updateTime":1486569730,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"南京东路","businessDistrictId":73,"hostId":64804,"contactNumber":"021-33317136","hostName":"斗牛士(南京东路店)","address":"南京东路353号悦荟广场(原353店)7F","hostDisplay":1,"hostPicUrl":"hostInfo/2016/12/26/1482718008927949.png","hostSharePicUrl":"hostInfo/2016/12/26/1482718008927949.png","hostLatitude":"31.24210523683","hostLongitude":"121.49020262932","location":{"lat":"31.24210523683","lon":"121.49020262932"},"hostLatitudeGD":"31.236339","hostLongitudeGD":"121.483623","locationGD":{"lat":"31.236339","lon":"121.483623"},"headPics":"","catalogIds":null,"cuisineStyleId":41,"cuisineStyle":"西餐","hideMask":-1,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":0,"orderNums":0,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":12200,"hostProductLabelIds":",1,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"["精选进口澳洲安格斯牛排","严控0度低温 保证牛肉鲜嫩","进口原切牛排保证牛肉口感与外观"]","isSeatBook":1,"lastUTCTimestamp":"2017-02-09T00:02:10.000+08:00"}
    .....
          "_index" : "fullbiz1",
          "_type" : "fullbizinfo",
          "_id" : "353771",
          "_score" : 0.7784657,
          "_source":{"boost":1,"productId":53771,"productType":3,"subType":2,"title":"九储堂创意中国菜(外滩店)","viceTitle":"九储堂创意中国菜(外滩店)","personMax":"-1","personMin":"-1","picUrl":"hostInfo/2016/12/26/1482744127546461.jpg","recommand":-1,"needReserveTime":-1,"priceStr":"-1","price":"-1","originalPrice":"-1","leadingMinutes":-1,"tags":"","status":0,"isFree":-1,"duration":null,"onlineTime":1482744132,"updateTime":1486738928,"applyExpiredTime":0,"beginTime":0,"endTime":0,"isCourse":-1,"isTour":-1,"supportParty":0,"interestedNum":0,"cityId":2,"cityName":"上海","categoryId":"0","categoryName":"","categoryIconUrl":"","businessDistrict":"外滩","businessDistrictId":71,"hostId":53771,"contactNumber":"021-63308900","hostName":"九储堂创意中国菜(外滩店)","address":"北京东路398号新协通国际大酒店18楼","hostDisplay":1,"hostPicUrl":"hostInfo/2016/12/26/1482744127546461.jpg","hostSharePicUrl":"hostInfo/2016/12/26/1482744127546461.jpg","hostLatitude":"31.246247363994","hostLongitude":"121.48894308136","location":{"lat":"31.246247363994","lon":"121.48894308136"},"hostLatitudeGD":"31.240463","hostLongitudeGD":"121.48237","locationGD":{"lat":"31.240463","lon":"121.48237"},"headPics":"","catalogIds":null,"cuisineStyleId":25,"cuisineStyle":"创意菜","hideMask":-1,"referenceAgeMin":0,"referenceAgeMax":0,"userLimit":-1,"todayReservable":0,"orderNums":0,"pvConversionRate":"-1","interestNums":0,"hotPoints":0,"hostAvgPrice":19100,"hostProductLabelIds":",1,","shopPay":0,"hostVipEquities":"0","isHostSale":0,"highlight":"["新加坡同乐餐饮总厨胡于保先生主理","大厅可容纳150人的宴会 包房5间","靠窗座位亦可欣赏浦江两岸美景"]","isSeatBook":1,"lastUTCTimestamp":"2017-02-10T23:02:08.000+08:00"}

    而结果中有包含“北京东路”完整内容的文档却排在后面,这不科学,为什么会是这个结果,下面我们经过explain来看看评分计算:

     curl  'localhost:9200/fullbiz1/fullbizinfo/_search?pretty&explain'  ....后面内容省略,和上面的请求是一样,只加了一个explain,以及size限制第一条,因为信息太多,只分析具体一个文档,下面我们直接看评分部分:

          "_explanation" : {
            "value" : 0.33371,
            "description" : "product of:",
            "details" : [ {
              "value" : 0.66742,
              "description" : "sum of:",
              "details" : [ {
                "value" : 0.28481156,
                "description" : "product of:",
                "details" : [ {
                  "value" : 0.5696231,
                  "description" : "sum of:",
                  "details" : [ {
                    "value" : 0.5696231,
                    "description" : "weight(title:东路 in 7321) [PerFieldSimilarity], result of:",
                    "details" : [ {
                      "value" : 0.5696231,
                      "description" : "score(doc=7321,freq=1.0), product of:",
                      "details" : [ {
                        "value" : 0.25448462,
                        "description" : "queryWeight, product of:",
                        "details" : [ {
                          "value" : 7.1626873,
                          "description" : "idf(docFreq=244, maxDocs=116302)"
                        }, {
                          "value" : 0.03552921,
                          "description" : "queryNorm"
                        } ]
                      }, {
                        "value" : 2.23834,
                        "description" : "fieldWeight in 7321, product of:",
                        "details" : [ {
                          "value" : 1.0,
                          "description" : "tf(freq=1.0), with freq of:",
                          "details" : [ {
                            "value" : 1.0,
                            "description" : "termFreq=1.0"
                          } ]
                        }, {
                          "value" : 7.1626873,
                          "description" : "idf(docFreq=244, maxDocs=116302)"
                        }, {
                          "value" : 0.3125,
                          "description" : "fieldNorm(doc=7321)"
                        } ]
                      } ]
                    } ]
                  } ]
                }, {
                  "value" : 0.5,
                  "description" : "coord(1/2)"
                } ]
              }, {
                "value" : 0.067192085,
                "description" : "product of:",
                "details" : [ {
                  "value" : 0.13438417,
                  "description" : "sum of:",
                  "details" : [ {
                    "value" : 0.13438417,
                    "description" : "weight(address:东路 in 7321) [PerFieldSimilarity], result of:",
                    "details" : [ {
                      "value" : 0.13438417,
                      "description" : "score(doc=7321,freq=1.0), product of:",
                      "details" : [ {
                        "value" : 0.1477382,
                        "description" : "queryWeight, product of:",
                        "details" : [ {
                          "value" : 4.158218,
                          "description" : "idf(docFreq=4942, maxDocs=116302)"
                        }, {
                          "value" : 0.03552921,
                          "description" : "queryNorm"
                        } ]
                      }, {
                        "value" : 0.90961015,
                        "description" : "fieldWeight in 7321, product of:",
                        "details" : [ {
                          "value" : 1.0,
                          "description" : "tf(freq=1.0), with freq of:",
                          "details" : [ {
                            "value" : 1.0,
                            "description" : "termFreq=1.0"
                          } ]
                        }, {
                          "value" : 4.158218,
                          "description" : "idf(docFreq=4942, maxDocs=116302)"
                        }, {
                          "value" : 0.21875,
                          "description" : "fieldNorm(doc=7321)"
                        } ]
                      } ]
                    } ]
                  } ]
                }, {
                  "value" : 0.5,
                  "description" : "coord(1/2)"
                } ]
              }, {
                "value" : 0.3154164,
                "description" : "product of:",
                "details" : [ {
                  "value" : 0.6308328,
                  "description" : "sum of:",
                  "details" : [ {
                    "value" : 0.6308328,
                    "description" : "weight(businessDistrict:东路 in 7321) [PerFieldSimilarity], result of:",
                    "details" : [ {
                      "value" : 0.6308328,
                      "description" : "score(doc=7321,freq=1.0), product of:",
                      "details" : [ {
                        "value" : 0.22633977,
                        "description" : "queryWeight, product of:",
                        "details" : [ {
                          "value" : 6.3705263,
                          "description" : "idf(docFreq=540, maxDocs=116302)"
                        }, {
                          "value" : 0.03552921,
                          "description" : "queryNorm"
                        } ]
                      }, {
                        "value" : 2.7871053,
                        "description" : "fieldWeight in 7321, product of:",
                        "details" : [ {
                          "value" : 1.0,
                          "description" : "tf(freq=1.0), with freq of:",
                          "details" : [ {
                            "value" : 1.0,
                            "description" : "termFreq=1.0"
                          } ]
                        }, {
                          "value" : 6.3705263,
                          "description" : "idf(docFreq=540, maxDocs=116302)"
                        }, {
                          "value" : 0.4375,
                          "description" : "fieldNorm(doc=7321)"
                        } ]
                      } ]
                    } ]
                  } ]
                }, {
                  "value" : 0.5,
                  "description" : "coord(1/2)"
                } ]
              } ]
            }, {
              "value" : 0.5,
              "description" : "coord(3/6)"
            } ]
          }
        } ]
      }
    }

    从上面分析结果来看,排在前面的这些包含“南京东路”的文档,不是因为匹配度高,而是因为匹配的字段多,所以得分大于下面那个只包含一个“北京东路”字段的文档。

    总结:most_field适应于那种字段之间信息差异较大的搜索匹配,像上面那种title中有“东路”,商圈、地址中也有“东路“,冗余信息较多。

  • 相关阅读:
    工业大数据的理论体系
    我的偶像王坚博士,一位执着的学者!
    云计算遇上区块链,会产生怎样的能量和火花?
    管好超时才能做好异步
    “AliOS之父”——阿里巴巴王坚博士
    Centos7开放及查看端口
    直连不同网段
    实施:帧中继
    网线标准
    以太网的帧结构
  • 原文地址:https://www.cnblogs.com/clonen/p/6674946.html
Copyright © 2020-2023  润新知