• Elasticsearch(全文搜索)


    前言

    收集大量的日志信息之后,把这些日志存放在哪里?才能对其日志内容进行搜素呢?MySQL?

    如果MySQL里存储了1000W条这样的数据,每条记录的details字段有128个字。

    用户想要查询details字段包含“ajax”这个关键词的记录。

    MySQL执行

    select * from logtable where details like "%ajax%";

    每次执行这条SQL语句,都需要逐一查询logtable中每条记录,最头痛的是找到这条记录之后,每次还要对这条记录中details字段里的文本内容进行全文扫描。

    判断这个当前记录中的details字段是的内容否包含 “ajax”?有可能会查询 10000w*128次.

    如果用户想搜素 “ajax”拼错了拼成了“ajxa”,这个sql无法搜素到用户想要的信息。因为不支持尝试把用户输入的错别字"ajxa"拆分开使用‘a‘,‘j‘,‘x‘,'a' 去尽可能多的匹配我想要的信息

    所以想要支持搜素details字段的Text内容的情况下,把海量的日志信息存在MySQL中是不太合理的。

    Elasticsearch简介

    1.倒排索引

    倒排索引是一种索引数据结构:从文本数据内容中提取出不重复的单词进行分词,每1个单词对应1个ID对单词进行区分,还对应1个该单词在那些文档中出现的列表 把这些信息组建成索引。

    倒排索引还记录了该单词在文档中出现位置、频率(次数/TF)用于快速定位文档和对搜素结果进行排序。

    (出现在文档1,<11位置>频率1次)
    (1,<11>,1),(2,<7>,1),(3,<3,9>,2)

    2.全文检索

    全文检索:把用户输入的关键词也进行分词,利用倒排索引,快速锁定关键词出现在那些文档。

    说白了就是根据value查询key(根据文档中内容关键字,找到该该关键字所在的文档的)而非根据key查询value。

    3.Lucene

    Lucene是apache软件基金会4 jakarta项目组的一个java子项目,是一个开放源代码的全文检索引擎JAR包。帮助我我们实现了以上的需求。

    lucene实现倒排索引之后,那么海量的数据如何分布式存储?如何高可用?集群节点之间如何管理?这是Elasticsearch实现的功能。

    常说的ELK是Elasticsearch(全文搜素)+Logstash(内容收集)+Kibana(内容展示)三大开源框架首字母大写简称。

    本文主要简单的介绍Elaticsearch,Elasticsearch是一个基于Lucene的分布式、高性能、可伸缩的搜素和分析系统,它提供了RESTful web API。

    Elasticsearch安装 

    官网下载

    ES的版本和Kibana的版本必须一致,官网下载比较慢,还好有好心人

    系统配置

    vm.max_map_count = 655360
    vim /etc/sysctl.conf

    权限
     chown -R elsearch:elsearch /data/elastic-search
    

     安全

    /etc/security/limits.conf
    [root@zhanggen config]# java -version
    openjdk version "1.8.0_102"
    OpenJDK Runtime Environment (build 1.8.0_102-b14)
    OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode)
    
    [root@zhanggen config]# cat /etc/security/limits.conf
    * soft nofile 65536
    * hard nofile 131072
    * soft nproc 4096
    * hard nproc 4096
    

      

    es配置文件

    # ======================== Elasticsearch Configuration =========================
    #
    # NOTE: Elasticsearch comes with reasonable defaults for most settings.
    #       Before you set out to tweak and tune the configuration, make sure you
    #       understand what are you trying to accomplish and the consequences.
    #
    # The primary way of configuring a node is via this file. This template lists
    # the most important settings you may want to configure for a production cluster.
    #
    # Please consult the documentation for further information on configuration options:
    # https://www.elastic.co/guide/en/elasticsearch/reference/index.html
    #
    # ---------------------------------- Cluster -----------------------------------
    #
    # Use a descriptive name for your cluster:
    #
    cluster.name: my-application
    #
    # ------------------------------------ Node ------------------------------------
    #
    # Use a descriptive name for the node:
    #
    node.name: node-1
    #
    # Add custom attributes to the node:
    #
    #node.attr.rack: r1
    #
    # ----------------------------------- Paths ------------------------------------
    #
    # Path to directory where to store the data (separate multiple locations by comma):
    #
    path.data: /data/elastic-search/data/
    #
    # Path to log files:
    #
    path.logs: /data/elastic-search/log/
    #
    #
    # ----------------------------------- Memory -----------------------------------
    #
    # Lock the memory on startup:
    #
    #bootstrap.memory_lock: false
    
    #
    # Make sure that the heap size is set to about half the memory available
    # on the system and that the owner of the process is allowed to use this
    # limit.
    #
    # Elasticsearch performs poorly when the system is swapping the memory.
    #
    # ---------------------------------- Network -----------------------------------
    #
    # Set the bind address to a specific IP (IPv4 or IPv6):
    #
    network.host: 0.0.0.0
    #
    # Set a custom port for HTTP:
    #
    http.port: 9200
    #
    # For more information, consult the network module documentation.
    #
    # --------------------------------- Discovery ----------------------------------
    #
    # Pass an initial list of hosts to perform discovery when this node is started:
    # The default list of hosts is ["127.0.0.1", "[::1]"]
    #
    #discovery.seed_hosts: ["host1", "host2"]
    #
    # Bootstrap the cluster using an initial set of master-eligible nodes:
    #
    cluster.initial_master_nodes: ["node-1"]
    #
    # For more information, consult the discovery and cluster formation module documentation.
    #
    # ---------------------------------- Gateway -----------------------------------
    #
    # Block initial recovery after a full cluster restart until N nodes are started:
    #
    #gateway.recover_after_nodes: 3
    #
    # For more information, consult the gateway module documentation.
    #
    # ---------------------------------- Various -----------------------------------
    #
    # Require explicit names when deleting indices:
    #
    #action.destructive_requires_name: true
    /elasticsearch-7.3.2/config/elasticsearch.yml

    启动

    [elsearch@zhanggen /]$ ./elasticsearch-7.3.2/bin/elasticsearch

    访问

    Elasticsearch使用

    关于Elasticsearch的使用都是基于RESTful风格的API进行的

    1.查看健康状态

    http://192.168.56.135:9200/_cat/health?v

    epoch      timestamp cluster        status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
    1590655194 08:39:54  my-application green        

    2.创建索引

    {
        "acknowledged": true,
        "shards_acknowledged": true,
        "index": "web"
    }

    3.删除索引

    {
        "acknowledged": true
    }

    4.插入数据

     request body

    {
        "name":"张根",
        "age":22,
        "marrid":"false"
    }

    response body

    {
        "_index": "students",
        "_type": "go",
        "_id": "cuWEWnIBWnQK6MVivzvO",
        "_version": 1,
        "result": "created",
        "_shards": {
            "total": 2,
            "successful": 1,
            "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1
    }

    ps:也可以使用PUT方法,但是需要传入id

     reqeust body

    {
        "name":"李渊",
        "age":1402,
        "marrid":"true"
    }

    response body

    {
        "_index": "students",
        "_type": "go",
        "_id": "2",
        "_version": 1,
        "result": "created",
        "_shards": {
            "total": 2,
            "successful": 1,
            "failed": 0
        },
        "_seq_no": 1,
        "_primary_term": 1
    }

    5.查询all

    [root@zhanggen zhanggen]#  curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query": { "match_all":{}}}'
    {
      "took" : 211,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 6,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "cuWEWnIBWnQK6MVivzvO",
            "_score" : 1.0,
            "_source" : {
              "name" : "张根",
              "age" : 22,
              "marrid" : "false"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "c-WPWnIBWnQK6MViOzt1",
            "_score" : 1.0,
            "_source" : {
              "name" : "张百忍",
              "age" : 3200,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "dOWPWnIBWnQK6MVimTsg",
            "_score" : 1.0,
            "_source" : {
              "name" : "李渊",
              "age" : 1402,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "deWQWnIBWnQK6MViazuu",
            "_score" : 1.0,
            "_source" : {
              "name" : "姜尚",
              "age" : 5903,
              "marrid" : "fale"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "duWSWnIBWnQK6MViXDtD",
            "_score" : 1.0,
            "_source" : {
              "name" : "孛儿只斤.铁木真",
              "age" : 814,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "2",
            "_score" : 1.0,
            "_source" : {
              "query" : {
                "match" : {
                  "name" : "张根"
                }
              }
            }
          }
        ]
      }
    }
    返回

    6.分页查询 (from, size)

    from 偏移,默认为0,size 返回的结果数,默认为10

    [root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{
    "query": { "match_all": {} },
    "from":1,
    "size":2
    }'

    返回

    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 6,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "c-WPWnIBWnQK6MViOzt1",
            "_score" : 1.0,
            "_source" : {
              "name" : "张百忍",
              "age" : 3200,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "dOWPWnIBWnQK6MVimTsg",
            "_score" : 1.0,
            "_source" : {
              "name" : "李渊",
              "age" : 1402,
              "marrid" : "true"
            }
          }
        ]
      }
    }
    View Code

     7.模糊查询字段中包含某些关键词

    [root@zhanggen zhanggen]# curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query": {"term": {"name":""}}}'

    返回

    {
      "took" : 155,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 2,
          "relation" : "eq"
        },
        "max_score" : 0.8161564,
        "hits" : [
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "cuWEWnIBWnQK6MVivzvO",
            "_score" : 0.8161564,
            "_source" : {
              "name" : "张根",
              "age" : 22,
              "marrid" : "false"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "c-WPWnIBWnQK6MViOzt1",
            "_score" : 0.7083998,
            "_source" : {
              "name" : "张百忍",
              "age" : 3200,
              "marrid" : "true"
            }
          }
        ]
      }
    }
    View Code

    8.range范围查找

    范围查询接收以下参数:

    • gte:大于等于
    • gt:大于
    • lte:小于等于
    • lt:小于
    • boost:设置查询的推动值(boost),默认为1.0
    curl -XGET 'localhost:9200/students/go/_search?pretty' -H 'content-Type:application/json' -d '{"query":{"range":{"age":{"gt":"18"}}}}'
    {
      "took" : 11,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 5,
          "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "cuWEWnIBWnQK6MVivzvO",
            "_score" : 1.0,
            "_source" : {
              "name" : "张根",
              "age" : 22,
              "marrid" : "false"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "c-WPWnIBWnQK6MViOzt1",
            "_score" : 1.0,
            "_source" : {
              "name" : "张百忍",
              "age" : 3200,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "dOWPWnIBWnQK6MVimTsg",
            "_score" : 1.0,
            "_source" : {
              "name" : "李渊",
              "age" : 1402,
              "marrid" : "true"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "deWQWnIBWnQK6MViazuu",
            "_score" : 1.0,
            "_source" : {
              "name" : "姜尚",
              "age" : 5903,
              "marrid" : "fale"
            }
          },
          {
            "_index" : "students",
            "_type" : "go",
            "_id" : "duWSWnIBWnQK6MViXDtD",
            "_score" : 1.0,
            "_source" : {
              "name" : "孛儿只斤.铁木真",
              "age" : 814,
              "marrid" : "true"
            }
          }
        ]
      }
    }
    返回

    安装Kibana

    kibana是针对elasticsearch操作及数据展示的工具,支持中文。安装时请确保kibama和Elasticsearch的版本一致。

    配置文件

    [root@zhanggen config]# cat ./kibana.yml|grep -Ev '^$|#'
    server.port: 5601
    server.host: "0.0.0.0"
    elasticsearch.hosts: ["http://localhost:9200"]
    elasticsearch.username: "kibana"
    elasticsearch.password: "xxxxxxx"
    i18n.locale: "zh-CN"
    [root@zhanggen config]# 

    启动

    [root@zhanggen bin]# ./kibana  --allow-root

    kibana使用

    管理-----》kibana索引模式-----》创建索引模式 

    ps:

    Elasticsearch果然是全文检索, 666。果然对用户输入的搜素关键词,进行了分词。我输入了(访问日志为例)把包含“问”的文档也搜素出来了!

    而不是仅仅搜素内容包含“访问日志为例”这个1个词的文档。

    Go操作Elasticsearch

    我们使用第三方库https://github.com/olivere/elastic来连接ES并进行操作。

    注意下载与你的ES相同版本的client,例如我们这里使用的ES是7.2.1的版本,那么我们下载的client也要与之对应为github.com/olivere/elastic/v7

    使用go.mod来管理依赖下载指定版本的第三库:

    module go相关模块/elasticsearch
    
    go 1.13
    
    require github.com/olivere/elastic/v7 v7.0.4
    

      

    代码 

    package main
    
    
    import (
    "context"
    "fmt"
    
    "github.com/olivere/elastic/v7"
    )
    
    // Elasticsearch demo
    
    type Person struct {
    	Name    string `json:"name"`
    	Age     int    `json:"age"`
    	Married bool   `json:"married"`
    }
    
    func main() {
    	client, err := elastic.NewClient(elastic.SetURL("http://192.168.56.135:9200/"))
    	if err != nil {
    		// Handle error
    		panic(err)
    	}
    
    	fmt.Println("connect to es success")
    	p1 := Person{Name: "曹操", Age: 155, Married: true}
    	put1, err := client.Index().
    		Index("students").Type("go").
    		BodyJson(p1).
    		Do(context.Background())
    	if err != nil {
    		// Handle error
    		panic(err)
    	}
    	fmt.Printf("Indexed user %s to index %s, type %s
    ", put1.Id, put1.Index, put1.Type)
    }
    

    kafka消息---->elasticsearch支持消息检索

    参考

  • 相关阅读:
    简明Python3教程 12.问题解决
    简明Python3教程 11.数据结构
    【SPOJ 694】Distinct Substrings
    【codeforces Manthan, Codefest 17 C】Helga Hufflepuff's Cup
    【CF Manthan, Codefest 17 B】Marvolo Gaunt's Ring
    【CF Manthan, Codefest 17 A】Tom Riddle's Diary
    【SPOJ 220】 PHRASES
    【POJ 3261】Milk Patterns
    【POJ 3294】Life Forms
    【POJ 1226】Substrings
  • 原文地址:https://www.cnblogs.com/sss4/p/12964422.html
Copyright © 2020-2023  润新知