企业级搜索elasticsearch应用03-前置处理器

一。Ingest Node

IngestNode节点被用于对原始json数据预处理的节点该节点只需要在 elasticsearch.yml中添加 node.ingest: true 需要预处理的文档只需要

添加一个pipeline（管道）指定一系列的processor （处理器）所以管道被运行在ingest节点 pipeline管道包含若干个处理器源文档的处理可以

指定pipeline管道

在处理文档时指定管道的语法是

PUT 索引/类型/id?pipeline=指定管道
{
  "foo": "bar"
}

pipeline的维护
1》添加管道（参考https://www.elastic.co/guide/en/elasticsearch/reference/current/put-pipeline-api.html）
管道添加语法

PUT _ingest/pipeline/管道id
{
  "description" : "测试管道",
  "processors" : [
     指定多个处理器
  ]
}

这里演示一个set处理器在文档添加是 set一个字段（使用之前数据 http://blog.csdn.net/liaomin416100569/article/details/78727827）
比如在添加数据是添加一个状态字段 status:1

添加处理器

curl -XPUT '192.168.58.147:9200/_ingest/pipeline/myp' -d '
{
  "description" : "测试管道",
  "processors" : [
    {
      "set" : {
        "field": "status",
        "value": "1"
      }
    }
  ]
}';

尝试添加一个用户信息

curl -XPUT '192.168.58.147:9200/user/info/12?pipeline=myp' -d '	
{"country":"中国","provice":"广东省","city":"广州市","age":"89","name":"王冠宇","desc":"王冠宇是王五的儿子"}
';

查看该doc 发现多了一个status:1的字段

[root@node1 ~]# curl -XGET '192.168.58.147:9200/user/info/12?pretty'
{
  "_index" : "user",
  "_type" : "info",
  "_id" : "12",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "country" : "中国",
    "city" : "广州市",
    "provice" : "广东省",
    "name" : "王冠宇",
    "age" : "89",
    "desc" : "王冠宇是王五的儿子",
    "status" : "1"
  }
}

2》删除管道

curl -XDELETE '192.168.58.147:9200/_ingest/pipeline/myp'

修改参考官方文档

二。常用前置处理器（参考https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest-processors.html）

1》 set前置处理器（https://www.elastic.co/guide/en/elasticsearch/reference/current/set-processor.html）

在文档被处理前添加一个字段语法

{
  "set": {
    "field": "field1",
    "value": 582.1
  }
}

2》GROK处理器（https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html）
使用grok表达式匹配字段中特定规则的字符到对应json
grok表达式是正则更上一层的表达式 grok也支持正则的语法
语法格式是
%{SYNTAX:SEMANTIC}
SYNTAX表示匹配的表达式 SEMANTIC表示将匹配的值写入的字段名比如

3.44 55.3.244.1

匹配grok表达式

%{NUMBER:duration} %{IP:client}

NUMBER和IP都是内置的表达式名称默认存在直接使用举例比如管道定义

curl -XPUT '192.168.58.147:9200/_ingest/pipeline/myp1?pretty' -d '
{
  "description" : "grok测试",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"]
      }
    }
  ]
}';

测试数据

curl -XPUT '192.168.58.147:9200/test/test/1?pipeline=myp1&pretty' -d '	
{
  "message": "55.3.244.1 GET /index.html 15824 0.043"
}
';

查询结果

[root@node1 httpd]# curl -XGET '192.168.58.147:9200/test/test/1?pretty'
{
  "_index" : "test",
  "_type" : "test",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "duration" : "0.043",
    "request" : "/index.html",
    "method" : "GET",
    "bytes" : "15824",
    "client" : "55.3.244.1",
    "message" : "55.3.244.1 GET /index.html 15824 0.043"
  }
}

其他处理器参考官网

相关阅读:
python 全栈开发，Day75(Django与Ajax,文件上传,ajax发送json数据,基于Ajax的文件上传,SweetAlert插件)
python 全栈开发，Day73(django多表添加,基于对象的跨表查询)
python 全栈开发，Day72(昨日作业讲解,昨日内容回顾,Django多表创建)
bootstrap源码里的function加上了+号
 php get set方法深入理解
 20160519
NetStatusEvent info对象的状态或错误情况的属性
 php 代码大全
 常见HTTP状态（304，200等）
php strtotime 和 date 日期操作
原文地址：https://www.cnblogs.com/liaomin416100569/p/9331155.html