• ElasticSearch mapping+精确匹配与全文搜索+倒排索引+分词器的内部组成


    一.初识mapping

    (1)往es里面直接插入数据,es会自动建立索引,同时建立type以及对应的mapping
    (2)mapping中就自动定义了每个field的数据类型
    (3)不同的数据类型(比如说text和date),可能有的是exact value,有的是full text
    (4)exact value,在建立倒排索引的时候,分词的时候,是将整个值一起作为一个关键词建立到倒排索引中的;full text,会经历各种各样的处理,分词,normaliztion(时态转换,同义词转换,大小写转换),才会建立到倒排索引中
    (5)同时呢,exact value和full text类型的field就决定了,在一个搜索过来的时候,对exact value field或者是full text field进行搜索的行为也是不一样的,会跟建立倒排索引的行为保持一致;比如说exact value搜索的时候,就是直接按照整个值进行匹配,full text query string,也会进行分词和normalization再去倒排索引中去搜索
    (6)可以用es的dynamic mapping,让其自动建立mapping,包括自动设置数据类型;也可以提前手动创建index和type的mapping,自己对各个field进行设置,包括数据类型,包括索引行为,包括分词器,等等

    mapping,就是index的type的元数据,每个type都有一个自己的mapping,决定了数据类型,建立倒排索引的行为,还有进行搜索的行为

    二.mapping的核心数据类型以及dynamic mapping

    1、核心的数据类型

    string
    byte,short,integer,long
    float,double
    boolean
    date

    2、dynamic mapping

    true or false --> boolean
    123 --> long
    123.45 --> double
    2017-01-01 --> date
    "hello world" --> string/text

    3、查看mapping

    GET /index/_mapping/type

    三.手动建立和修改mapping以及定制string类型数据是否分词

    1、如何建立索引

    analyzed
    not_analyzed
    no

    2、修改mapping

    只能创建index时手动建立mapping,或者新增field mapping,但是不能update field mapping

    PUT /website
    {
    "mappings": {
    "article": {
    "properties": {
    "author_id": {
    "type": "long"
    },
    "title": {
    "type": "text",
    "analyzer": "english"
    },
    "content": {
    "type": "text"
    },
    "post_date": {
    "type": "date"
    },
    "publisher_id": {
    "type": "text",
    "index": "not_analyzed"
    }
    }
    }
    }
    }

    PUT /website
    {
    "mappings": {
    "article": {
    "properties": {
    "author_id": {
    "type": "text"
    }
    }
    }
    }
    }

    {
    "error": {
    "root_cause": [
    {
    "type": "index_already_exists_exception",
    "reason": "index [website/co1dgJ-uTYGBEEOOL8GsQQ] already exists",
    "index_uuid": "co1dgJ-uTYGBEEOOL8GsQQ",
    "index": "website"
    }
    ],
    "type": "index_already_exists_exception",
    "reason": "index [website/co1dgJ-uTYGBEEOOL8GsQQ] already exists",
    "index_uuid": "co1dgJ-uTYGBEEOOL8GsQQ",
    "index": "website"
    },
    "status": 400
    }

    PUT /website/_mapping/article
    {
    "properties" : {
    "new_field" : {
    "type" : "string",
    "index": "not_analyzed"
    }
    }
    }

    3、测试mapping

    GET /website/_analyze
    {
    "field": "content",
    "text": "my-dogs"
    }

    GET website/_analyze
    {
    "field": "new_field",
    "text": "my dogs"
    }

    {
    "error": {
    "root_cause": [
    {
    "type": "remote_transport_exception",
    "reason": "[4onsTYV][127.0.0.1:9300][indices:admin/analyze[s]]"
    }
    ],
    "type": "illegal_argument_exception",
    "reason": "Can't process field [new_field], Analysis requests are only supported on tokenized fields"
    },
    "status": 400
    }

    四.mapping复杂数据类型以及object类型数据底层结构

    1、multivalue field

    { "tags": [ "tag1", "tag2" ]}

    建立索引时与string是一样的,数据类型不能混

    2、empty field

    null,[],[null]

    3、object field

    PUT /company/employee/1
    {
    "address": {
    "country": "china",
    "province": "guangdong",
    "city": "guangzhou"
    },
    "name": "jack",
    "age": 27,
    "join_date": "2017-01-01"
    }

    address:object类型

    {
    "company": {
    "mappings": {
    "employee": {
    "properties": {
    "address": {
    "properties": {
    "city": {
    "type": "text",
    "fields": {
    "keyword": {
    "type": "keyword",
    "ignore_above": 256
    }
    }
    },
    "country": {
    "type": "text",
    "fields": {
    "keyword": {
    "type": "keyword",
    "ignore_above": 256
    }
    }
    },
    "province": {
    "type": "text",
    "fields": {
    "keyword": {
    "type": "keyword",
    "ignore_above": 256
    }
    }
    }
    }
    },
    "age": {
    "type": "long"
    },
    "join_date": {
    "type": "date"
    },
    "name": {
    "type": "text",
    "fields": {
    "keyword": {
    "type": "keyword",
    "ignore_above": 256
    }
    }
    }
    }
    }
    }
    }
    }

    {
    "address": {
    "country": "china",
    "province": "guangdong",
    "city": "guangzhou"
    },
    "name": "jack",
    "age": 27,
    "join_date": "2017-01-01"
    }

    {
    "name": [jack],
    "age": [27],
    "join_date": [2017-01-01],
    "address.country": [china],
    "address.province": [guangdong],
    "address.city": [guangzhou]
    }

    {
    "authors": [
    { "age": 26, "name": "Jack White"},
    { "age": 55, "name": "Tom Jones"},
    { "age": 39, "name": "Kitty Smith"}
    ]
    }

    {
    "authors.age": [26, 55, 39],
    "authors.name": [jack, white, tom, jones, kitty, smith]
    }

  • 相关阅读:
    c3p0配置
    0624软件工程的回顾和总结
    0619学习进度条
    MySQL中wait_timeout的坑
    js/jquery禁止页面回退
    jquery打印页面(jquery.jqprint)
    input file multiple 批量上传文件
    Python学习笔记——Python Number(数字)
    正则表达式
    Python学习笔记(三)——条件语句、循环语句
  • 原文地址:https://www.cnblogs.com/Transkai/p/11277098.html
Copyright © 2020-2023  润新知