es分词器

1、默认的分词器

standard

standard tokenizer：以单词边界进行切分
standard token filter：什么都不做
lowercase token filter：将所有字母转换为小写
stop token filer（默认被禁用）：移除停用词，比如a the it等等

2、修改分词器的设置

启用english停用词token filter

PUT /my_index
{
　　"settings": {
　　　　"analysis": {
　　　　　　"analyzer": {
　　　　　　　　"es_std": {
　　　　　　　　　　"type": "standard",
　　　　　　　　　　"stopwords": "_english_"
　　　　　　　　}
　　　　　　}
　　　　}
　　}
}

GET /my_index/_analyze
{
　　"analyzer": "standard",
　　"text": "a dog is in the house"
}

GET /my_index/_analyze
{
　　"analyzer": "es_std",
　　"text":"a dog is in the house"
}

3、定制化自己的分词器

PUT /my_index
{
　　"settings": {
　　　　"analysis": {
　　　　　　"char_filter": {
　　　　　　　　"&_to_and": {
　　　　　　　　　　"type": "mapping",
　　　　　　　　　　"mappings": ["&=> and"]
　　　　　　　　}
　　　　　　},
　　　　　　"filter": {
　　　　　　　　"my_stopwords": {
　　　　　　　　　　　　"type": "stop",
　　　　　　　　　　　　"stopwords": ["the", "a"]
　　　　　　　　}
　　　　　　},
　　　　　　"analyzer": {
　　　　　　　　"my_analyzer": {
　　　　　　　　　　"type": "custom",
　　　　　　　　　　"char_filter": ["html_strip", "&_to_and"],
　　　　　　　　　　"tokenizer": "standard",
　　　　　　　　　　"filter": ["lowercase", "my_stopwords"]
　　　　　　　　}
　　　　　　}
　　　　}
　　}
}

GET /my_index/_analyze
{
　　"text": "tom&jerry are a friend in the house, <a>, HAHA!!",
　　"analyzer": "my_analyzer"
}

PUT /my_index/_mapping/my_type
{
　　"properties": {
　　　　"content": {
　　　　　　"type": "text",
　　　　　　"analyzer": "my_analyzer"
　　　　}
　　}
}

相关阅读:
poj 1037 三维dp
poj 3311 floyd+dfs或状态压缩dp 两种方法
HDU 5761 物理题
HDU 5752
Codeforces Round #328 (Div. 2) C 数学
cakephp中sql查询大于
cakephp获取最后一条sql语句
iconv()错误
sql时间戳转日期格式
接口报错

原文地址：https://www.cnblogs.com/qinjf/p/8546440.html