62.修改分词器及手动创建分词器

62.修改分词器及手动创建分词器
主要知识点
- 修改分词器
- 手动创建分词器
一、修改分词器
1、默认的分词器standard，主要有以下四个功能
- standard tokenizer：以单词边界进行切分
- standard token filter：什么都不做
- lowercase token filter：将所有字母转换为小写
- stop token filer（默认被禁用）：移除停用词，比如a the it等等
2、修改分词器的设置
启用english的停用词token filter

PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"es_std": {
"type": "standard",
"stopwords": "_english_"
}
}
}
}
}
测试修改后的分词器
GET /my_index/_analyze
{
"analyzer": "standard",
"text": "a dog is in the house"
}

GET /my_index/_analyze
{
"analyzer": "es_std",
"text":"a dog is in the house"
}

二、定制化自己的分词器

PUT /my_index
{
"settings": {
"analysis": {
"char_filter": {
"&_to_and": {
"type": "mapping",
"mappings": ["&=> and"]
}
},
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": ["the", "a"]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": ["html_strip", "&_to_and"],
"tokenizer": "standard",
"filter": ["lowercase", "my_stopwords"]
}
}
}
}
}
测试手动创建的分词器
GET /my_index/_analyze
{
"text": "tom&jerry are a friend in the house, <a>, HAHA!!",
"analyzer": "my_analyzer"
}

PUT /my_index/_mapping/my_type
{
"properties": {
"content": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
相关阅读:
sentinel使用内置规则检测威胁——自定义规则是使用的KQL
在Azure Sentinel中使用威胁情报——可以自己订阅，自己创建一条indicator来使用基于情报的检测
 sm2国密算法的纯c语言版本，使用于单片机平台（静态内存分配）
JDK-8180048 : Interned string and symbol table leak memory during parallel unlinking
CMS垃圾收集器小实验之CMSInitiatingOccupancyFraction参数
 记spring boot线上项目内存优化
 springboot 配置log4j2日志，并输出到文件
 SpringBoot 日志管理之自定义Appender
Linux 上定时备份postgresql 数据库的方法
 linux下执行sh脚本，提示Command not found解决办法
原文地址：https://www.cnblogs.com/liuqianli/p/8475474.html