虽然Elasticsearch需要很少的配置,但是有一些设置需要手动配置,并且必须在进入生产之前进行配置。
1、官方文档
这些重要配置说明,请参考官方文档:
https://www.elastic.co/guide/en/elasticsearch/reference/7.x/important-settings.html
2、参数说明
2.1 path.data和path.logs
If you are using the .zip or .tar.gz archives, the data and logs directories are sub-folders of $ES_HOME. If these important folders are left in their default locations, there is a high risk of them being deleted while upgrading Elasticsearch to a new version.
如果您正在使用.zip或.tar.gz文件归档,data和logs 目录在 $ES_HOME 下。如果这些重要文件夹保留在默认位置,则Elasticsearch升级到新版本时,很有可能被删除。
In production use, you will almost certainly want to change the locations of the data and log folder:
在生产中使用,肯定要更改数据和日志文件夹的位置:
path: logs: /var/log/elasticsearch data: /var/data/elasticsearch
补充说明:在生产环境下,应用程序的数据和日志一般需要配置到独立的磁盘分区下。比如/data目录作为独立的数据分区,/var/log作为应用程序日志分区。这样做的好处是,防止因应用程序数据或日志增长,撑爆OS分区。
2.2 cluster.name
A node can only join a cluster when it shares its cluster.name with all the other nodes in the cluster. The default name is elasticsearch, but you should change it to an appropriate name which describes the purpose of the cluster.
某个节点只有和集群下的其他节点共享它的 cluster.name 才能加入一个集群。默认是elasticsearch,但是应该修改为更恰当的,用于描述集群目的的名称。
cluster.name: myes
Make sure that you don’t reuse the same cluster names in different environments, otherwise you might end up with nodes joining the wrong cluster.
一定要确保不要在不同的环境中使用相同的集群名称。否则,节点可能会加入错误的集群中。
2.3 node.name
Elasticsearch uses node.name as a human readable identifier for a particular instance of Elasticsearch so it is included in the response of many APIs. It defaults to the hostname that the machine has when Elasticsearch starts but can be configured explicitly in elasticsearch.yml as follows:
默认情况下,Elasticsearch 将使用随机生成的uuid的前7个字符作为节点id,请注意,节点ID是持久化的,并且在节点重新启动时不会更改,因此默认节点名称也不会更改。
也可以使用服务器的 HOSTNAME 作为节点的名称。
node.name: ${HOSTNAME}
2.4 network.host
By default, Elasticsearch binds to loopback addresses only — e.g. 127.0.0.1 and [::1]. This is sufficient to run a single development node on a server.
默认情况下,Elasticsearch 仅仅绑定回环地址,比如127.0.0.1 和[::1] 。这足以在服务器上运行单个开发节点。
In order to form a cluster with nodes on other servers, your node will need to bind to a non-loopback address. While there are many network settings, usually all you need to configure is network.host:
为了与其他服务器上的节点进行通信并形成集群,你的节点将需要绑定到非环回地址。虽然这里有很多网络相关的配置,但通常只需要配置一下 network.host
network.host: 192.168.60.101
As soon as you provide a custom setting for network.host, Elasticsearch assumes that you are moving from development mode to production mode, and upgrades a number of system startup checks from warnings to exceptions.
一旦自定义设置了 network.host ,Elasticsearch 会假定你正在从开发模式转移到生产模式,并将许多系统启动检查从警告升级到异常。
By default, Elasticsearch assumes that you are working in development mode. If any of the above settings are not configured correctly, a warning will be written to the log file, but you will be able to start and run your Elasticsearch node. As soon as you configure a network setting like network.host, Elasticsearch assumes that you are moving to production and will upgrade the above warnings to exceptions. These exceptions will prevent your Elasticsearch node from starting. This is an important safety measure to ensure that you will not lose data because of a malconfigured server.
默认情况下,Elasticsearch假定您正在开发模式下工作。 如果未正确配置上述任何设置,则会向日志文件写入警告,但您将能够启动并运行Elasticsearch节点。
一旦配置了network.host之类的网络设置,Elasticsearch就会假定您正在转向生产并将上述警告升级为异常。 这些异常将阻止您的Elasticsearch节点启动。 这是一项重要的安全措施,可确保您不会因服务器配置错误而丢失数据。
2.5 Discovery settings
There are two important discovery and cluster formation settings that should be configured before going to production so that nodes in the cluster can discover each other and elect a master node.
在开始生产之前,应该配置两个重要的discovery 和cluster 设置,以便群集中的节点可以相互发现并选择主节点。
discovery.seed_hosts
Out of the box, without any network configuration, Elasticsearch will bind to the available loopback addresses and will scan local ports 9300 to 9305 to try to connect to other nodes running on the same server. This provides an auto- clustering experience without having to do any configuration.
开箱即用,没有任何网络配置,Elasticsearch将绑定到可用的环回地址,并将扫描本地端口9300到9305以尝试连接到在同一服务器上运行的其他节点。 这提供了自动集群体验,无需进行任何配置。
When you want to form a cluster with nodes on other hosts, you must use the discovery.seed_hosts setting to provide a list of other nodes in the cluster that are master-eligible and likely to be live and contactable in order to seed the discovery process. This setting should normally contain the addresses of all the master-eligible nodes in the cluster. This setting contains either an array of hosts or a comma-delimited string. Each value should be in the form of host:port or host (where port defaults to the setting transport.profiles.default.port falling back to transport.port if not set). Note that IPv6 hosts must be bracketed. The default for this setting is 127.0.0.1, [::1].
如果要在其他主机上形成包含节点的群集,则必须使用discovery.seed_hosts设置提供群集中其他节点的列表,这些节点符合主要条件且可能是实时且可联系的,以便为发现过程设定种子。 此设置通常应包含群集中所有符合主节点的节点的地址。 此设置包含主机数组或逗号分隔的字符串。 每个值应采用host:port或host的形式(其中port默认为设置transport.profiles.default.port,如果未设置则返回transport.port)。 请注意,必须将IPv6主机置于括号内。 此设置的默认值为127.0.0.1,[:: 1]。
discovery.seed_hosts: - node1 - node2 - node3
The port will default to transport.profiles.default.port and fallback to transport.port if not specified.
如果未指定,端口将默认为transport.profiles.default.port并回退到transport.port。
If a hostname resolves to multiple IP addresses then the node will attempt to discover other nodes at all resolved addresses.
如果主机名解析为多个IP地址,则该节点将尝试发现所有已解析地址的其他节点。
cluster.initial_master_nodes
When you start a brand new Elasticsearch cluster for the very first time, there is a cluster bootstrapping step, which determines the set of master-eligible nodes whose votes are counted in the very first election. In development mode, with no discovery settings configured, this step is automatically performed by the nodes themselves. As this auto-bootstrapping is inherently unsafe, when you start a brand new cluster in production mode, you must explicitly list the names or IP addresses of the master-eligible nodes whose votes should be counted in the very first election. This list is set using the cluster.initial_master_nodes setting.
当您第一次启动全新的Elasticsearch集群时,会出现一个集群引导步骤,该步骤确定在第一次选举中计票的主要合格节点集。 在开发模式下,如果未配置发现设置,则此步骤由节点本身自动执行。 由于此自动引导本质上是不安全的,因此当您在生产模式下启动全新集群时,必须明确列出符合条件的节点的名称或IP地址,这些节点的投票应在第一次选举中计算。 使用cluster.initial_master_nodes设置设置此列表。
cluster.initial_master_nodes: - node1 - node2 - node3
Initial master nodes can be identified by their node.name, which defaults to the hostname. Make sure that the value in cluster.initial_master_nodes matches the node.name exactly. If you use a fully-qualified domain name such as master-node-a.example.com for your node names then you must use the fully-qualified name in this list; conversely if node.name is a bare hostname without any trailing qualifiers then you must also omit the trailing qualifiers in cluster.initial_master_nodes.
初始主节点可以通过其node.name来标识,该节点默认为主机名。 确保cluster.initial_master_nodes中的值与node.name完全匹配。 如果您使用完全限定的域名(例如master-node-a.example.com)作为节点名称,则必须在此列表中使用完全限定名称; 相反,如果node.name是一个没有任何尾随限定符的裸主机名,那么您还必须省略cluster.initial_master_nodes中的尾随限定符。
Initial master nodes can also be identified by their IP address.
初始主节点也可以通过其IP地址识别。
2.6 堆大小配置
By default, Elasticsearch tells the JVM to use a heap with a minimum and maximum size of 1 GB. When moving to production, it is important to configure heap size to ensure that Elasticsearch has enough heap available.
默认情况下,Elasticsearch告诉JVM使用最小和最大大小为1 GB的堆。 迁移到生产环境时,配置堆大小以确保Elasticsearch有足够的可用堆是很重要的。
Elasticsearch will assign the entire heap specified in jvm.options via the Xms (minimum heap size) and Xmx (maximum heap size) settings.
Elasticsearch将通过Xms(最小堆大小)和Xmx(最大堆大小)设置分配jvm.options中指定的整个堆。
The value for these setting depends on the amount of RAM available on your server. Good rules of thumb are:
这些设置的值取决于服务器上可用的RAM量。 好的经验法则是:
Set the minimum heap size (Xms) and maximum heap size (Xmx) to be equal to each other. 将最小堆大小(Xms)和最大堆大小(Xmx)设置为彼此相等。
The more heap available to Elasticsearch, the more memory it can use for caching. But note that too much heap can subject you to long garbage collection pauses. Elasticsearch可用的堆越多,它可用于缓存的内存就越多。 但请注意,过多的堆可能会使您陷入长时间的垃圾收集暂停。
Set Xmx to no more than 50% of your physical RAM, to ensure that there is enough physical RAM left for kernel file system caches. 将Xmx设置为不超过物理RAM的50%,以确保有足够的物理RAM留给内核文件系统缓存。
Don’t set Xmx to above the cutoff that the JVM uses for compressed object pointers (compressed oops); the exact cutoff varies but is near 32 GB. You can verify that you are under the limit by looking for a line in the logs like the following: 不要将Xmx设置为JVM用于压缩对象指针(压缩oops)的截止值以上; 确切的截止值变化但接近32 GB。
Here are examples of how to set the heap size via the jvm.options file:
以下是如何通过jvm.options文件设置堆大小的示例:
比如公司有一个elasticsearch集群,每个节点是8G内存,那么可以设置如下。
[root@elastic1 elasticsearch-7.0.1]# vi config/jvm.options
# Xms represents the initial size of total heap space # Xmx represents the maximum size of total heap space -Xms4g -Xmx4g