In this article, I will take you through the steps to delete Elasticsearch Unassigned Shards. Elasticsearch stores data in the form of documents, which are grouped into an index. In the case of a huge amount of data, the number of documents in a single index may cross the limit of the underlying hardware capacity. For example, more than a trillion documents stored in a single index may need up to 100 GB of space, which it may not be possible to store in a single node.
As a solution to this problem, Elasticsearch provides a mechanism to break the index into multiple pieces, each can be considered a separate index and can be stored in multiple nodes. The pieces of an index are called shards. This will also improve search performance as the search can be performed simultaneously on multiple shards.
What is Index
An Index is a collection of document. It is also known as Logical partition of data or records in Elasticsearch. You can add/create any number of indices as possible.
What are Shards
An index is usually divided into number of shards in a distributed cluster nodes and usually acts as an smaller unit of Indexes.
Delete Elasticsearch Unassigned Shards
Also Read: Top 20 Elasticsearch API Query for Developers Part - 1
Step 1: Check Elasticsearch Cluster Health
First you need to check the cluster health using curl http://localhost:9200/_cluster/health?pretty
query. We need to specify the Port 9200
as the Elasticsearch Cluster is running on this port. From below output, you can check multiple useful information like Cluster Name, Cluster Status, Number of Nodes, Active Primary Shards, Active Shards, Relocating Shards, Active Shards Percentage etc. Currently Active Shard Percentage is showing 42.89%
and is not moving ahead. So we need to check now which all the Unassigned shards are still not allocated to the Cluster.
[root@localhost ~]# curl http://localhost:9200/_cluster/health?pretty
{
"cluster_name" : "test-cluster",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 3,
"active_primary_shards" : 189,
"active_shards" : 359,
"relocating_shards" : 0,
"initializing_shards" : 4,
"unassigned_shards" : 474,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 1,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 42.89127837514934
}
Please note that here I am using root
user to run all the below commands.You can use any user with sudo
access to run all these commands. For more information Please check Step by Step: How to Add User to Sudoers to provide sudo
access to the User.
Step 2: Check all Elasticsearch Unassigned Shards
Here you need to check all the unassigned shards using below curl query. You can check the name of the shards and its current state from below output. In this case I have waited for sometime and saw that cluster status is not moving ahead and Elasticsearch Unassigned shards are further not getting allocated to the cluster. Then only we ran below curl
query to remove all Elasticsearch Unassigned shards.
[root@localhost ~]# curl -XGET localhost:9200/_cat/shards?h=index,shards,state,prirep,unassigned.reason | grep UNASSIGNED
test-2017.05.16-1 UNASSIGNED r REPLICA_ADDED
test-2017.05.16-1 UNASSIGNED p NODE_LEFT
test-2017.05.16-1 UNASSIGNED r REPLICA_ADDED
test-2017.05.16-1 UNASSIGNED r REPLICA_ADDED
test-2017.05.08 UNASSIGNED r REPLICA_ADDED
test-2017.05.08 UNASSIGNED p NODE_LEFT
test-2017.05.08 UNASSIGNED r REPLICA_ADDED
test-2017.05.08 UNASSIGNED r REPLICA_ADDED
test-2017.05.08 UNASSIGNED r REPLICA_ADDED
test-2017.05.23 UNASSIGNED r INDEX_CREATED
test-2017.05.14-1 UNASSIGNED r REPLICA_ADDED
test-2017.05.14-1 UNASSIGNED p NODE_LEFT
test-2017.05.14-1 UNASSIGNED r REPLICA_ADDED
test-2017.05.14-1 UNASSIGNED r REPLICA_ADDED
test-2017.05.21 UNASSIGNED r INDEX_CREATED
test-2017.05.21 UNASSIGNED r INDEX_CREATED
test-2017.05.21 UNASSIGNED r NODE_LEFT
test-2017.05.09 UNASSIGNED r REPLICA_ADDED
test-2017.05.09-1 UNASSIGNED r REPLICA_ADDED
test-2017.05.09 UNASSIGNED p NODE_LEFT
test-2017.05.09 UNASSIGNED r REPLICA_ADDED
test-2017.05.09 UNASSIGNED r REPLICA_ADDED
test-2017.05.09 UNASSIGNED r REPLICA_ADDED
test-2017.02-10 UNASSIGNED r NODE_LEFT
test-2017.02-10 UNASSIGNED r NODE_LEFT
test-2017.05.22 UNASSIGNED r INDEX_CREATED
test-2017.05.22 UNASSIGNED r INDEX_CREATED
test-2017.05.22 UNASSIGNED r NODE_LEFT
test-2017.05.18-1 UNASSIGNED r REPLICA_ADDED
NOTE:
There are different causes due to which shards will not be in assigned state. More can be checked on Unassigned Shards Status.
- INDEX_CREATED: This state will show when API for creating an index introduces the problem.
- CLUSTER_RECOVERED: This state will show when full data restoration is performed for the cluster.
- INDEX_REOPENED: This state will show when an index is enabled or disabled.
- DANGLING_INDEX_IMPORTED: This state will show when result of dangling index is not imported.
- NEW_INDEX_RESTORED: This state will show when data is restored from the snapshot to a new index.
- EXISTING_INDEX_RESTORED: This state will show when data is restored from the snapshot to a disabled index.
- REPLICA_ADDED: This state will show when Replica shards are added explicitly.
- ALLOCATION_FAILED: This state will show when shard assignment fails.
- NODE_LEFT: This state will show when the node that carries the shard is located outside of the cluster.
- REINITIALIZED: This state will show when incorrect operations (such as use of the shadow replica shard) exist in the process from moving the shard to the shard initialization.
- REROUTE_CANCELLED: This state will show when the assignment is canceled because the routing is canceled explicitly.
- REALLOCATED_REPLICA: This indicates that a better replica location will be used, and the existing replica assignment is canceled. As a result, the shard is unassigned.
Step 3: Delete all Elasticsearch Unassigned Shards
You can use below curl query to delete all the unassigned shards. In this query, we are grepping all the UNASSIGNED
shards and feeding the output to awk
command to get the unassigned shards name. This name will be passed to xargs
command as an input which will be used by curl
command to delete all the unassigned shards.
[root@localhost ~]# curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk {'print $1'} | xargs -i curl -XDELETE "http://localhost:9200/{}"
Step 4: Check Cluster Health Status Again
Now you can again the cluster health and confirm if the cluster is going green or not. Here we can see that Active Shards Percentage is now showing 100%
and the cluster status went to Green
. Hence we can confirm that all the Shards are now allocated and cluster becomes active again.
[root@localhost ~]# curl http://localhost:9200/_cluster/health?pretty
{
"cluster_name" : "test-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4,
"active_shards" : 4,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}