• How to Delete Elasticsearch Unassigned Shards in 4 Easy Steps


    In this article, I will take you through the steps to delete Elasticsearch Unassigned Shards. Elasticsearch stores data in the form of documents, which are grouped into an index. In the case of a huge amount of data, the number of documents in a single index may cross the limit of the underlying hardware capacity. For example, more than a trillion documents stored in a single index may need up to 100 GB of space, which it may not be possible to store in a single node.

    As a solution to this problem, Elasticsearch provides a mechanism to break the index into multiple pieces, each can be considered a separate index and can be stored in multiple nodes. The pieces of an index are called shards. This will also improve search performance as the search can be performed simultaneously on multiple shards.

    What is Index

    An Index is a collection of document. It is also known as Logical partition of data or records in Elasticsearch. You can add/create any number of indices as possible.

    What are Shards

    An index is usually divided into number of shards in a distributed cluster nodes and usually acts as an smaller unit of Indexes.

    Delete Elasticsearch Unassigned Shards

    Also Read: Top 20 Elasticsearch API Query for Developers Part - 1

    Step 1: Check Elasticsearch Cluster Health

    First you need to check the cluster health using curl  http://localhost:9200/_cluster/health?pretty query. We need to specify the Port 9200 as the Elasticsearch Cluster is running on this port. From below output, you can check multiple useful information like Cluster Name, Cluster Status, Number of Nodes, Active Primary Shards, Active Shards, Relocating Shards, Active Shards Percentage etc. Currently Active Shard Percentage is showing 42.89% and is not moving ahead. So we need to check now which all the Unassigned shards are still not allocated to the Cluster.

    [root@localhost ~]# curl http://localhost:9200/_cluster/health?pretty
    {
    "cluster_name" : "test-cluster",
    "status" : "red",
    "timed_out" : false,
    "number_of_nodes" : 6,
    "number_of_data_nodes" : 3,
    "active_primary_shards" : 189,
    "active_shards" : 359,
    "relocating_shards" : 0,
    "initializing_shards" : 4,
    "unassigned_shards" : 474,
    "delayed_unassigned_shards" : 0,
    "number_of_pending_tasks" : 1,
    "number_of_in_flight_fetch" : 0,
    "task_max_waiting_in_queue_millis" : 0,
    "active_shards_percent_as_number" : 42.89127837514934
    }

     

    Please note that here I am using root user to run all the below commands.You can use any user with sudo access to run all these commands. For more information Please check Step by Step: How to Add User to Sudoers to provide sudo access to the User.

    Step 2: Check all Elasticsearch Unassigned Shards

    Here you need to check all the unassigned shards using below curl query. You can check the name of the shards and its current state from below output. In this case I have waited for sometime and saw that cluster status is not moving ahead and Elasticsearch Unassigned shards are further not getting allocated to the cluster. Then only we ran below curl query to remove all Elasticsearch Unassigned shards.

    [root@localhost ~]# curl -XGET localhost:9200/_cat/shards?h=index,shards,state,prirep,unassigned.reason | grep UNASSIGNED
    test-2017.05.16-1 UNASSIGNED r REPLICA_ADDED
    test-2017.05.16-1 UNASSIGNED p NODE_LEFT
    test-2017.05.16-1 UNASSIGNED r REPLICA_ADDED
    test-2017.05.16-1 UNASSIGNED r REPLICA_ADDED
    test-2017.05.08   UNASSIGNED r REPLICA_ADDED
    test-2017.05.08   UNASSIGNED p NODE_LEFT
    test-2017.05.08   UNASSIGNED r REPLICA_ADDED
    test-2017.05.08   UNASSIGNED r REPLICA_ADDED
    test-2017.05.08   UNASSIGNED r REPLICA_ADDED
    test-2017.05.23   UNASSIGNED r INDEX_CREATED
    test-2017.05.14-1 UNASSIGNED r REPLICA_ADDED
    test-2017.05.14-1 UNASSIGNED p NODE_LEFT
    test-2017.05.14-1 UNASSIGNED r REPLICA_ADDED
    test-2017.05.14-1 UNASSIGNED r REPLICA_ADDED
    test-2017.05.21   UNASSIGNED r INDEX_CREATED
    test-2017.05.21   UNASSIGNED r INDEX_CREATED
    test-2017.05.21   UNASSIGNED r NODE_LEFT
    test-2017.05.09   UNASSIGNED r REPLICA_ADDED
    test-2017.05.09-1 UNASSIGNED r REPLICA_ADDED
    test-2017.05.09   UNASSIGNED p NODE_LEFT
    test-2017.05.09   UNASSIGNED r REPLICA_ADDED
    test-2017.05.09   UNASSIGNED r REPLICA_ADDED
    test-2017.05.09   UNASSIGNED r REPLICA_ADDED
    test-2017.02-10   UNASSIGNED r NODE_LEFT
    test-2017.02-10   UNASSIGNED r NODE_LEFT
    test-2017.05.22   UNASSIGNED r INDEX_CREATED
    test-2017.05.22   UNASSIGNED r INDEX_CREATED
    test-2017.05.22   UNASSIGNED r NODE_LEFT
    test-2017.05.18-1 UNASSIGNED r REPLICA_ADDED

    NOTE:

    Please note that if you see shards allocation to cluster is progressing then it is advisable to wait for some time before removing all UNASSIGNED state shards. Sometimes it might take a while to allocate all the shards.

    There are different causes due to which shards will not be in assigned state. More can be checked on Unassigned Shards Status.

    • INDEX_CREATED: This state will show when API for creating an index introduces the problem.
    • CLUSTER_RECOVERED: This state will show when full data restoration is performed for the cluster.
    • INDEX_REOPENED: This state will show when an index is enabled or disabled.
    • DANGLING_INDEX_IMPORTED: This state will show when result of dangling index is not imported.
    • NEW_INDEX_RESTORED: This state will show when data is restored from the snapshot to a new index.
    • EXISTING_INDEX_RESTORED: This state will show when data is restored from the snapshot to a disabled index.
    • REPLICA_ADDED: This state will show when Replica shards are added explicitly.
    • ALLOCATION_FAILED: This state will show when shard assignment fails.
    • NODE_LEFT: This state will show when the node that carries the shard is located outside of the cluster.
    • REINITIALIZED: This state will show when incorrect operations (such as use of the shadow replica shard) exist in the process from moving the shard to the shard initialization.
    • REROUTE_CANCELLED: This state will show when the assignment is canceled because the routing is canceled explicitly.
    • REALLOCATED_REPLICA: This indicates that a better replica location will be used, and the existing replica assignment is canceled. As a result, the shard is unassigned.

    Step 3: Delete all Elasticsearch Unassigned Shards

    You can use below curl query to delete all the unassigned shards. In this query, we are grepping all the UNASSIGNED shards and feeding the output to awk command to get the unassigned shards name. This name will be passed to xargs command as an input which will be used by curl command to delete all the unassigned shards.

    [root@localhost ~]# curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk {'print $1'} | xargs -i curl -XDELETE "http://localhost:9200/{}"

    Step 4: Check Cluster Health Status Again

    Now you can again the cluster health and confirm if the cluster is going green or not. Here we can see that Active Shards Percentage is now showing 100% and the cluster status went to Green. Hence we can confirm that all the Shards are now allocated and cluster becomes active again.

    [root@localhost ~]# curl http://localhost:9200/_cluster/health?pretty
    {
    "cluster_name" : "test-cluster",
    "status" : "green",
    "timed_out" : false,
    "number_of_nodes" : 2,
    "number_of_data_nodes" : 1,
    "active_primary_shards" : 4,
    "active_shards" : 4,
    "relocating_shards" : 0,
    "initializing_shards" : 0,
    "unassigned_shards" : 0,
    "delayed_unassigned_shards" : 0,
    "number_of_pending_tasks" : 0,
    "number_of_in_flight_fetch" : 0,
    "task_max_waiting_in_queue_millis" : 0,
    "active_shards_percent_as_number" : 100.0
    }
  • 相关阅读:
    论文解读(GMAE)《Graph Masked Autoencoders with Transformers》 Learner
    大数据面试题V3.0 Spark面试题(约9.8w字)
    大数据面试题V3.0 Hive面试题(约3.3w字)
    大数据面试题V3.0 HDFS部分
    大数据面试题V3.0 Zookeeper面试题
    大数据面试题V3.0 MapReduce部分
    大数据面试题V3.0 数仓面试题(约3.6w字)
    大数据面试题V3.0 Flink面试题
    大数据面试题V3.0 Flume面试题
    大数据面试题V3.0 数据库面试题
  • 原文地址:https://www.cnblogs.com/weifeng1463/p/14476209.html
Copyright © 2020-2023  润新知