• 103) ceph更换故障OSD/添加新OSD步骤


    1- 更换故障OSD

    1.1- 故障现象

    $ ceph health detail
    ......
    OSD_SCRUB_ERRORS 31 scrub errors
    PG_DAMAGED Possible data damage: 5 pgs inconsistent
        pg 41.33 is active+clean+inconsistent, acting [35,33]
        pg 41.42 is active+clean+inconsistent, acting [29,35]
        pg 51.24 is active+clean+inconsistent, acting [35,43]
        pg 51.77 is active+clean+inconsistent, acting [28,35]
        pg 51.7b is active+clean+inconsistent, acting [35,46]
    
    

    1.2- 临时解决办法

    执行 ceph pg repair 解决,此时由于磁盘坏道造成不可读的数据会拷贝到其他位置。但这不能从根本上解决问题,磁盘损坏会持续报出类似的错误。

    $ ceph pg repair 41.33
    $ ceph pg repair 41.42
    $ ceph pg repair 51.24
    $ ceph pg repair 51.77
    $ ceph pg repair 51.7b
    

    1.3- 获取磁盘错误信息

    • 定位磁盘:
    apt install -y smartmontools
    # yum install -y smartmontools
    
    smartctl -a /dev/sdc
    smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-121-generic] (local build)
    Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
     
    === START OF INFORMATION SECTION ===
    Device Model:     TOSHIBA MG04ACA600E
    Serial Number:    57J6KA41F6CD
    LU WWN Device Id: 5 000039 7cb9822be
    Firmware Version: FS1K
    User Capacity:    6,001,175,126,016 bytes [6.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    7200 rpm
    Form Factor:      3.5 inches
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   ATA8-ACS (minor revision not indicated)
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Tue Aug  7 14:46:45 2018 CST
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    

    1.4- 关闭ceph集群数据迁移:

    osd硬盘故障,状态变为down。在经过mod osd down out interval 设定的时间间隔后,ceph将其标记为out,并开始进行数据迁移恢复。为了降低ceph进行数据恢复或scrub等操作对性能的影响,可以先将其暂时关闭,待硬盘更换完成且osd恢复后再开启:

    1.5- 进入osd故障的节点,卸载osd挂载目录

    umount /var/lib/ceph/osd/ceph-5
    

    1.6- 从crush map 中移除osd

    ceph osd crush remove osd.5
    

    1.7- 删除故障osd的密钥

    ceph auth del osd.5
    

    1.8- 删除故障osd

    ceph osd rm 5
    

    1.9- 更换完新硬盘后,注意新硬盘的盘符,并创建osd

    ceph-deploy osd create --data /dev/sdd node3
    

    1.10- 待新osd添加crush map后,重新开启集群禁用标志

    for i in noout nobackfill norecover noscrub nodeep-scrub;do ceph osd unset $i;done
    

    1.11- ceph集群经过一段时间的数据迁移后,恢复active+clean状态

    2- 添加新OSD

    1. 选择一个osd节点,添加好新的硬盘
    2. 擦净节点磁盘
    ceph-deploy disk zap [node_name] /dev/sdb
    
    1. 准备Object Storage Daemon
    ceph-deploy osd prepare [node_name]:/var/lib/ceph/osd1
    
    1. 激活Object Storage Daemon
    ceph-deploy osd activate [node_name]:/var/lib/ceph/osd1
    

    3- 删除OSD

    1. 把 OSD 踢出集群
    ceph osd out osd.4
    
    1. 在相应的节点,停止ceph-osd服务
    systemctl stop ceph-osd@4.service
    systemctl disable ceph-osd@4.service
    
    1. 删除 CRUSH 图的对应 OSD 条目,它就不再接收数据了
    ceph osd crush remove osd.4
    
    1. 删除 OSD 认证密钥
    ceph auth del osd.4
    
    1. 删除osd.4
    ceph osd rm osd.4
    
  • 相关阅读:
    H50068:html页面清除缓存
    CSS0019: 样式中高度百分比无效时,这样写 height:calc(100%)
    H50067:body 背景颜色 背景图片 background 的 简写属性
    40、在last_update后面新增加一列名字为create_date
    39、使用强制索引查询
    38、针对actor表创建视图actor_name_view
    37、对first_name创建唯一索引uniq_idx_firstname,对last_name创建普通索引idx_lastname
    36、创建一个actor_name表,将actor表中的所有first_name以及last_name导入改表
    35、批量插入如下数据,不使用replace操作
    34、批量插入如下数据
  • 原文地址:https://www.cnblogs.com/lemanlai/p/13798948.html
Copyright © 2020-2023  润新知