• pcie ssd相关问题处理


    一、前言

    1、背景

    基于高性能计算场景,我们常常需要用到一些高性能的SSD作为缓存加速,譬如pcie ssd等,本文主要记录在使用pcie ssd作为ceph osd时遇到的一些问题及对应处理方法

    2、硬件说明

    2.1、Shannon Direct-IO G3i 1600GB

    [root@node113 redhat7]# shannon-status -a
    Found Shannon PCIE Flash card /dev/scta:
    
    Basic Information:
    Control Device Node:        /dev/scta
    Driver Mode:                Block
    Block Device Node:            /dev/dfa
    Device State:                Attached
    Access Mode:                ReadWrite
    Product Model:                Direct-IO G3i 1600GB
    Serial Number:                SH17705K7320343
    Part Number:                MT29F512G08CMCCB
    UDID:                               1CB00275-1CB00032-AB17705E-73203430
    PCI VendorID:                1CB0
    PCI DeviceID:                0275
    PCI Bus Address:            04:00:0
    PCI Link Speed:                pcie 2.0 x 8 
    Firmware Version:            3.3
    Firmware Build:                3688e1ff
    Driver Version:                3.2.2.10
    FPGA Reconfig Support:              Yes
    Logical Sector:                512
    Physical Sector:            4096
    Disk Capacity:                1600.00 GB
    Physical Capacity:            2115.52 GB
    Overprovision:                24.37%
    Max Write Band                0 MB/s
    Atomic Write:                       Disabled
    Prioritize Write:                   Disabled

    2.2、环境说明

    [root@node113 ~]# cat /etc/redhat-release 
    CentOS Linux release 7.6.1810 (Core) 
    [root@node113 ~]# uname -a
    Linux node113 4.14.113 #1 SMP Thu Jul 30 14:55:45 CST 2020 x86_64 x86_64 x86_64 GNU/Linux

    3、问题处理

    3.1、Shannon Direct-IO G3i 1600GB

    3.1.1、系统无法识别到pcie ssd

    • 问题说明:
      服务器安装宝存pcie ssd之后,系统层无法识别硬盘
    • 原因分析:
      使用宝存shannon pcie ssd必须安装驱动程序shannon-module,因Linux驱动程序shannon-module与Linux内核版本关联性很大,官方readhat7系统只提供了3.10.x内核版本的驱动rpm包,而测试环境为4.14.113内核版本,故需要重新编译驱动程序RPM包
    • 处理措施:

    1、从Shannon_Linux_Driver_Package_3.2.2.10下载Linux驱动程序
    2、参照用户手册,执行以下步骤进行RPM编译安装及内核模块加载

    [root@node119 ~]# tar -zxvf Shannon_Linux_Driver_Package_3.2.2.10.tar.gz 
    [root@node119 ~]# cd Shannon_Linux_Driver_Package_3.2.2.10
    [root@node119 Shannon_Linux_Driver_Package_3.2.2.10]# cd redhat7/
    [root@node119 redhat7]# rpmbuild --rebuild shannon-module-3.2.2-10.src.rpm 
    Wrote: /root/rpmbuild/RPMS/x86_64/shannon-module-4.14.113-3.2.2-10.x86_64.rpm
    [root@node112 redhat7]# cd /root/rpmbuild/RPMS/x86_64/
    [root@node112 x86_64]# rpm -ivh shannon-module-4.14.113-3.2.2-10.x86_64.rpm 
    [root@node112 x86_64]# modprobe shannon

    3.1.2、无法使用pcie ssd创建lvm,导致添加osd失败

    • 问题说明:
      使用pcie ssd作为osd使用时,创建lvm失败,导致添加osd失败
    • 原因分析:
      手动对pcie ssd创建pv,创建失败,报错信息如下,找不到dfa设备
    [root@node111 ~]# pvcreate /dev/dfa
    Device /dev/dfa not found (or ignored by filtering
    • 处理措施:

    1、查看shannon块设备的主设备号

    [root@node111 ~]# cat /proc/devices | grep shannon
    252 shannon

    2、修改lvm.conf,添加pcie ssd信息到types字段内

    [root@node111 ~]# cat /etc/lvm/lvm.conf | grep types
        # Configuration option devices/types.
        # List of additional acceptable block device types.
        types = [ "shannon", 252 ]

    3.1.3、系统重启后,使用pcie ssd的osd无法自动启动

    • 问题说明:*
      使用pcie ssd作为osd加入到ceph集群使用,节点重启后,普通磁盘的osd可以正常启动,使用pcie ssd的osd无法启动成功
    • 原因分析:

    1、当节点重启后,使用systemctl restart ceph-volume@lvm-{osd-id}-{osd-fsid}命令可正常启动osd,初步怀疑可能是osd启动和shannon模块加载顺序问题(osd启动时,shannon驱动未加载,导致找不到磁盘无法启动osd)

    [root@node111 ~]# ceph-volume lvm list
    ====== osd.41 ======
      [block]    /dev/ceph-5a07b4d3-cc9a-4d4c-a29b-877c3b5d875e/osd-block-ad380cf6-774f-4f36-8328-f5f388b9740f
    
    
    
          type                      block
          osd id                    41
          cluster fsid              469729e5-af75-4c18-a58e-28ebe3690e4c
          cluster name              ceph
          osd fsid                  ad380cf6-774f-4f36-8328-f5f388b9740f
          encrypted                 0
          cephx lockbox secret      
          block uuid                Mzoqvr-StiC-8FeI-qUrl-KRBB-onKM-tf7x9l
          block device              /dev/ceph-5a07b4d3-cc9a-4d4c-a29b-877c3b5d875e/osd-block-ad380cf6-774f-4f36-8328-f5f388b9740f
          vdo                       0
          crush device class        None
          devices                   /dev/dfa13
    [root@node111 ~]# systemctl restart ceph-volume@lvm-41-ad380cf6-774f-4f36-8328-f5f388b9740f

    2、查看系统日志/var/log/messages系统启动打印,osd启动优先于shannon驱动程序加载,证实推测1无误

    Oct 13 09:35:28 node111 systemd: Started LSB: Starts and stops the generic storage target daemon.
    Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.18.
    Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.1.
    Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.36.
    Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.24.
    Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.12.
    Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.15.
    Oct 13 09:35:28 node111 systemd: Started Ceph object storage daemon osd.6.
    
    
    Oct 13 09:35:44 node111 kernel: <3>shn_info: scta: readwrite, readonly_reason= 0, reduced_write_reason= 0.
    Oct 13 09:35:44 node111 kernel: shn_dbg: shannon_init_gendisk(): disk_name=dfa, major=252, minors=64, first_minor=0.
    Oct 13 09:35:44 node111 kernel: shn_dbg: shannon_init_gendisk(): disk_name=dfa, major=252, minors=64, first_minor=0.
    Oct 13 09:35:44 node111 kernel: shn_dbg: shannon_init_gendisk(): disk_name=dfa, major=252, minors=64, first_minor=0.
    Oct 13 09:35:44 node111 kernel: dfa: dfa1 dfa2 dfa3 dfa4 dfa5 dfa6 dfa7 dfa8 dfa9 dfa10 dfa11 dfa12 dfa13
    Oct 13 09:35:44 node111 kernel: dfa: dfa1 dfa2 dfa3 dfa4 dfa5 dfa6 dfa7 dfa8 dfa9 dfa10 dfa11 dfa12 dfa13
    Oct 13 09:35:44 node111 kernel: <3>shn_info: Attached Direct-IO PCIe Flash /dev/scta as block device /dev/dfa:
    Oct 13 09:35:44 node111 kernel: <3>shn_info: Attached Direct-IO PCIe Flash /dev/scta as block device /dev/dfa:
    Oct 13 09:35:44 node111 kernel: <3>shn_info: sector size: logical 512 / physical 4096, capacity: 1600 GB, overprovision: 24.37%.
    Oct 13 09:35:44 node111 kernel: <3>shn_info: sector size: logical 512 / physical 4096, capacity: 1600 GB, overprovision: 24.37%.
    Oct 13 09:35:44 node111 kernel: <3>shn_info: scta: readwrite, readonly_reason= 0, reduced_write_reason= 0.
    Oct 13 09:35:44 node111 kernel: <3>shn_info: scta: readwrite, readonly_reason= 0, reduced_write_reason= 0.
    Oct 13 09:35:44 node111 kernel: <3>shn_info: Probed Direct-IO PCIe Flash /dev/scta: model: Direct-IO G3i 1600GB, sn: SH17705K7320327
    Oct 13 09:35:44 node111 kernel: <3>shn_info: Probed Direct-IO PCIe Flash /dev/scta: model: Direct-IO G3i 1600GB, sn: SH17705K7320327
    • 处理措施:
      1、手动添加脚本如下,添加到内核模块加载程序内,重启时,将优先加载shannon模块,而后再启动osd服务
    [root@node111 ~]# cat /etc/sysconfig/modules/shannon.modules 
    #!/bin/bash
    /sbin/modinfo -F filename shannon > /dev/null 2>&1
    
    if [ $? -eq 0 ]; then
        /sbin/modprobe shannon
    fi
    [root@node111 ~]# ll /etc/sysconfig/modules/
    total 4
    -rwxr-xr-x 1 root root 116 Oct 13 10:54 shannon.modules
  • 相关阅读:
    MySQL的存储引擎
    MySQL的索引及执行计划
    MySQL的SQL基础应用
    MySQL基础入门
    代码质量检测SonarQube
    Jenkins持续集成
    Git版本控制及gitlab私有仓库
    jumpserver跳板机
    Keepalived高可用服务
    well-known file is not secure
  • 原文地址:https://www.cnblogs.com/luxf0/p/13900470.html
Copyright © 2020-2023  润新知