• [Linux运维 -- 硬件]smartctl的使用


    [Linux运维 -- 硬件]smartctl的使用

    1. 是什么

    常用的磁盘检查工具,smart(Self-Monitoring,Analysis and Reporting Technology)

    2. 安装

    (1)ubuntu

    $ sudo apt-get install smartmontools
    

    (2)rhat & Centos

    $ yum install smartmontools
    

    3. 使用

    (1) 看磁盘是否支持smartctl

    $ sudo smartctl -i /dev/sda1 
    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
    Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Seagate Constellation ES (SATA 6Gb/s)
    Device Model:     ST1000NM0011
    Serial Number:    Z1N0EVRZ
    LU WWN Device Id: 5 000c50 03f123968
    Firmware Version: SN02
    User Capacity:    1,000,204,886,016 bytes [1.00 TB]
    Sector Size:      512 bytes logical/physical
    Rotation Rate:    7202 rpm
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ATA8-ACS T13/1699-D revision 4
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
    Local Time is:    Sun Aug 23 23:27:54 2015 CST
    SMART support is: Available - device has SMART capability.          
    SMART support is: Enabled
    

    最后两行给出了是否支持smartctl

    (2)手动开启支持smartctl

    $ smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda1
    

    各个参数意思如下:

    -s VALUE, --smart=VALUE
    Enable/disable SMART on device (on/off)

    -o VALUE, --offlineauto=VALUE (ATA)
    Enable/disable automatic offline testing on device (on/off)

    -S VALUE, --saveauto=VALUE (ATA)
    Enable/disable Attribute autosave on device (on/off)

    (3)检查磁盘的健康状况

    $ sudo smartctl -H /dev/sda1 
    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
    Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    

    (4)显示磁盘的属性值

    $ sudo smartctl -A /dev/sdl1
    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
    Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF READ SMART DATA SECTION ===
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   084   063   044    Pre-fail  Always       -       238687534
      3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       3
      5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       573183052
      9 Power_On_Hours          0x0032   063   063   000    Old_age   Always       -       33120
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       3
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
    189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0022   075   049   045    Old_age   Always       -       25 (Min/Max 20/30)
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
    193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       567
    194 Temperature_Celsius     0x0022   025   051   000    Old_age   Always       -       25 (0 20 0 0 0)
    195 Hardware_ECC_Recovered  0x001a   120   099   000    Old_age   Always       -       238687534
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       2
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       2
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    

    基本上,SMART属性表列出了制造商在硬盘中定义好的属性值,以及这些属性相关的故障阈值。这个表由驱动固件自动生成和更新。

    • ID: 属性值,通常是1到255之间的十进制数字
    • ATTRIBUTE_NAME:制造商定义的属性值
    • VALUE:这是表格中最重要的信息之一,代表给定属性的标准化值,在1到253之间。253意味着最好情况,1意味着最坏情况。取决于属性和制造商,初始化VALUE可以被设置成100或200.
    • FLAG:属性操作标志
    • THRESH: 在报告硬盘FAILED状态前,WORST可以允许的最小值
    • TYPE: 属性的类型(Pre-fail或Oldage)。Pre-fail类型的属性可被看成一个关键属性,表示参与磁盘的整体SMART健康评估(PASSED/FAILED)。如果任何Pre-fail类型的属性故障,那么可视为磁盘将要发生故障。另一方面,Oldage类型的属性可被看成一个非关键的属性(如正常的磁盘磨损),表示不会使磁盘本身发生故障。
    • UPDATED: 表示属性的更新频率。Offline代表磁盘上执行离线测试的时间。
    • WHEN_FAILED: 如果VALUE小于等于THRESH,会被设置成“FAILING_NOW”;如果WORST小于等于THRESH会被设置成“In_the_past”;如果都不是,会被设置成“-”。在“FAILING_NOW”情况下,需要尽快备份重要文件,特别是属性是Pre-fail类型时。“In_the_past”代表属性已经故障了,但在运行测试的时候没问题。“-”代表这个属性从没故障过。
    • RAW_VALUE: 制造商定义的原始值,从VALUE派生。

    (5)测试磁盘

    • short 测试
    $ sudo smartctl -t short /dev/sda
    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
    Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
    Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
    Testing has begun.
    Please wait 1 minutes for test to complete.
    Test will complete after Mon Aug 24 00:01:22 2015
    
    Use smartctl -X to abort test.
    
    • long测试
    $ sudo smartctl -t long /dev/sda
    
    • 看测试进度
    $ sudo smartctl -l selftest /dev/sda
    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
    Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF READ SMART DATA SECTION ===
    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Short offline       Completed without error       00%     33120         -
    
    • 停止测试
    $ sudo smartctl -X /dev/sda
    smartctl 6.2 2013-07-26 r3841 [x86_64-linux-2.6.18-164.11.1.el5] (local build)
    Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
    Sending command: "Abort SMART off-line mode self-test routine".
    Self-testing aborted!
    

    参考:

    (1) http://linux.cn/article-4682-1.html
    (2) http://xmodulo.com/check-hard-disk-health-linux-smartmontools.html
    (3) http://chaorenyong.blog.51cto.com/2163445/1051859
    (4) http://bbs.chinaunix.net/thread-4132241-1-1.html

  • 相关阅读:
    Python 解决: from pip import main ImportError: cannot import name 'main'
    tensorflow学习笔记
    python多线程、多进程相关知识
    灰度发布相关
    自定义flume的hbase sink 的序列化程序
    pyspark数据准备
    利用pipeline批量插入数据到redis
    CentOS Linux系统下更改Apache默认网站目录
    更改nginx网站根目录
    chkconfig用法
  • 原文地址:https://www.cnblogs.com/zk47/p/4753615.html
Copyright © 2020-2023  润新知