今天磁盘链路故障,登陆系统后发现数据库已经宕机了,日志记录是IO error
操作系统日志
[11640320.581749] sd 12:0:0:2: [sdi] Write Protect is off
[11640320.581751] sd 12:0:0:2: [sdi] Mode Sense: 6b 00 00 08
[11640320.582015] sd 12:0:0:2: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11640320.582126] sd 12:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16).
[11640320.591970] sdi: sdi1
[11640350.851141] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:0 -- 1 2002.
[11640350.852099] sd 12:0:0:0: [sdd] Attached SCSI disk
[11640351.851404] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:0 -- 1 2002.
[11640351.851604] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:1 -- 1 2002.
[11640351.851796] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:2 -- 1 2002.
[11640351.857535] sdh: unknown partition table
[11640351.857882] sd 12:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16).
[11640351.858692] sd 12:0:0:1: [sdh] Attached SCSI disk
[11640381.939887] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:2 -- 1 2002.
[11640381.940712] sd 12:0:0:2: [sdi] Attached SCSI disk
[11640390.879505] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:2 -- 1 2002.
[11640421.856372] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:2 -- 1 2002.
[11643411.613773] EXT4-fs (dm-2): error count since last fsck: 9
[11643411.613778] EXT4-fs (dm-2): initial error at time 1511349539: ext4_writepages:2414
[11643411.613781] EXT4-fs (dm-2): last error at time 1511349539: ext4_journal_check_start:56
[11646750.394697] rport-12:0-0: blocked FC remote port time out: removing target and saving binding
[11646750.396193] sd 12:0:0:0: [sdd] Synchronizing SCSI cache
[11646750.396215] sd 12:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[11646750.411178] sd 12:0:0:1: [sdh] Synchronizing SCSI cache
[11646750.411194] sd 12:0:0:1: [sdh] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[11646750.412116] sd 12:0:0:2: [sdi] Synchronizing SCSI cache
[11646750.412129] sd 12:0:0:2: [sdi] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[11646767.708015] scsi 12:0:0:0: Direct-Access HITACHI OPEN-V 8301 PQ: 0 ANSI: 3
[11646767.708735] sd 12:0:0:0: Attached scsi generic sg4 type 0
[11646767.709158] sd 12:0:0:0: [sdd] 209715200 512-byte logical blocks: (107 GB/100 GiB)
[11646767.709436] sd 12:0:0:0: [sdd] Write Protect is off
[11646767.709439] sd 12:0:0:0: [sdd] Mode Sense: 6b 00 00 08
[11646767.709641] sd 12:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11646767.710477] scsi 12:0:0:1: Direct-Access HITACHI OPEN-V 8301 PQ: 0 ANSI: 3
[11646767.711081] sdd: unknown partition table
[11646767.711311] sd 12:0:0:1: Attached scsi generic sg5 type 0
[11646767.711792] sd 12:0:0:0: [sdd] Attached SCSI disk
[11646767.711819] scsi 12:0:0:2: Direct-Access HITACHI OPEN-V 8301 PQ: 0 ANSI: 3
[11646767.712207] sd 12:0:0:1: [sde] Very big device. Trying to use READ CAPACITY(16).
[11646767.712345] sd 12:0:0:1: [sde] 4294967296 512-byte logical blocks: (2.19 TB/2.00 TiB)
[11646767.712620] sd 12:0:0:2: Attached scsi generic sg7 type 0
[11646767.712724] sd 12:0:0:1: [sde] Write Protect is off
[11646767.712726] sd 12:0:0:1: [sde] Mode Sense: 6b 00 00 08
[11646767.712858] sd 12:0:0:2: [sdg] 1048576000 512-byte logical blocks: (536 GB/500 GiB)
[11646767.712952] sd 12:0:0:2: [sdg] Write Protect is off
[11646767.712954] sd 12:0:0:2: [sdg] Mode Sense: 6b 00 00 08
[11646767.713040] sd 12:0:0:1: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11646767.713179] sd 12:0:0:2: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11646767.713546] sd 12:0:0:1: [sde] Very big device. Trying to use READ CAPACITY(16).
[11646767.714450] sde: unknown partition table
[11646767.714800] sd 12:0:0:1: [sde] Very big device. Trying to use READ CAPACITY(16).
[11646767.715479] sd 12:0:0:1: [sde] Attached SCSI disk
[11646767.721519] sdg: sdg1
[11646767.722228] sd 12:0:0:2: [sdg] Attached SCSI disk
[11646797.741521] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:0 -- 1 2002.
[11646828.766379] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:0 -- 1 2002.
[11654134.042047] rport-12:0-0: blocked FC remote port time out: removing target and saving binding
[11654134.042715] sd 12:0:0:0: [sdd] Synchronizing SCSI cache
[11654134.042735] sd 12:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[11654134.056655] sd 12:0:0:1: [sde] Synchronizing SCSI cache
[11654134.056686] sd 12:0:0:1: [sde] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[11654134.058829] sd 12:0:0:2: [sdg] Synchronizing SCSI cache
[11654134.058841] sd 12:0:0:2: [sdg] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[11654269.600683] scsi 12:0:0:0: Direct-Access HITACHI OPEN-V 8301 PQ: 0 ANSI: 3
[11654269.601393] sd 12:0:0:0: Attached scsi generic sg4 type 0
[11654269.601821] sd 12:0:0:0: [sdd] 209715200 512-byte logical blocks: (107 GB/100 GiB)
[11654269.601913] sd 12:0:0:0: [sdd] Write Protect is off
[11654269.601916] sd 12:0:0:0: [sdd] Mode Sense: 6b 00 00 08
[11654269.602185] sd 12:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11654269.603154] scsi 12:0:0:1: Direct-Access HITACHI OPEN-V 8301 PQ: 0 ANSI: 3
[11654269.603706] sdd: unknown partition table
[11654269.603891] sd 12:0:0:1: Attached scsi generic sg5 type 0
[11654269.604352] sd 12:0:0:0: [sdd] Attached SCSI disk
[11654269.604379] scsi 12:0:0:2: Direct-Access HITACHI OPEN-V 8301 PQ: 0 ANSI: 3
[11654269.605063] sd 12:0:0:2: Attached scsi generic sg7 type 0
[11654269.605339] sd 12:0:0:2: [sdi] 1048576000 512-byte logical blocks: (536 GB/500 GiB)
[11654269.605452] sd 12:0:0:2: [sdi] Write Protect is off
[11654269.605453] sd 12:0:0:2: [sdi] Mode Sense: 6b 00 00 08
[11654269.605653] sd 12:0:0:2: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11654269.614459] sdi: sdi1
[11654269.615149] sd 12:0:0:2: [sdi] Attached SCSI disk
[11654300.115015] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:0 -- 1 2002.
[11654301.115185] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:1 -- 1 2002.
[11654301.115487] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:2 -- 1 2002.
[11654302.117764] sd 12:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16).
[11654302.117994] sd 12:0:0:1: [sdh] 4294967296 512-byte logical blocks: (2.19 TB/2.00 TiB)
[11654302.118219] sd 12:0:0:1: [sdh] Write Protect is off
[11654302.118222] sd 12:0:0:1: [sdh] Mode Sense: 6b 00 00 08
[11654302.118611] sd 12:0:0:1: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11654302.119111] sd 12:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16).
[11654302.120087] sdh: unknown partition table
[11654302.120377] sd 12:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16).
[11654302.121181] sd 12:0:0:1: [sdh] Attached SCSI disk
[11654333.122856] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:2 -- 1 2002.
[11655264.634648] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:1 -- 1 2002.
[11655295.595455] qla2xxx [0000:82:00.0]-801c:12: Abort command issued nexus=12:0:2 -- 1 2002.
[11729876.334629] EXT4-fs (dm-2): error count since last fsck: 9
[11729876.334635] EXT4-fs (dm-2): initial error at time 1511349539: ext4_writepages:2414
[11729876.334638] EXT4-fs (dm-2): last error at time 1511349539: ext4_journal_check_start:56
[11816341.072955] EXT4-fs (dm-2): error count since last fsck: 9
[11816341.072960] EXT4-fs (dm-2): initial error at time 1511349539: ext4_writepages:2414
里面有fs,hba卡,system的报错,定位链路问题,告诉存储的大神。。。。
喝完咖啡后,重启系统,问题解决。
tips:由于用的是日立的存储,建议用新版的多路径软件,规划好磁盘分完区后,多重启几遍系统,刷新配置后看日志,没有问题才投入生产环境