场景
MGR单写模式三节点,db46写节点,db47/db48为读节点
工具sysbencn、压测15个小时,db46上18线程纯写,12线程oltp混合测试,db48上12线程select
在压测2个小时后,两个从库就卡住不再写,前一天晚上起的脚本,第二天来发现的这个问题,然后开始尝试解决
日志信息
2019-03-01T12:35:23.921815+08:00 2763 [Note] Multi-threaded slave statistics for channel 'group_replication_applier': seconds elapsed = 568; events assigned = 185124865; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 2030997800 waited (count) when Workers occupied = 75103 waited when Workers occupied = 22073734900
2019-03-01T12:37:29.742741+08:00 2763 [Note] Multi-threaded slave statistics for channel
'group_replication_applier': seconds elapsed = 126;
events assigned = 185125889;
worker queues filled over overrun level = 0;
waited due a Worker queue full = 0;
waited due the total size = 0;
waited at clock conflicts = 2032822300 waited (count) when Workers occupied = 75114
waited when Workers occupied = 22075589000
读节点无法关闭集群,一直卡
mysql> stop group_replication;
卡的过程中报相关日志
2019-03-01T12:54:22.192891+08:00 16 [ERROR] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2019-03-01T12:54:22.192959+08:00 16 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
2019-03-01T12:54:22.193012+08:00 16 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'
2019-03-01T12:54:22.193109+08:00 0 [Warning] Plugin group_replication reported: 'read failed'
2019-03-01T12:54:22.193211+08:00 0 [Warning] Plugin group_replication reported: '[GCS] The member has left the group but the new view will not be installed, probably because it has not been delivered yet.'
2019-03-01T12:54:27.193393+08:00 16 [Note] Plugin group_replication reported: 'auto_increment_increment is reset to 1'
2019-03-01T12:54:27.193414+08:00 16 [Note] Plugin group_replication reported: 'auto_increment_offset is reset to 1'
2019-03-01T12:54:27.193598+08:00 164 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed
2019-03-01T12:54:27.194129+08:00 164 [Note] Slave SQL thread for channel 'group_replication_applier' exiting, replication stopped in log 'FIRST' at position 374
2019-03-01T12:54:27.194564+08:00 161 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
强制kill了读节点db48 mysql进程后,无法再次加入集群
mysql> start group_replication;
ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log.
mysql> set global group_replication_allow_local_disjoint_gtids_join=ON;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> start group_replication;
ERROR 3092 (HY000): The server is not configured properly to be an active member of the group. Please see more details on error log.
在现写节点db46上执行stop group_replication;依然是卡住, 过一段时间之后,写节点切换到另外一个节点db47上了,
此时db48正常关闭实例,再次启动后,尝试start group_replication;加入了集群,查看写节点为db47
原来的写节点db46已经不在集群中了
mysql> SELECT MEMBER_ID,MEMBER_HOST,MEMBER_PORT,MEMBER_STATE FROM performance_schema.replication_group_members;
+--------------------------------------+------------+-------------+--------------+
| MEMBER_ID | MEMBER_HOST| MEMBER_PORT | MEMBER_STATE |
+--------------------------------------+------------+-------------+--------------+
| 3b6d8d58-3b31-11e9-b581-6c92bfb23298 | db47 | 3301 | ONLINE |
| d29a16c7-3b2f-11e9-8c10-6c92bfb234c0 | db48 | 3301 | RECOVERING |
+--------------------------------------+------------+-------------+--------------+
重点是,写入db46的数据,由于复制进程卡住,还会同步到db47上吗,毕竟从状态上看db46已经不在集群中了
查看db47的数据文件,发现文件大小不变且小于db46上的文件,即没有再从db46上同步数据,
但db48是从db47上同步数据的,db47/db48都落后于db46,但db48落后的更多,所以db47成为写节点后,db48开始从db47追数据
数据丢失是业务不能接受的,宁可全部舍弃db47与db48,也不会舍弃db46,因为db46数据是全的;
至此,已经可以得到两个观点
1. 追加从库要直接加在写节点的下面,不要加在读或从节点的下面;因为读节点要挂了,之后的节点数据皆不全
2. MGR正常的切换是在保证节点数据一致或节点间复制、通信正常为前提的;如果正常命令无法执行,
比如无法stop group_replication,就代表着此时集群有问题,
在关闭之前,应该先记录一下出问题的位置,查看一下日志报错等
不要上来就强制关闭实例,更不要重置节点,做这些之前要先想好恢复的方案
下面优先恢复写节点,在原写节点db46上执行
set global group_replication_bootstrap_group=on;
start group_replication;
set global group_replication_bootstrap_group=off;
重置节点db47,这是原来的操作步骤,没有记录执行的位置就重置了,是错误的操作,
由于实际操作时是这么做的,就记录一下,后面对此做法的数据恢复也做了分析
mysql> show master statusG;
*************************** 1. row ***************************
File: mysql-bin.000047
Position: 270
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 3b6d8d58-3b31-11e9-b581-6c92bfb23298:1-583,
aaaaaaaa-bbba-ccca-ddda-aaaaaaaaa104:1-52150701,
c7a444f8-3b30-11e9-8336-6c92bfb2329c:1-49
1 row in set (0.00 sec)
ERROR:
No query specified
mysql> reset master;
Query OK, 0 rows affected (5.23 sec)
mysql> show master statusG;
*************************** 1. row ***************************
File: mysql-bin.000001
Position: 150
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
ERROR:
No query specified
mysql> start group_replication;
Query OK, 0 rows affected (7.58 sec)
mysql> show master statusG;
*************************** 1. row ***************************
File: mysql-bin.000001
Position: 1946
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: aaaaaaaa-bbba-ccca-ddda-aaaaaaaaa104:1-4,
c7a444f8-3b30-11e9-8336-6c92bfb2329c:1-2
1 row in set (0.00 sec)
ERROR:
No query specified
mysql> SELECT MEMBER_ID,MEMBER_HOST,MEMBER_PORT,MEMBER_STATE FROM performance_schema.replication_group_members; +--------------------------------------+------------+-------------+--------------+ | MEMBER_ID | MEMBER_HOST| MEMBER_PORT | MEMBER_STATE | +--------------------------------------+------------+-------------+--------------+ | 3b6d8d58-3b31-11e9-b581-6c92bfb23298 | db47 | 3301 | RECOVERING | | c7a444f8-3b30-11e9-8336-6c92bfb2329c | db46 | 3301 | ONLINE | +--------------------------------------+------------+-------------+--------------+
不指定恢复的起点时,MGR默认是从头开始恢复的,可以看到日志中MGR在尝试启动,但启动不起来,最后就放弃从开始的位置开始同步数据了
2019-03-01T14:14:10.430561+08:00 928 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='db46', master_port= 3301, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2019-03-01T14:14:10.432469+08:00 928 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'
使用db48以指定的执行位置开始恢复
mysql> show master statusG;
*************************** 1. row ***************************
File: mysql-bin.000045
Position: 59480248
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: aaaaaaaa-bbba-ccca-ddda-aaaaaaaaa104:1-49285172,
c7a444f8-3b30-11e9-8336-6c92bfb2329c:1-49
1 row in set (0.00 sec)
reset master;
SET @@GLOBAL.GTID_PURGED='aaaaaaaa-bbba-ccca-ddda-aaaaaaaaa104:1-49285172,c7a444f8-3b30-11e9-8336-6c92bfb2329c:1-49';
start group_replication;
对于db47,先执行了reset master,之前的日志清空,位置重新开始;之前db48是落后于db47的,
如果让db47从db48的位置开始同步,那么必定存在大量的重复数据,分以下情况讨论
1. insert,由于是日志方式的同步,insert同样的数据时,会提示主键同步,报1062错误;可在配置文件中跳过1062错误即可
2. update,相同的数据会再次update一遍,比如set a=1,a字段本来就是1,再执行一次set a = 1,a字段的值还是1,数据正确性没有被破坏
3. delete,删除一条数据,由于是重复执行,此时会报1032错误-删除的数据不存在,跳过后不影响数据的正确性
测试期间并未做DDL操作,预期db47 online恢复后,数据应该与db46一致,等待恢复结束后再验证
同步完成之后,写库db46测试表| 61272706 行记录,db47也是| 61272706 |条记录,行数与预期的一致
但行数一致并不真正代表每一行的数据是一致的,此时时间有限,不细验证这个了,
简单地认为追加了13个小时延迟数据,有insert也有delete,数据量在6千万的情况下还能保持行数一致,数据就是一致的
至此,这套库数据恢复;再提一下最开始日志的信息中有写
waited when Workers occupied = 22075589000
Workers被占用而导致的等待,解决方法为调大slave_pending_jobs_size_max参数
mysql> show variables like 'slave_pending_jobs_size_max'; +-----------------------------+----------+ | Variable_name | Value | +-----------------------------+----------+ | slave_pending_jobs_size_max | 16777216 | +-----------------------------+----------+ 1 row in set (0.00 sec)
服务器配置比较高,所以我设置了一个比较大的值
set global slave_pending_jobs_size_max=64424509440;
这是修改运行状态中的参数设置,重启实例就失效了,在配置文件中添加
slave_pending_jobs_size_max=64424509440
下面进行另外一个测试,把db48 从MGR集群中去掉,变成普通的从库挂载到db46进行压测
对比普通从库与MGR读节点的性能差异
MGR读节点db48转普通主从
stop group_replication;
CHANGE MASTER TO MASTER_HOST='db46',MASTER_PORT=3301,MASTER_USER='rpl_user',MASTER_PASSWORD='11111111',MASTER_AUTO_POSITION=1;
start slave;
show slave statusG;
结论:
现象1:MGR在压力大的情况下,两个读节点全挂了,写节点正常
结论1:MGR读节点不如写节点能抗压力,MGR集群在压力大的情况下读节点挂掉的概率高于写节点
现象2:同样的硬件环境下,MGR一写一读、一从,读的性能不如从高
结论2:MGR在高压力环境读节点比普通从库性能有损耗,损耗多少可能与硬件配置、压力的大小有关,要具体测试验证
措施:
1. MGR集群下再挂一些普通从库,作为线上运行数据库的灾备库使用;
2. MGR读节点的硬件配置、性能需求可以比写读高一些
3. 在业务允许延迟的系统中,MGR一写一读+两从库,两从对外提供服务,MGR的读不对外使用,作为灾备库,故障时自动切换; 不差机器可以采用 3(MGR)+2(从)的方式。
4. 侧重于高可用,可以采用3(mgr)+ 1(从),从作为灾备使用,三个MGR节点负责高可用