原理简介:
在MySQL5.5之前,MySQL的复制其实都是异步复制(见下图),主库和从库的数据之间存在一定的延迟,这样存在一个隐患:当在主库上写入一个事务并提交成功,而从库尚未得到主库推送的BinLog日志时,恰好主库宕机了,例如主库可能因磁盘损坏、内存故障等造成主库上该事务Binlog丢失,此时从库就可能损失这个事务,从而造成主从不一致。
为了解决这个问题,从MySQL5.5开始引入了半同步复制机制(Semi_synchronous Replication)。为了保证主库上的每一个Binlog事务都能够被可靠的复制到从库上,主库在每次事务成功提交时,并不及时反馈给前端用户,而是等待其中一个从库也接受到Binlog事务并成功写入中继日志后,主库才返回Commit操作成功给客户端。半同步复制保证了事务成功提交后,至少有两份日志记录,一份在主库的Binlog日志上,另一份在至少一个从库的中继日志Relay Log上,从而更进一步保证了数据的完整性。半同步复制的大致流程如下图:
半同步复制模式下,假如在上图步骤①②③中任何一个步骤中主库宕机,则事务并未提交成功,从库上也没收到事务对应的Binlog日志,所以主从数据是一致的;假如在步骤④传送Binlog日志到从库时,从库宕机或者网络故障,导致Binlog并没有及时地传送到从库上,此时主库上的事务会等待一段时间(时间长短由参数rpl_semi_sync_master_timeout设置的毫秒数决定),如果Binlog在这段时间内都无法成功推送到从库上,则MySQL自动调整复制为异步复制,事务正常返回提交结果给客户端。
半同步复制很大程度上取决于主从库之间的网络情况,往返时延RTT(Round-Trip Time)越小决定了从库的实时性越好。通俗地说,主从库之间的网络越快,从库越实时。
测试环境:
安装步骤:
半同步复制是以插件形式来实现的,安装比较简单,在异步复制的环境上(这里已经安装好传统的异步复制),安装半同步插件即可,也可在新建时写入my.cnf,参考文章尾部;
1. 查看MySQL服务器是否支持动态增加插件
mysql> select @@have_dynamic_loading; +------------------------+ | @@have_dynamic_loading | +------------------------+ | YES | //YES表示支持 +------------------------+ 1 row in set (0.00 sec)
2. 确认支持动态插件后,检查安装目录是否存在所需插件,一般在mysql安装目录中的一个.../plugin/目录下,可以搜索一下:
[root@server-10 ~]# find / -name semisync_*.so /usr/lib64/mysql/plugin/debug/semisync_master.so /usr/lib64/mysql/plugin/debug/semisync_slave.so /usr/lib64/mysql/plugin/semisync_master.so /usr/lib64/mysql/plugin/semisync_slave.so
3. 在主库上安装插件semisync_master.so
mysql> install plugin rpl_semi_sync_master SONAME 'semisync_master.so'; Query OK, 0 rows affected (0.03 sec)
4. 在从库上安装插件semisync_master.so
mysql> install plugin rpl_semi_sync_slave SONAME 'semisync_slave.so'; Query OK, 0 rows affected (0.00 sec)
插件安装完成后,从plugin表中能够看到刚才安装的插件(这里在主库上查看一下):
mysql> select * from mysql.plugin; +----------------------+--------------------+ | name | dl | +----------------------+--------------------+ | rpl_semi_sync_master | semisync_master.so | +----------------------+--------------------+ 1 row in set (0.00 sec)
也就是说,安装完成后,MySQL会在系统表plugin中记录刚才安装的插件,下次系统重启后会自动加载插件。
5. 分别在主库和从库上配置参数打开半同步semi-sync,默认半同步设置是不打开的。
在主库上配置全局参数:
mysql> show variables like 'rpl_semi_sync_master%'; +------------------------------------+-------+ | Variable_name | Value | +------------------------------------+-------+ | rpl_semi_sync_master_enabled | OFF | | rpl_semi_sync_master_timeout | 10000 | | rpl_semi_sync_master_trace_level | 32 | | rpl_semi_sync_master_wait_no_slave | ON | +------------------------------------+-------+ 4 rows in set (0.00 sec) mysql> set global rpl_semi_sync_master_enabled = 1; Query OK, 0 rows affected (0.00 sec) mysql> set global rpl_semi_sync_master_timeout = 20000; Query OK, 0 rows affected (0.00 sec) mysql> show variables like 'rpl_semi_sync_master%'; +------------------------------------+-------+ | Variable_name | Value | +------------------------------------+-------+ | rpl_semi_sync_master_enabled | ON | | rpl_semi_sync_master_timeout | 20000 | | rpl_semi_sync_master_trace_level | 32 | | rpl_semi_sync_master_wait_no_slave | ON | +------------------------------------+-------+ 4 rows in set (0.00 sec)
在从库上配置参数:
mysql> set global rpl_semi_sync_slave_enabled = 1; Query OK, 0 rows affected (0.00 sec)
6. 由于之前配置的是传统的异步复制,所以需要重启一下从库上的I/O线程(如果是全新配置的半同步复制则不需要,后面会提到全新配置):
mysql> stop slave io_thread; Query OK, 0 rows affected (0.05 sec) mysql> start slave io_thread; Query OK, 0 rows affected (0.00 sec)
到此,半同步复制配置完成,下面可以来验证一下。
实际测试:
1. 先查看当前主库上半同步复制的一些状态值:
mysql> show status like 'Rpl_semi_sync%'; +--------------------------------------------+-------+ | Variable_name | Value | +--------------------------------------------+-------+ | Rpl_semi_sync_master_clients | 1 | | Rpl_semi_sync_master_net_avg_wait_time | 0 | | Rpl_semi_sync_master_net_wait_time | 0 | | Rpl_semi_sync_master_net_waits | 0 | | Rpl_semi_sync_master_no_times | 0 | | Rpl_semi_sync_master_no_tx | 0 | //留意该值,后面测试会有变化 | Rpl_semi_sync_master_status | ON | //留意该值,后面测试会有变化 | Rpl_semi_sync_master_timefunc_failures | 0 | | Rpl_semi_sync_master_tx_avg_wait_time | 0 | | Rpl_semi_sync_master_tx_wait_time | 0 | | Rpl_semi_sync_master_tx_waits | 0 | | Rpl_semi_sync_master_wait_pos_backtraverse | 0 | | Rpl_semi_sync_master_wait_sessions | 0 | | Rpl_semi_sync_master_yes_tx | 0 | //留意该值,后面测试会有变化 +--------------------------------------------+-------+ 14 rows in set (0.00 sec)
注意:环境不一样,可能显示也不一样,着重关注以下3个状态值的变化,而不是上面这些初始值。
Rpl_semi_sync_master_status :值为ON,表示半同步复制目前处于打开状态。
Rpl_semi_sync_master_yes_tx:值为0,表示主库当前尚未有任何一个事务是通过半同步复制到从库。
Rpl_semi_sync_master_no_tx:值为0,表示当前有0个事务不是半同步模式下从库及时响应的。
在主库上执行一个事务,然后再检查一下状态:
mysql> use mydb; Database changed mysql> show tables; Empty set (0.00 sec) mysql> CREATE TABLE customers -> ( -> cust_id int NOT NULL AUTO_INCREMENT, -> cust_name char(50) NOT NULL , -> cust_address char(50) NULL , -> cust_city char(50) NULL , -> cust_state char(5) NULL , -> cust_zip char(10) NULL , -> cust_country char(50) NULL , -> cust_contact char(50) NULL , -> cust_email char(255) NULL , -> PRIMARY KEY (cust_id) -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.10 sec) mysql> show tables; +----------------+ | Tables_in_mydb | +----------------+ | customers | +----------------+ 1 row in set (0.00 sec) mysql> show status like 'Rpl_semi_sync%'; +--------------------------------------------+-------+ | Variable_name | Value | +--------------------------------------------+-------+ | Rpl_semi_sync_master_clients | 1 | | Rpl_semi_sync_master_net_avg_wait_time | 316 | | Rpl_semi_sync_master_net_wait_time | 316 | | Rpl_semi_sync_master_net_waits | 1 | | Rpl_semi_sync_master_no_times | 0 | | Rpl_semi_sync_master_no_tx | 0 | | Rpl_semi_sync_master_status | ON | | Rpl_semi_sync_master_timefunc_failures | 0 | | Rpl_semi_sync_master_tx_avg_wait_time | 488 | | Rpl_semi_sync_master_tx_wait_time | 488 | | Rpl_semi_sync_master_tx_waits | 1 | | Rpl_semi_sync_master_wait_pos_backtraverse | 0 | | Rpl_semi_sync_master_wait_sessions | 0 | | Rpl_semi_sync_master_yes_tx | 1 | +--------------------------------------------+-------+ 14 rows in set (0.00 sec)
此时会发现Rpl_semi_sync_master_yes_tx的值变为1,即刚才的CREATE事务通过半同步复制到从库上了,Rpl_semi_sync_master_yes_tx计数增加1。
到从库确认一下,新建的customers表确实被复制过去了:
2. 接下来模仿网络异常的场景下,主库在等待 rpl_semi_sync_master_timeout毫秒超时后,自动转成异步复制的场景。
在主库上确认半同步复制会等待20s超时:
mysql> show variables like 'rpl_semi_sync_master_timeout'; +------------------------------+-------+ | Variable_name | Value | +------------------------------+-------+ | rpl_semi_sync_master_timeout | 20000 | //单位:ms +------------------------------+-------+ 1 row in set (0.00 sec)
在从库上通过iptables命令模拟从库宕机或者网络故障:
[root@server-11 ~]# iptables -A INPUT -s 138.138.82.10 -j DROP
在主库上执行一个事务并提交(默认提交即可),主库上的提交操作会被阻塞20秒:
mysql> INSERT INTO customers(cust_id, cust_name, cust_address, cust_city, cust_state, cust_zip, cust_country, cust_contact, cust_email) -> VALUES(10001, 'Coyote Inc.', '200 Maple Lane', 'Detroit', 'MI', '44444', 'USA', 'Y Lee', 'ylee@coyote.com'); Query OK, 1 row affected (20.05 sec) //回车后,会卡主(阻塞)20秒,然后才会跳出Query OK...这行,并显示用时20秒
在这个20秒阻塞过程中,新开一个窗口检查当前主库的线程,会发现提交操作在等待从库上半同步复制操作的响应:
mysql> show processlistG
......
*************************** 3. row *************************** Id: 12 User: root Host: localhost db: mydb Command: Query Time: 3 State: Waiting for semi-sync ACK from slave //阻塞,等待从库确认 Info: INSERT INTO customers(cust_id, cust_name, cust_address, cust_city, cust_state, cust_zip, cust_countr 3 rows in set (0.00 sec)
阻塞结束后,再次查看半同步复制的一些状态值:
mysql> show status like 'Rpl_semi_sync%'; +--------------------------------------------+----------+ | Variable_name | Value | +--------------------------------------------+----------+ | Rpl_semi_sync_master_clients | 1 | | Rpl_semi_sync_master_net_avg_wait_time | 15011119 | | Rpl_semi_sync_master_net_wait_time | 30022238 | | Rpl_semi_sync_master_net_waits | 2 | | Rpl_semi_sync_master_no_times | 1 | | Rpl_semi_sync_master_no_tx | 1 |//该值更新为1,表示在半同步复制模式下,从库没有及时响应的事务增加1个 | Rpl_semi_sync_master_status | OFF |//表示主库上半同步复制已经关闭了 | Rpl_semi_sync_master_timefunc_failures | 0 | | Rpl_semi_sync_master_tx_avg_wait_time | 488 | | Rpl_semi_sync_master_tx_wait_time | 488 | | Rpl_semi_sync_master_tx_waits | 1 | | Rpl_semi_sync_master_wait_pos_backtraverse | 0 | | Rpl_semi_sync_master_wait_sessions | 0 | | Rpl_semi_sync_master_yes_tx | 1 |//该值仍然为1,表示刚才的事务并不是通过半同步复制完成的,所以半同步成功事务仍然为1个 +--------------------------------------------+----------+ 14 rows in set (0.00 sec)
继续测试:如果从库正常连接上主库之后,主库是否会自动切换回半同步复制模式呢?
那么把之前从库上面的iptables限制条目去除:
[root@server-11 ~]# iptables -F [root@server-11 ~]# iptables -nL //查看一下确实没了 Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
然后在从库上查看slave状态:
mysql> show slave statusG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 138.138.82.10 Master_User: repl_user Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000010 Read_Master_Log_Pos: 988 Relay_Log_File: relay-bin.000016 Relay_Log_Pos: 704 Relay_Master_Log_File: master-bin.000010 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 988 Relay_Log_Space: 5483 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 10 Master_UUID: 8086bac0-a428-11e8-8bf9-00505691656b Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec) mysql> select * from customers; //之前在主库插入时阻塞20.05秒的条目也复制过来了 +---------+-------------+----------------+-----------+------------+----------+--------------+--------------+-----------------+ | cust_id | cust_name | cust_address | cust_city | cust_state | cust_zip | cust_country | cust_contact | cust_email | +---------+-------------+----------------+-----------+------------+----------+--------------+--------------+-----------------+ | 10001 | Coyote Inc. | 200 Maple Lane | Detroit | MI | 44444 | USA | Y Lee | ylee@coyote.com | +---------+-------------+----------------+-----------+------------+----------+--------------+--------------+-----------------+ 1 row in set (0.00 sec)
以上显示说明在网络状态恢复后(去掉iptables),从库会自动尝试连接主库,几秒钟后I/O线程状态从Connecting变成了YES,并且主库和从库的数据一致了。
再次查看主库上半同步复制的状态值:
mysql> show status like 'Rpl_semi_sync%'; +--------------------------------------------+----------+ | Variable_name | Value | +--------------------------------------------+----------+ | Rpl_semi_sync_master_clients | 1 | | Rpl_semi_sync_master_net_avg_wait_time | 10033583 | | Rpl_semi_sync_master_net_wait_time | 30100750 | | Rpl_semi_sync_master_net_waits | 3 | | Rpl_semi_sync_master_no_times | 1 | | Rpl_semi_sync_master_no_tx | 1 | | Rpl_semi_sync_master_status | ON | | Rpl_semi_sync_master_timefunc_failures | 0 | | Rpl_semi_sync_master_tx_avg_wait_time | 488 | | Rpl_semi_sync_master_tx_wait_time | 488 | | Rpl_semi_sync_master_tx_waits | 1 | | Rpl_semi_sync_master_wait_pos_backtraverse | 0 | | Rpl_semi_sync_master_wait_sessions | 0 | | Rpl_semi_sync_master_yes_tx | 1 | +--------------------------------------------+----------+ 14 rows in set (0.00 sec)
以上发现Rpl_semi_sync_master_status的值自动从OFF变成ON,说明在检测到从库正常之后,主库到从库的复制方式会自动切换为半同步复制模式。
我们继续在主库上做一个INSERT事务测试,确认当前的复制模式确实是半同步复制:
mysql> INSERT INTO customers(cust_id, cust_name, cust_address, cust_city, cust_state, cust_zip, cust_country, cust_contact) -> VALUES(10002, 'Mouse House', '333 Fromage Lane', 'Columbus', 'OH', '43333', 'USA', 'Jerry Mouse'); Query OK, 1 row affected (0.07 sec) mysql> show status like 'Rpl_semi_sync%'; +--------------------------------------------+----------+ | Variable_name | Value | +--------------------------------------------+----------+ | Rpl_semi_sync_master_clients | 1 | | Rpl_semi_sync_master_net_avg_wait_time | 7525296 | | Rpl_semi_sync_master_net_wait_time | 30101185 | | Rpl_semi_sync_master_net_waits | 4 | | Rpl_semi_sync_master_no_times | 1 | | Rpl_semi_sync_master_no_tx | 1 | | Rpl_semi_sync_master_status | ON | | Rpl_semi_sync_master_timefunc_failures | 0 | | Rpl_semi_sync_master_tx_avg_wait_time | 517 | | Rpl_semi_sync_master_tx_wait_time | 1034 | | Rpl_semi_sync_master_tx_waits | 2 | | Rpl_semi_sync_master_wait_pos_backtraverse | 0 | | Rpl_semi_sync_master_wait_sessions | 0 | | Rpl_semi_sync_master_yes_tx | 2 | //计数增加了1个,变为2 +--------------------------------------------+----------+ 14 rows in set (0.00 sec)
可以看出,以上的一个INSERT事务提交后,Rpl_semi_sync_master_yes_tx 值从1变成2,确认了刚才事务的复制事半同步复制。
测试结束;
小结:
从半同步复制的流程会发现,半同步复制的“半”就体现在:虽然主库和从库的Binlog日志时同步的,但是主库并不等待从库应用这部分日志就返回提交结果,这部分操作是异步的,从库的数据并不是和主库实时同步的,所以只能成为半同步,而不是完全的实时同步。
补充:
通过配置文件添加半同步插件和参数,操作如下:
在/etc/my.cnf中添加以下参数
plugin_load = "rpl_semi_sync_master=semisync_master.so;rpl_semi_sync_slave=semisync_slave.so"
rpl_semi_sync_master_enabled = 1
rpl_semi_sync_slave_enabled = 1
rpl_semi_sync_master_timeout = 20000
结束.