gh—ost是github的dba开源一款使用go语言开发的MySQL在线改表工具,解决了目前采用pt-online-schema-change遇到的一些问题,思路也很新颖;作者很厉害,也是是openark kit工具集的作者(主要是用python写的一套工具集)。
为什么叫gh-ost而不是叫gh-osc,创造了一个新的缩写,gh-ost可以发音为ghost,t可以解读为Transmogrifier/Translator/Transformer/Transfigurator。
相关链接:
原理
抛弃了pt-online-schema-change使用trigger来同步增量数据的方法,而通过模拟slave获取row格式的binlog的方式来获取增量数据,具体的数据流图可以看下图。
使用tirgger的方式有很多缺点,作者在triggerless-design里面做了详尽的阐述,对与我们在工作中遇到的具体问题主要是MDL锁的问题、死锁的问题,这两个问题在使用pt-online-schema-change的时候没有办法绕过去。我们线上在使用pt-osc的时候遇到过多起因为MDL锁和死锁导致的问题,具体的DML锁和死锁问题在其他文章里面做详尽的阐述。
因为trigger的方式,trigger触发的语句和被触发的语句在一个事务中;所以当时看到这种方式的第一反应就是会不会存在时序的问题(binlog和copy数据产生的时序问题);经过和朋友的讨论以及推理之后发现不会存在问题。
因为binlog中记录的是full image,所以binlog中的数据是最权威的,而且读取的binlog在应用的时候做了如下转化,而且copy old data是insert ignore,因此会以binlog的优先级为最高,因此不会有问题。
源类型 | 目标类型 |
---|---|
insert | replace |
update | update |
delete | delete |
对与insert和update是没有问题的,因为无论copy old row和apply binlog的先后顺序,如果apply binlog在后,会覆盖掉copy old row,如果apply binlog在前面,copy old row因为使用insert ignore,因此会被ignore掉;
对与delete数据,我们可以演算一下,abc三个操作,可能存在三种情况(b肯定在a的后面):
- a.delete old row
- b.delete binlog apply
- c.copy old row
- cab,c会将数据copy到ghost表,最后b会把ghost表中的数据delete掉;
- acb,c空操作,b也是空操作;
- abc,b空操作,c也是空操作;
详细过程
一、通过count(*)来获取这个表的准确数据量,或者explain来获取大概值,用于计算进度;通过exact-rowcount参数来决定,默认方式是通过explain。
--方法1:通过explain来获取大概的数据
explain select /* gh-ost */ * from %s.%s where 1=1;
--方法2:通过count(*)获取一个准确值
select /* gh-ost */ count(*) as rows from %s.%s;
二、通过order by分别获取最大值和最小值范围
SELECT /* gh-ost `test`.`test` */ `id`
FROM `test`.`test`
ORDER BY `id` ASC
LIMIT 1;
SELECT /* gh-ost `test`.`test` */ `id`
FROM `test`.`test`
ORDER BY `id` DESC
LIMIT 1;
三、通过一个下面的sql不断的探测是否还有数据需要copy
SELECT /* gh-ost `test`.`faredb_detail` iteration:25 */ `faredb_detail_id`
FROM (SELECT `faredb_detail_id`
FROM `test`.`faredb_detail`
WHERE (( `faredb_detail_id` > 118887 ))
AND ( ( `faredb_detail_id` < '105070846' )
OR (( `faredb_detail_id` = '105070846' )) )
ORDER BY `faredb_detail_id` ASC
LIMIT 1000) select_osc_chunk
ORDER BY `faredb_detail_id` DESC
LIMIT 1;
四、通过下面的SQL Copy原始表的数据
insert ignore into new_table select from old_table force index (`PRIMARY`) where lock in share mode
五、通过go/binlog/gomysql_reader.go读取binlog,然后通过go/logic/applier.go将读取的binlog进行转化写入到ghost表中;
通过buildDMLEventQuery可以看到binlog的转化细节;
源类型 | 目标类型 |
---|---|
insert | replace |
update | update |
delete | delete |
六、cut-over阶段(即rename表阶段),作者连续写了三篇blog来描述,比较复杂,有兴趣的可以阅读,1.Solving the Facebook-OSC non-atomic table swap problem;2.Solving the non-atomic table swap, Take II;3.Solving the non-atomic table swap, Take III: making it atomic;
六、详细可以参照下面的日志
2016-08-12 12:00:31 INFO starting gh-ost 1.0.8
2016-08-12 12:00:31 INFO Migrating `test`.`sbtest`
2016-08-12 12:00:31 INFO connection validated on test02:5002
2016-08-12 12:00:31 INFO User has ALL privileges
2016-08-12 12:00:31 INFO binary logs validated on test02:5002
2016-08-12 12:00:31 INFO Restarting replication on test02:5002 to make sure binlog settings apply to replication thread
2016-08-12 12:00:31 INFO Table found. Engine=InnoDB
2016-08-12 12:00:31 INFO Estimated number of rows via EXPLAIN: 9936
2016-08-12 12:00:31 INFO Master found to be test01:5002
2016-08-12 12:00:31 INFO connection validated on test02:5002
2016-08-12 12:00:31 INFO Registering replica at test02:5002
2016-08-12 12:00:31 INFO Connecting binlog streamer at mysql-bin.000023:954323927
2016-08-12 12:00:31 INFO rotate to next log name: mysql-bin.000023
2016-08-12 12:00:31 INFO connection validated on test01:5002
2016-08-12 12:00:31 INFO connection validated on test01:5002
2016-08-12 12:00:31 INFO Droppping table `test`.`_sbtest_gho`
2016-08-12 12:00:31 INFO Table dropped
2016-08-12 12:00:31 INFO Droppping table `test`.`_sbtest_ghc`
2016-08-12 12:00:31 INFO Table dropped
2016-08-12 12:00:31 INFO Creating changelog table `test`.`_sbtest_ghc`
2016-08-12 12:00:32 INFO Changelog table created
2016-08-12 12:00:32 INFO Creating ghost table `test`.`_sbtest_gho`
2016-08-12 12:00:32 INFO Ghost table created
2016-08-12 12:00:32 INFO Altering ghost table `test`.`_sbtest_gho`
2016-08-12 12:00:32 INFO Ghost table altered
2016-08-12 12:00:32 INFO Chosen shared unique key is PRIMARY
2016-08-12 12:00:32 INFO Shared columns are id,k,c,pad
2016-08-12 12:00:32 INFO Listening on unix socket file: /tmp/gh-ost.test.sbtest.sock
2016-08-12 12:00:32 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2016-08-12 12:00:32 INFO Exact number of rows via COUNT: 10000
2016-08-12 12:00:32 INFO Migration min values: [1]
2016-08-12 12:00:32 INFO Migration max values: [10000]
# Migrating `test`.`sbtest`; Ghost table is `test`.`_sbtest_gho`
# Migrating test01.dx.sankuai.com:5002; inspecting test02.dx.sankuai.com:5002; executing on dx-dba01.dx.sankuai.com
# Migration started at Fri Aug 12 12:00:31 +0800 2016
# chunk-size: 1000; max-lag-millis: 1500ms; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.test.sbtest.sock
Copy: 0/10000 0.0%; Applied: 0; Backlog: 0/100; Time: 0s(total), 0s(copy); streamer: mysql-bin.000023:954326679; ETA: N/A
Copy: 0/10000 0.0%; Applied: 0; Backlog: 0/100; Time: 1s(total), 1s(copy); streamer: mysql-bin.000023:954328041; ETA: N/A
2016-08-12 12:00:33 INFO Row copy complete
Copy: 10000/10000 100.0%; Applied: 0; Backlog: 0/100; Time: 1s(total), 1s(copy); streamer: mysql-bin.000023:954831040; ETA: 0s
2016-08-12 12:00:33 INFO Grabbing voluntary lock: gh-ost.84092.lock
2016-08-12 12:00:33 INFO Setting LOCK timeout as 6 seconds
2016-08-12 12:00:33 INFO Looking for magic cut-over table
2016-08-12 12:00:33 INFO Creating magic cut-over table `test`.`_sbtest_del`
2016-08-12 12:00:33 INFO Magic cut-over table created
2016-08-12 12:00:33 INFO Locking `test`.`sbtest`, `test`.`_sbtest_del`
2016-08-12 12:00:33 INFO Tables locked
2016-08-12 12:00:33 INFO Session locking original & magic tables is 84092
2016-08-12 12:00:33 INFO Writing changelog state: AllEventsUpToLockProcessed
2016-08-12 12:00:33 INFO Waiting for events up to lock
Copy: 10000/10000 100.0%; Applied: 0; Backlog: 1/100; Time: 2s(total), 1s(copy); streamer: mysql-bin.000023:954959331; ETA: 0s
2016-08-12 12:00:34 INFO Done waiting for events up to lock; duration=971.748469ms
# Migrating `test`.`sbtest`; Ghost table is `test`.`_sbtest_gho`
# Migrating test01.dx.sankuai.com:5002; inspecting test02.dx.sankuai.com:5002; executing on dx-dba01.dx.sankuai.com
# Migration started at Fri Aug 12 12:00:31 +0800 2016
# chunk-size: 1000; max-lag-millis: 1500ms; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
# throttle-additional-flag-file: /tmp/gh-ost.throttle
# postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
# panic-flag-file: /tmp/ghost.panic.flag
# Serving on unix socket: /tmp/gh-ost.test.sbtest.sock
Copy: 10000/10000 100.0%; Applied: 0; Backlog: 0/100; Time: 2s(total), 1s(copy); streamer: mysql-bin.000023:954960101; ETA: 0s
2016-08-12 12:00:34 INFO Setting RENAME timeout as 3 seconds
2016-08-12 12:00:34 INFO Session renaming tables is 84090
2016-08-12 12:00:34 INFO Issuing and expecting this to block: rename /* gh-ost */ table `test`.`sbtest` to `test`.`_sbtest_del`, `test`.`_sbtest_gho` to `test`.`sbtest`
2016-08-12 12:00:34 INFO Found atomic RENAME to be blocking, as expected. Double checking the lock is still in place (though I don't strictly have to)
2016-08-12 12:00:34 INFO Checking session lock: gh-ost.84092.lock
2016-08-12 12:00:34 INFO Connection holding lock on original table still exists
2016-08-12 12:00:34 INFO Will now proceed to drop magic table and unlock tables
2016-08-12 12:00:34 INFO Dropping magic cut-over table
2016-08-12 12:00:34 INFO Releasing lock from `test`.`sbtest`, `test`.`_sbtest_del`
2016-08-12 12:00:34 INFO Tables unlocked
2016-08-12 12:00:34 INFO Tables renamed
2016-08-12 12:00:34 INFO Lock & rename duration: 981.435405ms. During this time, queries on `sbtest` were blocked
2016-08-12 12:00:34 INFO Looking for magic cut-over table
2016-08-12 12:00:34 INFO Droppping table `test`.`_sbtest_ghc`
2016-08-12 12:00:34 INFO Table dropped
2016-08-12 12:00:34 INFO Am not dropping old table because I want this operation to be as live as possible. If you insist I should do it, please add `--ok-to-drop-table` next time. But I prefer you do not. To drop the old table, issue:
2016-08-12 12:00:34 INFO -- drop table `test`.`_sbtest_del`
2016-08-12 12:00:34 INFO Done migrating `test`.`sbtest`
2016-08-12 12:00:34 INFO Done
重要的一些选项以及典型使用方式
重要的一些选项:
-critical-load --max-load
Comma delimited status-name=threshold, same format as --max-load. When status exceeds threshold, app panics and quits
改表完成之后是否删除老表。
-ok-to-drop-table
Shall the tool drop the old table at end of operation. DROPping tables can be a long locking operation, which is why I'm not doing it by default. I'm an online tool, yes?
如果执行之前发现old-table,删除还是终止?默认终止。
-initially-drop-old-table
Drop a possibly existing OLD table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists
port选项仅对应的是host吗?其他地方都需要设置全
-port int
MySQL port (preferably a replica, not the master) (default 3306)
因为gh-ost没有提供recursion-method=processlist方法,因此需要通过throttle-control-replicas指定所有的需要检查的slave,并且注意上面的port仅仅对应于host选项,因此需要host:port的方式来写全称
-throttle-control-replicas string
List of replicas on which to check for lag; comma delimited. Example: myhost1.com:3306,myhost2.com,myhost3.com:3307
拷贝每个chunk之后sleep的时间=nice-ratio*copy-chunk-time,默认值是0,表示不sleep
-nice-ratio float
force being 'nice', imply sleep time per chunk time; range: [0.0..100.0]. Example values: 0 is aggressive. 1.5: for every ms spend in a rowcopy chunk, spend 1.5ms sleeping immediately after
默认是不删除socket文件的,这样当第二次运行的时候报错,提示socket文件已经存在
-initially-drop-socket-file
Should gh-ost forcibly delete an existing socket file. Be careful: this might drop the socket file of a running migration!
-exact-rowcount
actually count table rows as opposed to estimate them (results in more accurate progress estimation)
典型使用方式:
./gh-ost
--max-load=Threads_running=25
--critical-load=Threads_running=64
--chunk-size=1000
--throttle-control-replicas="test02:3306"
--max-lag-millis=1500
--initially-drop-old-table
--initially-drop-ghost-table
--initially-drop-socket-file
--ok-to-drop-table
--conf="./my.cnf"
--host="test02"
--port=3306
--user="admin"
--password="admin"
--database="test"
--table="test"
--verbose
--alter="drop index id1"
--switch-to-rbr
--allow-master-master
--cut-over=default
--default-retries=120
--panic-flag-file=/tmp/ghost.panic.flag
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag
--execute
发现的一些不足:
- 相比pt-osc缺少–check-interval参数;目前这个值写死是1s;issue地址默认是1s,足够小了,作者认为当前可以;
- –conf选项设置的my.cnf文件中不支持设置prompt=[h]u@d :m:s>;issue地址,作者已经修复;
- 缺少增加unique index的时候的检查;如果增加unique index的时候会丢失数据;(pt-osc也存在这个问题);作者认为不需要修复,问题的解决难度也比较大,DBA在使用前需要注意;
- 相比pt-osc缺少–check-replication-filters;是否确实对从库是否存在这个表的检查
- 相比pt-osc缺少–recursion-method=processlist;需要通过参数进行设置,issue地址
问题:
1.相比pt-osc缺乏–check-interval参数,那么检查的频率是多少呢?
~~~go // Migrate executes the complete migration logic. This is the major gh-ost function. func (this *Migrator) Migrate() (err error) { //在这个函数中进行load和slave lag的检查; go this.initiateThrottler() }
//通过这个函数可以很清晰的看到,间隔是每秒钟检查一次load和slave lag;通过time.Tick(1 * time.Second)创建一个定时器,每秒钟往管道throttlerTick中写一个值,然后再通过for range从管道中读取; // initiateThrottler initiates the throttle ticker and sets the basic behavior of throttling. func (this *Migrator) initiateThrottler() error { throttlerTick := time.Tick(1 * time.Second)
throttlerFunction := func() {
alreadyThrottling, currentReason := this.migrationContext.IsThrottled()
shouldThrottle, throttleReason := this.shouldThrottle()
if shouldThrottle && !alreadyThrottling {
// New throttling
this.applier.WriteAndLogChangelog("throttle", throttleReason)
} else if shouldThrottle && alreadyThrottling && (currentReason != throttleReason) {
// Change of reason
this.applier.WriteAndLogChangelog("throttle", throttleReason)
} else if alreadyThrottling && !shouldThrottle {
// End of throttling
this.applier.WriteAndLogChangelog("throttle", "done throttling")
}
this.migrationContext.SetThrottled(shouldThrottle, throttleReason)
}
throttlerFunction()
for range throttlerTick {
throttlerFunction()
}
return nil }