• MySQL之Innodb恢复的学习笔记


    MySQL · 引擎特性 · InnoDB 崩溃恢复过程

    enum {
        SRV_FORCE_IGNORE_CORRUPT = 1,    /*!< let the server run even if it
                        detects a corrupt page */
        SRV_FORCE_NO_BACKGROUND    = 2,    /*!< prevent the main thread from
                        running: if a crash would occur
                        in purge, this prevents it */
        SRV_FORCE_NO_TRX_UNDO = 3,    /*!< do not run trx rollback after
                        recovery */
        SRV_FORCE_NO_IBUF_MERGE = 4,    /*!< prevent also ibuf operations:
                        if they would cause a crash, better
                        not do them */
        SRV_FORCE_NO_UNDO_LOG_SCAN = 5,    /*!< do not look at undo logs when
                        starting the database: InnoDB will
                        treat even incomplete transactions
                        as committed */
        SRV_FORCE_NO_LOG_REDO = 6    /*!< do not do the log roll-forward
                        in connection with recovery */
    };

    innodb中的 3个lsn

    innodb的lsn和oracle的scn一样,是一个重要的概念。比如

    • 在flush list中正是是使用low lsn作为链表的条件
      参考buf_page_t中的lsn_t oldest_modification;
    • 在checkpoint中记录的也是lsn
      参考宏
    #define LOG_CHECKPOINT_NO        0
    #define LOG_CHECKPOINT_LSN        8
    #define LOG_CHECKPOINT_OFFSET        16
    #define LOG_CHECKPOINT_LOG_BUF_SIZE    24
    • 在物理文件中每个块最后的刷新lsn
      参考宏FIL_PAGE_LSN
    • 在写日志落盘的时候也是以lsn为标准的
      参考函数log_write_up_to

    实际上lsn就是表示的日志量的字节数,是一个累加的值,在5.7中表现为:

    /* Type used for all log sequence number storage and arithmetics */
    typedef    ib_uint64_t        lsn_t;

    及一个8字节非负的整数。最大值及2的64次方。有了这种物理上概念,lsn很容易换算为当前日志的偏移量。

    下面描述一下和检查点相关的几个lsn

    • ibdata第一个块FIL中的lsn(flush lsn):ibdata的26后面8字节是在innodb 干净关闭的时候进行更新的,如果不正常关闭不会进行写入(FIL_PAGE_FILE_FLUSH_LSN)
    • redolog中MLOG_CHECKPOINT的lsn: redolog MLOG_CHECKPOINT lsn的写入是在每次checkpoint的时候同步写入的.干净关闭会更新。
    • redolog header中的lsn:是在每次checkpoint的时候异步写入的在MLOG_CHECKPOINT写入之后.干净关闭会更新。
      我们表示为lsn1/lsn2/lsn3

    正常关闭3个lsn是相等的,如果非正常关闭innodb,lsn1不会更新,因此lsn3必然不和lsn1相等,则判定需要进行carsh recovery。

    if (checkpoint_lsn != flush_lsn) {
    
        ......
            if (!recv_needed_recovery) {
    
                ib::info() << "The log sequence number " << flush_lsn
                    << " in the system tablespace does not match"
                    " the log sequence number " << checkpoint_lsn
                    << " in the ib_logfiles!"; //出现这个警告说明需要恢复了,因为没有干净的关闭数据库,那么flush_lsn一定比checkpoint_lsn小
    
                if (srv_read_only_mode) {
                    ib::error() << "Can't initiate database"
                        " recovery, running in read-only-mode.";
                    log_mutex_exit();
                    return(DB_READ_ONLY);
                }
                recv_init_crash_recovery(); //初始化
            }
        }

    MLOG_CHECKPOINT

    • log_checkpoint 函数由master线程调用,以及关闭数据库的时候调用,不断的向redo log中写入MLOG_CHECKPOINT和MLOG_FILE_NAME

    log_group_checkpoint 写入checkpoint信息到log header

    #0  mtr_t::commit_checkpoint (this=0x7fff761fb830, checkpoint_lsn=697558445) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/mtr/mtr0mtr.cc:592
    #1  0x0000000001ceb4b9 in fil_names_clear (lsn=697558445, do_write=true) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/fil/fil0fil.cc:7067
    #2  0x0000000001a521cc in log_checkpoint (sync=true, write_always=false) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1927
    #3  0x0000000001b856f2 in srv_master_do_idle_tasks () at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/srv/srv0srv.cc:2596
    #4  0x0000000001b85b6b in srv_master_thread (arg=0x0) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/srv/srv0srv.cc:2744
    #5  0x0000003f74807aa1 in start_thread () from /lib64/libpthread.so.0
    #6  0x0000003f740e8bcd in clone () from /lib64/libc.so.6

    正常innodb运行的情况下 checkpoint是由master线程触发。我们知道脏数据通过page clean线程和lru manager线程是在不断写盘的,那么在进行异常重启的的时候我们必须要知道一个恢复的起点,但是这个起点是不能记录在内存中必要固化到磁盘,恢复的时候读取这个点以后的redo进行恢复,而checkpoint就是完成这个事情下面是checkpoint的执行流程。

    正常情况下master会每秒进行检查点其作用有:(参考log_checkpoint函数)
    1、检查是否有自上次检查点以来的脏数据写盘了
    2、如果有则在redo里面会为每个修改过的文件写入MLOG_FILE_NAME,完成后写入一个总的MLOG_CHECKPOINT(参考fil_names_clear函数)
    MLOG_FILE_NAME主要记录至上次检查点以来更改过的数据文件
    MLOG_CHECKPOINT主要记录检查点的lsn
    3、如果有则在redo header中写入相应的检查点信息包含(异步写)(参考log_group_checkpoint函数)

    Log sequence number 697794162
    Log flushed up to 697794162
    Pages flushed up to 697794162
    Last checkpoint at 697794153

    697794162-697794153 = 9 刚好是MLOG_CHECKPOINT的长度
    oldest_lsn <= log_sys->last_checkpoint_lsn + SIZE_OF_MLOG_CHECKPOINT 
    SIZE_OF_MLOG_CHECKPOINT=9

    redo的写入有6个途径:
    1、master 线程每秒调用 栈帧(可能是idle可能是active 和检测是否需要插入缓存合并有关)
    2、master 线程每秒checkpoint调用 (可能是idle可能是active 和检测是否需要插入缓存合并有关)
    3、page clean 线程调用 栈帧 / Force the log to the disk before writing the modified block /
    4、主线程commit 调用 栈帧
    5、innodb shutdown
    6、redo buffer不足

    1、master线程调用 栈帧(可能是idle可能是active 和检测是否需要插入缓存合并有关)

    #0  log_group_write_buf (group=0x33f29f8, buf=0x7fffa5b38000 "20024", len=512, pad_len=0, start_lsn=697764864, new_data_offset=166)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1145
    #1  0x0000000001a50f95 in log_write_up_to (lsn=697765068, flush_to_disk=true) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1493
    #2  0x0000000001a51163 in log_buffer_sync_in_background (flush=true) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1553
    #3  0x0000000001b84bd1 in srv_sync_log_buffer_in_background () at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/srv/srv0srv.cc:2312
    #4  0x0000000001b85666 in srv_master_do_idle_tasks () at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/srv/srv0srv.cc:2586
    #5  0x0000000001b85b6b in srv_master_thread (arg=0x0) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/srv/srv0srv.cc:2744

    2、master 线程checkpoint调用 (可能是idle可能是active 和检测是否需要插入缓存合并有关)

    #0  log_group_write_buf (group=0x33f29f8, buf=0x7fffa5a38000 "2002402", len=1024, pad_len=0, start_lsn=697789952, new_data_offset=139)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1145
    #1  0x0000000001a50f95 in log_write_up_to (lsn=697790725, flush_to_disk=true) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1493
    #2  0x0000000001a52247 in log_checkpoint (sync=true, write_always=false) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1934
    #3  0x0000000001b856f2 in srv_master_do_idle_tasks () at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/srv/srv0srv.cc:2596
    #4  0x0000000001b85b6b in srv_master_thread (arg=0x0) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/srv/srv0srv.cc:2744

    3、page clean 线程调用 栈帧 / Force the log to the disk before writing the modified block /

    #0  log_group_write_buf (group=0x33f29f8, buf=0x7fffa5a38000 "2002402", len=13312, pad_len=1024, start_lsn=697778176, new_data_offset=468)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1145
    #1  0x0000000001a50f95 in log_write_up_to (lsn=697790015, flush_to_disk=true) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1493
    #2  0x0000000001c704c7 in buf_flush_write_block_low (bpage=0x7fffc0cae940, flush_type=BUF_FLUSH_LIST, sync=false)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/buf/buf0flu.cc:1035
    #3  0x0000000001c70cea in buf_flush_page (buf_pool=0x33247d8, bpage=0x7fffc0cae940, flush_type=BUF_FLUSH_LIST, sync=false)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/buf/buf0flu.cc:1237
    #4  0x0000000001c717f4 in buf_flush_try_neighbors (page_id=..., flush_type=BUF_FLUSH_LIST, n_flushed=0, n_to_flush=25)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/buf/buf0flu.cc:1466
    #5  0x0000000001c71b57 in buf_flush_page_and_try_neighbors (bpage=0x7fffc0cae940, flush_type=BUF_FLUSH_LIST, n_to_flush=25, count=0x7fffa02867c0)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/buf/buf0flu.cc:1558
    #6  0x0000000001c72862 in buf_do_flush_list_batch (buf_pool=0x33247d8, min_n=25, lsn_limit=18446744073709551615)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/buf/buf0flu.cc:1846
    #7  0x0000000001c72cb6 in buf_flush_batch (buf_pool=0x33247d8, flush_type=BUF_FLUSH_LIST, min_n=25, lsn_limit=18446744073709551615)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/buf/buf0flu.cc:1926
    #8  0x0000000001c73104 in buf_flush_do_batch (buf_pool=0x33247d8, type=BUF_FLUSH_LIST, min_n=25, lsn_limit=18446744073709551615, n_processed=0x7fffa0286938)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/buf/buf0flu.cc:2071
    #9  0x0000000001c734ee in buf_flush_lists (min_n=25, lsn_limit=18446744073709551615, n_processed=0x7fffa02869c8)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/buf/buf0flu.cc:2202
    #10 0x0000000001c76a97 in buf_flush_page_cleaner_coordinator (arg=0x0) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/buf/buf0flu.cc:3362

    4、主线程commit 调用 栈帧

    #0  log_group_write_buf (group=0x33f29f8, buf=0x7fffa5a38000 "2002402", len=2560, pad_len=0, start_lsn=697762816, new_data_offset=230)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1145
    #1  0x0000000001a50f95 in log_write_up_to (lsn=697765030, flush_to_disk=true) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1493
    #2  0x0000000001a51087 in log_buffer_flush_to_disk (sync=true) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:1524
    #3  0x00000000019a9157 in innobase_flush_logs (hton=0x2e9fdd0, binlog_group_flush=true)
        at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/handler/ha_innodb.cc:4407
    #4  0x0000000000f65893 in flush_handlerton (thd=0x0, plugin=0x7ffff03588e8, arg=0x7ffff0358944) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/handler.cc:2606
    #5  0x00000000015d7716 in plugin_foreach_with_mask (thd=0x0, func=0xf65835 <flush_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=4294967287, 
        arg=0x7ffff0358944) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_plugin.cc:2318
    #6  0x0000000000f658ef in ha_flush_logs (db_type=0x0, binlog_group_flush=true) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/handler.cc:2617
    #7  0x000000000185733d in MYSQL_BIN_LOG::process_flush_stage_queue (this=0x2e02c80, total_bytes_var=0x7ffff0358a88, rotate_var=0x7ffff0358a87, 
        out_queue_var=0x7ffff0358a78) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/binlog.cc:8541
    #8  0x000000000185899f in MYSQL_BIN_LOG::ordered_commit (this=0x2e02c80, thd=0x7fff2c000b70, all=false, skip_commit=false)
        at /root/mysql5.7.14/percona-server-5.7.14-7/sql/binlog.cc:9189
    #9  0x000000000185700c in MYSQL_BIN_LOG::commit (this=0x2e02c80, thd=0x7fff2c000b70, all=false) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/binlog.cc:8440
    #10 0x0000000000f63df8 in ha_commit_trans (thd=0x7fff2c000b70, all=false, ignore_global_read_lock=false)

    入口函数
    innobase_start_or_create_for_mysql

    recv_recovery_from_checkpoint_start

    1、此函数需要输入flush的lsn及脏数据写入到的位置

    这个数据从FIL_PAGE_FILE_FLUSH_LSN中读取,这个值
    只在ibdata的第一个page有效其他均为0,他的读取来自于
    函数Datafile::validate_first_page,
    其中
    *flush_lsn = mach_read_from_8(
    m_first_page + FIL_PAGE_FILE_FLUSH_LSN);
    
    
    其写入由函数  正常shutdown才会写入,非正常关闭不会写入,正常运行检查点也不会写入
    fil_write_flushed_lsn 写入
    mach_write_to_8(buf + FIL_PAGE_FILE_FLUSH_LSN, lsn);
    
    栈帧
        #0  fil_write_flushed_lsn (lsn=696973727) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/fil/fil0fil.cc:1946
        #1  0x0000000001a538a7 in logs_empty_and_mark_files_at_shutdown () at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/log/log0log.cc:2464
        #2  0x0000000001b915a9 in innobase_shutdown_for_mysql () at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/srv/srv0start.cc:2803
        #3  0x00000000019a8ffc in innobase_end (hton=0x2e9edd0, type=HA_PANIC_CLOSE) at /root/mysql5.7.14/percona-server-5.7.14-7/storage/innobase/handler/ha_innodb.cc:4360
        #4  0x0000000000f62621 in ha_finalize_handlerton (plugin=0x3015cd0) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/handler.cc:813
        #5  0x00000000015d3d25 in plugin_deinitialize (plugin=0x3015cd0, ref_check=true) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_plugin.cc:995
        #6  0x00000000015d410e in reap_plugins () at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_plugin.cc:1077
        #7  0x00000000015d6073 in plugin_shutdown () at /root/mysql5.7.14/percona-server-5.7.14-7/sql/sql_plugin.cc:1845
        #8  0x0000000000ebf7eb in clean_up (print_message=true) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/mysqld.cc:1336
        #9  0x0000000000ec701b in mysqld_main (argc=98, argv=0x2e9cf08) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/mysqld.cc:5386
        #10 0x0000000000ebd604 in main (argc=10, argv=0x7fffffffe458) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/main.cc:25

    2、此函数首先建立红黑树用于恢复并且做 force recovery判断

       (srv_force_recovery >= SRV_FORCE_NO_LOG_REDO) 
       然后调用 
       recv_find_max_checkpoint 此函数就是找到最大的checkpoint lsn、redo文件,以及field 就是相对于一个日志而讲所在的offset set (512*1 or 512*3)_
       读取如下:
                       group->lsn = mach_read_from_8(
                      buf + LOG_CHECKPOINT_LSN);
                    group->lsn_offset = mach_read_from_8(
                      buf + LOG_CHECKPOINT_OFFSET);
                    checkpoint_no = mach_read_from_8(
                      buf + LOG_CHECKPOINT_NO);

    3、recv_group_scan_log_recs

       循环读取64K(RECV_SCAN_SIZE)日志到redo buffer(log_group_read_log_seg)
       recv_scan_log_recs
         本函数将64K 的redo 通过每个512 bytes block大小循环加入到(recv_sys_add_to_parsing_buf) 扫描到parse buffer 
         代码片段拷贝
              ut_memcpy(recv_sys->buf + recv_sys->len,
                  log_block + start_offset, end_offset - start_offset);
                
               当每次扫描了64K*80的日志量的时候会输出,级5M
               ib::info() << "Doing recovery: scanned up to"
                    " log sequence number " << scanned_lsn;
         并且进行分析对这加入到 parse buffer的日志进行分析(recv_parse_log_recs) /其中parse buffer 为2M(宏RECV_PARSING_BUF_SIZE大小)
            recv_parse_log_recs函数对每次加入 parse buffer的64k进行分析主要分析是MLOG_SINGLE_REC_FLAG还是MLOG_MULTI_REC_END 同时确认MLOG_MULTI_REC_END
            是否完整,完成后加入到hash table,函数是调用recv_parse_log_rec进行 (type spaceid page_no data)的剥离。 
    

    那么recv_group_scan_log_recs为第一层循环,循环扫描64K日志到redo buffer
    然后调用recv_scan_log_recs以block单位(512b)大小循环加入到parse buffer,这是通过函数recv_sys_add_to_parsing_buf完成
    等到64k加入到parse buffer后调用recv_parse_log_recs函数对每次加入 parse buffer的64k进行分析,分析是循环以record为单位的,主要分析MLOG_SINGLE_REC_FLAG还是MLOG_MULTI_REC_END,同时确保MLOG_MULTI_REC_END记录是完整的。完成后recv_parse_log_recs函数 还要将其加入到hash table函数为recv_add_to_hash_table,本函数是通过recv_parse_log_rec进行 (type spaceid page_no data)的剥离。

    recv_apply_hashed_log_recs 最后完成分析完成后的日志应用,从hash table中

  • 相关阅读:
    PAT B1021 个位数统计 (15)
    PAT B1006 换个格式输出整数 (15)
    PAT A1058 A+B in Hogwarts (20)
    PAT A1027 Colors in Mars (20)
    PAT A1019 General Palindromic Number (20 分)
    PAT B1022 D进制的A+B
    J2EE
    MVC
    数据库长连接短连接
    twisted
  • 原文地址:https://www.cnblogs.com/DataArt/p/9873200.html
Copyright © 2020-2023  润新知