Innodb启动过程如下:
1. 初始化innobase_hton,它是一个handlerton类型的指针,以便在server层能够调用存储引擎的接口。
2. Innodb相关参数的检车和初始化,包括系统表空间,临时表空间,undo表空间,redo文件,doublewrite文件等。
3. innobase_start_or_create_for_mysql()创建或者启动 innobase。
innobase_start_or_create_for_mysql() 过程如下:
1. 重置 start state.
2. 处理 innodb_flush_method, 一般情况下,线上使用 O_DIRECT | O_DIRECT_NO_FSYNC
3. 设置 Innodb 最大线程数量
4. 重置 innodb_buffer_pool_instances 和 innodb_buffer_pool_size
5. 根据 srv_buf_pool_instances 调整 innodb_page_cleaners 的数量
6. 启动innodb server, 进行相关参数和组件的初始化。
7. 初始化异步IO子系统
8. 创建 innodb_buffer_pool, 当没有足够的内存时会报错
9. 调用 fsp_init 和 log_init, 初始化 fsp 系统 & redo log 系统
10. 调用recv_sys_create和recv_sys_init函数,创建及初始化recovery系统
11. 调用 lock_sys_create函数,创建锁系统
12. 调用 os_thread_create 函数,创建 IO 线程
13. 调用 buf_flush_page_cleaner_init 函数,初始化 page_cleaner 系统,而后创建 buf_flush_page_cleaner_coordinator 和 buf_flush_page_cleaner_worker 线程
14. 等待 page_cleaner 变为 active 状态。
15. 调用 check_file_spec函数,检查数据文件是否存在, ibdata1 ibdata2 等等, 判断是否需要创建新的数据库
16. 如果需要创建新的数据库, 则检查是否存在 redo log file 和 undo 表空间
17. 调用 srv_sys_space.open_or_create(), 打开或创建新的数据文件[ibdata..],如果不是创建新的数据库,则从 ibdata1文件中读取 flushed_lsn
18. 这里如果是 create_new_db,则:
18.1 从所有缓冲池的 flush list 的尾部同步flush脏的数据页
18.2 获取当前 lsn
18.3 创建 redo log 文件
19. 如果是 !create_new_db,则打开 redo log file
20. 调用 fil_space_create函数,创建 redo log 内存中的空间对象
21. 添加redo log file 文件到 redo log space 中
22. 初始化 redo log group 日志组
23. 调用 fil_open_log_and_system_tablespace_files,打开所有日志文件和系统表空间数据文件
24. 调用 srv_undo_tablespaces_init,打开 undo 表空间, 在找到并打开所有的 undo 文件之后, 将他们全部加入文件管理系统
25. 调用trx_sys_file_format_init函数,初始化变量file_format_max
26. 创建 trx_sys instance 并初始化 purge_queue 和 mutex
27. 如果 create_new_db,则:
27.1 调用 fsp_header_init,在 ibdata 文件的开始分配空间,以便可以存储管理一些系统模块,如事务系统等
27.2 调用 trx_sys_create_sys_pages,创建事务系统的文件页,在ibdata中的第6个页面。
27.3 调用 trx_sys_init_at_db_start,创建并初始化事务系统内存结构。
27.4 调用 trx_purge_sys_create,创建并初始化 trx purge 系统
27.5 调用 dict_create, 创建新的数据字典并初始化 change buf
28. 使整个缓冲池无效, 来确保在 recovery的过程中我们重新读取之前读取的页。这是一个很轻量级的操作, 此时再 LRU 列表中只有一个数据页, 在 flush 列表中没有任何数据页。
29. 调用 recv_recovery_from_checkpoint_start(),开始 recovery 操作
29.1 初始化 flush 红黑树, 以便在恢复的过程中快速插入 flush 列表。
29.2 在 log groups 中查找 latest checkpoint
29.3 读取 latest checkpoint 所在的 redo log 页到 log_sys->checkpoint_buf中
29.4 获取 checkpoint_lsn 和 checkpoint_no
29.5 从 checkpoing_lsn 读取 redo log 到 hash 表中。
29.6 检查 crash recovery 所需的表空间, 处理并删除double write buf 中的数据页, 这里会检查double write buf 中页对应的真实数据页的完整性, 如果有问题, 则使用 double write buf 中页进行恢复。同时, 生成后台线程 recv_writer_thread 以清理缓冲池中的脏页。
29.7 将日志段从最新的日志组复制到其他组, 我们目前只有一个日志组。
30. 清除 double write buf 中的数据页
30. 调用 dict_boot, 初始化数据字典系统和change_buf
31. 调用trx_sys_init_at_db_start,创建并初始化事务系统
32. 调用 recv_apply_hashed_log_recs,应用 redo log
33. 调用trx_purge_sys_create,创建 trx_purge sys
34. 调用recv_recovery_from_checkpoint_finish,从一个 checkpoint 位置完成 recovery 操作
34.1 确保 recv_writer 线程已完成
34.2 等待 flush 操作完成, flush脏页操作已经完成
34.3 等待 recv_writer 线程终止
34.4 释放 flush 红黑树
34.5 回滚所有的数据字典表的事务,以便数据字典表没有被锁定。数据字典 latch 应保证一次只有一个数据字典事务处于活跃状态。
35. 调用recv_recovery_rollback_active,回滚未在Innodb中提交的不完整的事务【处于TRX_STATE_ACTIVE状态,尚未进入 TRX_STATE_PREPARED状态的事务】,这是在一个后台线程中进行中
36. 调用 srv_open_tmp_tablespace,打开临时表空间
37. 调用trx_sys_create_rsegs,创建回滚段
38. 创建锁等待超时线程,线程函数为lock_wait_timeout_thread。
39. 创建信号量超时监控线程,当信号量等待持续过长的时间时,打印警告信息,线程函数为srv_error_monitor_thread。
40. 创建 master thread,线程函数为 srv_master_thread
41. 创建 purge 系统线程,srv_purge_coordinator_thread 和 srv_worker_thread 线程
42. srv_start_wait_for_purge_to_start,等待 purge 系统启动
43. 创建buffer pool dump/load线程,线程函数为buf_dump_thread
44. 创建统计信息收集线程,线程函数为dict_stats_thread
45. 调用函数fts_optimize_init,创建优化线程,线程函数为fts_optimize_thread
46. 创建buffer pool size动态调整线程,线程函数为buf_resize_thread。
Innodb存储引擎的启动代码是在 ha_innodb.cc 的 innobase_init() 方法中,其源码如下:
/*********************************************************************//** 初始化Innodb 插件 Opens an InnoDB database. @return 0 on success, 1 on failure */ static int innobase_init( /*==========*/ void *p) /*!< in: InnoDB handlerton */ { static char current_dir[3]; /*!< Set if using current lib */ int err; char *default_path; uint format_id; ulong num_pll_degree; // 初始化 innobase_hton,以便在server层能够调用Innodb的接口 DBUG_ENTER("innobase_init"); handlerton* innobase_hton= (handlerton*) p; innodb_hton_ptr = innobase_hton; innobase_hton->state = SHOW_OPTION_YES; innobase_hton->db_type = DB_TYPE_INNODB; innobase_hton->savepoint_offset = sizeof(trx_named_savept_t); innobase_hton->close_connection = innobase_close_connection; innobase_hton->kill_connection = innobase_kill_connection; innobase_hton->savepoint_set = innobase_savepoint; innobase_hton->savepoint_rollback = innobase_rollback_to_savepoint; innobase_hton->savepoint_rollback_can_release_mdl = innobase_rollback_to_savepoint_can_release_mdl; innobase_hton->savepoint_release = innobase_release_savepoint; innobase_hton->commit = innobase_commit; innobase_hton->rollback = innobase_rollback; innobase_hton->prepare = innobase_xa_prepare; innobase_hton->recover = innobase_xa_recover; innobase_hton->commit_by_xid = innobase_commit_by_xid; innobase_hton->rollback_by_xid = innobase_rollback_by_xid; innobase_hton->create = innobase_create_handler; innobase_hton->alter_tablespace = innobase_alter_tablespace; innobase_hton->drop_database = innobase_drop_database; innobase_hton->panic = innobase_end; innobase_hton->partition_flags= innobase_partition_flags; innobase_hton->start_consistent_snapshot = innobase_start_trx_and_assign_read_view; innobase_hton->flush_logs = innobase_flush_logs; innobase_hton->show_status = innobase_show_status; innobase_hton->fill_is_table = innobase_fill_i_s_table; innobase_hton->flags = HTON_SUPPORTS_EXTENDED_KEYS | HTON_SUPPORTS_FOREIGN_KEYS | HTON_SUPPORTS_TABLE_ENCRYPTION; innobase_hton->release_temporary_latches = innobase_release_temporary_latches; innobase_hton->replace_native_transaction_in_thd = innodb_replace_trx_in_thd; innobase_hton->data = &innodb_api_cb; innobase_hton->is_reserved_db_name= innobase_check_reserved_file_name; innobase_hton->is_supported_system_table= innobase_is_supported_system_table; innobase_hton->rotate_encryption_master_key = innobase_encryption_key_rotation; ut_a(DATA_MYSQL_TRUE_VARCHAR == (ulint)MYSQL_TYPE_VARCHAR); #ifndef NDEBUG static const char test_filename[] = "-@"; char test_tablename[sizeof test_filename + sizeof(srv_mysql50_table_name_prefix) - 1]; if ((sizeof(test_tablename)) - 1 != filename_to_tablename(test_filename, test_tablename, sizeof(test_tablename), true) || strncmp(test_tablename, srv_mysql50_table_name_prefix, sizeof(srv_mysql50_table_name_prefix) - 1) || strcmp(test_tablename + sizeof(srv_mysql50_table_name_prefix) - 1, test_filename)) { sql_print_error("tablename encoding has been changed"); DBUG_RETURN(innobase_init_abort()); } #endif /* NDEBUG */ /* Check that values don't overflow on 32-bit systems. */ if (sizeof(ulint) == 4) { if (innobase_buffer_pool_size > UINT_MAX32) { sql_print_error( "innodb_buffer_pool_size can't be over 4GB" " on 32-bit systems"); DBUG_RETURN(innobase_init_abort()); } } os_file_set_umask(my_umask); /* Setup the memory alloc/free tracing mechanisms before calling any functions that could possibly allocate memory. */ ut_new_boot(); /* First calculate the default path for innodb_data_home_dir etc., in case the user has not given any value. Note that when using the embedded server, the datadirectory is not necessarily the current directory of this program. */ if (mysqld_embedded) { default_path = mysql_real_data_home; } else { /* It's better to use current lib, to keep paths short */ current_dir[0] = FN_CURLIB; current_dir[1] = FN_LIBCHAR; current_dir[2] = 0; default_path = current_dir; } ut_a(default_path); fil_path_to_mysql_datadir = default_path; folder_mysql_datadir = fil_path_to_mysql_datadir; /* Set InnoDB initialization parameters according to the values read from MySQL .cnf file */ /* The default dir for data files is the datadir of MySQL 默认的数据文件目录 */ srv_data_home = innobase_data_home_dir ? innobase_data_home_dir : default_path; /*--------------- Shared tablespaces ------------------------- 共享表空间, 分为系统表空间和临时共享表空间 */ /* Check that the value of system variable innodb_page_size was set correctly. Its value was put into srv_page_size. If valid, return the associated srv_page_size_shift. */ // 检查系统变量 innodb_page_size 的值。 srv_page_size_shift = innodb_page_size_validate(srv_page_size); if (!srv_page_size_shift) { sql_print_error("InnoDB: Invalid page size=%lu.\n", srv_page_size); DBUG_RETURN(innobase_init_abort()); } /* Set default InnoDB temp data file size to 12 MB and let it be auto-extending. 设置默认的 Innodb 数据文件大小为12MB,并设置其自动增长。 */ if (!innobase_data_file_path) { innobase_data_file_path = (char*) "ibdata1:12M:autoextend"; } /* This is the first time univ_page_size is used. It was initialized to 16k pages before srv_page_size was set univ_page_size 被初始化为 16k. */ univ_page_size.copy_from( page_size_t(srv_page_size, srv_page_size, false)); // 设置系统表空间的 space_id srv_sys_space.set_space_id(TRX_SYS_SPACE); /* Create the filespace flags. 设置系统表空间 filespace_flags\name\path */ ulint fsp_flags = fsp_flags_init( univ_page_size, false, false, false, false); srv_sys_space.set_flags(fsp_flags); srv_sys_space.set_name(reserved_system_space_name); srv_sys_space.set_path(srv_data_home); /* Supports raw devices 支持 raw devices */ if (!srv_sys_space.parse_params(innobase_data_file_path, true)) { ib::error() << "Unable to parse innodb_data_file_path=" << innobase_data_file_path; DBUG_RETURN(innobase_init_abort()); } /* Set default InnoDB temp data file size to 12 MB and let it be auto-extending. 设置默认的 Innodb temp 数据文件大小为 12MB 并自动增长。 */ if (!innobase_temp_data_file_path) { innobase_temp_data_file_path = (char*) "ibtmp1:12M:autoextend"; } /* We set the temporary tablspace id later, after recovery. The temp tablespace doesn't support raw devices. Set the name and path. 在这里设置临时表空间 name 和 path,临时表空间不支持原始设备。 在 recovery 之后设置临时表空间id。 */ srv_tmp_space.set_name(reserved_temporary_space_name); srv_tmp_space.set_path(srv_data_home); /* Create the filespace flags with the temp flag set. 设置临时表空间的 filespace_flags. */ fsp_flags = fsp_flags_init( univ_page_size, false, false, false, true); srv_tmp_space.set_flags(fsp_flags); if (!srv_tmp_space.parse_params(innobase_temp_data_file_path, false)) { ib::error() << "Unable to parse innodb_temp_data_file_path=" << innobase_temp_data_file_path; DBUG_RETURN(innobase_init_abort()); } /* Perform all sanity check before we take action of deleting files*/ // 检查系统表空间和临时表空间是否有公共 data file. if (srv_sys_space.intersection(&srv_tmp_space)) { sql_print_error("%s and %s file names seem to be the same.", srv_tmp_space.name(), srv_sys_space.name()); DBUG_RETURN(innobase_init_abort()); } /* ------------ UNDO tablespaces files --------------------- undo 表空间。 */ // undo表空间dir if (!srv_undo_dir) { srv_undo_dir = default_path; } // 规范 undo 表空间目录 os_normalize_path(srv_undo_dir); if (strchr(srv_undo_dir, ';')) { sql_print_error("syntax error in innodb_undo_directory"); DBUG_RETURN(innobase_init_abort()); } /* -------------- All log files --------------------------- 所有的日志文件 */ /* The default dir for log files is the datadir of MySQL 默认redo log 目录 */ // 默认 redo log group dir if (!srv_log_group_home_dir) { srv_log_group_home_dir = default_path; } // 规范目录 os_normalize_path(srv_log_group_home_dir); if (strchr(srv_log_group_home_dir, ';')) { sql_print_error("syntax error in innodb_log_group_home_dir"); DBUG_RETURN(innobase_init_abort()); } if (!innobase_large_prefix) { ib::warn() << deprecated_large_prefix; } if (!THDVAR(NULL, support_xa)) { ib::warn() << deprecated_innodb_support_xa_off; THDVAR(NULL, support_xa) = TRUE; } if (innobase_file_format_name != innodb_file_format_default) { ib::warn() << deprecated_file_format; } /* Validate the file format by animal name 校验 innodb_file_format_max; innodb文件格式 */ if (innobase_file_format_name != NULL) { format_id = innobase_file_format_name_lookup( innobase_file_format_name); if (format_id > UNIV_FORMAT_MAX) { sql_print_error("InnoDB: wrong innodb_file_format."); DBUG_RETURN(innobase_init_abort()); } } else { /* Set it to the default file format id. Though this should never happen. */ format_id = 0; } srv_file_format = format_id; /* Given the type of innobase_file_format_name we have little choice but to cast away the constness from the returned name. innobase_file_format_name is used in the MySQL set variable interface and so can't be const. */ innobase_file_format_name = (char*) trx_sys_file_format_id_to_name(format_id); /* Check innobase_file_format_check variable 检查 innodb_file_format_check 变量; */ if (!innobase_file_format_check) { ib::warn() << deprecated_file_format_check; /* Set the value to disable checking. */ srv_max_file_format_at_startup = UNIV_FORMAT_MAX + 1; } else { /* Set the value to the lowest supported format. */ srv_max_file_format_at_startup = UNIV_FORMAT_MIN; } if (innobase_file_format_max != innodb_file_format_max_default) { ib::warn() << deprecated_file_format_max; } /* Did the user specify a format name that we support? As a side effect it will update the variable srv_max_file_format_at_startup */ if (innobase_file_format_validate_and_set( innobase_file_format_max) < 0) { sql_print_error("InnoDB: invalid" " innodb_file_format_max value:" " should be any value up to %s or its" " equivalent numeric id", trx_sys_file_format_id_to_name( UNIV_FORMAT_MAX)); DBUG_RETURN(innobase_init_abort()); } /** Innodb change buffer */ if (innobase_change_buffering) { ulint use; for (use = 0; use < UT_ARR_SIZE(innobase_change_buffering_values); use++) { if (!innobase_strcasecmp( innobase_change_buffering, innobase_change_buffering_values[use])) { ibuf_use = (ibuf_use_t) use; goto innobase_change_buffering_inited_ok; } } sql_print_error("InnoDB: invalid value" " innodb_change_buffering=%s", innobase_change_buffering); DBUG_RETURN(innobase_init_abort()); } innobase_change_buffering_inited_ok: // Innodb_change_buffering = ALL ut_a((ulint) ibuf_use < UT_ARR_SIZE(innobase_change_buffering_values)); innobase_change_buffering = (char*) innobase_change_buffering_values[ibuf_use]; /* Check that interdependent parameters have sane values. 对相互依赖的参数进行检查。 srv_max_buf_pool_modified_pct & srv_max_dirty_pages_pct_lwm srv_max_io_capacity & srv_io_capacity & SRV_MAX_IO_CAPACITY_DUMMY_DEFAULT */ if (srv_max_buf_pool_modified_pct < srv_max_dirty_pages_pct_lwm) { sql_print_warning("InnoDB: innodb_max_dirty_pages_pct_lwm" " cannot be set higher than" " innodb_max_dirty_pages_pct.\n" "InnoDB: Setting" " innodb_max_dirty_pages_pct_lwm to %lf\n", srv_max_buf_pool_modified_pct); srv_max_dirty_pages_pct_lwm = srv_max_buf_pool_modified_pct; } if (srv_max_io_capacity == SRV_MAX_IO_CAPACITY_DUMMY_DEFAULT) { if (srv_io_capacity >= SRV_MAX_IO_CAPACITY_LIMIT / 2) { /* Avoid overflow. */ srv_max_io_capacity = SRV_MAX_IO_CAPACITY_LIMIT; } else { /* The user has not set the value. We should set it based on innodb_io_capacity. */ srv_max_io_capacity = ut_max(2 * srv_io_capacity, 2000UL); } } else if (srv_max_io_capacity < srv_io_capacity) { sql_print_warning("InnoDB: innodb_io_capacity" " cannot be set higher than" " innodb_io_capacity_max.\n" "InnoDB: Setting" " innodb_io_capacity to %lu\n", srv_max_io_capacity); srv_io_capacity = srv_max_io_capacity; } // 检查 innodb_buffer_pool_filename 配置 if (!is_filename_allowed(srv_buf_dump_filename, strlen(srv_buf_dump_filename), FALSE)) { sql_print_error("InnoDB: innodb_buffer_pool_filename" " cannot have colon (:) in the file name."); DBUG_RETURN(innobase_init_abort()); } /* -------------------------------------------------- innodb_file_flush_method & innobase_log_file_size & innodb_log_write_ahead_size innodb_log_buffer_size & innodb_buffer_pool_size & innodb_read_io_threads & innodb_write_io_threads innodb_doublewrite & innodb_log_checksums & innodb_rollback_on_timeout & innobase_locks_unsafe_for_binlog innodb_open_files & innodb_monitor 配置 & innodb_old_blocks_pct & innodb_undo_logs & */ srv_file_flush_method_str = innobase_file_flush_method; srv_log_file_size = (ib_uint64_t) innobase_log_file_size; if (UNIV_PAGE_SIZE_DEF != srv_page_size) { ib::warn() << "innodb-page-size has been changed from the" " default value " << UNIV_PAGE_SIZE_DEF << " to " << srv_page_size << "."; } if (srv_log_write_ahead_size > srv_page_size) { srv_log_write_ahead_size = srv_page_size; } else { ulong srv_log_write_ahead_size_tmp = OS_FILE_LOG_BLOCK_SIZE; while (srv_log_write_ahead_size_tmp < srv_log_write_ahead_size) { srv_log_write_ahead_size_tmp = srv_log_write_ahead_size_tmp * 2; } if (srv_log_write_ahead_size_tmp != srv_log_write_ahead_size) { srv_log_write_ahead_size = srv_log_write_ahead_size_tmp / 2; } } srv_log_buffer_size = (ulint) innobase_log_buffer_size; srv_buf_pool_size = (ulint) innobase_buffer_pool_size; srv_n_read_io_threads = (ulint) innobase_read_io_threads; srv_n_write_io_threads = (ulint) innobase_write_io_threads; srv_use_doublewrite_buf = (ibool) innobase_use_doublewrite; if (!innobase_use_checksums) { ib::warn() << "Setting innodb_checksums to OFF is DEPRECATED." " This option may be removed in future releases. You" " should set innodb_checksum_algorithm=NONE instead."; srv_checksum_algorithm = SRV_CHECKSUM_ALGORITHM_NONE; } innodb_log_checksums_func_update(innodb_log_checksums); #ifdef HAVE_LINUX_LARGE_PAGES if ((os_use_large_pages = my_use_large_pages)) { os_large_page_size = opt_large_page_size; } #endif row_rollback_on_timeout = (ibool) innobase_rollback_on_timeout; srv_locks_unsafe_for_binlog = (ibool) innobase_locks_unsafe_for_binlog; if (innobase_locks_unsafe_for_binlog) { ib::warn() << "Using innodb_locks_unsafe_for_binlog is" " DEPRECATED. This option may be removed in future" " releases. Please use READ COMMITTED transaction" " isolation level instead; " << SET_TRANSACTION_MSG; } if (innobase_open_files < 10) { innobase_open_files = 300; if (srv_file_per_table && table_cache_size > 300) { innobase_open_files = table_cache_size; } } if (innobase_open_files > (long) open_files_limit) { ib::warn() << "innodb_open_files should not be greater" " than the open_files_limit.\n"; if (innobase_open_files > (long) table_cache_size) { innobase_open_files = table_cache_size; } } srv_max_n_open_files = (ulint) innobase_open_files; srv_innodb_status = (ibool) innobase_create_status_file; srv_print_verbose_log = mysqld_embedded ? 0 : 1; /* Round up fts_sort_pll_degree to nearest power of 2 number */ for (num_pll_degree = 1; num_pll_degree < fts_sort_pll_degree; num_pll_degree <<= 1) { /* No op */ } fts_sort_pll_degree = num_pll_degree; /* Store the default charset-collation number of this MySQL installation MySQL默认的 charset-collation. */ data_mysql_default_charset_coll = (ulint) default_charset_info->number; // 初始化 innodb_commit_concurrency[限制并发提交] 的默认值 innobase_commit_concurrency_init_default(); // 初始化 os_event 对象。 os_event_global_init(); /* Set buffer pool size to default for fast startup when mysqld is run with --help --verbose options. */ ulint srv_buf_pool_size_org = 0; if (opt_help && opt_verbose && srv_buf_pool_size > srv_buf_pool_def_size) { ib::warn() << "Setting innodb_buf_pool_size to " << srv_buf_pool_def_size << " for fast startup, " << "when running with --help --verbose options."; srv_buf_pool_size_org = srv_buf_pool_size; srv_buf_pool_size = srv_buf_pool_def_size; } /* Since we in this module access directly the fields of a trx struct, and due to different headers and flags it might happen that ib_mutex_t has a different size in this module and in InnoDB modules, we check at run time that the size is the same in these compilation modules. */ // 启动或直接创建 innobase err = innobase_start_or_create_for_mysql(); // innobase_buffer_pool_size if (srv_buf_pool_size_org != 0) { /* Set the original value back to show in help. */ srv_buf_pool_size_org = buf_pool_size_align(srv_buf_pool_size_org); innobase_buffer_pool_size = static_cast<long long>(srv_buf_pool_size_org); } else { innobase_buffer_pool_size = static_cast<long long>(srv_buf_pool_size); } if (err != DB_SUCCESS) { DBUG_RETURN(innobase_init_abort()); } /* Create mutex to protect encryption master_key_id. */ mutex_create(LATCH_ID_MASTER_KEY_ID_MUTEX, &master_key_id_mutex); /* Adjust the innodb_undo_logs config object 调整 innodb_undo_logs */ innobase_undo_logs_init_default_max(); innobase_old_blocks_pct = static_cast<uint>( buf_LRU_old_ratio_update(innobase_old_blocks_pct, TRUE)); ibuf_max_size_update(srv_change_buffer_max_size); innobase_open_tables = hash_create(200); mysql_mutex_init(innobase_share_mutex_key.m_value, &innobase_share_mutex, MY_MUTEX_INIT_FAST); mysql_mutex_init(commit_cond_mutex_key.m_value, &commit_cond_m, MY_MUTEX_INIT_FAST); mysql_cond_init(commit_cond_key.m_value, &commit_cond); innodb_inited= 1; #ifdef MYSQL_DYNAMIC_PLUGIN if (innobase_hton != p) { innobase_hton = reinterpret_cast<handlerton*>(p); *innobase_hton = *innodb_hton_ptr; } #endif /* MYSQL_DYNAMIC_PLUGIN */ /* Get the current high water mark format. */ innobase_file_format_max = (char*) trx_sys_file_format_max_get(); /* Currently, monitor counter information are not persistent. Innodb monitor */ memset(monitor_set_tbl, 0, sizeof monitor_set_tbl); memset(innodb_counter_value, 0, sizeof innodb_counter_value); /* Do this as late as possible so server is fully starts up, since we might get some initial stats if user choose to turn on some counters from start up */ if (innobase_enable_monitor_counter) { innodb_enable_monitor_at_startup( innobase_enable_monitor_counter); } /* Turn on monitor counters that are default on */ srv_mon_default_on(); /* Unit Tests */ #ifdef UNIV_ENABLE_UNIT_TEST_GET_PARENT_DIR unit_test_os_file_get_parent_dir(); #endif /* UNIV_ENABLE_UNIT_TEST_GET_PARENT_DIR */ #ifdef UNIV_ENABLE_UNIT_TEST_MAKE_FILEPATH test_make_filepath(); #endif /*UNIV_ENABLE_UNIT_TEST_MAKE_FILEPATH */ #ifdef UNIV_ENABLE_DICT_STATS_TEST test_dict_stats_all(); #endif /*UNIV_ENABLE_DICT_STATS_TEST */ #ifdef UNIV_ENABLE_UNIT_TEST_ROW_RAW_FORMAT_INT # ifdef HAVE_UT_CHRONO_T test_row_raw_format_int(); # endif /* HAVE_UT_CHRONO_T */ #endif /* UNIV_ENABLE_UNIT_TEST_ROW_RAW_FORMAT_INT */ #ifndef UNIV_HOTBACKUP #ifdef _WIN32 if (ut_win_init_time()) { DBUG_RETURN(innobase_init_abort()); } #endif /* _WIN32 */ #endif /* !UNIV_HOTBACKUP */ DBUG_RETURN(0); }
innobase_start_or_create_for_mysql() 函数解析如下:
dberr_t innobase_start_or_create_for_mysql(void) /*====================================*/ { bool create_new_db = false; lsn_t flushed_lsn; ulint sum_of_data_file_sizes; ulint tablespace_size_in_header; dberr_t err; ulint srv_n_log_files_found = srv_n_log_files; mtr_t mtr; purge_pq_t* purge_queue; char logfilename[10000]; char* logfile0 = NULL; size_t dirnamelen; unsigned i = 0; /* Reset the start state. 重置 start state. */ srv_start_state = SRV_START_STATE_NONE; // SRV_FORCE_NO_LOG_REDO: 不做 redo log 的前滚操作 if (srv_force_recovery == SRV_FORCE_NO_LOG_REDO) { srv_read_only_mode = true; } // high_level_read_only: high_level_read_only = srv_read_only_mode || srv_force_recovery > SRV_FORCE_NO_TRX_UNDO; // 如果处于 read_only mode, 那么除了内部表之外,没有其他写操作,关闭两次写机制。 if (srv_read_only_mode) { ib::info() << "Started in read only mode"; /* There is no write except to intrinsic table and so turn-off doublewrite mechanism completely. */ srv_use_doublewrite_buf = FALSE; } #ifdef _WIN32 srv_use_native_aio = TRUE; #elif defined(LINUX_NATIVE_AIO) if (srv_use_native_aio) { ib::info() << "Using Linux native AIO"; } #else /* Currently native AIO is supported only on windows and linux and that also when the support is compiled in. In all other cases, we ignore the setting of innodb_use_native_aio. */ srv_use_native_aio = FALSE; #endif /* _WIN32 */ /* Register performance schema stages before any real work has been started which may need to be instrumented. */ mysql_stage_register("innodb", srv_stages, UT_ARR_SIZE(srv_stages)); /** 处理参数 innodb_flush_method 通常情况下,innodb_flush_method 设置为 O_DIRECT | O_DIRECT_NO_FSYNC; */ if (srv_file_flush_method_str == NULL) { /* These are the default options */ #ifndef _WIN32 srv_unix_file_flush_method = SRV_UNIX_FSYNC; } else if (0 == ut_strcmp(srv_file_flush_method_str, "fsync")) { srv_unix_file_flush_method = SRV_UNIX_FSYNC; } else if (0 == ut_strcmp(srv_file_flush_method_str, "O_DSYNC")) { srv_unix_file_flush_method = SRV_UNIX_O_DSYNC; } else if (0 == ut_strcmp(srv_file_flush_method_str, "O_DIRECT")) { srv_unix_file_flush_method = SRV_UNIX_O_DIRECT; } else if (0 == ut_strcmp(srv_file_flush_method_str, "O_DIRECT_NO_FSYNC")) { srv_unix_file_flush_method = SRV_UNIX_O_DIRECT_NO_FSYNC; } else if (0 == ut_strcmp(srv_file_flush_method_str, "littlesync")) { srv_unix_file_flush_method = SRV_UNIX_LITTLESYNC; } else if (0 == ut_strcmp(srv_file_flush_method_str, "nosync")) { srv_unix_file_flush_method = SRV_UNIX_NOSYNC; #else srv_win_file_flush_method = SRV_WIN_IO_UNBUFFERED; } else if (0 == ut_strcmp(srv_file_flush_method_str, "normal")) { srv_win_file_flush_method = SRV_WIN_IO_NORMAL; srv_use_native_aio = FALSE; } else if (0 == ut_strcmp(srv_file_flush_method_str, "unbuffered")) { srv_win_file_flush_method = SRV_WIN_IO_UNBUFFERED; srv_use_native_aio = FALSE; } else if (0 == ut_strcmp(srv_file_flush_method_str, "async_unbuffered")) { srv_win_file_flush_method = SRV_WIN_IO_UNBUFFERED; #endif /* _WIN32 */ } else { ib::error() << "Unrecognized value " << srv_file_flush_method_str << " for innodb_flush_method"; return(srv_init_abort(DB_ERROR)); } /* Note that the call srv_boot() also changes the values of some variables to the units used by InnoDB internally */ /* Set the maximum number of threads which can wait for a semaphore inside InnoDB: this is the 'sync wait array' size, as well as the maximum number of threads that can wait in the 'srv_conc array' for their time to enter InnoDB. 设置 Innodb 内部可能等待信号量的最大线程数量: 这是 sync wait array 的大小, 以及 在 srv_conc 数组中等待进入 Innodb的最大线程数。 */ srv_max_n_threads = 1 /* io_ibuf_thread */ + 1 /* io_log_thread */ + 1 /* lock_wait_timeout_thread */ + 1 /* srv_error_monitor_thread */ + 1 /* srv_monitor_thread */ + 1 /* srv_master_thread */ + 1 /* srv_purge_coordinator_thread */ + 1 /* buf_dump_thread */ + 1 /* dict_stats_thread */ + 1 /* fts_optimize_thread */ + 1 /* recv_writer_thread */ + 1 /* trx_rollback_or_clean_all_recovered */ + 128 /* added as margin, for use of InnoDB Memcached etc. */ + max_connections + srv_n_read_io_threads + srv_n_write_io_threads + srv_n_purge_threads + srv_n_page_cleaners /* FTS Parallel Sort */ + fts_sort_pll_degree * FTS_NUM_AUX_INDEX * max_connections; /** 重置 innodb_buffer_pool_instances */ if (srv_buf_pool_size >= BUF_POOL_SIZE_THRESHOLD) { if (srv_buf_pool_instances == srv_buf_pool_instances_default) { #if defined(_WIN32) && !defined(_WIN64) /* Do not allocate too large of a buffer pool on Windows 32-bit systems, which can have trouble allocating larger single contiguous memory blocks. */ srv_buf_pool_instances = ut_min( static_cast<ulong>(MAX_BUFFER_POOLS), static_cast<ulong>(srv_buf_pool_size / (128 * 1024 * 1024))); #else /* defined(_WIN32) && !defined(_WIN64) */ /* Default to 8 instances when size > 1GB. */ srv_buf_pool_instances = 8; #endif /* defined(_WIN32) && !defined(_WIN64) */ } } else { /* If buffer pool is less than 1 GiB, assume fewer threads. Also use only one buffer pool instance. */ if (srv_buf_pool_instances != srv_buf_pool_instances_default && srv_buf_pool_instances != 1) { /* We can't distinguish whether the user has explicitly started mysqld with --innodb-buffer-pool-instances=0, (srv_buf_pool_instances_default is 0) or has not specified that option at all. Thus we have the limitation that if the user started with =0, we will not emit a warning here, but we should actually do so. */ ib::info() << "Adjusting innodb_buffer_pool_instances" " from " << srv_buf_pool_instances << " to 1" " since innodb_buffer_pool_size is less than " << BUF_POOL_SIZE_THRESHOLD / (1024 * 1024) << " MiB"; } srv_buf_pool_instances = 1; } // 调整 srv_buf_pool_chunk_unit 大小。 if (srv_buf_pool_chunk_unit * srv_buf_pool_instances > srv_buf_pool_size) { /* Size unit of buffer pool is larger than srv_buf_pool_size. adjust srv_buf_pool_chunk_unit for srv_buf_pool_size. */ srv_buf_pool_chunk_unit = static_cast<ulong>(srv_buf_pool_size) / srv_buf_pool_instances; if (srv_buf_pool_size % srv_buf_pool_instances != 0) { ++srv_buf_pool_chunk_unit; } } // 基于 srv_buf_pool_chunk_unit 对齐 srv_buf_pool_size srv_buf_pool_size = buf_pool_size_align(srv_buf_pool_size); // 根据 srv_buf_pool_instances 重置 innodb_page_cleaners if (srv_n_page_cleaners > srv_buf_pool_instances) { /* limit of page_cleaner parallelizability is number of buffer pool instances. */ srv_n_page_cleaners = srv_buf_pool_instances; } /** 启动innodb server, 进行相关参数和组件的初始化。 */ srv_boot(); ib::info() << (ut_crc32_sse2_enabled ? "Using" : "Not using") << " CPU crc32 instructions"; // innodb monitor 相关 if (!srv_read_only_mode) { mutex_create(LATCH_ID_SRV_MONITOR_FILE, &srv_monitor_file_mutex); if (srv_innodb_status) { srv_monitor_file_name = static_cast<char*>( ut_malloc_nokey( strlen(fil_path_to_mysql_datadir) + 20 + sizeof "/innodb_status.")); sprintf(srv_monitor_file_name, "%s/innodb_status." ULINTPF, fil_path_to_mysql_datadir, os_proc_get_number()); srv_monitor_file = fopen(srv_monitor_file_name, "w+"); if (!srv_monitor_file) { ib::error() << "Unable to create " << srv_monitor_file_name << ": " << strerror(errno); return(srv_init_abort(DB_ERROR)); } } else { srv_monitor_file_name = NULL; srv_monitor_file = os_file_create_tmpfile(NULL); if (!srv_monitor_file) { return(srv_init_abort(DB_ERROR)); } } mutex_create(LATCH_ID_SRV_DICT_TMPFILE, &srv_dict_tmpfile_mutex); srv_dict_tmpfile = os_file_create_tmpfile(NULL); if (!srv_dict_tmpfile) { return(srv_init_abort(DB_ERROR)); } mutex_create(LATCH_ID_SRV_MISC_TMPFILE, &srv_misc_tmpfile_mutex); srv_misc_tmpfile = os_file_create_tmpfile(NULL); if (!srv_misc_tmpfile) { return(srv_init_abort(DB_ERROR)); } } /** file_io_threads */ // innodb_read_io_threads & innodb_write_io_threads srv_n_file_io_threads = srv_n_read_io_threads; srv_n_file_io_threads += srv_n_write_io_threads; // 非 read only, 添加 log & ibuf io thread if (!srv_read_only_mode) { /* Add the log and ibuf IO threads. */ srv_n_file_io_threads += 2; } else { ib::info() << "Disabling background log and ibuf IO write" << " threads."; } ut_a(srv_n_file_io_threads <= SRV_MAX_N_IO_THREADS); // 初始化异步IO子系统。 if (!os_aio_init(srv_n_read_io_threads, srv_n_write_io_threads, SRV_MAX_N_PENDING_SYNC_IOS)) { ib::error() << "Cannot initialize AIO sub-system"; return(srv_init_abort(DB_ERROR)); } // 初始化各表空间的内存cache fil_init(srv_file_per_table ? 50000 : 5000, srv_max_n_open_files); double size; char unit; // innodb_buffer_pool_size 和 chunk_size if (srv_buf_pool_size >= 1024 * 1024 * 1024) { size = ((double) srv_buf_pool_size) / (1024 * 1024 * 1024); unit = 'G'; } else { size = ((double) srv_buf_pool_size) / (1024 * 1024); unit = 'M'; } double chunk_size; char chunk_unit; if (srv_buf_pool_chunk_unit >= 1024 * 1024 * 1024) { chunk_size = srv_buf_pool_chunk_unit / 1024.0 / 1024 / 1024; chunk_unit = 'G'; } else { chunk_size = srv_buf_pool_chunk_unit / 1024.0 / 1024; chunk_unit = 'M'; } ib::info() << "Initializing buffer pool, total size = " << size << unit << ", instances = " << srv_buf_pool_instances << ", chunk size = " << chunk_size << chunk_unit; // 创建 innodb_buffer_pool, 当没有足够的内存时会报错 err = buf_pool_init(srv_buf_pool_size, srv_buf_pool_instances); if (err != DB_SUCCESS) { ib::error() << "Cannot allocate memory for the buffer pool"; return(srv_init_abort(DB_ERROR)); } ib::info() << "Completed initialization of buffer pool"; // 初始化 fsp 系统 & redo log fsp_init(); log_init(); // 创建 recovery 系统, 针对一个 recovery 操作初始化 recovery 系统 recv_sys_create(); recv_sys_init(buf_pool_get_curr_size()); // 数据库启动时创建锁系统 lock_sys_create(srv_lock_table_size); // start lock-timeout thread srv_start_state_set(SRV_START_STATE_LOCK_SYS); /* Create i/o-handler threads: 创建 io 线程 */ for (ulint t = 0; t < srv_n_file_io_threads; ++t) { n[t] = t; os_thread_create(io_handler_thread, n + t, thread_ids + t); } /* Even in read-only mode there could be flush job generated by intrinsic table operations. 初始化 page_cleaner */ buf_flush_page_cleaner_init(); // 创建 buf_flush_page_cleaner_coordinator 线程 os_thread_create(buf_flush_page_cleaner_coordinator, NULL, NULL); // 创建 buf_flush_page_cleaner_worker 线程 for (i = 1; i < srv_n_page_cleaners; ++i) { os_thread_create(buf_flush_page_cleaner_worker, NULL, NULL); } /* Make sure page cleaner is active. page_cleaner处于活跃状态 */ while (!buf_page_cleaner_is_active) { os_thread_sleep(10000); } // start io-thread srv_start_state_set(SRV_START_STATE_IO); // 对目录进行规范 os_normalize_path(srv_data_home); /* Check if the data files exist or not. 检查数据文件是否存在, ibdata1 ibdata2 等等,判断是否需要创建新的数据库 */ err = srv_sys_space.check_file_spec( &create_new_db, MIN_EXPECTED_TABLESPACE_SIZE); if (err != DB_SUCCESS) { return(srv_init_abort(DB_ERROR)); } // 不是创建新的db, 则需要回滚未完成的事务 srv_startup_is_before_trx_rollback_phase = !create_new_db; /* Check if undo tablespaces and redo log files exist before creating a new system tablespace 检查是否存在 redo log file 和 undo 表空间 */ if (create_new_db) { err = srv_check_undo_redo_logs_exists(); if (err != DB_SUCCESS) { return(srv_init_abort(DB_ERROR)); } recv_sys_debug_free(); } /* Open or create the data files. 打开或者创建数据文件。 */ ulint sum_of_new_sizes; // 打开或者创建数据文件[ibdata文件],并从 ibdata1 文件中读取 flushed_lsn err = srv_sys_space.open_or_create( false, create_new_db, &sum_of_new_sizes, &flushed_lsn); switch (err) { case DB_SUCCESS: break; case DB_CANNOT_OPEN_FILE: ib::error() << "Could not open or create the system tablespace. If" " you tried to add new data files to the system" " tablespace, and it failed here, you should now" " edit innodb_data_file_path in my.cnf back to what" " it was, and remove the new ibdata files InnoDB" " created in this failed attempt. InnoDB only wrote" " those files full of zeros, but did not yet use" " them in any way. But be careful: do not remove" " old data files which contain your precious data!"; /* fall through */ default: /* Other errors might come from Datafile::validate_first_page() */ return(srv_init_abort(err)); } dirnamelen = strlen(srv_log_group_home_dir); ut_a(dirnamelen < (sizeof logfilename) - 10 - sizeof "ib_logfile"); memcpy(logfilename, srv_log_group_home_dir, dirnamelen); /* Add a path separator if needed. */ if (dirnamelen && logfilename[dirnamelen - 1] != OS_PATH_SEPARATOR) { logfilename[dirnamelen++] = OS_PATH_SEPARATOR; } srv_log_file_size_requested = srv_log_file_size; if (create_new_db) { /** 如果是 create new db */ // 从所有缓冲池实例的 flush list 的末尾同步的 flush dirty blocks. buf_flush_sync_all_buf_pools(); // 获取 current lsn flushed_lsn = log_get_lsn(); // 创建 redo log file err = create_log_files( logfilename, dirnamelen, flushed_lsn, logfile0); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } } else { // not create new db for (i = 0; i < SRV_N_LOG_FILES_MAX; i++) { os_offset_t size; os_file_stat_t stat_info; sprintf(logfilename + dirnamelen, "ib_logfile%u", i); // 获取 logfile 文件状态 err = os_file_get_status( logfilename, &stat_info, false, srv_read_only_mode); if (err == DB_NOT_FOUND) { if (i == 0) { if (flushed_lsn < static_cast<lsn_t>(1000)) { ib::error() << "Cannot create" " log files because" " data files are" " corrupt or the" " database was not" " shut down cleanly" " after creating" " the data files."; return(srv_init_abort( DB_ERROR)); } err = create_log_files( logfilename, dirnamelen, flushed_lsn, logfile0); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } create_log_files_rename( logfilename, dirnamelen, flushed_lsn, logfile0); /* Suppress the message about crash recovery. */ flushed_lsn = log_get_lsn(); goto files_checked; } else if (i < 2) { /* must have at least 2 log files */ ib::error() << "Only one log file" " found."; return(srv_init_abort(err)); } /* opened all files */ break; } // 检查 log file mode if (!srv_file_check_mode(logfilename)) { return(srv_init_abort(DB_ERROR)); } // 打开 redo log file err = open_log_file(&files[i], logfilename, &size); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } ut_a(size != (os_offset_t) -1); if (size & ((1 << UNIV_PAGE_SIZE_SHIFT) - 1)) { ib::error() << "Log file " << logfilename << " size " << size << " is not a" " multiple of innodb_page_size"; return(srv_init_abort(DB_ERROR)); } size >>= UNIV_PAGE_SIZE_SHIFT; if (i == 0) { srv_log_file_size = size; } else if (size != srv_log_file_size) { ib::error() << "Log file " << logfilename << " is of different size " << (size << UNIV_PAGE_SIZE_SHIFT) << " bytes than other log files " << (srv_log_file_size << UNIV_PAGE_SIZE_SHIFT) << " bytes!"; return(srv_init_abort(DB_ERROR)); } } // logfile的数量 srv_n_log_files_found = i; /* Create the in-memory file space objects. 创建 log file 内存中的文件空间对象。 */ sprintf(logfilename + dirnamelen, "ib_logfile%u", 0); /* Disable the doublewrite buffer for log files. log file 禁用两次写缓冲区。 */ fil_space_t* log_space = fil_space_create( "innodb_redo_log", SRV_LOG_SPACE_FIRST_ID, fsp_flags_set_page_size(0, univ_page_size), FIL_TYPE_LOG); ut_a(fil_validate()); ut_a(log_space); /* srv_log_file_size is measured in pages; if page size is 16KB, then we have a limit of 64TB on 32 bit systems */ ut_a(srv_log_file_size <= ULINT_MAX); // 添加 log file文件到 log file space 中 for (unsigned j = 0; j < i; j++) { sprintf(logfilename + dirnamelen, "ib_logfile%u", j); if (!fil_node_create(logfilename, (ulint) srv_log_file_size, log_space, false, false)) { return(srv_init_abort(DB_ERROR)); } } // 初始化 redo log group if (!log_group_init(0, i, srv_log_file_size * UNIV_PAGE_SIZE, SRV_LOG_SPACE_FIRST_ID)) { return(srv_init_abort(DB_ERROR)); } } files_checked: /* Open all log files and data files in the system tablespace: we keep them open until database shutdown */ // 打开所有的日志文件和系统表数据文件。 fil_open_log_and_system_tablespace_files(); // 打开 undo 表空间, 在找到并打开所有的 undo 文件之后, 将他们全部加入文件管理系统 err = srv_undo_tablespaces_init( create_new_db, srv_undo_tablespaces, &srv_undo_tablespaces_open); /* If the force recovery is set very high then we carry on regardless of all errors. Basically this is fingers crossed mode. 接下来涉及到数据的恢复。 */ if (err != DB_SUCCESS && srv_force_recovery < SRV_FORCE_NO_UNDO_LOG_SCAN) { return(srv_init_abort(err)); } /* Initialize objects used by dict stats gathering thread, which can also be used by recovery if it tries to drop some table */ if (!srv_read_only_mode) { dict_stats_thread_init(); } // 初始化 file_format_max变量。 trx_sys_file_format_init(); // 创建 trx_sys instance 并初始化 purge_queue 和 mutex trx_sys_create(); if (create_new_db) { ut_a(!srv_read_only_mode); mtr_start(&mtr); bool ret = fsp_header_init(0, sum_of_new_sizes, &mtr); mtr_commit(&mtr); if (!ret) { return(srv_init_abort(DB_ERROR)); } /* To maintain backward compatibility we create only the first rollback segment before the double write buffer. All the remaining rollback segments will be created later, after the double write buffer has been created. */ trx_sys_create_sys_pages(); purge_queue = trx_sys_init_at_db_start(); DBUG_EXECUTE_IF("check_no_undo", ut_ad(purge_queue->empty()); ); /* The purge system needs to create the purge view and therefore requires that the trx_sys is inited. */ trx_purge_sys_create(srv_n_purge_threads, purge_queue); err = dict_create(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } buf_flush_sync_all_buf_pools(); flushed_lsn = log_get_lsn(); fil_write_flushed_lsn(flushed_lsn); create_log_files_rename( logfilename, dirnamelen, flushed_lsn, logfile0); } else { /* Check if we support the max format that is stamped on the system tablespace. Note: We are NOT allowed to make any modifications to the TRX_SYS_PAGE_NO page before recovery because this page also contains the max_trx_id etc. important system variables that are required for recovery. We need to ensure that we return the system to a state where normal recovery is guaranteed to work. We do this by invalidating the buffer cache, this will force the reread of the page and restoration to its last known consistent state, this is REQUIRED for the recovery process to work. */ // 检查是否支持系统表空间上的 max 格式。 err = trx_sys_file_format_max_check( srv_max_file_format_at_startup); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } /* Invalidate the buffer pool to ensure that we reread the page that we read above, during recovery. Note that this is not as heavy weight as it seems. At this point there will be only ONE page in the buf_LRU and there must be no page in the buf_flush list. 使整个缓冲池无效, 来确保在 recovery的过程中我们重启读取之前读取的页。 这是一个很轻量级的操作, 此时再 LRU 列表中只有一个数据页, 在 flush 列表中没有任何数据页。 */ buf_pool_invalidate(); /* Scan and locate truncate log files. Parsed located files and add table to truncate information to central vector for truncate fix-up action post recovery. 扫描并定位 truncate log file, 解析truncate log file. */ err = TruncateLogParser::scan_and_parse(srv_log_group_home_dir); if (err != DB_SUCCESS) { return(srv_init_abort(DB_ERROR)); } /* We always try to do a recovery, even if the database had been shut down normally: this is the normal startup path 通常情况下, 需要做一个 recovery 操作, 即使 database 正常关闭。 */ /** 从 checkpoint flushed_lsn 位置开始恢复。 1. 初始化红黑树, 以便在恢复的过程中快速插入 flush 列表。 2. 在 log groups 中查找 latest checkpoint 3. 读取 latest checkpoint 所在的 redo log 页到 log_sys->checkpoint_buf中 4. 获取 checkpoint_lsn 和 checkpoint_no 5. 从 checkpoing_lsn 读取 redo log 到 hash 表中。 6. 检查 crash recovery 所需的表空间, 处理并删除double write buf 中的数据页, 这里会检查double write buf 中页对应的真实数据页的 完整性, 如果有问题, 则使用 double write buf 中页进行恢复。同时, 生成后台线程 recv_writer_thread 以清理缓冲池中的脏页。 7. 将日志段从最新的日志组复制到其他组, 我们目前只有一个日志组。 */ err = recv_recovery_from_checkpoint_start(flushed_lsn); // 清除 double write buf 中的数据页 recv_sys->dblwr.pages.clear(); // 初始化 数据字典系统,并初始化change buffer if (err == DB_SUCCESS) { /* Initialize the change buffer. */ err = dict_boot(); } if (err != DB_SUCCESS) { /* A tablespace was not found during recovery. The user must force recovery. */ if (err == DB_TABLESPACE_NOT_FOUND) { srv_fatal_error(); ut_error; } return(srv_init_abort(DB_ERROR)); } // 创建并初始化事务系统。 purge_queue = trx_sys_init_at_db_start(); if (srv_force_recovery < SRV_FORCE_NO_LOG_REDO) { /* Apply the hashed log records to the respective file pages, for the last batch of recv_group_scan_log_recs(). */ // 应用 redo log, 完成 crash recovery 操作. recv_apply_hashed_log_recs(TRUE); DBUG_PRINT("ib_log", ("apply completed")); if (recv_needed_recovery) { /// Last MySQL binlog file position 0 894036112, file name mysql-bin.002128 trx_sys_print_mysql_binlog_offset(); } } if (recv_sys->found_corrupt_log) { ib::warn() << "The log file may have been corrupt and it" " is possible that the log scan or parsing" " did not proceed far enough in recovery." " Please run CHECK TABLE on your InnoDB tables" " to check that they are ok!" " It may be safest to recover your" " InnoDB database from a backup!"; } /* The purge system needs to create the purge view and therefore requires that the trx_sys is inited. */ // 创建 trx_purge_sys trx_purge_sys_create(srv_n_purge_threads, purge_queue); /* recv_recovery_from_checkpoint_finish needs trx lists which are initialized in trx_sys_init_at_db_start(). */ /* 完成 recovery 操作。 1. 确保 recv_writer 线程已完成 2. 等待 flush 操作完成, flush脏页操作已经完成 3. 等待 recv_writer 线程终止 4. 释放 flush 红黑树 5. 回滚所有的数据字典表的事务,以便数据字典表没有被锁定。数据字典 latch 应保证一次只有一个数据字典事务处于活跃状态。 */ recv_recovery_from_checkpoint_finish(); /* Fix-up truncate of tables in the system tablespace if server crashed while truncate was active. The non- system tables are done after tablespace discovery. Do this now because this procedure assumes that no pages have changed since redo recovery. Tablespace discovery can do updates to pages in the system tablespace.*/ // 修复系统表空间中的表 err = truncate_t::fixup_tables_in_system_tablespace(); if (srv_force_recovery < SRV_FORCE_NO_IBUF_MERGE) { /* Open or Create SYS_TABLESPACES and SYS_DATAFILES so that tablespace names and other metadata can be found. */ srv_sys_tablespaces_open = true; // 检查数据字典中每个表的表空间 err = dict_create_or_check_sys_tablespace(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } /* The following call is necessary for the insert buffer to work with multiple tablespaces. We must know the mapping between space id's and .ibd file names. In a crash recovery, we check that the info in data dictionary is consistent with what we already know about space id's from the calls to fil_ibd_load(). In a normal startup, we create the space objects for every table in the InnoDB data dictionary that has an .ibd file. We also determine the maximum tablespace id used. The 'validate' flag indicates that when a tablespace is opened, we also read the header page and validate the contents to the data dictionary. This is time consuming, especially for databases with lots of ibd files. So only do it after a crash and not forcing recovery. Open rw transactions at this point is not a good reason to validate. */ bool validate = recv_needed_recovery && srv_force_recovery == 0; dict_check_tablespaces_and_store_max_id(validate); } /* Rotate the encryption key for recovery. It's because server could crash in middle of key rotation. Some tablespace didn't complete key rotation. Here, we will resume the rotation. */ if (!srv_read_only_mode && srv_force_recovery < SRV_FORCE_NO_LOG_REDO) { fil_encryption_rotate(); } /* Fix-up truncate of table if server crashed while truncate was active. */ err = truncate_t::fixup_tables_in_non_system_tablespace(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } if (!srv_force_recovery && !recv_sys->found_corrupt_log && (srv_log_file_size_requested != srv_log_file_size || srv_n_log_files_found != srv_n_log_files)) { /* Prepare to replace the redo log files. */ if (srv_read_only_mode) { ib::error() << "Cannot resize log files" " in read-only mode."; return(srv_init_abort(DB_READ_ONLY)); } /* Prepare to delete the old redo log files */ flushed_lsn = srv_prepare_to_delete_redo_log_files(i); /* Prohibit redo log writes from any other threads until creating a log checkpoint at the end of create_log_files(). */ ut_d(recv_no_log_write = true); ut_ad(!buf_pool_check_no_pending_io()); RECOVERY_CRASH(3); /* Stamp the LSN to the data files. */ fil_write_flushed_lsn(flushed_lsn); RECOVERY_CRASH(4); /* Close and free the redo log files, so that we can replace them. */ fil_close_log_files(true); RECOVERY_CRASH(5); /* Free the old log file space. */ log_group_close_all(); ib::warn() << "Starting to delete and rewrite log" " files."; srv_log_file_size = srv_log_file_size_requested; err = create_log_files( logfilename, dirnamelen, flushed_lsn, logfile0); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } create_log_files_rename( logfilename, dirnamelen, flushed_lsn, logfile0); } // 回滚未提交的不完整的事务, 这是在一个后台线程中进行的。 recv_recovery_rollback_active(); /* It is possible that file_format tag has never been set. In this case we initialize it to minimum value. Important to note that we can do it ONLY after we have finished the recovery process so that the image of TRX_SYS_PAGE_NO is not stale. */ trx_sys_file_format_tag_init(); } if (!create_new_db) { /* Check and reset any no-redo rseg slot on disk used by pre-5.7.2 redo resg with no data to purge. */ trx_rseg_reset_pending(); } if (!create_new_db && sum_of_new_sizes > 0) { /* New data file(s) were added */ mtr_start(&mtr); fsp_header_inc_size(0, sum_of_new_sizes, &mtr); mtr_commit(&mtr); /* Immediately write the log record about increased tablespace size to disk, so that it is durable even if mysqld would crash quickly */ log_buffer_flush_to_disk(); } /* Open temp-tablespace and keep it open until shutdown. */ // 打开临时表空间 err = srv_open_tmp_tablespace(create_new_db, &srv_tmp_space); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } /* Create the doublewrite buffer to a new tablespace */ if (buf_dblwr == NULL && !buf_dblwr_create()) { return(srv_init_abort(DB_ERROR)); } /* Here the double write buffer has already been created and so any new rollback segments will be allocated after the double write buffer. The default segment should already exist. We create the new segments only if it's a new database or the database was shutdown cleanly. */ /* Note: When creating the extra rollback segments during an upgrade we violate the latching order, even if the change buffer is empty. We make an exception in sync0sync.cc and check srv_is_being_started for that violation. It cannot create a deadlock because we are still running in single threaded mode essentially. Only the IO threads should be running at this stage. */ /* Deprecate innodb_undo_logs. But still use it if it is set to non-default and innodb_rollback_segments is default. */ ut_a(srv_rollback_segments > 0); ut_a(srv_rollback_segments <= TRX_SYS_N_RSEGS); ut_a(srv_undo_logs > 0); ut_a(srv_undo_logs <= TRX_SYS_N_RSEGS); if (srv_undo_logs < TRX_SYS_N_RSEGS) { ib::warn() << deprecated_undo_logs; if (srv_rollback_segments == TRX_SYS_N_RSEGS) { srv_rollback_segments = srv_undo_logs; } } /* The number of rsegs that exist in InnoDB is given by status variable srv_available_undo_logs. The number of rsegs to use can be set using the dynamic global variable srv_rollback_segments. */ // 创建回滚段 srv_available_undo_logs = trx_sys_create_rsegs( srv_undo_tablespaces, srv_rollback_segments, srv_tmp_undo_logs); if (srv_available_undo_logs == ULINT_UNDEFINED) { /* Can only happen if server is read only. */ ut_a(srv_read_only_mode); srv_rollback_segments = ULONG_UNDEFINED; } else if (srv_available_undo_logs < srv_rollback_segments && !srv_force_recovery && !recv_needed_recovery) { ib::error() << "System or UNDO tablespace is running of out" << " of space"; /* Should due to out of file space. */ return(srv_init_abort(DB_ERROR)); } srv_startup_is_before_trx_rollback_phase = false; if (!srv_read_only_mode) { /* Create the thread which watches the timeouts for lock waits 创建 lock_wait_timeout_thread watch 线程 */ os_thread_create( lock_wait_timeout_thread, NULL, thread_ids + 2 + SRV_MAX_N_IO_THREADS); /* Create the thread which warns of long semaphore waits 创建 srv_error_monitor_thread 线程 */ os_thread_create( srv_error_monitor_thread, NULL, thread_ids + 3 + SRV_MAX_N_IO_THREADS); /* Create the thread which prints InnoDB monitor info 创建 Innodb monitor info print 线程 */ os_thread_create( srv_monitor_thread, NULL, thread_ids + 4 + SRV_MAX_N_IO_THREADS); srv_start_state_set(SRV_START_STATE_MONITOR); } /* Create the SYS_FOREIGN and SYS_FOREIGN_COLS system tables */ err = dict_create_or_check_foreign_constraint_tables(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } /* Create the SYS_TABLESPACES system table */ err = dict_create_or_check_sys_tablespace(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } srv_sys_tablespaces_open = true; /* Create the SYS_VIRTUAL system table */ err = dict_create_or_check_sys_virtual(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } srv_is_being_started = false; ut_a(trx_purge_state() == PURGE_STATE_INIT); /* Create the master thread which does purge and other utility operations 创建 master 线程 */ if (!srv_read_only_mode) { os_thread_create( srv_master_thread, NULL, thread_ids + (1 + SRV_MAX_N_IO_THREADS)); srv_start_state_set(SRV_START_STATE_MASTER); } // purge_coordinator 线程和 purge_worker 线程 if (!srv_read_only_mode && srv_force_recovery < SRV_FORCE_NO_BACKGROUND) { os_thread_create( srv_purge_coordinator_thread, NULL, thread_ids + 5 + SRV_MAX_N_IO_THREADS); ut_a(UT_ARR_SIZE(thread_ids) > 5 + srv_n_purge_threads + SRV_MAX_N_IO_THREADS); /* We've already created the purge coordinator thread above. */ for (i = 1; i < srv_n_purge_threads; ++i) { os_thread_create( srv_worker_thread, NULL, thread_ids + 5 + i + SRV_MAX_N_IO_THREADS); } // 等待 purge thread 启动 srv_start_wait_for_purge_to_start(); srv_start_state_set(SRV_START_STATE_PURGE); } else { purge_sys->state = PURGE_STATE_DISABLED; } /* wake main loop of page cleaner up 唤醒 page cleaner 主循环 */ os_event_set(buf_flush_event); sum_of_data_file_sizes = srv_sys_space.get_sum_of_sizes(); ut_a(sum_of_new_sizes != ULINT_UNDEFINED); tablespace_size_in_header = fsp_header_get_tablespace_size(); if (!srv_read_only_mode && !srv_sys_space.can_auto_extend_last_file() && sum_of_data_file_sizes != tablespace_size_in_header) { ib::error() << "Tablespace size stored in header is " << tablespace_size_in_header << " pages, but the sum" " of data file sizes is " << sum_of_data_file_sizes << " pages"; if (srv_force_recovery == 0 && sum_of_data_file_sizes < tablespace_size_in_header) { /* This is a fatal error, the tail of a tablespace is missing */ ib::error() << "Cannot start InnoDB." " The tail of the system tablespace is" " missing. Have you edited" " innodb_data_file_path in my.cnf in an" " inappropriate way, removing" " ibdata files from there?" " You can set innodb_force_recovery=1" " in my.cnf to force" " a startup if you are trying" " to recover a badly corrupt database."; return(srv_init_abort(DB_ERROR)); } } if (!srv_read_only_mode && srv_sys_space.can_auto_extend_last_file() && sum_of_data_file_sizes < tablespace_size_in_header) { ib::error() << "Tablespace size stored in header is " << tablespace_size_in_header << " pages, but the sum" " of data file sizes is only " << sum_of_data_file_sizes << " pages"; if (srv_force_recovery == 0) { ib::error() << "Cannot start InnoDB. The tail of" " the system tablespace is" " missing. Have you edited" " innodb_data_file_path in my.cnf in an" " InnoDB: inappropriate way, removing" " ibdata files from there?" " You can set innodb_force_recovery=1" " in my.cnf to force" " InnoDB: a startup if you are trying to" " recover a badly corrupt database."; return(srv_init_abort(DB_ERROR)); } } if (srv_print_verbose_log) { ib::info() << INNODB_VERSION_STR << " started; log sequence number " << srv_start_lsn; } if (srv_force_recovery > 0) { ib::info() << "!!! innodb_force_recovery is set to " << srv_force_recovery << " !!!"; } if (srv_force_recovery == 0) { /* In the insert buffer we may have even bigger tablespace id's, because we may have dropped those tablespaces, but insert buffer merge has not had time to clean the records from the ibuf tree. */ ibuf_update_max_tablespace_id(); } if (!srv_read_only_mode) { if (create_new_db) { srv_buffer_pool_load_at_startup = FALSE; } /* Create the buffer pool dump/load thread */ os_thread_create(buf_dump_thread, NULL, NULL); /* Create the dict stats gathering thread */ os_thread_create(dict_stats_thread, NULL, NULL); /* Create the thread that will optimize the FTS sub-system. */ fts_optimize_init(); srv_start_state_set(SRV_START_STATE_STAT); } /* Create the buffer pool resize thread */ os_thread_create(buf_resize_thread, NULL, NULL); srv_was_started = TRUE; return(DB_SUCCESS); }