• Redis源码分析AOF持久化


    AOF持久化:

    ​ 与RDB持久化通过保存数据库中的键值对来记录数据库状态不同,AOF持久化是通过保存Redis服务器所执行的写命令来记录数据库状态的。

    一、结构实现:

    redisServer中有关AOF持久化的字段:

    list *aof_rewrite_buf_blocks;   /* Hold changes during an AOF rewrite. */
    sds aof_buf;      /* AOF buffer, written before entering the event loop */
    pid_t aof_child_pid;            /* PID if rewriting process */
    
    • L1:AOF缓冲区;
    • L2:AOF重写缓冲区;
    • L3:fork一个子进程来重写AOF文件,如果该字段不为-1,说明正在进行AOF重写;

    二、命令追加:

    ​ 服务器在执行一个命令后,将命令写入AOF缓冲区中,由命令传播[2]实现;


    三、命令写入与同步:

    ​ 在事件循环中,服务器可能会执行多条命令,AOF缓冲区中被插入许多内容。在进入下一次epoll_wait(2)调用阻塞时,我们需要将AOF缓冲区的内容写入到AOF文件中。

    void aeMain(aeEventLoop *eventLoop) {
        eventLoop->stop = 0;
        while (!eventLoop->stop) {
            if (eventLoop->beforesleep != NULL)
                eventLoop->beforesleep(eventLoop);
            aeProcessEvents(eventLoop, AE_ALL_EVENTS);
        }
    }
    
    • L4:执行命令写入函数flushAppendOnlyFile
    void flushAppendOnlyFile(int force) {
        ssize_t nwritten;
        int sync_in_progress = 0;
    
        if (sdslen(server.aof_buf) == 0) return;
    
        if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
            /* 是否有fsync正在执行 */
            sync_in_progress = bioPendingJobsOfType(REDIS_BIO_AOF_FSYNC) != 0;
    
        if (server.aof_fsync == AOF_FSYNC_EVERYSEC && !force) {
            /* With this append fsync policy we do background fsyncing.
             * If the fsync is still in progress we can try to delay
             * the write for a couple of seconds. */
            if (sync_in_progress) {
                if (server.aof_flush_postponed_start == 0) {
                    /* No previous write postponinig, remember that we are
                     * postponing the flush and return. */
                    server.aof_flush_postponed_start = server.unixtime;
                    return;
                } else if (server.unixtime - server.aof_flush_postponed_start < 2) {
                    /* We were already waiting for fsync to finish, but for less
                     * than two seconds this is still ok. Postpone again. */
                    return;
                }
                /* Otherwise fall trough, and go write since we can't wait
                 * over two seconds. */
                server.aof_delayed_fsync++;
                redisLog(REDIS_NOTICE,"Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.");
            }
        }
        /* If you are following this code path, then we are going to write so
         * set reset the postponed flush sentinel to zero. */
        server.aof_flush_postponed_start = 0;
    
        /* We want to perform a single write. This should be guaranteed atomic
         * at least if the filesystem we are writing is a real physical one.
         * While this will save us against the server being killed I don't think
         * there is much to do about the whole server stopping for power problems
         * or alike we
        /* 服务器尝试一次性写入,如果不行,那么就放弃 */
        nwritten = write(server.aof_fd,server.aof_buf,sdslen(server.aof_buf));
        if (nwritten != (signed)sdslen(server.aof_buf)) {
            /* Ooops, we are in troubles. The best thing to do for now is
             * aborting instead of giving the illusion that everything is
             * working as expected. */
            if (nwritten == -1) {
                redisLog(REDIS_WARNING,"Exiting on error writing to the append-only file: %s",strerror(errno));
            } else {
                redisLog(REDIS_WARNING,"Exiting on short write while writing to "
                                       "the append-only file: %s (nwritten=%ld, "
                                       "expected=%ld)",
                                       strerror(errno),
                                       (long)nwritten,
                                       (long)sdslen(server.aof_buf));
                /* 如果只写入了一部分,使用ftruncate函数放弃写入的部分 */
                if (ftruncate(server.aof_fd, server.aof_current_size) == -1) {
                    redisLog(REDIS_WARNING, "Could not remove short write "
                             "from the append-only file.  Redis may refuse "
                             "to load the AOF the next time it starts.  "
                             "ftruncate: %s", strerror(errno));
                }
            }
            exit(1);
        }
        /* 一次性写入成功,更新aof文件大小 */
        server.aof_current_size += nwritten;
    
        /* Re-use AOF buffer when it is small enough. The maximum comes from the
         * arena size of 4k minus some overhead (but is otherwise arbitrary). */
        if ((sdslen(server.aof_buf)+sdsavail(server.aof_buf)) < 4000) {
            sdsclear(server.aof_buf);
        } else {
            sdsfree(server.aof_buf);
            server.aof_buf = sdsempty();
        }
    
        /* Don't fsync if no-appendfsync-on-rewrite is set to yes and there are
         * children doing I/O in the background. */
        if (server.aof_no_fsync_on_rewrite &&
            (server.aof_child_pid != -1 || server.rdb_child_pid != -1))
                return;
    
        /* Perform the fsync if needed. */
        /* 两种模式的差异 */
        if (server.aof_fsync == AOF_FSYNC_ALWAYS) {
            /* aof_fsync is defined as fdatasync() for Linux in order to avoid
             * flushing metadata. */
            aof_fsync(server.aof_fd); /* Let's try to get this data on the disk */
            server.aof_last_fsync = server.unixtime;
        } else if ((server.aof_fsync == AOF_FSYNC_EVERYSEC &&
                    server.unixtime > server.aof_last_fsync)) {
            if (!sync_in_progress) aof_background_fsync(server.aof_fd);
            server.aof_last_fsync = server.unixtime;
        }
    }
    
    • L41~L65:我们希望一次性写入AOF缓冲区中所有内容,否则就放弃这次写入;
    • K84~L95:AOF持久化的同步[2]选项有三种:
      1. ALWAYS:每次事件循环中写入AOF缓冲区中所有内容,并进行一次同步(fsync);如果出现故障停机,会丢失一个事件循环中执行的命令
      2. EVERYSEC:每次事件循环中写入AOF缓冲区中所有内容,并且每秒都要在子线程中对AOF文件进行一次同步;如果出现故障停机,会丢失1s内执行的命令
      3. NO:每次事件循环中写入AOF缓冲区中所有内容,同步则取决于操作系统何时执行;

    四、AOF重写:

    ​ AOF重写(rewrite)可以解决AOF文件体积膨胀的问题:重写后的新AOF文件与旧AOF文件所保存的数据库状态一致,但新AOF文件不会包含任何浪费空间的冗余命令

    ​ 当AOF文件在后台重写时(BGREWRITEAOF)会创建一个新AOF文件,服务器可以继续处理命令请求(并更新旧的AOF文件),此时可能就会出现AOF文件和现有数据库状态的不一致。为了解决这种不一致,服务器会在AOF重写时,除了将命令插入AOF缓冲区(更新旧AOF文件),也会插入AOF重写缓冲区。当AOF文件。

    ​ 服务器在执行bgrewriteaofCommand时执行rewriteAppendOnlyFileBackground函数:

    • Note:服务器后台执行BGSAVE命令时,BGREWRITEAOF命令会被推迟到BGSAVE命令执行完毕后执行,可以在serverCron中看到这一点;

      这么做的原因是这两个命令都会执行大量的磁盘写入操作,同时执行这两个命令的性能不好;

    int rewriteAppendOnlyFileBackground(void) {
        pid_t childpid;
        long long start;
    
        if (server.aof_child_pid != -1) return REDIS_ERR;
        start = ustime();
        if ((childpid = fork()) == 0) {
            char tmpfile[256];
    
            /* Child */
            closeListeningSockets(0);
            redisSetProcTitle("redis-aof-rewrite");
            snprintf(tmpfile,256,"temp-rewriteaof-bg-%d.aof", (int) getpid());
            /* 执行重写操作 */
            if (rewriteAppendOnlyFile(tmpfile) == REDIS_OK) {
                size_t private_dirty = zmalloc_get_private_dirty();
    
                if (private_dirty) {
                    redisLog(REDIS_NOTICE,
                        "AOF rewrite: %zu MB of memory used by copy-on-write",
                        private_dirty/(1024*1024));
                }
                /* 给父进程发送信号 */
                exitFromChild(0);
            } else {
                /* 给父进程发送信号 */
                exitFromChild(1);
            }
        } else {
            /* Parent */
            server.stat_fork_time = ustime()-start;
            if (childpid == -1) {
                redisLog(REDIS_WARNING,
                    "Can't rewrite append only file in background: fork: %s",
                    strerror(errno));
                return REDIS_ERR;
            }
            redisLog(REDIS_NOTICE,
                "Background append only file rewriting started by pid %d",childpid);
            server.aof_rewrite_scheduled = 0;
            server.aof_rewrite_time_start = time(NULL);
            server.aof_child_pid = childpid;
            updateDictResizePolicy();
            /* We set appendseldb to -1 in order to force the next call to the
             * feedAppendOnlyFile() to issue a SELECT command, so the differences
             * accumulated by the parent into server.aof_rewrite_buf will start
             * with a SELECT statement and it will be safe to merge. */
            /* 将 aof_selected_db 设为 -1 ,
             * 强制让 feedAppendOnlyFile() 下次执行时引发一个 SELECT 命令,
             * 从而确保之后新添加的命令会设置到正确的数据库中 */
            server.aof_selected_db = -1;
            replicationScriptCacheFlush();
            return REDIS_OK;
        }
        return REDIS_OK; /* unreached */
    }
    
    • L15:子进程执行重写操作,遍历server的所有数据库,根据数据库状态来执行命令,从而减小AOF文件体积;

    • L24:子进程重写完成后用信号通知父进程,父进程仍有其他操作:

      上文提到AOF后台重写时,服务器接收新的命令请求,可能出现新旧AOF文件出现数据不一致的状况;所以额外引入AOF重写缓冲区;

      看一下父进程的操作(在serverCron中调用):

    void backgroundRewriteDoneHandler(int exitcode, int bysignal) {
        if (!bysignal && exitcode == 0) {
            int newfd, oldfd;
            char tmpfile[256];
            long long now = ustime();
    
            redisLog(REDIS_NOTICE,
                "Background AOF rewrite terminated with success");
    
            /* Flush the differences accumulated by the parent to the
             * rewritten AOF. */
            snprintf(tmpfile,256,"temp-rewriteaof-bg-%d.aof",
                (int)server.aof_child_pid);
            /* AOF文件写入完成,打开文件*/
            newfd = open(tmpfile,O_WRONLY|O_APPEND);
            if (newfd == -1) {
                redisLog(REDIS_WARNING,
                    "Unable to open the temporary AOF produced by the child: %s", strerror(errno));
                goto cleanup;
            }
            /* 将AOF重写缓冲区中内容写入AOF文件 */
            if (aofRewriteBufferWrite(newfd) == -1) {
                redisLog(REDIS_WARNING,
                    "Error trying to flush the parent diff to the rewritten AOF: %s", strerror(errno));
                close(newfd);
                goto cleanup;
            }
    
            redisLog(REDIS_NOTICE,
                "Parent diff successfully flushed to the rewritten AOF (%lu bytes)", aofRewriteBufferSize());
    
            /* The only remaining thing to do is to rename the temporary file to
             * the configured file and switch the file descriptor used to do AOF
             * writes. We don't want close(2) or rename(2) calls to block the
             * server on old file deletion.
             *
             * There are two possible scenarios:
             *
             * 1) AOF is DISABLED and this was a one time rewrite. The temporary
             * file will be renamed to the configured file. When this file already
             * exists, it will be unlinked, which may block the server.
             *
             * 2) AOF is ENABLED and the rewritten AOF will immediately start
             * receiving writes. After the temporary file is renamed to the
             * configured file, the original AOF file descriptor will be closed.
             * Since this will be the last reference to that file, closing it
             * causes the underlying file to be unlinked, which may block the
             * server.
             *
             * To mitigate the blocking effect of the unlink operation (either
             * caused by rename(2) in scenario 1, or by close(2) in scenario 2), we
             * use a background thread to take care of this. First, we
             * make scenario 1 identical to scenario 2 by opening the target file
             * when it exists. The unlink operation after the rename(2) will then
             * be executed upon calling close(2) for its descriptor. Everything to
             * guarantee atomicity for this switch has already happened by then, so
             * we don't care what the outcome or duration of that close operation
             * is, as long as the file descriptor is released again. */
            if (server.aof_fd == -1) {
                /* AOF disabled */
    
                 /* Don't care if this fails: oldfd will be -1 and we handle that.
                  * One notable case of -1 return is if the old file does
                  * not exist. */
                 oldfd = open(server.aof_filename,O_RDONLY|O_NONBLOCK);
            } else {
                /* AOF enabled */
                oldfd = -1; /* We'll set this to the current AOF filedes later. */
            }
    
            /* Rename the temporary file. This will not unlink the target file if
             * it exists, because we reference it with "oldfd". */
            if (rename(tmpfile,server.aof_filename) == -1) {
                redisLog(REDIS_WARNING,
                    "Error trying to rename the temporary AOF file: %s", strerror(errno));
                close(newfd);
                if (oldfd != -1) close(oldfd);
                goto cleanup;
            }
    
            if (server.aof_fd == -1) {
                /* AOF disabled, we don't need to set the AOF file descriptor
                 * to this new file, so we can close it. */
                close(newfd);
            } else {
                /* AOF enabled, replace the old fd with the new one. */
                oldfd = server.aof_fd;
                server.aof_fd = newfd;
                if (server.aof_fsync == AOF_FSYNC_ALWAYS)
                    aof_fsync(newfd);
                else if (server.aof_fsync == AOF_FSYNC_EVERYSEC)
                    aof_background_fsync(newfd);
                server.aof_selected_db = -1; /* Make sure SELECT is re-issued */
                aofUpdateCurrentSize();
                server.aof_rewrite_base_size = server.aof_current_size;
    
                /* Clear regular AOF buffer since its contents was just written to
                 * the new AOF from the background rewrite buffer. */
                sdsfree(server.aof_buf);
                server.aof_buf = sdsempty();
            }
    
            server.aof_lastbgrewrite_status = REDIS_OK;
    
            redisLog(REDIS_NOTICE, "Background AOF rewrite finished successfully");
            /* Change state from WAIT_REWRITE to ON if needed */
            if (server.aof_state == REDIS_AOF_WAIT_REWRITE)
                server.aof_state = REDIS_AOF_ON;
    
            /* Asynchronously close the overwritten AOF. */
            /* 异步关闭旧 AOF 文件 */
            if (oldfd != -1) bioCreateBackgroundJob(REDIS_BIO_CLOSE_FILE,(void*)(long)oldfd,NULL,NULL);
    
            redisLog(REDIS_VERBOSE,
                "Background AOF rewrite signal handler took %lldus", ustime()-now);
        } else if (!bysignal && exitcode != 0) {
            server.aof_lastbgrewrite_status = REDIS_ERR;
    
            redisLog(REDIS_WARNING,
                "Background AOF rewrite terminated with error");
        } else {
            server.aof_lastbgrewrite_status = REDIS_ERR;
    
            redisLog(REDIS_WARNING,
                "Background AOF rewrite terminated by signal %d", bysignal);
        }
    
    cleanup:
        aofRewriteBufferReset();
        aofRemoveTempFile(server.aof_child_pid);
        server.aof_child_pid = -1;
        server.aof_rewrite_time_last = time(NULL)-server.aof_rewrite_time_start;
        server.aof_rewrite_time_start = -1;
        /* Schedule a new rewrite if we are waiting for it to switch the AOF ON. */
        if (server.aof_state == REDIS_AOF_WAIT_REWRITE)
            server.aof_rewrite_scheduled = 1;
    }
    
    
    • L22:AOF重写缓冲区的内容的写入,(该缓冲区也是在命令传播中被追加的[1]);
    • L32~L58、L111:异步关闭旧AOF文件描述符,避免阻塞!

    参考:

    1. Redis源码分析--命令传播 - macguz - 博客园 (cnblogs.com)

    2. 函数sync、fsync与fdatasync的总结整理(必看篇)_Linux_脚本之家 (jb51.net)

    3. Redis源码解析(8) AOF持久化_李兆龙的博客-CSDN博客

  • 相关阅读:
    mongodb 的安装历程
    从C的声明符到Objective-C的Blocks语法
    #译# Core Data概述 (转)
    避免在block中循环引用(Retain Cycle in Block)
    GCD和信号量
    Blocks的申明调用与Queue当做锁的用法
    [译]在IB中实现自动布局
    清理Xcode的技巧和方法
    SVN的Status字段含义
    iOS应用崩溃日志揭秘2
  • 原文地址:https://www.cnblogs.com/macguz/p/15872639.html
Copyright © 2020-2023  润新知