摘要:本文通过对Redis Sentinel源码的理解,详细说明Sentinel的代码实现方式。
Redis Sentinel 是Redis提供的高可用模型解决方案。Sentinel可以自动监测一个或多个Redis主备实例,并在主实例宕机的情况下自动实行主备倒换。本文通过对Redis Sentinel源码的理解,详细说明Sentinel的代码实现方式。
Sentinel使用Redis内核相同的事件驱动代码框架, 但Sentinel有自己独特的初始化步骤。在这篇文章里,会从Sentinel的初始化、Sentinel主时间事件函数、Sentinel 网络连接和Tilt模式三部分进行讲解。
Sentinel初始化
我们可以通过redis-sentinel <path-to-configfile> 或者 redis-server <path-to-configfile> --sentinel 这两种方式启动并运行Sentinel实例,这两种方式是等价的。在Redis server.c 的main函数中,我们会看到Redis如何判断用户指定以Sentinel方式运行的逻辑:
int main(int argc, char **argv) { .......... server.sentinel_mode = checkForSentinelMode(argc,argv); .......... }
其中checkForSentinelMode函数会监测以下两种条件:
- 程序使用redis-sentinel可执行文件执行。
- 程序参数列表中有--sentinel 标志。
以上任何一种条件成立则Redis会使用Sentinel的方式运行。
/* Returns 1 if there is --sentinel among the arguments or if * argv[0] contains "redis-sentinel". */ int checkForSentinelMode(int argc, char **argv) { int j; if (strstr(argv[0],"redis-sentinel") != NULL) return 1; for (j = 1; j < argc; j++) if (!strcmp(argv[j],"--sentinel")) return 1; return 0; }
在Redis 判断是否以Sentinel的方式运行以后,我们会看到如下代码段:
int main(int argc, char **argv) { struct timeval tv; int j; ............ /* We need to init sentinel right now as parsing the configuration file * in sentinel mode will have the effect of populating the sentinel * data structures with master nodes to monitor. */ if (server.sentinel_mode) { initSentinelConfig(); initSentinel(); } ............
在initSentinelConfig函数中,会使用Sentinel特定的端口(默认为26379)来替代Redis的默认端口(6379)。另外,在Sentinel模式下,需要禁用服务器运行保护模式。
/* This function overwrites a few normal Redis config default with Sentinel * specific defaults. */ void initSentinelConfig(void) { server.port = REDIS_SENTINEL_PORT; server.protected_mode = 0; /* Sentinel must be exposed. */ }
与此同时,initSentinel函数会做如下操作:
/* Perform the Sentinel mode initialization. */ void initSentinel(void) { unsigned int j; /* Remove usual Redis commands from the command table, then just add * the SENTINEL command. */ dictEmpty(server.commands,NULL); for (j = 0; j < sizeof(sentinelcmds)/sizeof(sentinelcmds[0]); j++) { int retval; struct redisCommand *cmd = sentinelcmds+j; retval = dictAdd(server.commands, sdsnew(cmd->name), cmd); serverAssert(retval == DICT_OK); /* Translate the command string flags description into an actual * set of flags. */ if (populateCommandTableParseFlags(cmd,cmd->sflags) == C_ERR) serverPanic("Unsupported command flag"); } /* Initialize various data structures. */ sentinel.current_epoch = 0; sentinel.masters = dictCreate(&instancesDictType,NULL); sentinel.tilt = 0; sentinel.tilt_start_time = 0; sentinel.previous_time = mstime(); ............. }
1、使用Sentinel自带的命令表去替代Redis服务器原生的命令. Sentinel 支持的命令表如下:
struct redisCommand sentinelcmds[] = { {"ping",pingCommand,1,"",0,NULL,0,0,0,0,0}, {"sentinel",sentinelCommand,-2,"",0,NULL,0,0,0,0,0}, {"subscribe",subscribeCommand,-2,"",0,NULL,0,0,0,0,0}, {"unsubscribe",unsubscribeCommand,-1,"",0,NULL,0,0,0,0,0}, {"psubscribe",psubscribeCommand,-2,"",0,NULL,0,0,0,0,0}, {"punsubscribe",punsubscribeCommand,-1,"",0,NULL,0,0,0,0,0}, {"publish",sentinelPublishCommand,3,"",0,NULL,0,0,0,0,0}, {"info",sentinelInfoCommand,-1,"",0,NULL,0,0,0,0,0}, {"role",sentinelRoleCommand,1,"ok-loading",0,NULL,0,0,0,0,0}, {"client",clientCommand,-2,"read-only no-script",0,NULL,0,0,0,0,0}, {"shutdown",shutdownCommand,-1,"",0,NULL,0,0,0,0,0}, {"auth",authCommand,2,"no-auth no-script ok-loading ok-stale fast",0,NULL,0,0,0,0,0}, {"hello",helloCommand,-2,"no-auth no-script fast",0,NULL,0,0,0,0,0} };
2、初始化Sentinel主状态结构,Sentinel主状态的定义及注释如下。
/* Main state. */ struct sentinelState { char myid[CONFIG_RUN_ID_SIZE+1]; /* This sentinel ID. */ uint64_t current_epoch; /* Current epoch. */ dict *masters; /* Dictionary of master sentinelRedisInstances. Key is the instance name, value is the sentinelRedisInstance structure pointer. */ int tilt; /* Are we in TILT mode? */ int running_scripts; /* Number of scripts in execution right now. */ mstime_t tilt_start_time; /* When TITL started. */ mstime_t previous_time; /* Last time we ran the time handler. */ list *scripts_queue; /* Queue of user scripts to execute. */ char *announce_ip; /* IP addr that is gossiped to other sentinels if not NULL. */ int announce_port; /* Port that is gossiped to other sentinels if non zero. */ unsigned long simfailure_flags; /* Failures simulation. */ int deny_scripts_reconfig; /* Allow SENTINEL SET ... to change script paths at runtime? */ } sentinel;
其中masters字典指针中的每个值都对应着一个Sentinel检测的主实例。
在读取配置信息后,Redis服务器主函数会调用sentinelIsRunning函数, 做以下几个工作:
- 检查配置文件是否被设置,并且检查程序对配置文件是否有写权限,因为如果Sentinel状态改变的话,会不断将自己当前状态记录在配置文件中。
- 如果在配置文件中指定运行ID,Sentinel 会使用这个ID作为运行ID,相反地,如果没有指定运行ID,Sentinel会生成一个ID用来作为Sentinel的运行ID。
- 对所有的Sentinel监测实例产生初始监测事件。
/* This function gets called when the server is in Sentinel mode, started, * loaded the configuration, and is ready for normal operations. */ void sentinelIsRunning(void) { int j; if (server.configfile == NULL) { serverLog(LL_WARNING, "Sentinel started without a config file. Exiting..."); exit(1); } else if (access(server.configfile,W_OK) == -1) { serverLog(LL_WARNING, "Sentinel config file %s is not writable: %s. Exiting...", server.configfile,strerror(errno)); exit(1); } /* If this Sentinel has yet no ID set in the configuration file, we * pick a random one and persist the config on disk. From now on this * will be this Sentinel ID across restarts. */ for (j = 0; j < CONFIG_RUN_ID_SIZE; j++) if (sentinel.myid[j] != 0) break; if (j == CONFIG_RUN_ID_SIZE) { /* Pick ID and persist the config. */ getRandomHexChars(sentinel.myid,CONFIG_RUN_ID_SIZE); sentinelFlushConfig(); } /* Log its ID to make debugging of issues simpler. */ serverLog(LL_WARNING,"Sentinel ID is %s", sentinel.myid); /* We want to generate a +monitor event for every configured master * at startup. */ sentinelGenerateInitialMonitorEvents(); }
Sentinel的主时间事件函数
Sentinel 使用和Redis服务器相同的事件处理机制:分为文件事件和时间事件。文件事件处理机制使用I/O 多路复用来处理服务器端的网络I/O 请求,例如客户端连接,读写等操作。时间处理机制则在主循环中周期性调用时间函数来处理定时操作,例如服务器端的维护,定时更新,删除等操作。Redis服务器主时间函数是在server.c中定义的serverCron函数,在默认情况下,serverCron会每100ms被调用一次。在这个函数中,我们看到如下代码:
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) { int j; UNUSED(eventLoop); UNUSED(id); UNUSED(clientData); ........... /* Run the Sentinel timer if we are in sentinel mode. */ if (server.sentinel_mode) sentinelTimer(); ........... }
其中当服务器以sentinel模式运行的时候,serverCron会调用sentinelTimer函数,来运行Sentinel中的主逻辑,sentinelTimer函数在sentinel.c中的定义如下:
void sentinelTimer(void) { sentinelCheckTiltCondition(); sentinelHandleDictOfRedisInstances(sentinel.masters); sentinelRunPendingScripts(); sentinelCollectTerminatedScripts(); sentinelKillTimedoutScripts(); /* We continuously change the frequency of the Redis "timer interrupt" * in order to desynchronize every Sentinel from every other. * This non-determinism avoids that Sentinels started at the same time * exactly continue to stay synchronized asking to be voted at the * same time again and again (resulting in nobody likely winning the * election because of split brain voting). */ server.hz = CONFIG_DEFAULT_HZ + rand() % CONFIG_DEFAULT_HZ; }
Sentinel Timer函数会做如下几个操作:
- 检查Sentinel当前是否在Tilt 模式(Tilt模式将会在稍后章节介绍)。
- 检查Sentinel与其监控主备实例,以及其他Sentinel实例的连接,更新当前状态,并在主实例下线的时候自动做主备倒换操作。
- 检查回调脚本状态,并做相应操作。
- 更新服务器频率(调用serverCron函数的频率),加上一个随机因子,作用是防止监控相同主节点的Sentinel在选举Leader的时候时间冲突,导致选举无法产生绝对多的票数。
其中SentinelHandleDictOfRedisInstances函数的定义如下:
/* Perform scheduled operations for all the instances in the dictionary. * Recursively call the function against dictionaries of slaves. */ void sentinelHandleDictOfRedisInstances(dict *instances) { dictIterator *di; dictEntry *de; sentinelRedisInstance *switch_to_promoted = NULL; /* There are a number of things we need to perform against every master. */ di = dictGetIterator(instances); while((de = dictNext(di)) != NULL) { sentinelRedisInstance *ri = dictGetVal(de); sentinelHandleRedisInstance(ri); if (ri->flags & SRI_MASTER) { sentinelHandleDictOfRedisInstances(ri->slaves); sentinelHandleDictOfRedisInstances(ri->sentinels); if (ri->failover_state == SENTINEL_FAILOVER_STATE_UPDATE_CONFIG) { switch_to_promoted = ri; } } } if (switch_to_promoted) sentinelFailoverSwitchToPromotedSlave(switch_to_promoted); dictReleaseIterator(di); }
SentinelHandleDictOfRedisInstances函数主要做的工作是:
调用sentinelHandleDictOfRedisInstance函数处理Sentinel与其它特定实例连接,状态更 新,以及主备倒换工作。
- 如果当前处理实例为主实例,递归调用SentinelHandleDictOfRedisInstances函数处理其下属的从实例以及其他监控这个主实例的Sentinel。
- 在主备倒换成功的情况下,更新主实例为升级为主实例的从实例。
其中在sentinelHandleRedisInstance的定义如下:
/* Perform scheduled operations for the specified Redis instance. */ void sentinelHandleRedisInstance(sentinelRedisInstance *ri) { /* ========== MONITORING HALF ============ */ /* Every kind of instance */ sentinelReconnectInstance(ri); sentinelSendPeriodicCommands(ri); /* ============== ACTING HALF ============= */ /* We don't proceed with the acting half if we are in TILT mode. * TILT happens when we find something odd with the time, like a * sudden change in the clock. */ if (sentinel.tilt) { if (mstime()-sentinel.tilt_start_time < SENTINEL_TILT_PERIOD) return; sentinel.tilt = 0; sentinelEvent(LL_WARNING,"-tilt",NULL,"#tilt mode exited"); } /* Every kind of instance */ sentinelCheckSubjectivelyDown(ri); /* Masters and slaves */ if (ri->flags & (SRI_MASTER|SRI_SLAVE)) { /* Nothing so far. */ } /* Only masters */ if (ri->flags & SRI_MASTER) { sentinelCheckObjectivelyDown(ri); if (sentinelStartFailoverIfNeeded(ri)) sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED); sentinelFailoverStateMachine(ri); sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS); } }
这个函数会做以下两部分操作:
1、检查Sentinel和其他实例(主备实例以及其他Sentinel)的连接,如果连接没有设置或已经断开连接,Sentinel会重试相对应的连接,并定时发送响应命令。 需要注意的是:Sentinel和每个主备实例都有两个连接,命令连接和发布订阅连接。但是与其他监听相同主备实例的Sentinel只保留命令连接,这部分细节会在网络章节单独介绍。
2、第二部分操作主要做的是监测主备及其他Sentinel实例,并监测其是否在主观下线状态,对于主实例来说,还要检测是否在客观下线状态,并进行相应的主备倒换操作。
需要注意的是第二部分操作如果Sentinel在Tilt模式下是忽略的,下面我们来看一下这个函数第二部分的的具体实现细节。
sentinelCheckSubjectivelyDown 函数会监测特定的Redis实例(主备实例以及其他Sentinel)是否处于主观下线状态,这部分函数代码如下:
/* Is this instance down from our point of view? */ void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) { mstime_t elapsed = 0; if (ri->link->act_ping_time) elapsed = mstime() - ri->link->act_ping_time; else if (ri->link->disconnected) elapsed = mstime() - ri->link->last_avail_time; ....... /* Update the SDOWN flag. We believe the instance is SDOWN if: * * 1) It is not replying. * 2) We believe it is a master, it reports to be a slave for enough time * to meet the down_after_period, plus enough time to get two times * INFO report from the instance. */ if (elapsed > ri->down_after_period || (ri->flags & SRI_MASTER && ri->role_reported == SRI_SLAVE && mstime() - ri->role_reported_time > (ri->down_after_period+SENTINEL_INFO_PERIOD*2))) { /* Is subjectively down */ if ((ri->flags & SRI_S_DOWN) == 0) { sentinelEvent(LL_WARNING,"+sdown",ri,"%@"); ri->s_down_since_time = mstime(); ri->flags |= SRI_S_DOWN; } } else { /* Is subjectively up */ if (ri->flags & SRI_S_DOWN) { sentinelEvent(LL_WARNING,"-sdown",ri,"%@"); ri->flags &= ~(SRI_S_DOWN|SRI_SCRIPT_KILL_SENT); } } }
主观下线状态意味着特定的Redis实例满足以下条件之一:
- 在实例配置的down_after_milliseconds时间内没有收到Ping的回复。
- Sentinel认为实例是主实例,但收到实例为从实例的回复,并且上次实例角色回复时间大于在实例配置的down_after_millisecon时间加上2倍INFO命令间隔。
如果任何一个条件满足,Sentinel会打开实例的S_DOWN标志并认为实例进入主观下线状态。
主观下线状态意味着Sentinel主观认为实例下线,但此时Sentinel并没有询问其他监控此实例的其他Sentinel此实例的在线状态。
sentinelCheckObjectivelyDown 函数会检查实例是否为客观下线状态,这个操作仅仅对主实例进行。sentinelCheckObjectivelyDown函数定义如下:
/* Is this instance down according to the configured quorum? * * Note that ODOWN is a weak quorum, it only means that enough Sentinels * reported in a given time range that the instance was not reachable. * However messages can be delayed so there are no strong guarantees about * N instances agreeing at the same time about the down state. */ void sentinelCheckObjectivelyDown(sentinelRedisInstance *master) { dictIterator *di; dictEntry *de; unsigned int quorum = 0, odown = 0; if (master->flags & SRI_S_DOWN) { /* Is down for enough sentinels? */ quorum = 1; /* the current sentinel. */ /* Count all the other sentinels. */ di = dictGetIterator(master->sentinels); while((de = dictNext(di)) != NULL) { sentinelRedisInstance *ri = dictGetVal(de); if (ri->flags & SRI_MASTER_DOWN) quorum++; } dictReleaseIterator(di); if (quorum >= master->quorum) odown = 1; } /* Set the flag accordingly to the outcome. */ if (odown) { if ((master->flags & SRI_O_DOWN) == 0) { sentinelEvent(LL_WARNING,"+odown",master,"%@ #quorum %d/%d", quorum, master->quorum); master->flags |= SRI_O_DOWN; master->o_down_since_time = mstime(); } } else { if (master->flags & SRI_O_DOWN) { sentinelEvent(LL_WARNING,"-odown",master,"%@"); master->flags &= ~SRI_O_DOWN; } } }
这个函数主要进行的操作是循环查看监控此主实例的其他Sentinel SRI_MASTER_DOWN 标志是否打开,如果打开则意味着其他特定的Sentinel认为主实例处于下线状态,并统计认为主实例处于下线状态的票数,如果票数大于等于主实例配置的quorum值,则Sentinel会把主实例的SRI_O_DOWN标志打开,并认为主实例处于客观下线状态。
sentinelStartFailoverIfNeeded函数首先会检查实例是否处于客观下线状态(SRI_O_DOWN标志是否打开),并且在2倍主实例配置的主备倒换超时时间内没有进行主备倒换工作,Sentinel会打开SRI_FAILOVER_IN_PROGRESS标志并设置倒换状态为SENTINEL_FAILOVER_STATE_WAIT_START。并开始进行主备倒换工作。主备倒换的细节将在主备倒换的章节里介绍。
int sentinelStartFailoverIfNeeded(sentinelRedisInstance *master) { /* We can't failover if the master is not in O_DOWN state. */ if (!(master->flags & SRI_O_DOWN)) return 0; /* Failover already in progress? */ if (master->flags & SRI_FAILOVER_IN_PROGRESS) return 0; /* Last failover attempt started too little time ago? */ if (mstime() - master->failover_start_time < master->failover_timeout*2) { if (master->failover_delay_logged != master->failover_start_time) { time_t clock = (master->failover_start_time + master->failover_timeout*2) / 1000; char ctimebuf[26]; ctime_r(&clock,ctimebuf); ctimebuf[24] = '