• redis 主从哨兵02


    一.为什么要复制

    • 1.实现数据的多副本存储,从而可以实现服务的高可用
    • 2.提供更好的读性能,分担读请求

    二.复制技术的关键点及难点

    • 1.如何指定被复制对象
    • 2.增量还是全量,以及如何实现增量
    • 3.复制时不影响前端业务的操作
    • 4.网络被中断后如何处理
    • 5.如何防止发送出去的数据丢失,没有到达从服务器
    • 6.如何识别被复制的数据源发生变化,导致数据出错

    三.复制步骤

    graph LR
    全量同步--增量同步-->命令传播
    
    3.1指定master
    • 1.配置文件配置slaveof
    • 2.从节点命令执行slaveof命令
    3.2建立socket连接
    • 从服务器根据配置或者命令行命令slaveof,创建连向主服务器的socket
    3.3发送ping命令(当连接创建后发送)
    • 1.通过ping命令检查socket的读写状态是否正常
    • 2.检查主服务器是否能正常处理命令请求
    • 3.当从服务器不能在规定的时间内得到ping的回复,则表示网络不正常,从服务器会断开socket并重连
    • 4.如果从服务器收到主服务器返回的一个错误信息,比如BUSY redis is busy running ascript, youcan...,则从服务器会断开并重连
    • 5.如果从服务器收到的回应是PING,则表示一切正常,可以执行下一步流程
    3.4身份验证
    • 1.如果从服务器设置了masterauth选项,则进行身份验证,否则部进行
    • 2.通过向master发送命令auth来实现认证,auth passwd
    • 3.当master没有设置requirepass时,会提示出现no password is set
    • 4.当master设置与slave的密码不一样时,则出现invalid password错误
    3.5发送端口信息
    • 1.从服务器执行命令REPLCONF listening-port <port-number>,向主服务器发送从服务器的命令监控端口
    • 2.这个端口号是为了在master上执行info命令时,可以查看从节点的端口信息,也就是从主动告知主自己的监听端口
    3.6同步
    • 主从服务器之间互为客户端,可以皮尺发送命令和相应回应
    3.7命令传播
    • 主服务器执行命令后会发送给从服务器

    四.同步过程记录

    五.配置说明

    slave <masterip> <masterport> 
    # 指定被复制的数据源
    masterauth <master-password>
    # 被复制数据源的认证密码
    slave-serve-stale-data yes
    # yes 表示slave与master之间的连接断开或者正处于复制时,slave服务器可以接受客户端的请求,缺点是可能读取到可期数据
    # no 表示不接受客户端请求,返回错误信息"SYNC with master ip progress"
    slave-read-only yes
    # 从服务器是否只读,如果不是只读,可能会和主从之间产生数据不一致
    repl-timeout 60
    # 复制超时时间
    # slave在于master SYNC期间有大量数据传输,造成超时
    # 在slave角度,master超时,包括数据、ping等
    # 在master角度,slave超时,当master发送REPLCONF、ACK pings
    repl-disable-tcp-nodelay no
    # yes redis将使用更少的tcp和带宽来向slave发送数据,本质就是提高包的有效使用率,多个数据放在一个包中传输,但会导致一定的数据延迟,linux系统是发送堆栈超时40ms
    # no 包利用率不高,但延迟更低
    repl-backlog-size 1mb
    # master端固定发送缓冲区,影响从节点与主节点网络中断后是否全部同步;如果从节点需要多少的数据还在缓冲区,则增量同步,如果超时或者积压淘汰,则发生全量同步
    repl-backlog-ttl 3600
    # 当slave与master断开后,一定时间超时后,释放backlog的数据
    slave-priority 100
    # 用于配置从节点优先级,当主节点不能正常工作时,redis sentinel使用它来选择一个从节点并提升为主节点,优先级越高的从节点更有几率提升为主节点
    # 当满足下面的条件时,主不接收前端的写请求
    min-slaves-to-write 3
    # 最少多少个slave在线,默认是0,表示关闭此功能
    min-slaves-max-lag 10
    # 最小时间延迟,超过该值前端停止写入
    

    六.同步流程

    image

    七.全量同步过程

    image

    • 7.1从库进行slaveof
    415:S 20 Nov 14:17:17.330 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
    415:S 20 Nov 14:17:17.331 * SLAVE OF 172.16.10.140:6379 enabled (user request from 'id=4 addr=127.0.0.1:55027 fd=11 name= age=198 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof')
    415:S 20 Nov 14:17:17.586 * Connecting to MASTER 172.16.10.140:6379
    415:S 20 Nov 14:17:17.586 * MASTER <-> SLAVE sync started
    415:S 20 Nov 14:17:17.586 * Non blocking connect for SYNC fired the event.
    415:S 20 Nov 14:17:17.587 * Master replied to PING, replication can continue...
    415:S 20 Nov 14:17:17.587 * Trying a partial resynchronization (request 572caecf4c0bf264880b2e3899a3dae52e7704e9:1).
    415:S 20 Nov 14:17:17.592 * Full resync from master: 030a3c44c4f64eb9a02c3b36f3891226fc2074fe:0
    415:S 20 Nov 14:17:17.592 * Discarding previously cached master state.
    415:S 20 Nov 14:17:17.681 * MASTER <-> SLAVE sync: receiving 201 bytes from master
    415:S 20 Nov 14:17:17.698 * MASTER <-> SLAVE sync: Flushing old data
    415:S 20 Nov 14:17:19.605 * MASTER <-> SLAVE sync: Loading DB in memory
    415:S 20 Nov 14:17:19.605 * MASTER <-> SLAVE sync: Finished with success
    415:S 20 Nov 14:17:19.606 * Background append only file rewriting started by pid 687
    415:S 20 Nov 14:17:19.631 * AOF rewrite child asks to stop sending diffs.
    687:C 20 Nov 14:17:19.631 * Parent agreed to stop sending diffs. Finalizing AOF...
    687:C 20 Nov 14:17:19.631 * Concatenating 0.00 MB of AOF diff received from parent.
    687:C 20 Nov 14:17:19.632 * SYNC append only file rewrite performed
    687:C 20 Nov 14:17:19.632 * AOF rewrite: 2 MB of memory used by copy-on-write
    415:S 20 Nov 14:17:19.707 * Background AOF rewrite terminated with success
    415:S 20 Nov 14:17:19.707 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
    415:S 20 Nov 14:17:19.707 * Background AOF rewrite finished successfully
    
    • 7.2主库的log
    10110:M 20 Nov 14:17:16.884 * Slave 172.16.10.141:6379 asks for synchronization
    10110:M 20 Nov 14:17:16.884 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '572caecf4c0bf264880b2e3899a3dae52e7704e9', my replication IDs are '14f0fbdd33f13d8e6d07c13bb0a184ba7a43c258' and 'ff98eda832c57bef003947b34ae024063689ca44')
    10110:M 20 Nov 14:17:16.885 * Starting BGSAVE for SYNC with target: disk
    10110:M 20 Nov 14:17:16.888 * Background saving started by pid 11565
    11565:C 20 Nov 14:17:16.891 * DB saved on disk
    11565:C 20 Nov 14:17:16.891 * RDB: 6 MB of memory used by copy-on-write
    10110:M 20 Nov 14:17:16.978 * Background saving terminated with success
    10110:M 20 Nov 14:17:16.978 * Synchronization with slave 172.16.10.141:6379 succeeded
    
    • 7.3主库关闭
    # 主库log
    12519:C 20 Nov 14:22:10.243 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
    12519:C 20 Nov 14:22:10.243 # Redis version=4.0.8, bits=64, commit=00000000, modified=0, pid=12519, just started
    12519:C 20 Nov 14:22:10.243 # Configuration loaded
    12520:M 20 Nov 14:22:10.245 * Increased maximum number of open files to 10032 (it was originally set to 1024).
    12520:M 20 Nov 14:22:10.245 # Creating Server TCP listening socket *:6379: bind: Address already in use
    10110:M 20 Nov 14:23:36.032 # User requested shutdown...
    10110:M 20 Nov 14:23:36.032 * Calling fsync() on the AOF file.
    10110:M 20 Nov 14:23:36.032 * Removing the pid file.
    10110:M 20 Nov 14:23:36.032 # Redis is now ready to exit, bye bye...
    
    # 从库log
    415:S 20 Nov 14:23:36.736 # Connection with master lost.
    415:S 20 Nov 14:23:36.736 * Caching the disconnected master state.
    415:S 20 Nov 14:23:37.456 * Connecting to MASTER 172.16.10.140:6379
    415:S 20 Nov 14:23:37.456 * MASTER <-> SLAVE sync started
    415:S 20 Nov 14:23:37.456 # Error condition on socket for SYNC: Connection refused
    415:S 20 Nov 14:23:38.458 * Connecting to MASTER 172.16.10.140:6379
    415:S 20 Nov 14:23:38.459 * MASTER <-> SLAVE sync started
    415:S 20 Nov 14:23:38.459 # Error condition on socket for SYNC: Connection refused
    415:S 20 Nov 14:23:39.462 * Connecting to MASTER 172.16.10.140:6379
    
    • 7.4主库启动
    # 从库log
    415:S 20 Nov 14:24:39.625 # Error condition on socket for SYNC: Connection refused
    415:S 20 Nov 14:24:40.626 * Connecting to MASTER 172.16.10.140:6379
    415:S 20 Nov 14:24:40.626 * MASTER <-> SLAVE sync started
    415:S 20 Nov 14:24:40.627 * Non blocking connect for SYNC fired the event.
    415:S 20 Nov 14:24:40.627 * Master replied to PING, replication can continue...
    415:S 20 Nov 14:24:40.628 * Trying a partial resynchronization (request 030a3c44c4f64eb9a02c3b36f3891226fc2074fe:702).
    415:S 20 Nov 14:24:40.629 * Full resync from master: 1e1b4acf86e7882c044eb952136e04e5a70b077b:0
    415:S 20 Nov 14:24:40.629 * Discarding previously cached master state.
    415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: receiving 216 bytes from master
    415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: Flushing old data
    415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: Loading DB in memory
    415:S 20 Nov 14:24:40.712 * MASTER <-> SLAVE sync: Finished with success
    415:S 20 Nov 14:24:40.713 * Background append only file rewriting started by pid 1102
    415:S 20 Nov 14:24:40.737 * AOF rewrite child asks to stop sending diffs.
    1102:C 20 Nov 14:24:40.737 * Parent agreed to stop sending diffs. Finalizing AOF...
    1102:C 20 Nov 14:24:40.737 * Concatenating 0.00 MB of AOF diff received from parent.
    1102:C 20 Nov 14:24:40.737 * SYNC append only file rewrite performed
    1102:C 20 Nov 14:24:40.738 * AOF rewrite: 2 MB of memory used by copy-on-write
    415:S 20 Nov 14:24:40.829 * Background AOF rewrite terminated with success
    415:S 20 Nov 14:24:40.829 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
    415:S 20 Nov 14:24:40.829 * Background AOF rewrite finished successfully
    
    # 主库log,run_id改变,全同步
    12992:M 20 Nov 14:24:39.924 * Slave 172.16.10.141:6379 asks for synchronization
    12992:M 20 Nov 14:24:39.925 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '030a3c44c4f64eb9a02c3b36f3891226fc2074fe', my replication IDs are '510ae8234d41a712b9c60fe63a4cf193fc3a9fe2' and '0000000000000000000000000000000000000000')
    12992:M 20 Nov 14:24:39.925 * Starting BGSAVE for SYNC with target: disk
    12992:M 20 Nov 14:24:39.925 * Background saving started by pid 13002
    13002:C 20 Nov 14:24:39.927 * DB saved on disk
    13002:C 20 Nov 14:24:39.927 * RDB: 6 MB of memory used by copy-on-write
    12992:M 20 Nov 14:24:40.008 * Background saving terminated with success
    12992:M 20 Nov 14:24:40.008 * Synchronization with slave 172.16.10.141:6379 succeeded
    

    八.断线后增量复制过程

    image

    从库重启
    • 8.1从库关闭
    # 主库记录连接丢失
    12992:M 20 Nov 14:30:33.092 # Connection with slave 172.16.10.141:6379 lost.
    
    • 8.2主库继续写数据
    127.0.0.1:6379> set k11 v11
    OK
    127.0.0.1:6379> set k22 v22
    OK
    
    • 8.3从库启动,从库重新启动,也会进行全量同步,因为slave的 run_id也改变了
    # 从库log
    1520:S 20 Nov 14:31:55.315 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
    1520:S 20 Nov 14:31:55.315 * SLAVE OF 172.16.10.140:6379 enabled (user request from 'id=2 addr=127.0.0.1:55195 fd=10 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof')
    1520:S 20 Nov 14:31:55.712 * Connecting to MASTER 172.16.10.140:6379
    1520:S 20 Nov 14:31:55.712 * MASTER <-> SLAVE sync started
    1520:S 20 Nov 14:31:55.712 * Non blocking connect for SYNC fired the event.
    1520:S 20 Nov 14:31:55.712 * Master replied to PING, replication can continue...
    1520:S 20 Nov 14:31:55.713 * Trying a partial resynchronization (request 3a389f3b7dc9e3a394e6fdac5b7028e59aa635a8:1).
    1520:S 20 Nov 14:31:55.715 * Full resync from master: 1e1b4acf86e7882c044eb952136e04e5a70b077b:575
    1520:S 20 Nov 14:31:55.715 * Discarding previously cached master state.
    1520:S 20 Nov 14:31:55.784 * MASTER <-> SLAVE sync: receiving 235 bytes from master
    1520:S 20 Nov 14:31:55.785 * MASTER <-> SLAVE sync: Flushing old data
    1520:S 20 Nov 14:31:55.785 * MASTER <-> SLAVE sync: Loading DB in memory
    1520:S 20 Nov 14:31:55.785 * MASTER <-> SLAVE sync: Finished with success
    1520:S 20 Nov 14:31:55.786 * Background append only file rewriting started by pid 1533
    1520:S 20 Nov 14:31:55.809 * AOF rewrite child asks to stop sending diffs.
    1533:C 20 Nov 14:31:55.809 * Parent agreed to stop sending diffs. Finalizing AOF...
    1533:C 20 Nov 14:31:55.809 * Concatenating 0.00 MB of AOF diff received from parent.
    1533:C 20 Nov 14:31:55.809 * SYNC append only file rewrite performed
    1533:C 20 Nov 14:31:55.809 * AOF rewrite: 6 MB of memory used by copy-on-write
    1520:S 20 Nov 14:31:55.812 * Background AOF rewrite terminated with success
    1520:S 20 Nov 14:31:55.812 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
    1520:S 20 Nov 14:31:55.812 * Background AOF rewrite finished successfully
    
    # 主库log
    12992:M 20 Nov 14:31:55.010 * Slave 172.16.10.141:6379 asks for synchronization
    12992:M 20 Nov 14:31:55.010 * Partial resynchronization not accepted: Replication ID mismatch (Slave asked for '3a389f3b7dc9e3a394e6fdac5b7028e59aa635a8', my replication IDs are '1e1b4acf86e7882c044eb952136e04e5a70b077b' and '0000000000000000000000000000000000000000')
    12992:M 20 Nov 14:31:55.010 * Starting BGSAVE for SYNC with target: disk
    12992:M 20 Nov 14:31:55.011 * Background saving started by pid 14369
    14369:C 20 Nov 14:31:55.013 * DB saved on disk
    14369:C 20 Nov 14:31:55.013 * RDB: 6 MB of memory used by copy-on-write
    12992:M 20 Nov 14:31:55.081 * Background saving terminated with success
    12992:M 20 Nov 14:31:55.081 * Synchronization with slave 172.16.10.141:6379 succeeded
    
    从库断线,进行增量同步(积压区数据还在)
    • 1.从库断线后,主库依然写入数据
    # slave
    systemctl stop network&&sleep 60&&systemctl start network &
    
    • 2.slave上线后
    #主库log
    12992:M 20 Nov 15:17:37.019 # Disconnecting timedout slave: 172.16.10.141:6379
    12992:M 20 Nov 15:17:37.019 # Connection with slave 172.16.10.141:6379 lost.
    12992:M 20 Nov 15:17:38.092 * Slave 172.16.10.141:6379 asks for synchronization
    12992:M 20 Nov 15:17:38.093 * Partial resynchronization request from 172.16.10.141:6379 accepted. Sending 165 bytes of backlog starting from offset 4388.
    
    # 从库log
    1705:S 20 Nov 15:17:38.792 # MASTER timeout: no data nor PING received...
    1705:S 20 Nov 15:17:38.793 # Connection with master lost.
    1705:S 20 Nov 15:17:38.793 * Caching the disconnected master state.
    1705:S 20 Nov 15:17:38.793 * Connecting to MASTER 172.16.10.140:6379
    1705:S 20 Nov 15:17:38.794 * MASTER <-> SLAVE sync started
    1705:S 20 Nov 15:17:38.795 * Non blocking connect for SYNC fired the event.
    1705:S 20 Nov 15:17:38.795 * Master replied to PING, replication can continue...
    1705:S 20 Nov 15:17:38.795 * Trying a partial resynchronization (request 1e1b4acf86e7882c044eb952136e04e5a70b077b:4388).
    1705:S 20 Nov 15:17:38.796 * Successful partial resynchronization with master.
    1705:S 20 Nov 15:17:38.796 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.
    

    九.停掉主后,重新启动,会不会重新全量同步

    • 因为run_id源改变,发生全量同步

    十.心跳检测

    从服务器默认每10秒一次的频率向主发送心跳命令:REPLCONF ACK <replication_offset>

    • 通过心跳检测可以知道网络状况,通过info命令可以查看到lag参数,表示主从延迟,单位是秒,一般为0或者1
    • 在心跳检测中带有当前从的复制偏移量,当主发送给从的命令有丢失时,可以通过这种高频的心跳检测及时发现偏移量不正确,主服务器可以把缺失的命令重新发给从服务器
    • 通过心跳检查可以实现min-slaves功能,即如果主从状态不正常时,不允许主写入数据

    十一.Redis高可用应该解决那些问题

    • 1.多个节点拥有相同的数据
      • 复制技术
    • 2.当主节点宕机后,如何产生新的主节点
    • 3.当主节点宕机后,从节点如何自动连接到新的主节点
    • 4.如何判断主节点宕机
    • 5.旧的主节点恢复后,如何处理
    • 6.如何监控redis所有节点的健康状态

    十二.什么是sentinel(哨兵)

    • 1.本身也就是redis程序的一部分
    • 2.主要功能
      • 2.1监控redis节点的健康状态
      • 2.2通知,把监控到的变化通知给相关系统或者redis实例,通过redis的订阅机制实现
      • 2.3自动热备(failover),主节点宕机----选举新的主节点
      • 2.4.配置管理,redis实例可以通过sentinel获取到某些共享信息
    • 3.Sentinel本身也是分布式,解决了自身单点问题
      image
    12.1安装配置sentinel
    • 1.复制配置slave
    port 6380
    logfile "/home/liubx/redisdata/slave1/logs/redis.log“
    pidfile /var/run/redis.pid与主路径不一致
    dir /home/liubx/redisdata/slave1
    slaveof localhost  6379
    
    • 2.sentinel配置
    在redis的安装目录下有一个配置文件sentinel.conf
    daemonize yes
    logfile "/home/liubx/sentinel/sentinel.log“
    sentinel monitor mymaster 127.0.0.1 6379 1
    # 监控名 IP 端口 票数
    # 1个sentinel可监控多个master
    
    • 3.启动sentinel
      • redis-sentinel ../sentinel.conf
      • redis-server ../sentinel.conf --sentinel
    12.2HA步骤
    • 1.主观判断主节点是否下线
    • 2.客观判断主节点下线
    • 3.sentinel选举出执行故障转移的节点(多个sentinel构成对主节点的监控)
    • 4.故障转移
      • 选出新的主服务器
      • 修改从服务器的复制目标
      • 将旧的主服务器变为从服务器
    12.3主观判断下线
    1.默认每10秒一次的频率发送ping命令,用于检测相关节点是否在线
    • 包括主服务器 主所属的从服务器 以及其它sentinel
    • 返回+PONG 、–LOADING、 -MASTERDOWN这三种状态中一种表示节点在线,反之,则节点不在线
    2.在某段时间内,如果ping的返回不正确,则表示该节点主观下线
    • 时间由参数sentinel down-after-milliseconds master 50000配置,单位为毫秒
    • 这个时间的设置不仅仅影响主节点,还影响主节点所属的所有从节点以及同样监听这个主节点的其它sentinel
      • 比如master的ip为1.1 此时的sentinel的ip为1.2,有从节点1.3,1.4,均指向1.1主节点;同时,另外一个sentinel的ip为1.5,并监控1.1;则如果1.2这个sentinel的时间配置为10000毫秒,则1.2判断1.1,1.3,1.4,1.5主观下线的时间都为10000毫秒
    • 不同的sentinel,这个配置时间可以不一样
    12.4客观判断下线
    当一定数量的其它sentinel也同样判断该master下线时,此sentinel就认为此master为客观下线
    • 这个数量由sentinel monitor master ip port num这里面的num指定
    Sentinel之间会创建通信连接,通过发送命令来获取别的sentinel的判断信息
    • 发送sentinel is-master-down-by-addr <current_epoch>
      • Current_epoch 配置纪元,也可以理解为选举轮次计数器
      • runid为sentinel的实例id,可以为*,代表判断主节点是否下线状态,如果是具体的id,则表示选举领头的sentinel
      • Ip为被sentinel判断为主观下线的主服务器的ip地址
      • Port为被判断下线的主服务器端口
    • 当其它sentinel收到上面的命令时,会返回以下三个数据
      • down_state:1代表主服务器下线,0代表未下线
      • leader_runid:*代表此次回复仅为判断主服务器是否下线,具体的值为局部领头sentinel的运行id
      • leader_epoch:上一个参数为具体的运行id时,此参数代表此实例的配置纪元类似于配置版本;如果上一个参数为*,则此参数为0
    12.5选举领头sentinel
    • 某个sentinel发现主节点客观不在线后都可以发起选举
    • 一个sentinel在一次选举中只能投一次票,先到先得
    • 一次投票完成后,无论是否成功,投票周期都会加一,即epoch加一
    • 如果某个sentinel获取到超过一半的投票,则自己就成为领头sentinel,负责实施故障转移
    12.6选举举例

    场景:三台sentinel,编号为1,2,3,master的ip为192.168.1.110,端口为6379
    步骤:

    • 1这个sentinel先判断主节点主观下线
    • 1发送sentinel is-master-down-by-addr 192.168.1.110 6379 1 *给2和3节点
    • 1获取到反馈后,达到了判断master客观下线的条件
    • 1发起选举,发送sentinel is-master-down-by-addr 192.168.1.110 6379 1 ab12cd34(1自己的实例id)给2和3节点
    • 2收到消息后,因为是第一个收到1的,所以它也选举1,回复消息包含1,ab12cd34,1,分别代表主已经下线,选举的sentinel的实例id为
      ab12cd34,选举周期为1;
    • 1收到2的反馈后,发现所获得票是一半以上,则自己成为主,执行故障转移操作
    12.7故障转移
    1.选出新的主服务器
    • 删除主服务器的所有slave中处于下线状态的从服务器
    • 删除最近5秒内没有回复sentinel发出的info命令的从服务器
    • 删除与主服务器断线时间超过down-after-milliseconds*10毫秒的服务器
    • 按照slave的优先级排序,优先级越高,越容易被选中
    • 优先级一样高,则按照复制偏移量来排,数据偏移量越大说明数据越新
    • 通过向选出的从服务器发送slaveof no one命令来转变身份
    • 以每秒一次的频率发送info命令,如果返回信息中role:master,则选举成功
    2.修改从服务器的复制目标
    • 向其它从服务器发送slaveof命令即可
    3.将旧的主服务器变为从服务器
    • 因为主服务器已经下线,并不会做任何操作,但是sentinel会在自己的内部状态中维护主已经变为从,当重新连接后,会发送slaveof命令

    十三.sentinel

    13.1
    • 1)当前主从模式
    127.0.0.1:6379> info replication
    # Replication
    role:master
    connected_slaves:1
    slave0:ip=172.16.10.141,port=6379,state=online,offset=5266,lag=0
    master_replid:1e1b4acf86e7882c044eb952136e04e5a70b077b
    master_replid2:0000000000000000000000000000000000000000
    master_repl_offset:5266
    second_repl_offset:-1
    repl_backlog_active:1
    repl_backlog_size:1048576
    repl_backlog_first_byte_offset:1
    repl_backlog_histlen:5266
    
    • 2)配置2节点的sentinel
    vi /usr/local/redis/etc/sentinel.conf
    dir "/usr/local/redis/work"
    logfile "/usr/local/redis/sentinel.log"
    daemonize yes
    protected-mode no
    sentinel monitor mymaster 172.16.3.140 6379 1
    # 上面的mymaster随意起,但是一定要放在下面这行引用的名字之前,不然会报名字找不到
    sentinel auth-pass mymaster foobared
    
    • 3)启动sentinel监控redis-sentinel /usr/local/redis/etc/sentinel.conf
    25401:X 20 Nov 15:30:06.428 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
    25401:X 20 Nov 15:30:06.428 # Redis version=4.0.8, bits=64, commit=00000000, modified=0, pid=25401, just started
    25401:X 20 Nov 15:30:06.428 # Configuration loaded
    25402:X 20 Nov 15:30:06.430 * Increased maximum number of open files to 10032 (it was originally set to 1024).
    25402:X 20 Nov 15:30:06.431 * Running mode=sentinel, port=26379.
    25402:X 20 Nov 15:30:06.431 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
    25402:X 20 Nov 15:30:06.432 # Sentinel ID is 20c0b9c989a852c87c59d913cd1c17c5b7bc2414
    25402:X 20 Nov 15:30:06.432 # +monitor master mymaster 172.16.10.140 6379 quorum 1
    25402:X 20 Nov 15:30:06.433 * +slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
    25402:X 20 Nov 15:30:06.902 * +sentinel sentinel ff661bc57580186ec6bd2c5162925381e0eef451 172.16.10.141 26379 @ mymaster 172.16.10.140 6379
    
    5778:X 20 Nov 15:30:03.530 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
    5778:X 20 Nov 15:30:03.530 # Redis version=4.0.8, bits=64, commit=00000000, modified=0, pid=5778, just started
    5778:X 20 Nov 15:30:03.530 # Configuration loaded
    5779:X 20 Nov 15:30:03.532 * Increased maximum number of open files to 10032 (it was originally set to 1024).
    5779:X 20 Nov 15:30:03.533 * Running mode=sentinel, port=26379.
    5779:X 20 Nov 15:30:03.534 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
    5779:X 20 Nov 15:30:03.535 # Sentinel ID is ff661bc57580186ec6bd2c5162925381e0eef451
    5779:X 20 Nov 15:30:03.535 # +monitor master mymaster 172.16.10.140 6379 quorum 1
    5779:X 20 Nov 15:30:03.537 * +slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
    5779:X 20 Nov 15:30:09.198 * +sentinel sentinel 20c0b9c989a852c87c59d913cd1c17c5b7bc2414 172.16.10.140 26379 @ mymaster 172.16.10.140 6379
    
    • 4)关闭master 172.16.10.140
    # 过一会后 slave变成master
    5779:X 20 Nov 15:32:42.152 # +sdown master mymaster 172.16.10.140 6379
    5779:X 20 Nov 15:32:42.152 # +odown master mymaster 172.16.10.140 6379 #quorum 1/1
    5779:X 20 Nov 15:32:42.152 # +new-epoch 1
    5779:X 20 Nov 15:32:42.152 # +try-failover master mymaster 172.16.10.140 6379
    5779:X 20 Nov 15:32:42.153 # +vote-for-leader ff661bc57580186ec6bd2c5162925381e0eef451 1
    5779:X 20 Nov 15:32:42.155 # 20c0b9c989a852c87c59d913cd1c17c5b7bc2414 voted for ff661bc57580186ec6bd2c5162925381e0eef451 1
    5779:X 20 Nov 15:32:42.253 # +elected-leader master mymaster 172.16.10.140 6379
    5779:X 20 Nov 15:32:42.253 # +failover-state-select-slave master mymaster 172.16.10.140 6379
    5779:X 20 Nov 15:32:42.308 # +selected-slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
    5779:X 20 Nov 15:32:42.308 * +failover-state-send-slaveof-noone slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
    5779:X 20 Nov 15:32:42.366 * +failover-state-wait-promotion slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
    5779:X 20 Nov 15:32:43.095 # +promoted-slave slave 172.16.10.141:6379 172.16.10.141 6379 @ mymaster 172.16.10.140 6379
    5779:X 20 Nov 15:32:43.095 # +failover-state-reconf-slaves master mymaster 172.16.10.140 6379
    5779:X 20 Nov 15:32:43.147 # +failover-end master mymaster 172.16.10.140 6379
    5779:X 20 Nov 15:32:43.147 # +switch-master mymaster 172.16.10.140 6379 172.16.10.141 6379
    5779:X 20 Nov 15:32:43.147 * +slave slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
    5779:X 20 Nov 15:33:13.204 # +sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
    
    # 原主库log
    25402:X 20 Nov 15:32:41.451 # +new-epoch 1
    25402:X 20 Nov 15:32:41.452 # +vote-for-leader ff661bc57580186ec6bd2c5162925381e0eef451 1
    25402:X 20 Nov 15:32:41.459 # +sdown master mymaster 172.16.10.140 6379
    25402:X 20 Nov 15:32:41.459 # +odown master mymaster 172.16.10.140 6379 #quorum 1/1
    25402:X 20 Nov 15:32:41.459 # Next failover delay: I will not start a failover before Tue Nov 20 15:38:42 2018
    25402:X 20 Nov 15:32:42.445 # +config-update-from sentinel ff661bc57580186ec6bd2c5162925381e0eef451 172.16.10.141 26379 @ mymaster 172.16.10.140 6379
    25402:X 20 Nov 15:32:42.445 # +switch-master mymaster 172.16.10.140 6379 172.16.10.141 6379
    25402:X 20 Nov 15:32:42.446 * +slave slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
    25402:X 20 Nov 15:33:12.473 # +sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
    
    • 5)master已经飘到其中一个slave上了
    • 6)新master上的redis日志
    1705:S 20 Nov 15:32:41.912 * MASTER <-> SLAVE sync started
    1705:S 20 Nov 15:32:41.912 # Error condition on socket for SYNC: Connection refused
    1705:M 20 Nov 15:32:42.366 # Setting secondary replication ID to 1e1b4acf86e7882c044eb952136e04e5a70b077b, valid up to offset: 22832. New replication ID is 6e7a0afb3aa5dbfc2c5b6c4f78afe8a9f0d0035c
    1705:M 20 Nov 15:32:42.366 * Discarding previously cached master state.
    1705:M 20 Nov 15:32:42.366 * MASTER MODE enabled (user request from 'id=60 addr=172.16.10.141:55503 fd=11 name=sentinel-ff661bc5-cmd age=159 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')
    1705:M 20 Nov 15:32:42.367 # CONFIG REWRITE executed with success.
    
    • 7)将挂掉的master开启
    # 原master
    25402:X 20 Nov 15:50:25.990 # -sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
    25402:X 20 Nov 15:50:35.967 * +convert-to-slave slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
    # 新master
    5779:X 20 Nov 15:50:26.744 # -sdown slave 172.16.10.140:6379 172.16.10.140 6379 @ mymaster 172.16.10.141 6379
    
    • 8)sentinel.conf被自动修改
    dir "/usr/local/redis/work"
    logfile "/usr/local/redis/work/sentinel.log"
    daemonize yes
    protected-mode no
    sentinel myid 20c0b9c989a852c87c59d913cd1c17c5b7bc2414
    # 上面的mymaster随意起,但是一定要放在下面这行引用的名字之前,不然会报名字找不到
    sentinel monitor mymaster 172.16.10.141 6379 1
    # Generated by CONFIG REWRITE
    port 26379
    sentinel auth-pass mymaster foobared
    sentinel config-epoch mymaster 1
    sentinel leader-epoch mymaster 1
    sentinel known-slave mymaster 172.16.10.140 6379
    sentinel known-sentinel mymaster 172.16.10.141 26379 ff661bc57580186ec6bd2c5162925381e0eef451
    sentinel current-epoch 1
    
    dir "/usr/local/redis/work"
    logfile "/usr/local/redis/work/sentinel.log"
    daemonize yes
    protected-mode no
    sentinel myid ff661bc57580186ec6bd2c5162925381e0eef451
    # 上面的mymaster随意起,但是一定要放在下面这行引用的名字之前,不然会报名字找不到
    sentinel monitor mymaster 172.16.10.141 6379 1
    # Generated by CONFIG REWRITE
    port 26379
    sentinel auth-pass mymaster foobared
    sentinel config-epoch mymaster 1
    sentinel leader-epoch mymaster 1
    sentinel known-slave mymaster 172.16.10.140 6379
    sentinel known-sentinel mymaster 172.16.10.140 26379 20c0b9c989a852c87c59d913cd1c17c5b7bc2414
    sentinel current-epoch 1
    
    • 9)注意
    21239:X 29 Mar 16:43:12.722 # +try-failover master mymaster 172.16.3.140 6379
    21239:X 29 Mar 16:43:12.724 # +vote-for-leader 863c1c8c627415dbc3004deb529d27df2299c2df 95
    21239:X 29 Mar 16:43:23.438 # -failover-abort-not-elected master mymaster 172.16.3.140 6379
    21239:X 29 Mar 16:43:23.497 # Next failover delay: I will not start a failover before Thu Mar 29 16:49:13 2018
    

    当出现上面停掉master后,无法failover,我用的是第一种方法

    1)如果redis实例没有配置
    protected-mode yes
    bind 192.168.98.136
    
    则在sentinel 配置文件加上
    protected-mode no 
    
    即可
    
    2)如果redis实例有配置
    protected-mode yes
    bind 192.168.98.136
    
    则在sentinel 配置文件加上
    protected-mode yes
    bind 192.168.98.136
    
    即可
    
  • 相关阅读:
    cvsmooth平滑处理函数
    对图像频率的一些理解
    VIM 如何切换buffer
    Mplab X IDE 安装DMCI
    测试 使用橘子曰
    wlr设置 Blog Ping
    wlr快捷键
    使用Windows Live Writer写文章时不要用360清除垃圾
    如何将Windows live writer草稿转存到其他电脑上
    使用windows live writer写cnblog-1 安装wlr
  • 原文地址:https://www.cnblogs.com/jenvid/p/10184501.html
Copyright © 2020-2023  润新知