• Redis Cluster高可用集群在线迁移操作记录【转】


    之前介绍了redis cluster的结构及高可用集群部署过程,今天这里简单说下redis集群的迁移。由于之前的redis cluster集群环境部署的服务器性能有限,需要迁移到高配置的服务器上。考虑到是线上生产环境,决定在线迁移,迁移过程,不中断服务。操作过程如下:

    一、机器环境

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    迁移前机器环境
    -------------------------------------------------------------------------------
    主机名              ip地址             节点端口
    redis-node01       172.16.60.207     7000,7001
    redis-node02       172.16.60.208     7002,7003
    redis-node03       172.16.60.209     7004,7005
     
    迁移后机器环境
    -------------------------------------------------------------------------------
    主机名             ip地址             节点端口
    redis-new01       172.16.60.202     7000,7001
    redis-new02       172.16.60.204     7002,7003
    redis-new03       172.16.60.205     7004,7005

    二、迁移前redis cluster高可用集群环境部署(这里采用"三主三从"模式)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    三台节点机器安装操作如下一致
    [root@redis-node01 ~]# yum install -y gcc g++ make gcc-c++ kernel-devel automake autoconf libtool make wget tcl vim ruby rubygems unzip git
    [root@redis-node01 ~]# /etc/init.d/iptables stop
    [root@redis-node01 ~]# setenforce 0
    [root@redis-node01 ~]# vim /etc/sysconfig/selinux
    SELINUX=disabled
     
    提前做好下面的准备操作,否则redis日志里会有相应报错
    [root@redis-node01 ~]# echo "512" > /proc/sys/net/core/somaxconn     
    [root@redis-node01 ~]# vim /etc/rc.local
    echo "512" > /proc/sys/net/core/somaxconn  
    [root@redis-node01 ~]# echo 1 > /proc/sys/vm/overcommit_memory
    [root@redis-node01 ~]# sysctl vm.overcommit_memory=1
    vm.overcommit_memory = 1
    [root@redis-node01 ~]# vim /etc/sysctl.conf
    vm.overcommit_memory=1
    [root@redis-node01 ~]# echo never > /sys/kernel/mm/transparent_hugepage/enabled
    [root@redis-node01 ~]# vim /etc/rc.local
    echo never > /sys/kernel/mm/transparent_hugepage/enabled
     
    下载并编译安装redis
    [root@redis-node01 ~]# mkdir -p /data/software/
    [root@redis-node01 software]# wget http://download.redis.io/releases/redis-4.0.6.tar.gz
    [root@redis-node01 software]# tar -zvxf redis-4.0.6.tar.gz
    [root@redis-node01 software]# mv redis-4.0.6 /data/
    [root@redis-node01 software]# cd /data/redis-4.0.6/
    [root@redis-node01 redis-4.0.6]# make
         
    -------------------------------------------------------------------------------
    分别创建和配置节点
    节点1配置
    [root@redis-node01 ~]# mkdir /data/redis-4.0.6/redis-cluster
    [root@redis-node01 ~]# cd /data/redis-4.0.6/redis-cluster
    [root@redis-node01 redis-cluster]# mkdir 7000 7001
    [root@redis-node01 redis-cluster]# mkdir /var/log/redis
    [root@redis-node01 redis-cluster]# vim 7000/redis.conf
    port 7000
    bind 172.16.60.207
    daemonize yes
    pidfile /var/run/redis_7000.pid
    logfile /var/log/redis/redis_7000.log
    cluster-enabled yes
    cluster-config-file nodes_7000.conf
    cluster-node-timeout 10100
    appendonly yes
         
    [root@redis-node01 redis-cluster]# vim 7001/redis.conf
    port 7001
    bind 172.16.60.207
    daemonize yes
    pidfile /var/run/redis_7001.pid
    logfile /var/log/redis/redis_7001.log
    cluster-enabled yes
    cluster-config-file nodes_7001.conf
    cluster-node-timeout 10100
    appendonly yes
         
    节点2配置
    [root@redis-node02 ~]# mkdir /data/redis-4.0.6/redis-cluster
    [root@redis-node02 ~]# cd /data/redis-4.0.6/redis-cluster
    [root@redis-node02 redis-cluster]# mkdir 7002 7003
    [root@redis-node02 redis-cluster]# mkdir /var/log/redis
    [root@redis-node02 redis-cluster]# vim 7000/redis.conf
    port 7002
    bind 172.16.60.208
    daemonize yes
    pidfile /var/run/redis_7002.pid
    logfile /var/log/redis/redis_7002.log
    cluster-enabled yes
    cluster-config-file nodes_7002.conf
    cluster-node-timeout 10100
    appendonly yes
         
    [root@redis-node02 redis-cluster]# vim 7003/redis.conf
    port 7003
    bind 172.16.60.208
    daemonize yes
    pidfile /var/run/redis_7003.pid
    logfile /var/log/redis/redis_7003.log
    cluster-enabled yes
    cluster-config-file nodes_7003.conf
    cluster-node-timeout 10100
    appendonly yes
         
    节点3配置
    [root@redis-node03 ~]# mkdir /data/redis-4.0.6/redis-cluster
    [root@redis-node03 ~]# cd /data/redis-4.0.6/redis-cluster
    [root@redis-node03 redis-cluster]# mkdir 7004 7005
    [root@redis-node03 redis-cluster]# mkdir /var/log/redis
    [root@redis-node03 redis-cluster]# vim 7004/redis.conf
    port 7004
    bind 172.16.60.209
    daemonize yes
    pidfile /var/run/redis_7004.pid
    logfile /var/log/redis/redis_7004.log
    cluster-enabled yes
    cluster-config-file nodes_7004.conf
    cluster-node-timeout 10100
    appendonly yes
         
    [root@redis-node03 redis-cluster]# vim 7005/redis.conf
    port 7005
    bind 172.16.60.209
    daemonize yes
    pidfile /var/run/redis_7005.pid
    logfile /var/log/redis/redis_7005.log
    cluster-enabled yes
    cluster-config-file nodes_7005.conf
    cluster-node-timeout 10100
    appendonly yes
         
    -------------------------------------------------------------------------------
    分别启动redis服务(这里统一在/data/redis-4.0.6/redis-cluster路径下启动redis服务,即nodes_*.conf等文件也在这个路径下产生)
    节点1
    [root@redis-node01 redis-cluster]# for((i=0;i<=1;i++)); do /data/redis-4.0.6/src/redis-server /data/redis-4.0.6/redis-cluster/700$i/redis.conf; done
    [root@redis-node01 redis-cluster]# ps -ef|grep redis
    root      1103     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.207:7000 [cluster]              
    root      1105     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.207:7001 [cluster]              
    root      1315 32360  0 16:16 pts/1    00:00:00 grep redis
         
    节点2
    [root@redis-node02 redis-cluster]# for((i=2;i<=3;i++)); do /data/redis-4.0.6/src/redis-server /data/redis-4.0.6/redis-cluster/700$i/redis.conf; done
    [root@redis-node02 redis-cluster]# ps -ef|grep redis
    root      9446     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.208:7002 [cluster]              
    root      9448     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.208:7003 [cluster]              
    root      9644  8540  0 16:17 pts/0    00:00:00 grep redis
         
    节点3
    [root@redis-node01 redis-cluster]# for((i=4;i<=5;i++)); do /data/redis-4.0.6/src/redis-server /data/redis-4.0.6/redis-cluster/700$i/redis.conf; done
    [root@redis-node03 ~]# ps -ef|grep redis
    root      9486     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.209:7004 [cluster]              
    root      9488     1  0 15:19 ?        00:00:03 /data/redis-4.0.6/src/redis-server 172.16.60.209:7005 [cluster]              
    root      9686  9555  0 16:17 pts/0    00:00:00 grep redis
         
    -------------------------------------------------------------------------------
    接着在节点1上安装 Ruby(只需要在其中一个节点上安装即可)
    [root@redis-node01 ~]# yum -y install ruby ruby-devel rubygems rpm-build
    [root@redis-node01 ~]# gem install redis
         
    温馨提示:
    在centos6.x下执行上面的"gem install redis"操作可能会报错,坑很多!
    默认yum安装的ruby版本是1.8.7,版本太低,需要升级到ruby2.2以上,否则执行上面安装会报错!
         
    首先安装rvm(或者直接下载证书:https://pan.baidu.com/s/1slTyJ7n  密钥:7uan   下载并解压后直接执行"curl -L get.rvm.io | bash -s stable"即可)
    [root@redis-node01 ~]# curl -L get.rvm.io | bash -s stable          //可能会报错,需要安装提示进行下面一步操作
    [root@redis-node01 ~]# curl -sSL https://rvm.io/mpapis.asc | gpg2 --import -      //然后再接着执行:curl -L get.rvm.io | bash -s stable
    [root@redis-node01 ~]# find / -name rvm.sh
    /etc/profile.d/rvm.sh
    [root@redis-node01 ~]# source /etc/profile.d/rvm.sh
    [root@redis-node01 ~]# rvm requirements
           
    然后升级ruby到2.3
    [root@redis-node01 ~]# rvm install ruby 2.3.1
    [root@redis-node01 ~]# ruby -v
    ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
           
    列出所有ruby版本
    [root@redis-node01 ~]# rvm list
           
    设置默认的版本
    [root@redis-node01 ~]# rvm --default use 2.3.1
           
    更新下载源
    [root@redis-node01 ~]# gem sources --add https://gems.ruby-china.org/ --remove https://rubygems.org
    https://gems.ruby-china.org/ added to sources
    source https://rubygems.org not present in cache
           
    [root@redis-node01 ~]# gem sources
    *** CURRENT SOURCES ***
           
    https://rubygems.org/
    https://gems.ruby-china.org/
           
    最后就能顺利安装了
    [root@redis-node01 ~]# gem install redis
    Successfully installed redis-4.0.6
    Parsing documentation for redis-4.0.6
    Done installing documentation for redis after 1 seconds
    1 gem installed
         
    -------------------------------------------------------------------------------
    接着创建redis cluster集群(在节点1机器上操作即可)
         
    首先手动指定三个master节点。master节点最好分布在三台机器上
    [root@redis-node01 ~]# /data/redis-4.0.6/src/redis-trib.rb create  172.16.60.207:7000 172.16.60.208:7002  172.16.60.209:7004
         
    然后手动指定上面三个master节点各自的slave节点。slave节点也最好分布在三台机器上
    [root@redis-node01 ~]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.208:7003  172.16.60.207:7000
    [root@redis-node01 ~]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.209:7005  172.16.60.208:7002
    [root@redis-node01 ~]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.207:7001  172.16.60.209:7004
         
    然后检查下redis cluster集群状态
    [root@redis-node01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.207:7000
    >>> Performing Cluster Check (using node 172.16.60.207:7000)
    M: 971d05cd7b9bb3634ad024e6aac3dff158c52eee 172.16.60.207:7000
       slots:0-5460 (5461 slots) master
       1 additional replica(s)
    S: e7592314869c29375599d781721ad76675645c4c 172.16.60.209:7005
       slots: (0 slots) slave
       replicates 0060012d749167d3f72833d916e53b3445b66c62
    S: 52b8d27838244657d9b01a233578f24d287979fe 172.16.60.208:7003
       slots: (0 slots) slave
       replicates 971d05cd7b9bb3634ad024e6aac3dff158c52eee
    S: 213bde6296c36b5f31b958c7730ff1629125a204 172.16.60.207:7001
       slots: (0 slots) slave
       replicates e936d5b4c95b6cae57f994e95805aef87ea4a7a5
    M: e936d5b4c95b6cae57f994e95805aef87ea4a7a5 172.16.60.209:7004
       slots:10923-16383 (5461 slots) master
       1 additional replica(s)
    M: 0060012d749167d3f72833d916e53b3445b66c62 172.16.60.208:7002
       slots:5461-10922 (5462 slots) master
       1 additional replica(s)
    [OK] All nodes agree about slots configuration.
    >>> Check for open slots...
    >>> Check slots coverage...
    [OK] All 16384 slots。 covered.
       
    通过上面可以看出,只有master主节点才占用slots,从节点都是0 slots,也就是说keys数值是在master节点上。
    三个master主节点分割了16384 slots。分别是0-5460、5461-10922、10923-16383。
    如果有一组master-slave都挂掉,16484 slots不完整,则整个集群服务也就挂了,必须等到这组master-slave节点恢复,则整个集群才能恢复。
    如果新加入master主节点,默认是0 slots,需要reshard为新master节点分布数据槽(会询问向移动多少哈希槽到此节点),后面会提到。
      
    写入几条测试数据
    登录三个master节点上写入数据(登录slave节点上写入数据,发现也会自动跳到master节点上进行写入)
    [root@redis-node01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.207 -c -p 7000
    172.16.60.207:7000> set test1 test-207
    OK
    172.16.60.207:7000> set test11 test-207-207
    -> Redirected to slot [13313] located at 172.16.60.209:7004
    OK
      
    [root@redis-node01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.208 -c -p 7002
    172.16.60.208:7002> set test2 test-208
    OK
    172.16.60.208:7002> set test22 test-208-208
    -> Redirected to slot [4401] located at 172.16.60.207:7000
    OK
      
    [root@redis-node01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.209 -c -p 7004
    172.16.60.209:7004> set test3 test-209
    OK
    172.16.60.209:7004> set test33 test-209-209
    OK
      
    读数据
    [root@redis-node01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.207 -c -p 7000
    172.16.60.207:7000> get test1
    "test-207"
    172.16.60.207:7000> get test11
    -> Redirected to slot [13313] located at 172.16.60.209:7004
    "test-207-207"
    172.16.60.209:7004> get test2
    -> Redirected to slot [8899] located at 172.16.60.208:7002
    "test-208"
    172.16.60.208:7002> get test22
    -> Redirected to slot [4401] located at 172.16.60.207:7000
    "test-208-208"
    172.16.60.207:7000> get test3
    -> Redirected to slot [13026] located at 172.16.60.209:7004
    "test-209"
    172.16.60.209:7004> get test33
    "test-209-209"
    172.16.60.209:7004> 

    三、在线迁移

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    三台新机器安装redis步骤省略,和上面一致。
    三台新机器的各节点配置和迁移前三台机器一直,只需要修改ip地址即可。路径和端口一致
    启动三台新机器的redis节点服务
    在新节点redis-new01上安装Ruby,安装过程省略,和上面一直。
     
    将三个新节点都添加到之前的集群中。
    =====================
    先添加主节点
    命令格式"redis-trib.rb add-node <新增节点名> < 原集群节点名>"
    第一个为新节点IP的master端口,第二个参数为现有的任意节点IP的master端口
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node 172.16.60.202:7000 172.16.60.207:7000
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node 172.16.60.204:7002 172.16.60.207:7000
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node 172.16.60.205:7004 172.16.60.207:7000
     
    =====================
    再添加新机器的从节点
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.204:7003  172.16.60.202:7000
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.205:7005  172.16.60.204:7002
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb add-node --slave 172.16.60.202:7001  172.16.60.205:7004
     
    查看此时集群状态
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000
     
    查看集群的哈希槽slot情况
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb info 172.16.60.202:7000
    172.16.60.202:7000 (a0169bec...) -> 0 keys | 0 slots | 1 slaves.
    172.16.60.209:7004 (47cde5c7...) -> 3 keys | 5461 slots | 1 slaves.
    172.16.60.208:7002 (656fc84a...) -> 1 keys | 5462 slots | 1 slaves.
    172.16.60.205:7004 (48cbab90...) -> 0 keys | 0 slots | 1 slaves.
    172.16.60.207:7000 (a8fe2d6e...) -> 2 keys | 5461 slots | 1 slaves.
    172.16.60.204:7002 (c6a78cfb...) -> 0 keys | 0 slots | 1 slaves.
    [OK] 6 keys in 6 masters.
    0.00 keys per slot on average.
     
    新添加的master节点的slot默认都是为0,master主节点如果没有slots的话,存取数据就都不会被选中!
    数据只会存储在master主节点中!
    所以需要给新添加的master主节点分配slot,即reshard slot操作。
     
    如上根据最后一个新master节点添加成功后显示的slot可知,已有的master节点的slot分配情况为:
    172.16.60.207:7000   -->  slots:0-5460 (5461 slots) master
    172.16.60.208:7002   -->  slots:5461-10922 (5462 slots) master
    172.16.60.209:7004   -->  slots:10923-16383 (5461 slots) master
     
    现在开始往新添加的三个master节点分配slot
    a)将172.16.60.207:7000的slot全部分配(5461)给172.16.60.202:7000
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.202:7000
    ........
    How many slots do you want to move (from 1 to 16384)? 5461          #分配多少数量的slot。(这里要把172.16.60.207:7000节点的slot都分配完)
    What is the receiving node ID? a0169becd97ccca732d905fd762b4d615674f7bd      #上面那些数量的slot被哪个节点接收。这里填写172.16.60.202:7000节点ID
    Please enter all the source node IDs.
      Type 'all' to use all the nodes as source nodes for the hash slots.
      Type 'done' once you entered all the source nodes IDs.
    Source node #1:971d05cd7b9bb3634ad024e6aac3dff158c52eee          #指从哪个节点分配上面指定数量的slot。这里填写172.16.60.207:7000的ID。如果填写all,则表示从之前所有master节点中抽取上面指定数量的slot。
    Source node #2:done                       #填写done
    .......
    Do you want to proceed with the proposed reshard plan (yes/no)? yes     #填写yes,确认分配
     
    ==================================================================
    可能会遇到点问题,resharding执行中断。然后出现两边都有slot的情况。
    Moving slot 4396 from 172.16.60.207:7000 to 172.16.60.202:7000:
    Moving slot 4397 from 172.16.60.207:7000 to 172.16.60.202:7000:
    Moving slot 4398 from 172.16.60.207:7000 to 172.16.60.202:7000:
    Moving slot 4399 from 172.16.60.207:7000 to 172.16.60.202:7000:
    Moving slot 4400 from 172.16.60.207:7000 to 172.16.60.202:7000:
    Moving slot 4401 from 172.16.60.207:7000 to 172.16.60.202:7000:
    [ERR] Calling MIGRATE: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)
     
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000  
    >>> Performing Cluster Check (using node 172.16.60.202:7000)
    M: a0169becd97ccca732d905fd762b4d615674f7bd 172.16.60.202:7000
       slots:0-4400 (4401 slots) master
       1 additional replica(s)
    .......
    M: 971d05cd7b9bb3634ad024e6aac3dff158c52eee 172.16.60.207:7000
       slots:4401-5460 (1060 slots) master
       1 additional replica(s)
     
    分析原因:
    reshard重新分配slot时报错内容为:Syntax error ,try CLIENT (LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY)
    但是迁移没有key-value的槽的时候就会执行成功。 这就说明问题出在了存不存在key-value上!
     
    找到reshard的执行过程:发现具体迁移步骤是通过 move_slot函数调用(redis-trib.rb文件中)。
    打开move_slot函数,找到具体的迁移代码。
    [root@redis-new01 redis-cluster]# cp /data/redis-4.0.6/src/redis-trib.rb /tmp/
    [root@redis-new01 redis-cluster]# cat /data/redis-4.0.6/src/redis-trib.rb|grep source.r.client.call
                    source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:keys,*keys])
                        source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])
     
    上面grep出来的source.r.client.call部分则就是redis-trib.rb脚本告知客户端执行迁移带key-value槽的指令。
     
    我们会发现该指令的具体调用时,等同于
    "client migrate target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys]"
     
    问题来了,这条指令在服务器中怎么执行的呢?
    它先执行networking.c  文件中的 clientCommand(client *c)
     
    根据参数一一比对(if条件语句)。这时候就会发现bug来了!!!clientCommand函数中没有 migrate分支。
    所以会返回一个    Syntax error ,try CLIENT (LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY);
    这个错误信息告诉你, Client中只有LIST|KILL|GETNAME|SETNAME|PAUSE|REPLY分支。
     
    那么怎么去修改实现真正的带key迁移的slot呢?
     
    研究源码,cluster.c文件中里面有migrateCommand(client *c)。恍然大悟,故只要将redis-trib.rb文件中迁移语句修改为:
      source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,"replace",:keys,*keys])
      source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])
     
    即不执行clientCommand,直接执行migrateCommand。
     
    也就是说,只需要将redis-trib.rb文件中原来的
                    source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:keys,*keys])
                        source.r.client.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])
    改为
                    source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,"replace",:keys,*keys])
                        source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])
     
    问题就解决了!
     
    [root@redis-new01 redis-cluster]# cat /data/redis-4.0.6/src/redis-trib.rb |grep  source.r.call
                    source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,"replace",:keys,*keys])
                        source.r.call(["migrate",target.info[:host],target.info[:port],"",0,@timeout,:replace,:keys,*keys])
     
    这个bug是因为ruby的gem不同造成的,以后5.0版本会抛弃redis-trib.rb。直接使用redis-cli客户端实现集群管理!!
    ==================================================================
     
    redis-trib.rb脚本文件修改后,继续将172.16.60.207:7000剩下的slot全部分配给172.16.60.202:7000
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.202:7000
    ........
    >>> Check for open slots...
    [WARNING] Node 172.16.60.202:7000 has slots in importing state (4401).
    [WARNING] Node 172.16.60.207:7000 has slots in migrating state (4401).
    [WARNING] The following slots are open: 4401
    >>> Check slots coverage...
    [OK] All 16384 slots covered.
    *** Please fix your cluster problems before resharding
     
    解决办法:
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.202 -c -p 7000
    172.16.60.202:7000> cluster setslot 4401 stable
    OK
    172.16.60.202:7000>
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.207 -c -p 7000
    172.16.60.207:7000> cluster setslot 4401 stable
    OK
    172.16.60.207:7000>
     
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb fix 172.16.60.202:7000 
    .......
    [OK] All nodes agree about slots configuration.
    >>> Check for open slots...
    >>> Check slots coverage...
    [OK] All 16384 slots covered.
     
     
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.202:7000
    ......
    How many slots do you want to move (from 1 to 16384)? 1060
    What is the receiving node ID? a0169becd97ccca732d905fd762b4d615674f7bd     
    Please enter all the source node IDs.
      Type 'all' to use all the nodes as source nodes for the hash slots.
      Type 'done' once you entered all the source nodes IDs.
    Source node #1:971d05cd7b9bb3634ad024e6aac3dff158c52eee         
    Source node #2:done                      
    .......
    Do you want to proceed with the proposed reshard plan (yes/no)? yes    
     
    然后再check检查集群状态.
    发现172.16.60.207:7000节点的5461个slot已经移动到172.16.60.202:7000节点上了。
     [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000     
    >>> Performing Cluster Check (using node 172.16.60.202:7000)
    M: a0169becd97ccca732d905fd762b4d615674f7bd 172.16.60.202:7000
       slots:0-5460 (5461 slots) master
       2 additional replica(s)
    ........
    M: 971d05cd7b9bb3634ad024e6aac3dff158c52eee 172.16.60.207:7000
       slots: (0 slots) master
       0 additional replica(s)
     
    b)将172.16.60.208:7002的slot(5462)全部分配给172.16.60.204:7002
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.204:7002
    .......
    How many slots do you want to move (from 1 to 16384)? 5462
    What is the receiving node ID? c6a78cfbb77804c4837963b5f589064b6111457a
    Please enter all the source node IDs.
      Type 'all' to use all the nodes as source nodes for the hash slots.
      Type 'done' once you entered all the source nodes IDs.
    Source node #1:0060012d749167d3f72833d916e53b3445b66c62
    Source node #2:done
    .......
    Do you want to proceed with the proposed reshard plan (yes/no)? yes
     
    c)将172.16.60.209:7004的slot(5461)全部分配给172.16.60.205:7004
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.205:7004
    .........
    How many slots do you want to move (from 1 to 16384)? 5461
    What is the receiving node ID? 48cbab906141dd26241ccdbc38bee406586a8d03
    Please enter all the source node IDs.
      Type 'all' to use all the nodes as source nodes for the hash slots.
      Type 'done' once you entered all the source nodes IDs.
    Source node #1:e936d5b4c95b6cae57f994e95805aef87ea4a7a5
    Source node #2:done
    .........
    Do you want to proceed with the proposed reshard plan (yes/no)? yes
     
    待到三个新节点的master都分配完哈希槽slot之后,再次查看下集群状态
    发现迁移之前的那三个master的slot都为0了,slot都对应迁移到新的节点的三个master上了
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000
    >>> Performing Cluster Check (using node 172.16.60.202:7000)
    M: a0169becd97ccca732d905fd762b4d615674f7bd 172.16.60.202:7000
       slots:0-5460 (5461 slots) master
       2 additional replica(s)
    S: d9671ca6b4235931a2a215cc327a400ad4f9a399 172.16.60.205:7005
       slots: (0 slots) slave
       replicates c6a78cfbb77804c4837963b5f589064b6111457a
    M: e936d5b4c95b6cae57f994e95805aef87ea4a7a5 172.16.60.209:7004
       slots: (0 slots) master
       0 additional replica(s)
    S: 213bde6296c36b5f31b958c7730ff1629125a204 172.16.60.207:7001
       slots: (0 slots) slave
       replicates 48cbab906141dd26241ccdbc38bee406586a8d03
    M: 0060012d749167d3f72833d916e53b3445b66c62 172.16.60.208:7002
       slots: (0 slots) master
       0 additional replica(s)
    S: 52b8d27838244657d9b01a233578f24d287979fe 172.16.60.208:7003
       slots: (0 slots) slave
       replicates a0169becd97ccca732d905fd762b4d615674f7bd
    M: 48cbab906141dd26241ccdbc38bee406586a8d03 172.16.60.205:7004
       slots:10923-16383 (5461 slots) master
       2 additional replica(s)
    S: e7592314869c29375599d781721ad76675645c4c 172.16.60.209:7005
       slots: (0 slots) slave
       replicates c6a78cfbb77804c4837963b5f589064b6111457a
    S: 2950f2cb6d960cd48e792f7c82d62d2cd07d20f9 172.16.60.204:7003
       slots: (0 slots) slave
       replicates a0169becd97ccca732d905fd762b4d615674f7bd
    M: 971d05cd7b9bb3634ad024e6aac3dff158c52eee 172.16.60.207:7000
       slots: (0 slots) master
       0 additional replica(s)
    M: c6a78cfbb77804c4837963b5f589064b6111457a 172.16.60.204:7002
       slots:5461-10922 (5462 slots) master
       2 additional replica(s)
    S: 6e663a1bcc3d241ed4d1a9667a0cc92fbe554740 172.16.60.202:7001
       slots: (0 slots) slave
       replicates 48cbab906141dd26241ccdbc38bee406586a8d03
    [OK] All nodes agree about slots configuration.
    >>> Check for open slots...
    >>> Check slots coverage...
    [OK] All 16384 slots covered.
     
    查看集群slot情况
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb info 172.16.60.202:7000
    172.16.60.202:7000 (a0169bec...) -> 2 keys | 5461 slots | 2 slaves.
    172.16.60.209:7004 (47cde5c7...) -> 0 keys | 0 slots | 0 slaves.
    172.16.60.208:7002 (656fc84a...) -> 0 keys | 0 slots | 0 slaves.
    172.16.60.205:7004 (48cbab90...) -> 3 keys | 5461 slots | 2 slaves.
    172.16.60.207:7000 (a8fe2d6e...) -> 0 keys | 0 slots | 0 slaves.
    172.16.60.204:7002 (c6a78cfb...) -> 1 keys | 5462 slots | 2 slaves.
    [OK] 6 keys in 6 masters.
    0.00 keys per slot on average.
     
    检查下数据,发现测试数据也已经迁移到新的master节点上了
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.202 -c -p 7000
    172.16.60.202:7000> get test1
    "test-207"
    172.16.60.202:7000> get test2
    -> Redirected to slot [8899] located at 172.16.60.204:7002
    "test-208"
    172.16.60.204:7002> get test3
    -> Redirected to slot [13026] located at 172.16.60.205:7004
    "test-209"
    172.16.60.205:7004> get test11
    "test-207-207"
    172.16.60.205:7004> get test22
    -> Redirected to slot [4401] located at 172.16.60.202:7000
    "test-208-208"
    172.16.60.202:7000> get test33
    -> Redirected to slot [12833] located at 172.16.60.205:7004
    "test-209-209"
    172.16.60.205:7004>

    关于reshard重新分配哈希槽slot,除了上面交互式的操作,也可以直接使用如下命令进行操作:

    1
    # redis-trib.rb reshard --from <node-id> --to <node-id> --slots <number of slots> --yes <host>:<port>

    四、迁移完成后,从集群中删除原来的节点

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    a)从集群中删除迁移之前的slave从节点
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.207:7001 213bde6296c36b5f31b958c7730ff1629125a204
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.208:7003 52b8d27838244657d9b01a233578f24d287979fe
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.209:7005 e7592314869c29375599d781721ad76675645c4c
       
    b)从集群中删除迁移之前的master主节点。
    删除master主节点时需注意下面节点:
    -  如果主节点有从节点,需要将从节点转移到其他主节点或提前删除从节点
    -  如果主节点有slot,去掉分配的slot,然后再删除主节点。
      
    删除master主节点时,必须确保它上面的slot为0,即必须为空!否则可能会导致整个redis cluster集群无法工作!
    如果要移除的master节点不是空的,需要先用重新分片命令来把数据移到其他的节点。
    另外一个移除master节点的方法是先进行一次手动的失效备援,等它的slave被选举为新的master,并且它被作为一个新的slave被重新加到集群中来之后再移除它。
    很明显,如果你是想要减少集群中的master数量,这种做法没什么用。在这种情况下你还是需要用重新分片来移除数据后再移除它。
      
    由于已经将原来的三个master主节点的slot全部抽完了,即slot现在都为0,且他们各自的slave节点也已在上面删除
    所以这时原来的三个master主节点可以直接从集群中删除
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.207:7000 971d05cd7b9bb3634ad024e6aac3dff158c52eee
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.208:7002 0060012d749167d3f72833d916e53b3445b66c62
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb del-node 172.16.60.209:7004 e936d5b4c95b6cae57f994e95805aef87ea4a7a5
       
    最后再次查看下新的redis cluster集群状态
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb check 172.16.60.202:7000                                         
    >>> Performing Cluster Check (using node 172.16.60.202:7000)
    M: a0169becd97ccca732d905fd762b4d615674f7bd 172.16.60.202:7000
       slots:0-5460 (5461 slots) master
       1 additional replica(s)
    S: d9671ca6b4235931a2a215cc327a400ad4f9a399 172.16.60.205:7005
       slots: (0 slots) slave
       replicates c6a78cfbb77804c4837963b5f589064b6111457a
    M: 48cbab906141dd26241ccdbc38bee406586a8d03 172.16.60.205:7004
       slots:10923-16383 (5461 slots) master
       1 additional replica(s)
    S: 2950f2cb6d960cd48e792f7c82d62d2cd07d20f9 172.16.60.204:7003
       slots: (0 slots) slave
       replicates a0169becd97ccca732d905fd762b4d615674f7bd
    M: c6a78cfbb77804c4837963b5f589064b6111457a 172.16.60.204:7002
       slots:5461-10922 (5462 slots) master
       1 additional replica(s)
    S: 6e663a1bcc3d241ed4d1a9667a0cc92fbe554740 172.16.60.202:7001
       slots: (0 slots) slave
       replicates 48cbab906141dd26241ccdbc38bee406586a8d03
    [OK] All nodes agree about slots configuration.
    >>> Check for open slots...
    >>> Check slots coverage...
    [OK] All 16384 slots covered.
       
       
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb info 172.16.60.202:7000                                        
    172.16.60.202:7000 (a0169bec...) -> 2 keys | 5461 slots | 1 slaves.
    172.16.60.205:7004 (48cbab90...) -> 3 keys | 5461 slots | 1 slaves.
    172.16.60.204:7002 (c6a78cfb...) -> 1 keys | 5462 slots | 1 slaves.
    [OK] 6 keys in 3 masters.
    0.00 keys per slot on average.
       
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-cli -h 172.16.60.202 -c -p 7000
    172.16.60.202:7000> get test1
    "test-207"
    172.16.60.202:7000> get test11
    -> Redirected to slot [13313] located at 172.16.60.205:7004
    "test-207-207"
    172.16.60.205:7004> get test2
    -> Redirected to slot [8899] located at 172.16.60.204:7002
    "test-208"
    172.16.60.204:7002> get test22
    -> Redirected to slot [4401] located at 172.16.60.202:7000
    "test-208-208"
    172.16.60.202:7000> get test3
    -> Redirected to slot [13026] located at 172.16.60.205:7004
    "test-209"
    172.16.60.205:7004> get test33
    "test-209-209"
    172.16.60.205:7004>
       
    =====================================================
    温馨提示:
    如果被删除的master主节点的slot不为0,则需要先将被删除master节点的slot抽取完,即取消它的slot分配!
       
    假设master主节点172.16.60.207:7000的slot还有2550个,则需要将这2550个slot从172.16.60.207:7000上抽到172.16.60.202:7000上
       
    [root@redis-new01 redis-cluster]# /data/redis-4.0.6/src/redis-trib.rb reshard 172.16.60.207:7000
    .......
    How many slots do you want to move (from 1 to 16384)? 2550               //被删除master的所有slot数量
    What is the receiving node ID? a0169becd97ccca732d905fd762b4d615674f7bd       //接收2550的slot的master节点ID,即172.16.60.202:7000的ID
    Please enter all the source node IDs.
      Type 'all' to use all the nodes as source nodes for the hash slots.
      Type 'done' once you entered all the source nodes IDs.
    Source node #1:971d05cd7b9bb3634ad024e6aac3dff158c52eee        //被删除master节点的ID,即172.16.60.207:7000的ID
    Source node #2:done                                                                              //输入done
    .......
    Do you want to proceed with the proposed reshard plan (yes/no)? yes           //确认操作
       
    如上成功取消了master节点的slot分配(即slot为0)之后,它就可以被删除了!
       
    温馨提示:
    1)新增master节点后,也需要进行reshard操作,不过针对的是新增节点。即"redis-trib.rb reshard 新增节点"。这是slot分配操作!
    2)删除master节点前,如果有slot,也需要进行reshard操作,不过针对的是删除节点。即"redis-trib.rb reshard 被删除节点"。这是slot取消操作!

    经过测试,应用在redis cluster如上迁移过程中没有受到任何影响!但是要注意,迁移后需要在应用程序里将redis连接地址更新为新的redis地址和端口。

    转自

    Redis Cluster高可用集群在线迁移操作记录 - 散尽浮华 - 博客园 https://www.cnblogs.com/kevingrace/p/9844310.html

  • 相关阅读:
    MUI-页面传参数
    Spring-boot:多模块打包
    PythonDay11
    PythonDay10
    PythonDay09
    PythonDay08
    PythonDay07
    PythonDay06
    PythonDay05
    PythonDay04
  • 原文地址:https://www.cnblogs.com/paul8339/p/9883998.html
Copyright © 2020-2023  润新知