• centos7 MFS drbd keepalived


    环境:

    centos7.3 + moosefs 3.0.97 + drbd84-utils-8.9.8-1 + keepalived-1.2.13-9

    工作原理:

    架构图:

    节点信息:

    节点名               MFS角色                         主机名              IP

    node1               master & metalogger       node1              172.16.0.41

    node2               master & metalogger       node2              172.16.0.42

    node3               chunk server                   node3              172.16.0.43     

    node4               chunk server                   node4              172.16.0.44

    node5               chunk server                   node5              172.16.0.45

    node6               client                               node5              172.16.0.11

    node7               client                               node5              172.16.0.12

    node8               client                               node5              172.16.0.13

    vip                                                          mfsmaster        172.16.0.47

    说明: 

    1) 在两台MFS Master机服务器安装DRBD做网络磁盘,网络磁盘上存放mfs master的meta文件。
    2)在两台机器上都安装keepalived,两台服务器上有一个VIP漂移。keepalived通过检测脚本来检测服务器状态,当一台有问题时,VIP自动切换到另一台上。
    3)client、chunk server、 metalogger都是连接的VIP,所以当其中一台服务器挂掉后,并不影响服务。

    node1 node2绑定hosts

    cat /etc/hosts

    172.16.0.41    node1
    172.16.0.42    node2
    172.16.0.43    node3
    172.16.0.44    node4
    172.16.0.45    node5
    172.16.0.11    node6
    172.16.0.12    node7
    172.16.0.13    node8
    172.16.0.47    mfsmaster

    node3 - node8上绑定hosts, cat /etc/hosts

    172.16.0.47    mfsmaster

    安装MFS yum库(所有节点)

    Import the public key:
    curl "http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS" > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS
    
    
    To install ELRepo for RHEL-7, SL-7 or CentOS-7:
    curl "http://ppa.moosefs.com/MooseFS-3-el7.repo" > /etc/yum.repos.d/MooseFS.repo
    
    
    
    To install ELRepo for RHEL-6, SL-6 or CentOS-6:
    curl "http://ppa.moosefs.com/MooseFS-3-el6.repo" > /etc/yum.repos.d/MooseFS.repo

    安装elreo库(node1 node2)

    Get started
    
    Import the public key:
    rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
    
    Detailed info on the GPG key used by the ELRepo Project can be found on https://www.elrepo.org/tiki/key
    If you have a system with Secure Boot enabled, please see the SecureBootKey page for more information.
    
    To install ELRepo for RHEL-7, SL-7 or CentOS-7:
    rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

    To make use of our mirror system, please also install yum-plugin-fastestmirror. To install ELRepo for RHEL-6, SL-6 or CentOS-6: rpm -Uvh http://www.elrepo.org/elrepo-release-6-8.el6.elrepo.noarch.rpm

    创建分区(node1、node2)

    node1、node2添加单独的硬盘做DRBD,大小一样

     基于LVM,方便扩容

    [root@mfs-n1 /]# fdisk /dev/vdb 
    Welcome to fdisk (util-linux 2.23.2).
    
    Changes will remain in memory only, until you decide to write them.
    Be careful before using the write command.
    
    
    Command (m for help): p
    
    Disk /dev/vdb: 64.4 GB, 64424509440 bytes, 125829120 sectors
    Units = sectors of 1 * 512 = 512 bytes
    Sector size (logical/physical): 512 bytes / 512 bytes
    I/O size (minimum/optimal): 512 bytes / 512 bytes
    Disk label type: dos
    Disk identifier: 0xdaf38769
    
       Device Boot      Start         End      Blocks   Id  System
    
    Command (m for help): n
    Partition type:
       p   primary (0 primary, 0 extended, 4 free)
       e   extended
    Select (default p): p
    Partition number (1-4, default 1): 
    First sector (2048-125829119, default 2048): 
    Using default value 2048
    Last sector, +sectors or +size{K,M,G} (2048-125829119, default 125829119): 
    Using default value 125829119
    Partition 1 of type Linux and of size 60 GiB is set
    
    Command (m for help): m
    Command action
       a   toggle a bootable flag
       b   edit bsd disklabel
       c   toggle the dos compatibility flag
       d   delete a partition
       g   create a new empty GPT partition table
       G   create an IRIX (SGI) partition table
       l   list known partition types
       m   print this menu
       n   add a new partition
       o   create a new empty DOS partition table
       p   print the partition table
       q   quit without saving changes
       s   create a new empty Sun disklabel
       t   change a partition's system id
       u   change display/entry units
       v   verify the partition table
       w   write table to disk and exit
       x   extra functionality (experts only)
    
    Command (m for help): t
    Selected partition 1
    Hex code (type L to list all codes): L
    
     0  Empty           24  NEC DOS         81  Minix / old Lin bf  Solaris        
     1  FAT12           27  Hidden NTFS Win 82  Linux swap / So c1  DRDOS/sec (FAT-
     2  XENIX root      39  Plan 9          83  Linux           c4  DRDOS/sec (FAT-
     3  XENIX usr       3c  PartitionMagic  84  OS/2 hidden C:  c6  DRDOS/sec (FAT-
     4  FAT16 <32M      40  Venix 80286     85  Linux extended  c7  Syrinx         
     5  Extended        41  PPC PReP Boot   86  NTFS volume set da  Non-FS data    
     6  FAT16           42  SFS             87  NTFS volume set db  CP/M / CTOS / .
     7  HPFS/NTFS/exFAT 4d  QNX4.x          88  Linux plaintext de  Dell Utility   
     8  AIX             4e  QNX4.x 2nd part 8e  Linux LVM       df  BootIt         
     9  AIX bootable    4f  QNX4.x 3rd part 93  Amoeba          e1  DOS access     
     a  OS/2 Boot Manag 50  OnTrack DM      94  Amoeba BBT      e3  DOS R/O        
     b  W95 FAT32       51  OnTrack DM6 Aux 9f  BSD/OS          e4  SpeedStor      
     c  W95 FAT32 (LBA) 52  CP/M            a0  IBM Thinkpad hi eb  BeOS fs        
     e  W95 FAT16 (LBA) 53  OnTrack DM6 Aux a5  FreeBSD         ee  GPT            
     f  W95 Ext'd (LBA) 54  OnTrackDM6      a6  OpenBSD         ef  EFI (FAT-12/16/
    10  OPUS            55  EZ-Drive        a7  NeXTSTEP        f0  Linux/PA-RISC b
    11  Hidden FAT12    56  Golden Bow      a8  Darwin UFS      f1  SpeedStor      
    12  Compaq diagnost 5c  Priam Edisk     a9  NetBSD          f4  SpeedStor      
    14  Hidden FAT16 <3 61  SpeedStor       ab  Darwin boot     f2  DOS secondary  
    16  Hidden FAT16    63  GNU HURD or Sys af  HFS / HFS+      fb  VMware VMFS    
    17  Hidden HPFS/NTF 64  Novell Netware  b7  BSDI fs         fc  VMware VMKCORE 
    18  AST SmartSleep  65  Novell Netware  b8  BSDI swap       fd  Linux raid auto
    1b  Hidden W95 FAT3 70  DiskSecure Mult bb  Boot Wizard hid fe  LANstep        
    1c  Hidden W95 FAT3 75  PC/IX           be  Solaris boot    ff  BBT            
    1e  Hidden W95 FAT1 80  Old Minix      
    Hex code (type L to list all codes): 8e       
    Changed type of partition 'Linux' to 'Linux LVM'
    
    Command (m for help): w
    The partition table has been altered!

    Calling ioctl() to re-read partition table.
    Syncing disks.

    pvcreate /dev/vdb1             #创建物理卷

    vgcreate vgdrbr /dev/vdb1      #创建卷组 

    vgdisplay   (查看逻辑卷组)

    vgdisplay 
      --- Volume group ---
      VG Name               vgdrbr
      System ID             
      Format                lvm2
      Metadata Areas        1
      Metadata Sequence No  1
      VG Access             read/write
      VG Status             resizable
      MAX LV                0
      Cur LV                0
      Open LV               0
      Max PV                0
      Cur PV                1
      Act PV                1
      VG Size               <60.00 GiB
      PE Size               4.00 MiB
      Total PE              15359
      Alloc PE / Size       0 / 0   
      Free  PE / Size       15359 / <60.00 GiB
      VG UUID               jgrCsU-CQmQ-l2yz-O375-qFQq-Rho9-r6eY54
       

    lvcreate -l +15359 -n mfs vgdrbr        #创建逻辑卷

    lvdisplay 
      --- Logical volume ---
      LV Path                /dev/vgdrbr/mfs
      LV Name                mfs
      VG Name                vgdrbr
      LV UUID                3pb6ZJ-aMIu-PVbU-PAID-ozvB-XVpz-hdfK1c
      LV Write Access        read/write
      LV Creation host, time mfs-n1, 2017-11-08 10:14:00 +0800
      LV Status              available
      # open                 2
      LV Size                <60.00 GiB
      Current LE             15359
      Segments               1
      Allocation             inherit
      Read ahead sectors     auto
      - currently set to     8192
      Block device           253:3

     映射分区为 /dev/mapper/vgdrbr-mfs

    DRBD

    安装DRBD(node1 node2 )

    yum -y install drbd84-utils kmod-drbd84  # 有更新的版本 yum install  -y  drbd90  kmod-drbd90

    加载DRBD模块:

    # modprobe drbd

    查看DRBD模块是否加载到内核:

    # lsmod |grep drbd

    配置DRBD

    node1:

    cat /etc/drbd.conf 

    # You can find an example in  /usr/share/doc/drbd.../drbd.conf.example
    
    include "drbd.d/global_common.conf";
    include "drbd.d/*.res";

    cat /etc/drbd.d/global_common.conf

    # DRBD is the result of over a decade of development by LINBIT.
    # In case you need professional services for DRBD or have
    # feature requests visit http://www.linbit.com
    
    global {
        usage-count no;
        # minor-count dialog-refresh disable-ip-verification
        # cmd-timeout-short 5; cmd-timeout-medium 121; cmd-timeout-long 600;
    }
    
    common {
        handlers {
            # These are EXAMPLE handlers only.
            # They may have severe implications,
            # like hard resetting the node under certain circumstances.
            # Be careful when chosing your poison.
    
            # pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
            # pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
            # local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
            # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
            # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
            # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
            # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
            # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
        }
    
        startup {
            # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
            wfc-timeout 30;
            degr-wfc-timeout 30;
            outdated-wfc-timeout 30;
        }
    
        options {
            # cpu-mask on-no-data-accessible
        }
    
        disk {
            # size on-io-error fencing disk-barrier disk-flushes
            # disk-drain md-flushes resync-rate resync-after al-extents
            # c-plan-ahead c-delay-target c-fill-target c-max-rate
            # c-min-rate disk-timeout
            on-io-error detach;
        }
    
        net {
            # protocol timeout max-epoch-size max-buffers unplug-watermark
            # connect-int ping-int sndbuf-size rcvbuf-size ko-count
            # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
            # after-sb-1pri after-sb-2pri always-asbp rr-conflict
            # ping-timeout data-integrity-alg tcp-cork on-congestion
            # congestion-fill congestion-extents csums-alg verify-alg
            # use-rle
            protocol C;
            cram-hmac-alg sha1;
            shared-secret "bqrPnf9";
        }
    
    }

    cat /etc/drbd.d/mfs.res

    resource mfs{
            device /dev/drbd0;
            meta-disk internal;
    
            on node1{
                disk /dev/vgdrbr/mfs;
                address 172.16.0.41:9876;
            }
    
            on node2{
                disk /dev/vgdrbr/mfs;
                address 172.16.0.42:9876;
            }
    }

    从node1 复制 /etc/drbd.d/*  到node2   /etc/drbd.d/

    创建DRBD资源

    drbdadm create-md mfs

    initializing activity log
    NOT initializing bitmap
    Writing meta data...
    New drbd meta data block successfully created.

    启动DRBD服务(node1 node2)

    systemctl enable drbd

    systemctl start drbd

    查看DRBD初始化状态:

    drbd-overview

    配置node1节点为主节点

    drbdadm primary mfs

    如果报错,执行:

    drbdadm primary --force mfs    # 不行重启下drbd服务,再执行

    drbdadm --overwrite-data-of-peer primary all

    此时用命令  drbd-overview  或cat /proc/drbd  可以看到开始同步数据了

    drbdsetup status mfs --verbose --statistics  // 查看详细情况

    node1节点上格式化drbd设备

    mkfs -t xfs /dev/drbd0   或 mkfs.xfs /dev/drbd0

    节点上测试mount设备

    mkdir -p /data/drbd
    
    mount /dev/drbd0 /data/drbd

    node2上需要创建 挂载相同的挂载目录 mkdir -p /data/drbd

    node2上无需创建DRBD资源及格式化drbd设备,同步完后,node2上的数据跟node1上是一致的,已经是创建资源并格式化好的。

    安装MFS Master + metalogger

    yum -y install moosefs-master moosefs-cli moosefs-cgi moosefs-cgiserv moosefs-metalogger

    配置MFS Master 、MFSmetalogger

     /etc/mfs/mfsmaster.cfg

    grep -v "^#" mfsmaster.cfg

    WORKING_USER = mfs

    WORKING_GROUP = mfs

    SYSLOG_IDENT = mfsmaster

    LOCK_MEMORY = 0

    NICE_LEVEL = -19


    DATA_PATH = /data/drbd/mfs

    EXPORTS_FILENAME = /etc/mfs/mfsexports.cfg

    TOPOLOGY_FILENAME = /etc/mfs/mfstopology.cfg

    BACK_LOGS = 50

    BACK_META_KEEP_PREVIOUS = 1

    MATOML_LISTEN_HOST = *

    MATOML_LISTEN_PORT = 9419

    MATOCS_LISTEN_HOST = *

    MATOCS_LISTEN_PORT = 9420

    # chunkserver 与 master之间的认证
    AUTH_CODE = mfspassword

    REPLICATIONS_DELAY_INIT = 300


    CHUNKS_LOOP_MAX_CPS = 100000

    CHUNKS_LOOP_MIN_TIME = 300

    CHUNKS_SOFT_DEL_LIMIT = 10

    CHUNKS_HARD_DEL_LIMIT = 25

    CHUNKS_WRITE_REP_LIMIT = 2

    CHUNKS_READ_REP_LIMIT = 10

    MATOCL_LISTEN_HOST = *

    MATOCL_LISTEN_PORT = 9421


    SESSION_SUSTAIN_TIME = 86400

    /etc/mfs/mfsmetalogger.cfg

    grep -v "^#" mfsmetalogger.cfg

    WORKING_USER = mfs
    
    WORKING_GROUP = mfs
    
    SYSLOG_IDENT = mfsmetalogger
    
    LOCK_MEMORY = 0
    
    
    
    NICE_LEVEL = -19
    
    
    DATA_PATH = /var/lib/mfs
    
    BACK_LOGS = 50
    
    BACK_META_KEEP_PREVIOUS = 3
    
    META_DOWNLOAD_FREQ = 24
    
    
    MASTER_RECONNECTION_DELAY = 5
    
    
    MASTER_HOST = mfsmaster
    
    MASTER_PORT = 9419
    
    MASTER_TIMEOUT = 10

    /etc/mfs/mfsexports.cfg  权限配置

    grep -v "^#" mfsexports.cfg

    *            /    rw,alldirs,admin,maproot=0:0,password=9WpV9odJ
    * . rw
    password值是MFS客户连接时的认证密码

     

    把node1 /etc/mfs目录下  mfsexports.cfg  mfsmaster.cfg  mfsmetalogger.cfg  mfstopology.cfg 文件同步到 node2 的/etc/mfs 目录下 

    创建metadata存储目录:

    mkdir -p /data/drbd/mfs
    cp /var/lib/mfs/metadata.mfs.empty /data/drbd/mfs/metadata.mfs
    chown -R mfs.mfs /data/drbd/mfs

    启动mfsmaster(node2上不用启,在node1故障里通过keepalived脚本来启动node2的mfsmaster)

    mfsmaster start

    启动MFS监控服务

    chmod 755 /usr/share/mfscgi/*.cgi                   # node1 node2都确定有可执行权限

    mfscgiserv start  或   systemctl start moosefs-cgiserv

    用浏览器访问:http://172.16.0.41:9425

    安装keepalived (node1 node2)

    yum -y install keepalived

    配置keepalived

    node1:

    添加脚本

    发邮件脚本:

    cat /etc/keepalived/script/mail_notify.py

    #!/usr/bin/env python
    # -*- coding:utf-8 -*-
    
    import smtplib
    from email.mime.text import MIMEText
    from email.header import Header
    import sys, time, subprocess, random
    
    
    
    # 第三方 SMTP 服务
    mail_host="smtp.qq.com"  #设置服务器
    

    userinfo_list = [{'user':'user1@qq.com','pass':'pass1'}, {'user':'user2@qq.com','pass':'pass2'}, {'user':'user3@qq.com','pass':'pass3'}]

    
    

    user_inst = userinfo_list[random.randint(0, len(userinfo_list)-1)]
    mail_user=user_inst['user'] #用户名
    mail_pass=user_inst['pass'] #口令


    sender = mail_user # 邮件发送者

    
    receivers = ['xx1@qq.com', 'xx2@163.com']  # 接收邮件,可设置为你的QQ邮箱或者其他邮箱
    
    
    p = subprocess.Popen('hostname', shell=True, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    hostname = p.stdout.readline().split('
    ')[0]
    
    message_to = ''
    for i in receivers:
        message_to += i + ';'
    
    def print_help():
        note = '''python script.py role ip vip
        '''
        print(note)
        exit(1)
    
    time_stamp = time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
    
    if len(sys.argv) != 4:
        print_help()
    elif sys.argv[1] == 'master':
        message_content = '%s server: %s(%s) keepalived change to Master, vIP: %s' %(time_stamp, sys.argv[2], hostname, sys.argv[3])
        subject = '%s keepalived change to Master -- keepalived notify' %(sys.argv[2])
    elif sys.argv[1] == 'backup':
        message_content = '%s server: %s(%s) keepalived change to Backup, vIP: %s' %(time_stamp, sys.argv[2], hostname, sys.argv[3])
        subject = '%s keepalived change to Backup -- keepalived notify' %(sys.argv[2])
    elif sys.argv[1] == 'stop':
        message_content = '%s server: %s(%s) keepalived change to Stop, vIP: %s' %(time_stamp, sys.argv[2], hostname, sys.argv[3])
        subject = '%s keepalived change to Stop -- keepalived notify' %(sys.argv[2])
    else:
        print_help()
    
    message = MIMEText(message_content, 'plain', 'utf-8')
    message['From'] = Header(sender, 'utf-8')
    message['To'] =  Header(message_to, 'utf-8')
    
    message['Subject'] = Header(subject, 'utf-8')
    
    try:
        smtpObj = smtplib.SMTP()
        smtpObj.connect(mail_host, 25)    # 25 为 SMTP 端口号
        smtpObj.login(mail_user,mail_pass)
        smtpObj.sendmail(sender, receivers, message.as_string())
        print("邮件发送成功")
    except smtplib.SMTPException as e:
        print("Error: 无法发送邮件")
        print(e)

    DRBD检测脚本:

    cat /etc/keepalived/script/check_drbd.sh

    #!/bin/bash
    
    # set basic parameter
    drbd_res=mfs
    drbd_mountpoint=/mfs/drbd
    
    
    status="ok"
    #ret=`ps -C mfsmaster --no-header |wc -l`
    ret=`pidof mfsmaster |wc -l`
    
    if [ $ret -eq 0 ]; then
       status="mfsmaster not running"
       umount $drbd_mountpoint
       drbdadm secondary $drbd_res
       mfscgiserv stop
       /bin/python /etc/keepalived/script/mail_notify.py stop 172.16.0.41 172.16.0.47
       systemctl stop keepalived
    fi
    
    echo $status

    keepalived切换为master脚本:

    cat /etc/keepalived/script/master.sh

    #!/bin/bash
    
    drbdadm primary mfs
    mount /dev/drbd0 /data/drbd
    mfsmaster start
    mfscgiserv start

     chmod +x /etc/keepalived/script/*.sh

    从node1 复制 mail_notify.py 到 node2 /etc/keepalived/script

    cat /etc/keepalived/keepalived.conf

    ! Configuration File for keepalived
    
     
    global_defs {
    
        notification_email {
            xx@xx.com
        }
        
        notification_email_from keepalived@xx.com
        smtp_server 127.0.0.1
        smtp_connect_timeout 30
        
        router_id node1_mfs_master      # 标识本节点的字条串,通常为hostname,但不一定非得是hostname。故障发生时,邮件通知会用到
    }
    
    vrrp_script check_drbd {
        script "/etc/keepalived/script/check_drbd.sh"
        interval 3     # check every 3 seconds
    #    weight -40      # if failed, decrease 40 of the priority
    #    fall   2        # require 2 failures for failures
    #    rise   1        # require 1 sucesses for ok
    }
    
    # net.ipv4.ip_nonlocal_bind=1
    vrrp_instance VI_MFS {
    
        state BACKUP
        interface ens160
        virtual_router_id 16
        #mcast_src_ip 172.16.0.41
        
        nopreempt    ## 当node2 keepalived 选举为 MASTER 时,node1 keepalived重启,优先级比node2高也不会抢占MASTER角色,state 都需为 BACKUP,只在优先级高的设置 nopreempt
        priority 100
        advert_int 1
        #debug
    
        authentication {
            auth_type PASS
            auth_pass O7F3CjHVXWP
        }
    
        virtual_ipaddress {
            172.16.0.47
        }
    
        track_script {
            check_drbd
        }
    
    }

     systemctl start keepalived

     systemctl disable keepalived

    systemctl enable moosefs-metalogger;  systemctl start moosefs-metalogger

    node2:

    keepalived切换为master脚本:

    cat /etc/keepalived/script/master.sh

    #!/bin/bash
    
    # set basic parameter
    drbd_res=mfs
    drbd_driver=/dev/drbd0
    drbd_mountpoint=/mfs/drbd
    
    
    drbdadm primary $drbd_res
    mount $drbd_driver $drbd_mountpoint
    mfsmaster start
    mfscgiserv start
    
    
    /bin/python /etc/keepalived/script/mail_notify.py master 172.16.0.42 172.16.0.47

    keepalived切换为backup脚本:

    cat /etc/keepalived/script/backup.sh

    #!/bin/bash
    
    # set basic parameter
    drbd_res=mfs
    drbd_mountpoint=/mfs/drbd
    
    
    mfsmaster stop
    umount $drbd_mountpoint
    drbdadm secondary $drbd_res
    mfscgiserv stop
    
    
    /bin/python /etc/keepalived/script/mail_notify.py backup 172.16.0.42 172.16.0.47

    cat /etc/keepalived/keepalived.conf

    ! Configuration File for keepalived
    
     
    global_defs {
    
        notification_email {
            xx@xx.com
        }
        
        notification_email_from keepalived@xx.com
        smtp_server 127.0.0.1
        smtp_connect_timeout 30
        
        router_id node2_mfs_backup
    }
    
    
    # net.ipv4.ip_nonlocal_bind=1
    vrrp_instance mfs {
    
        state BACKUP
        interface ens160
        virtual_router_id 16
        #mcast_src_ip 172.16.0.42
        
        priority 80
        advert_int 1
        #debug
    
        authentication {
            auth_type PASS
            auth_pass O7F3CjHVXWP
        }
    
        virtual_ipaddress {
            172.16.0.47
        }
    
        notify_master "/etc/keepalived/script/master.sh"
        notify_backup "/etc/keepalived/script/backup.sh"
    }

     systemctl start keepalived

     systemctl disable keepalived

    systemctl enable moosefs-metalogger;  systemctl start moosefs-metalogger

    注意:node1 node2 的mfsmaster、keepalived服务不要设置成开机启动 

    安装MFS Chunk servers

    yum -y install moosefs-chunkserver

    配置MFS Chunk servers (node3 node4 node5)

    /etc/mfs/mfschunkserver.cfg

    grep -v "^#"  /etc/mfs/mfschunkserver.cfg

    WORKING_USER = mfs

    WORKING_GROUP = mfs

    SYSLOG_IDENT = mfschunkserver

    LOCK_MEMORY = 0

    NICE_LEVEL = -19

    DATA_PATH = /var/lib/mfs

    HDD_CONF_FILENAME = /etc/mfs/mfshdd.cfg

    HDD_TEST_FREQ = 10

    BIND_HOST = *

    MASTER_HOST = mfsmaster

    MASTER_PORT = 9420


    MASTER_TIMEOUT = 60

    MASTER_RECONNECTION_DELAY = 5

    # authentication string (used only when master requires authorization)

    AUTH_CODE = mfspassword


    CSSERV_LISTEN_HOST = *

    CSSERV_LISTEN_PORT = 9422

    /etc/mfs/mfshdd.cfg  指定chunk server的硬盘驱动

    mkdir -p /data/mfs; chown -R mfs:mfs /data/mfs

    grep -v "^#" /etc/mfs/mfshdd.cfg

    mfschunk保存数据的路径,建议使用单独的LVM 逻辑卷

    /data/mfs

    systemctl enable moosefs-chunkserver;  systemctl start moosefs-chunkserver

    手动启动命令 mfschunkserver start

    可通过MFS的监控页面查看是否连接MFS MASTER成功

    MFS客户端

    安装MFS客户端

    yum -y install moosefs-client fuse

    MFS客户端重启自动挂载mfs目录

    Shell> vi /etc/rc.local  
    /sbin/modprobe fuse  
    /usr/bin/mfsmount /mnt1 -H mfsmaster -S /backup/db  
    /usr/bin/mfsmount /mnt2 -H mfsmaster -S /app/image

    mfsmount -H mfsmaster /mnt3 # 挂载根目录到 /mnt3
    mfsmount -H 主机 -P 端口 -p 认证密码 挂载点路径
    mfsmount -H 主机 -P 端口 -o mfspassword=PASSWORD 挂载点路径

    通过/etc/fstab的方式(建议使用该方法)

    mfsmaster使用主机名的话,需要本机可以解析,可以设置/etc/hosts

    Shell> vi /etc/fstab  
     
    mfsmount /mnt fuse mfsmaster=MASTER_IP,mfsport=9421,_netdev 0 0 (重启系统后挂载MFS的根目录)    
    mfsmount /mnt2 fuse mfstermaster=MASTER_IP,mfsport=9421,mfssubfolder=/subdir,_netdev 0 0(重启系统后挂载MFS的子目录)

    mfsmount /data/upload    fuse    mfsmaster=mfsmaster,mfsport=9421,mfssubfolder=/pro1,mfspassword=9WpV9odJ,_netdev 0 0(使用密码认证)



    ## _netdev:当网络可用时才进行挂载,避免挂载失败

     采用fstab配置文件挂载方式可以通过如下命令,测试是否配置正确,并可把fstab中mfsmount 进行挂载:

    mount -a -t fuse 

     取消挂载,操作时工作目录不能在挂载点/mnt:

    umount /mnt

    查看挂载情况

    df -h -T

     附:

    MFS master切换测试

    node1:

    停止 mfsmaster服务

    mfsmaster stop

    查看node2是否有接管vip

    df -h -T 查看磁盘挂载情况

    cat /proc/drbd  #查看 DRBD状态

    drbd-overview    #  查看DRBD详情

    告警邮件是否有收到

    若node2 keepalived为master,node1要再切回master,先确定node1 node2 drbd状态已经同步一致,使用drbd-overview可查看

    若一致时,node2上执行 /etc/keepalived/script/backup.sh脚本,然后关闭 keepalived服务, 

    node1执行 /etc/keepalived/script/master.sh 脚本,然后启动 keepalived服务

    node2 启动 keepalived服务

    使用metalogger恢复数据

    metalogger节点不是必要的,但是在master挂掉的时候,可以用来恢复master,十分有用!
    如果master无法启动,使用 mfsmetarestore -a 进行修复,如果不能修复,则拷贝metalogger上的备份日志到master上,然后进行恢复!

    查看drbd状态

    cat /proc/drbd

    version: 8.4.9-1 (api:1/proto:86-101)
    GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by akemi@Build64R7, 2016-12-04 01:08:48
    0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:4865024 dw:4865024 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:58046556
    [>...................] sync'ed: 7.8% (56684/61436)M
    finish: 0:23:35 speed: 41,008 (33,784) want: 41,000 K/sec



    第一行为:drbd的版本信息
    第二行为:编译的信息
    重点第三行:
    开头"0:"表示 设备/dev/drbd0
    开头"1:"表示 设备/dev/data-1-1
    "cs(connection state):"  连接状态  Connected连接
    "ro(roles):"       角色  Primary/Secondary   一个主一个备,正常状态应该属于这样。
    "ds:(disk staes)"     硬盘状态  UpToDate/UpToDate
    "ns(network send)"      通过网络连接把数据发送到对端 volume of net data sent to the parther via the network connection

    "nr(network receive:)"   网络接收,收到的数据
    "dw(disk write)"               硬盘的写
    "dr(disk read)"                   硬盘的读

    MFS安装

     Add the appropriate key to package manager:

    curl "http://ppa.moosefs.com/RPM-GPG-KEY-MooseFS" > /etc/pki/rpm-gpg/RPM-GPG-KEY-MooseFS

    Next you need to add the repository entry (MooseFS 3.0):

    • For EL7 family:
      # curl "http://ppa.moosefs.com/MooseFS-3-el7.repo" > /etc/yum.repos.d/MooseFS.repo
    • For EL6 family:
      # curl "http://ppa.moosefs.com/MooseFS-3-el6.repo" > /etc/yum.repos.d/MooseFS.repo

    For MoosefS 2.0, use:

    • For EL7 family:
      # curl "http://ppa.moosefs.com/MooseFS-2-el7.repo" > /etc/yum.repos.d/MooseFS.repo
    • For EL6 family:
      # curl "http://ppa.moosefs.com/MooseFS-2-el6.repo" > /etc/yum.repos.d/MooseFS.repo

    After those operations it should be possible to install the packages with following commands:

    • For Master Server:
      # yum install moosefs-master moosefs-cli moosefs-cgi moosefs-cgiserv
    • For Chunkservers:
      # yum install moosefs-chunkserver
    • For Metaloggers:
      # yum install moosefs-metalogger
    • For Clients:
      # yum install moosefs-client

      If you want MooseFS to be mounted automatically when system starts, first of all install File System in Userspace (FUSE) utilities:
      # yum install fuse
      and then add one of the following entries to your /etc/fstab:

      "classic" entry (works with all MooseFS 3.0 and 2.0 verisons):
      mfsmount                /mnt/mfs    fuse       defaults    0 0

      or "NFS-like" entry (works with MooseFS 3.0.75+):
      mfsmaster.host.name:    /mnt/mfs    moosefs    defaults    0 0

    Running the system

      • To start process manually:
        # mfsmaster start
        # mfschunkserver start
      • For systemd OS family - EL7:
        # systemctl start moosefs-master.service
        # systemctl start moosefs-chunkserver.service
      • For SysV OS family - EL6:
        # service moosefs-master start 
        # service moosefs-chunkserver start

     

    moosefs-master.service修复

    默认的moosefs-master.service 启动脚本启动超时不成功,方法:注释掉 PIDFile=/var/lib/mfs/.mfsmaster.lock 这行

    cat /usr/lib/systemd/system/moosefs-master.service

    [Unit]
    Description=MooseFS Master server
    Wants=network-online.target
    After=network.target network-online.target
    
    [Service]
    Type=forking
    ExecStart=/usr/sbin/mfsmaster start
    ExecStop=/usr/sbin/mfsmaster stop
    ExecReload=/usr/sbin/mfsmaster reload
    #PIDFile=/var/lib/mfs/.mfsmaster.lock
    TimeoutStopSec=60
    TimeoutStartSec=60
    Restart=no
    
    [Install]
    WantedBy=multi-user.target

    drbdadm create-md mfs时报  'mfs' not defined in your config (for this host)错

    主要问题:主机名与  /etc/drbd.d/mfs.res 中定义的不一样,改成一样即可

    多个业务需要连接MFS建议

    多个业务需要连接MFS,建议先用一台MFS客户端挂载MFS / 目录,然后创建好相应业务对方的目录,每业务客户机再挂载相应业务的子目录。

    MFS文件系统使用

    Client通过MFS软件提供的工具来管理MFS文件系统,下面是工具介绍

    /usr/local/mfs/bin/mfstools -h
    mfs multi tool
    
    usage:
            mfstools create - create symlinks (mfs<toolname> -> /usr/local/mfs/bin/mfstools)
    
    tools:
            mfsgetgoal                               // 设定副本数
            mfssetgoal                               // 获取副本数
            mfsgettrashtime                          // 设定回收站时间
            mfssettrashtime                          // 设定回收站时间
            mfscheckfile                             // 检查文件
            mfsfileinfo                              // 文件信息
            mfsappendchunks                            
            mfsdirinfo                                // 目录信息
            mfsfilerepair                             // 文件修复
            mfsmakesnapshot                           // 快照     
            mfsgeteattr                               // 设置权限
            mfsseteattr
            mfsdeleattr
    
    deprecated tools:                                  // 递归设置
            mfsrgetgoal = mfsgetgoal -r
            mfsrsetgoal = mfssetgoal -r
            mfsrgettrashtime = mfsgettreshtime -r
            mfsrsettrashtime = mfssettreshtime -r
    

    挂载文件系统

    MooseFS 文件系统利用下面的命令:

    mfsmount mountpoint [-d][-f] [-s][-m] [-n][-p] [-HMASTER][-PPORT] [-S PATH][-o OPT[,OPT...]]
    -H MASTER:是管理服务器(master server)的ip 地址
    -P PORT: 是管理服务器( master server)的端口号,要按照mfsmaster.cfg 配置文件中的变量
    MATOCU_LISTEN_POR 的之填写。如果master serve 使用的是默认端口号则不用指出。
    -S PATH:指出被挂接mfs 目录的子目录,默认是/目录,就是挂载整个mfs 目录。
    

    Mountpoint:是指先前创建的用来挂接mfs 的目录。

    在开始mfsmount 进程时,用一个-m 或-o mfsmeta 的选项,这样可以挂接一个辅助的文件系统

    MFSMETA,这么做的目的是对于意外的从MooseFS 卷上删除文件或者是为了释放磁盘空间而移动的

    文件而又此文件又过去了垃圾文件存放期的恢复,例如:

    /usr/local/mfs/bin/mfsmount -m /MFS_meta/ -H 172.16.18.137
    

    设定副本数量

    目标(goal),是指文件被拷贝副本的份数,设定了拷贝的份数后是可以通过mfsgetgoal 命令来证实的,也可以通过mfsrsetgoal 来改变设定。

    mfssetgoal 3 /MFS_data/test/ 
    mfssetgoal 3 /MFS_data/test/
    

    用 mfsgetgoal –r 和 mfssetgoal –r 同样的操作可以对整个树形目录递归操作,其等效于 mfsrsetgoal 命令。实际的拷贝份数可以通过 mfscheckfile 和 mfsfile info 命令来证实。

    注意以下几种特殊情况:

    • 一个不包含数据的零长度的文件,尽管没有设置为非零的目标(the non-zero “goal”),但用mfscheckfile 命令查询将返回一个空的结果;将文件填充内容后,其会根据设置的goal创建副本;这时再将文件清空,其副本依然作为空文件存在。
    • 假如改变一个已经存在的文件的拷贝个数,那么文件的拷贝份数将会被扩大或者被删除,这个过程会有延时。可以通过mfscheckfile 命令来证实。
    • 对一个目录设定“目标”,此目录下的新创建文件和子目录均会继承此目录的设定,但不会改变已经存在的文件及目录的拷贝份数。

    可以通过mfsdirinfo来查看整个目录树的信息摘要。

    垃圾回收站

    一个被删除文件能够存放在一个“ 垃圾箱”的时间就是一个隔离时间, 这个时间可以用 mfsgettrashtime 命令来验证,也可以使用`mfssettrashtime 命令来设置。

    mfssettrashtime 64800 /MFS_data/test/test1 
    mfsgettrashtime /MFS_data/test/test1
    

    时间的单位是秒(有用的值有:1 小时是3600 秒,24 - 86400 秒,1天 - 604800 秒)。就像文件被存储的份数一样, 为一个目录设定存放时间是要被新创建的文件和目录所继承的。数字0 意味着一个文件被删除后, 将立即被彻底删除,在想回收是不可能的。

    删除文件可以通过一个单独安装MFSMETA 文件系统。特别是它包含目录/ trash (包含任然可以被还原的被删除文件的信息)和/ trash/undel (用于获取文件)。只有管理员有权限访问MFSMETA(用户的uid 0,通常是root)。

    /usr/local/mfs/bin/mfsmount -m /MFS_meta/ -H 172.16.18.137
    

    被删文件的文件名在“垃圾箱”目录里还可见,文件名由一个八位十六进制的数i-node 和被删文件的文件名组成,在文件名和i-node 之间不是用“/”,而是用了“|”替代。如果一个文件名的长度超过操作系统的限制(通常是255 个字符),那么部分将被删除。通过从挂载点起全路径的文件名被删除的文件任然可以被读写。

    移动这个文件到trash/undel 子目录下,将会使原始的文件恢复到正确的MooseFS 文件系统上路径下(如果路径没有改变)。如果在同一路径下有个新的同名文件,那么恢复不会成功。

    从“垃圾箱”中删除文件结果是释放之前被它站用的空间(删除有延迟,数据被异步删除)。

    在MFSMETA中还有另一个目录reserved,该目录内的是被删除但依然打开的文件。在用户关闭了这些被打开的文件后,reserved 目录中的文件将被删除,文件的数据也将被立即删除。在reserved 目录中文件的命名方法同trash 目录中的一样,但是不能有其他功能的操作。

    快照snapshot

    MooseFS 系统的另一个特征是利用mfsmakesnapshot 工具给文件或者是目录树做快照

    mfsmakesnapshot source ... destination
    

    Mfsmakesnapshot 是在一次执行中整合了一个或是一组文件的拷贝,而且任何修改这些文件的源文件都不会影响到源文件的快照, 就是说任何对源文件的操作,例如写入源文件,将不会修改副本(或反之亦然)。

    也可以使用mfsappendchunks:

    mfsappendchunks destination-file source-file ...
    

    当有多个源文件时,它们的快照被加入到同一个目标文件中(每个chunk 的最大量是chunk)。

    MFS集群维护

    启动MFS集群

    安全的启动MooseFS 集群(避免任何读或写的错误数据或类似的问题)的方式是按照以下命令步骤:

    1. 启动mfsmaster 进程
    2. 启动所有的mfschunkserver 进程
    3. 启动mfsmetalogger 进程(如果配置了mfsmetalogger)
    4. 当所有的chunkservers 连接到MooseFS master 后,任何数目的客户端可以利用mfsmount 去挂接被export 的文件系统。(可以通过检查master 的日志或是CGI 监视器来查看是否所有的chunkserver被连接)。

    停止MFS集群

    安全的停止MooseFS 集群:

    1. 在所有的客户端卸载MooseFS 文件系统(用umount 命令或者是其它等效的命令)
    2. 用mfschunkserver stop 命令停止chunkserver 进程
    3. 用mfsmetalogger stop 命令停止metalogger 进程
    4. 用mfsmaster stop 命令停止master 进程

    Chunkservers 的维护

    若每个文件的goal(目标)都不小于2,并且没有under-goal 文件(这些可以用mfsgetgoal –r和mfsdirinfo 命令来检查),那么一个单一的chunkserver 在任何时刻都可能做停止或者是重新启动。以后每当需要做停止或者是重新启动另一个chunkserver 的时候,要确定之前的chunkserver 被连接,而且要没有under-goal chunks。

    MFS元数据备份

    通常元数据有两部分的数据:

    • 主要元数据文件metadata.mfs,当mfsmaster 运行的时候会被命名为metadata.mfs.back
    • 元数据改变日志changelog.*.mfs,存储了过去的N 小时的文件改变(N 的数值是由BACK_LOGS参数设置的,参数的设置在mfschunkserver.cfg 配置文件中)。

    主要的元数据文件需要定期备份,备份的频率取决于取决于多少小时changelogs 储存。元数据changelogs 实时的自动复制。1.6版本中这个工作都由metalogger完成。

    MFS Master的恢复

    一旦mfsmaster 崩溃(例如因为主机或电源失败),需要最后一个元数据日志changelog 并入主要的metadata 中。这个操作时通过 mfsmetarestore 工具做的,最简单的方法是:

    mfsmetarestore -a
    

    如果master 数据被存储在MooseFS 编译指定地点外的路径,则要利用-d 参数指定使用,如:

    mfsmetarestore -a -d /opt/mfsmaster
    

    从MetaLogger中恢复Master

    如果mfsmetarestore -a无法修复,则使用metalogger也可能无法修复,暂时没遇到过这种情况,这里不暂不考虑。

    1. 找回metadata.mfs.back 文件,可以从备份中找,也可以中metalogger 主机中找(如果启动了metalogger 服务),然后把metadata.mfs.back 放入data 目录,一般为{prefix}/var/mfs
    2. 从在master 宕掉之前的任何运行metalogger 服务的服务器上拷贝最后metadata 文件,然后放入mfsmaster 的数据目录。
    3. 利用mfsmetarestore 命令合并元数据changelogs,可以用自动恢复模式mfsmetarestore –a,也可以利用非自动化恢复模式
    mfsmetarestore -m metadata.mfs.back -o metadata.mfs changelog_ml.*.mfs
    

    或:强制使用metadata.mfs.back创建metadata.mfs,可以启动master,但丢失的数据暂无法确定。

    Automated Failover

    生产环境使用 MooseFS 时,需要保证 master 节点的高可用。 使用 ucarp 是一种比较成熟的方案,或者 DRBD+[hearbeat|keepalived] 。 ucarp 类似于 keepalived ,通过主备服务器间的健康检查来发现集群状态,并执行相应操作。另外 MooseFS商业版本已经支持双主配置,解决单点故障。

    moosefs-chunkserver升级

    systemctl stop moosefs-chunkserver

    yum -y update  moosefs-chunkserver

    systemctl start moosefs-chunkserver   #启动异常

    mfschunkserver -u
    open files limit has been set to: 16384
    working directory: /var/lib/mfs
    config: using default value for option 'FILE_UMASK' - '23'
    lockfile created and locked
    config: using default value for option 'LIMIT_GLIBC_MALLOC_ARENAS' - '4'
    setting glibc malloc arena max to 4
    setting glibc malloc arena test to 4
    config: using default value for option 'DISABLE_OOM_KILLER' - '1'
    initializing mfschunkserver modules ...
    config: using default value for option 'HDD_LEAVE_SPACE_DEFAULT' - '256MiB'
    hdd space manager: data folder '/data/mfs/' already locked (used by another process)
    hdd space manager: no hdd space defined in /etc/mfs/mfshdd.cfg file
    init: hdd space manager failed !!!
    error occurred during initialization - exiting

    ps -ef |grep mfs
    mfs 51699 1 1 2017 ? 1-15:27:28 /usr/sbin/mfschunkserver start

    把这个进程结束掉

    删除PID文件:rm -rf   /var/lib/mfs/.mfschunkserver.lock

  • 相关阅读:
    vue中封装axios方法
    ajax请求步骤
    angular 封装公共方法
    vue 二级列表折叠面板
    eslint常规语法检
    angular 4 路由变化的时候实时监测刷新组件
    json写入到excel表
    angular4 常用pipe管道
    angular4 在页面跳转的时候传递多个参数到新页面
    easyui 动态加载语言包
  • 原文地址:https://www.cnblogs.com/linkenpark/p/7416998.html
Copyright © 2020-2023  润新知