• 在 Azure CentOS VM 中配置 SQL Server 2019 AG


    前文

    • 假定您对Azure和SQL Server HA具有基础知识
    • 假定您对Azure Cli具有基础知识
    • 目标是在Azure Linux VM上创建一个具有三个副本的可用性组,并实现侦听器和Fencing配置

    环境

    • SQL Server 2019 Developer on Linux
    • Azure VM Fencing agent
    • Azure Cli实现部分配置
    • CentOS 7.7 Azure VM,分别SQL19N1,SQL19N2,SQL19N3,位于同一VNet

    步骤

    • 为VM创建资源组和可用性集

    # 中国东部2创建资源组
    az group create --name SQL-DEMO-RG --location chinaeast2
    
    # 创建用于VM的Availability Set,配置2个容错域,2个更新域
    az vm availability-set create 
        --resource-group SQL-DEMO-RG 
        --name AGLinux-AvailabilitySet 
        --platform-fault-domain-count 2 
        --platform-update-domain-count 2
    
    • 使用Template部署3台VM

    第一次创建VM时,会生成template,然后下载保存下,修改其中的参数值后,就可以方便地创建配置类似的VM。VM的配置主要有:

    • 使用前面的可用性集
    • 使用同一个子网
    • IP使用Standard
    • SSH public key配置

    模板和参数文件太长,就不展示了。可以在Azure Portal上自行获取。

    # 如下是SQL19N2的配置,修改参数文件后,直接可以用于创建SQL19N3
    templateFile="./templateFile"
    paramFile="./vmParams-sql19n2.json"
    az deployment group validate --name sql19n2vm 
         -g SQL-DEMO-RG --template-file $templateFile --parameters $paramFile
    
    
    • 配置VM使用固定内网IP和公网DNS Label

    三台VM都需要修改配置,如下只是一台的配置示例

    # 找出nic和IP的信息
    az network nic list -g SQL-DEMO-RG --query "[].{nicName:name,configuration:ipConfigurations[].{ipName:name,ip:privateIpAddress,method:privateIpAllocationMethod}}" -o yaml
    
    # 修改privateIpAllocationMethod为Static
    az network nic ip-config update -g SQL-DEMO-RG --nic-name sql19n1152 --name ipconfig1 --set privateIpAllocationMethod=Static
    
    # 找出pbulic ip名称
    az network public-ip list -g SQL-DEMO-RG --query "[].name" -o tsv
    
    # 配置Public IP的DNS name,只能使用数字和小写字母
    az network public-ip update -g SQL-DEMO-RG -n SQL19N1ip851 --dns-name sql19n1
    
    • 安装HA相关软件包

    最好先更新一下系统的软件包,再安装HA相关软件。

    yum update -y
    yum install -y pacemaker pcs fence-agents-all resource-agents fence-agents-azure-arm
    reboot
    
    • 为群集和SQL Server开放防火墙端口

    # Pacemaker和Corosync的端口
    # TCP: Ports 2224,3121,21064,5405
    # UDP: Port 5405
    firewall-cmd --add-port=2224/tcp --permanent
    firewall-cmd --add-port=2224/tcp --permanent
    firewall-cmd --add-port=21064/tcp --permanent
    firewall-cmd --add-port=5405/tcp --permanent
    firewall-cmd --add-port=5405/udp --permanent
    
    # SQL Server端口和AG镜像端口
    # TCP: 1433,5022
    firewall-cmd --add-port=1433/tcp --permanent
    firewall-cmd --add-port=5022/tcp --permanent
    firewall-cmd --reload
    
    • 添加hosts记录

    vi /etc/hosts
    172.17.2.8      SQL19N1
    172.17.2.9      SQL19N2
    172.17.2.10     SQL19N3
    
    • 创建Pacemaker群集

    # 设置Pacemaker的默认用户密码,三台VM上
    passwd hacluster
    
    # 设置pacemaker和pcsd自启动在三台VM上
    systemctl enable pcsd
    systemctl start pcsd
    systemctl enable pacemaker
    
    # 创建群集,在master节点
    sudo pcs cluster auth SQL19N1 SQL19N2 SQL19N3 -u hacluster 
    sudo pcs cluster setup --name agcluster SQL19N1 SQL19N2 SQL19N3 --token 30000 --force
    sudo pcs cluster start --all
    sudo pcs cluster enable --all
    # 查看群集状态
    pcs status
    # 在三个节点上修改quorum的expected-votes为3,其实三节点群集默认为3
    # 设置表示,群集存活需要3票,这个修改只影响当前running的群集,不会变成群集的永久性配置保存下来
    pcs quorum expected-votes 3
    
    • 在Azure上为Fencing Agent配置Service Principal

    # 1. 创建 AAD App,成功后记录下相应的appID
     az ad app create --display-name sqldemorg-app --identifier-uris http://localhost  
     --password "1qaz@WSX3edc" --end-date '2030-04-27' --credential-description "sql19 ag secret"
     
    # 2. 创建AAD App的Service Principal
    az ad sp create --id <appID>
    
    # 3. 将Service Principal分配到VM对应的管理role,对每个VM都要执行
    # 我这里分配的是Owner role,这不是安全的做法。应该使用自定义一个role,只给最小权限
    # 自定义role需要Azure订阅是PP1或者PP2级别
    
    az role assignment create --assignee <appID> --role owner 
    --scope /subscriptions/<subscription-ID>/resourceGroups/<resourceGroup-Name>/providers/Microsoft.Compute/virtualMachines/SQL19N1
    
    • 创建Azure的STONITH 设备

    我使用的是Azure China,所以需要指定cloud=china,如果使用global Azure不需要指定此参数。
    执行 fence_azure_arm -h,查看此资源代理的更多帮助信息

    pcs property set stonith-timeout=900
    pcs stonith create rsc_st_azure fence_azure_arm login="<ApplicationID>" passwd="<servicePrincipalPassword>" resourceGroup="<resourceGroupName>" tenantId="<tenantID>" subscriptionId="<subscriptionId>" power_timeout=240 pcmk_reboot_timeout=900 cloud=china
    
    • 安装SQL 2019及工具

    # 安装 SQL 2019和HA 资源代理
    sudo curl -o /etc/yum.repos.d/mssql-server.repo https://packages.microsoft.com/config/rhel/7/mssql-server-2019.repo
    sudo yum install -y mssql-server
    sudo /opt/mssql/bin/mssql-conf setup
    sudo yum install mssql-server-ha
    
    # 安装 mssql-tools
    sudo curl -o /etc/yum.repos.d/msprod.repo https://packages.microsoft.com/config/rhel/7/prod.repo
    sudo yum install -y mssql-tools unixODBC-devel
    # 将mssql-tools目录加入到aPATH,方便使用
    echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
    echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
    source ~/.bashrc
    
    # 安装 mssql-cli
    sudo rpm --import https://packages.microsoft.com/keys/microsoft.asc
    sudo curl -o /etc/yum.repos.d/mssql-cli.repo https://packages.microsoft.com/config/rhel/7/prod.repo
    sudo yum install mssql-cli
    
    # 查看SQL 状态
    systemctl status mssql-server
    

    如果您熟悉 SQL Server相关的PowerShell,建议将PowerShell也安装上,并安装SQLServer module。对SQL Server的配置,使用PowerShell会方便很多

    yum install powershell -y
    pwsh
    Install-Module SQLServer
    # 查看SQL相关的命令
    Get-Command -Module SQLServer
    
    • 配置AG

    • 创建PowerShell 函数方便后续执行T-SQL
    # 打开PowerShell的 profile文件,如果不存在则需要创建
    vi /root/.config/powershell/Microsoft.PowerShell_profile.ps1
    
    # 将如下函数加入 到 profile文件中,每次打开pwsh时就可以直接调用
    # 函数有两个参数,$sql表示需要执行的T-SQL,最好使用here-string以避免字符转义问题
    # $servers表示目标实例,数组类型。默认值为当前环境中的三个实例
    function run-sql ($sql,$servers=("SQL19N1","SQL19N2","SQL19N3"))
    {
            $secpasswd = "1qaz@WSX"|ConvertTo-SecureString -AsPlainText -Force
            $cred=New-Object System.Management.Automation.PSCredential -ArgumentList 'sa', $secpasswd
            $sql
            "---------"
            foreach($svr in $servers) {"Running T-SQL on $svr..."; Invoke-Sqlcmd -ServerInstance $svr -Credential $cred -Query $sql}
    }
    

    • 启用 hadr功能,每个实例
    sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1
    sudo systemctl restart mssql-server
    
    • 启动AG extened event session
    # T-SQL,每个实例
    ALTER EVENT SESSION  AlwaysOn_health ON SERVER WITH (STARTUP_STATE=ON);
    GO
    
    • 在主副本实例上创建证书,这个证书用于验证Mirroring endpoint通信。将证书和私钥复制到其它节点上的相同的目录位置。授予mssql用户访问权限
    CREATE MASTER KEY ENCRYPTION BY PASSWORD = '1qaz@WSX';
    GO
    CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm';
    GO
    BACKUP CERTIFICATE dbm_certificate
       TO FILE = '/var/opt/mssql/data/dbm_certificate.cer'
       WITH PRIVATE KEY (
               FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
               ENCRYPTION BY PASSWORD = '1qaz@WSX'
           );
    
    # 复制证书和私钥到辅助副本主机SQL19N2和SQL19N3
    cd /var/opt/mssql/data
    scp dbm_certificate.* root@SQL19N2:/var/opt/mssql/data/
    scp dbm_certificate.* root@SQL19N3:/var/opt/mssql/data/
    
    # 辅助副本节点上修改权限
    cd /var/opt/mssql/data
    chown mssql:mssql dbm_certificate.*
    
    • 在辅助副本实例中创建master key并导入证书
    CREATE MASTER KEY ENCRYPTION BY PASSWORD = '1qaz@WSX';
    GO
    CREATE CERTIFICATE dbm_certificate
        FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer'
        WITH PRIVATE KEY (
        FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
        DECRYPTION BY PASSWORD = '1qaz@WSX'
                );
    
    • 创建AG的镜像端口,注意防火墙和NSG配置端口例外
    CREATE ENDPOINT [Hadr_endpoint]
        AS TCP (LISTENER_PORT = 5022)
        FOR DATABASE_MIRRORING (
    	    ROLE = ALL,
    	    AUTHENTICATION = CERTIFICATE dbm_certificate,
    		ENCRYPTION = REQUIRED ALGORITHM AES
    		);
    GO
    ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
    
    • 创建三个副本,同步模式的AG,主副本实例上执行
    CREATE AVAILABILITY GROUP [ag1]
         WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL)
         FOR REPLICA ON
             N'SQL19N1' 
     	      	WITH (
      	       ENDPOINT_URL = N'tcp://SQL19N1:5022',
      	       AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
      	       FAILOVER_MODE = EXTERNAL,
      	       SEEDING_MODE = AUTOMATIC,
      	       SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)
      	       ),
             N'SQL19N2' 
      	    WITH ( 
      	       ENDPOINT_URL = N'tcp://SQL19N2:5022', 
      	       AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
      	       FAILOVER_MODE = EXTERNAL,
      	       SEEDING_MODE = AUTOMATIC,
      	       SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)
      	       ),
      	   N'SQL19N3'
             WITH( 
      	      ENDPOINT_URL = N'tcp://SQL19N3:5022', 
      	      AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
      	      FAILOVER_MODE = EXTERNAL,
      	      SEEDING_MODE = AUTOMATIC,
      	      SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)
      	      );
    GO
    ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;
    GO
    
    • 为Pacemaker创建sql登录并授权,每个实例
    USE [master]
    GO
    CREATE LOGIN [pacemakerLogin] with PASSWORD= N'1qaz@WSX'
    go
    ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin];
    GO
    
    • 将pacemaker的login信息保存到本地文件
    echo "pacemakerLogin" >> /var/opt/mssql/secrets/passwd
    echo "1qaz@WSX" >> /var/opt/mssql/secrets/passwd
    
    # 只允许root读取
    chown root:root /var/opt/mssql/secrets/passwd
    chmod 400 /var/opt/mssql/secrets/passwd
    
    
    • 将辅助副本加入到AG, 辅助副本执行
    ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL);
    GO
    # auto_seeding功能需要的权限
    ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;
    GO
    
    • 如果您不希望pacemakerLogin具有sysadmin的权限,可以将之从sysadmin中移除,并授予如下权限。每个实例
    ALTER SERVER ROLE [sysadmin] DROP MEMBER [pacemakerLogin]
    GO
    GRANT ALTER, CONTROL, VIEW DEFINITION ON AVAILABILITY GROUP::ag1 TO pacemakerLogin;
    GO
    GRANT VIEW SERVER STATE TO pacemakerLogin;
    GO
    
    • 添加数据库到AG,主副本执行
    CREATE DATABASE [db1];
    GO
    ALTER DATABASE [db1] SET RECOVERY FULL;
    GO
    BACKUP DATABASE [db1]
       TO DISK = N'nul';
    GO
    ALTER AVAILABILITY GROUP [ag1] ADD DATABASE [db1];
    GO
    
    • 可用性数据库状态
    SELECT * FROM sys.databases WHERE name = 'db1';
    GO
    SELECT DB_NAME(database_id) AS 'database', synchronization_state_desc FROM sys.dm_hadr_database_replica_states;
    
    • 在Pacemaker群集中配置AG


    • 创建AG资源,ag_name要指定为之前创建AG名称
    pcs resource create agcluster ocf:mssql:ag ag_name=ag1 meta failure-timeout=30s master notify=true
    
    • 创建虚拟IP资源
    # 禁用fencing
    pcs property set stonith-enabled=false
    
    # 创建VIP
    pcs resource create virtualip ocf:heartbeat:IPaddr2 ip=172.17.2.7
    
    
    • 创建 colacation constraint,vip和master必需在同一个节点上启动
    pcs constraint colocation add virtualip agcluster-master INFINITY with-rsc-role=Master
    
    • 创建 ordering constraint,vip要先于master副本资源启动
    pcs constraint order promote agcluster-master then start virtualip
    
    # 查看当前的约束
    pcs constraint show --full
    
    • 重新启用STONITH并查看群集状态
    pcs property set stonith-enabled=true
    pcs status
    # 我的环境中的状态信息
    ---------------------------------------
    Cluster name: agcluster
    Stack: corosync
    Current DC: SQL19N3 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum
    Last updated: Wed Apr 29 04:24:50 2020
    Last change: Wed Apr 29 04:24:45 2020 by root via cibadmin on SQL19N1
    
    3 nodes configured
    5 resources configured
    
    Online: [ SQL19N1 SQL19N2 SQL19N3 ]
    
    Full list of resources:
    
     rsc_st_azure   (stonith:fence_azure_arm):      Started SQL19N1
     Master/Slave Set: agcluster-master [agcluster]
         Masters: [ SQL19N1 ]
         Slaves: [ SQL19N2 SQL19N3 ]
     virtualip      (ocf::heartbeat:IPaddr2):       Started SQL19N1
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
    
    • 测试Failover和Fencing
    # 手动failover
    pcs resource move agcluster-master SQL19N2 --master
    pcs status
    
    # 手动 failover会生成一个constraint,避免AG资源再回到原来的节点
    # 如果希望AG后续还能 failover回来,需要手动删除之
    pcs constraint show --full
    pcs constraint remove cli-prefer-agcluster-master
    
    # 尝试Fencing群集节点,每个节点都试一下
    # 如下命令的fencing只是重启node,如果要关闭node,使用--off参数
    pcs stonith fence SQL19N3 --debug
    
  • 相关阅读:
    关于React的脚手架
    yarn和npm
    谈谈NPM和Webpack的关系
    php开发环境和框架phalcon的搭建
    Centos6.5--svn搭建
    System.Diagnostics.Process.Start(ProcessStartInfo)
    PHP错误:call to undefined function imagecreatetruecolor
    PostgreSQL删除表中重复数据行
    URL存在http host头攻击漏洞-修复方案
    for循环的执行顺序
  • 原文地址:https://www.cnblogs.com/Joe-T/p/12803084.html
Copyright © 2020-2023  润新知