• 在 AWS 平台搭建 DolphinScheduler


    AWS平台搭建 DolphinScheduler

    DolphinScheduler 是当前热门的调度器,提供了完善的可视化、拖拉拽式的调度。在 AWS 平台上提供了 airflow 与 step function 这两种调度工具,但两者在可视化操作上的支持较为有限,无法满足所有AWS用户场景。有部分用户、场景对 DolphinSchduler 有需求,所以本文提供了详细的在 AWS 上搭建 DolphinScheduler 的步骤。

    本文目标:指导创建一个Dolphin 集群,包含1个MasterServer 与 1个WorkerServer。可以直接与 EMR 环境进行集成,Hive、Spark 等 job 可以直接提交到后端 EMR 集群。

    1. 规划

    1.1. 节点选型

    2台EC2节点,分别为 :

    l  MasterServer:包含MasterServer、Zookeeper、ApiServer、AlertServer

    l  WorkerServer:包含WokerServer

    配置均为c5.xlarge(4 vCPU,8GB), Amazon Linux 2,50GB EBS卷存储。

    1台RDS MySQL8。配置为db.m6g.large,存储为20g。与Dolphinscheduler 节点同一个子网。

    2. 部署

    前置条件

    1. EMR集群已就绪(测试EMR版本为5.30.1)
    2. RDS MySQL 8 已就绪

    2.1. 制作Hadoop环境

    参考文档在实例内配置与目标EMR集群同样配置的环境:

    https://laurence.blog.csdn.net/article/details/108529087

    https://github.com/bluishglc/emr-edgenode-maker

    按1.1.中介绍的节点选型启动EC2实例。建议将实例启动在与EMR集群同一个子网,并在安全组中允许所有来自EMR安全组的流量。

    上传pem文件到实例:

    scp -i keys/xx.pem keys/xx.pem ec2-user@xxx:

    ssh登录到实例,执行:

    sudo yum install -y git

    git clone https://github.com/bluishglc/emr-edgenode-maker.git

    cd emr-edgenode-maker/

    chmod a+x make-emr-edge-node.sh

    # 其中10.0.0.193 为 EMR Master 节点私网ip

    sudo ./make-emr-edge-node.sh init /home/ec2-user/xx.pem 10.0.0.193

    # 重启实例

    sudo reboot

    # 再次ssh登陆实例

    # 制作hadoop客户端

    sudo ./emr-edgenode-maker/make-emr-edge-node.sh make-hadoop-client /home/ec2-user/xx.pem 10.0.0.193

    # 制作spark客户端

    sudo ./emr-edgenode-maker/make-emr-edge-node.sh make-spark-client /home/ec2-user/xx.pem 10.0.0.193

    # 制作 hive 客户端

    sudo ./emr-edgenode-maker/make-emr-edge-node.sh make-hive-client /home/ec2-user/xx.pem 10.0.0.193

    # 制作hbase 客户端

    sudo ./emr-edgenode-maker/make-emr-edge-node.sh make-hbase-client /home/ec2-user/xx.pem 10.0.0.193

    # 为hadoop用户配置ssh公钥

    sudo cp ~/.ssh/authorized_keys /home/hadoop/.ssh/authorized_keys

    sudo chown hadoop:hadoop /home/hadoop/.ssh/authorized_keys

    # 测试 hdfs 与 hive

    ssh -i xx.pem hadoop@localhost

    hdfs dfs -ls /

    hive -e "select version()"

    2.2. 安装dolphinscheduler

    安装最新版 3.0.0-beta-1

    官方下载地址:https://dolphinscheduler.apache.org/zh-cn/download/download.html

    参考官方安装文档:

    https://dolphinscheduler.apache.org/zh-cn/docs/latest/user_doc/guide/installation/cluster.html

    https://dolphinscheduler.apache.org/zh-cn/docs/latest/user_doc/guide/installation/pseudo-cluster.html

    以 hadoop 用户登陆实例,配置用户免密以及权限:

    # 创建用户需使用 root 登录

    sudo useradd dolphinscheduler

    # 添加密码

    sudo su

    echo "dolphinscheduler" | passwd --stdin dolphinscheduler

    exit

    # 配置 sudo 免密

    sudo sed -i '$adolphinscheduler  ALL=(ALL)  NOPASSWD: NOPASSWD: ALL' /etc/sudoers

    sudo sed -i 's/Defaults    requirett/#Defaults    requirett/g' /etc/sudoers

    # 下载dolphinschduler bin格式安装包

    wget https://dlcdn.apache.org/dolphinscheduler/3.0.0-beta-1/apache-dolphinscheduler-3.0.0-beta-1-bin.tar.gz

    tar -zxvf apache-dolphinscheduler-3.0.0-beta-1-bin.tar.gz

    # 修改目录权限,使得部署用户对二进制包解压后的 apache-dolphinscheduler-*-bin 目录有操作权限

    sudo chown -R dolphinscheduler:dolphinscheduler apache-dolphinscheduler-3.0.0-beta-1-bin

    配置机器ssh免密登陆

    sudo su dolphinscheduler

    cd ~

    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

    chmod 600 ~/.ssh/authorized_keys

    注意: 配置完成后,可以通过运行命令 ssh localhost 判断是否成功,如果不需要输入密码就能ssh登陆则证明成功

    2.3. 配置dolphinscheduler

    使用当前实例创建一个AMI,然后通过AMI启动一个新实例。

    当前实例为MasterServer(10.0.0.41),新实例为WorkerServer(10.0.0.244)。

    2.3.1. 配置MasterServer

    启动zookeeper:

    sudo zookeeper-server start

    修改DS相关配置。修改apache-dolphinscheduler-3.0.0-beta-1-bin/bin/env/install_env.sh

    sudo vi apache-dolphinscheduler-3.0.0-beta-1-bin/bin/env/install_env.sh

    修改内容:

    # A comma separated list of machine hostname or IP would be installed DolphinScheduler,

    # including master, worker, api, alert. If you want to deploy in pseudo-distributed

    # mode, just write a pseudo-distributed hostname

    # Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"

    ips=${ips:-"10.0.0.41,10.0.0.244"}

    # A comma separated list of machine hostname or IP would be installed Master server, it

    # must be a subset of configuration `ips`.

    # Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"

    masters=${masters:-"10.0.0.41"}

    # A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a

    # subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts

    # Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"

    workers=${workers:-"10.0.0.244:default"}

    # A comma separated list of machine hostname or IP would be installed Alert server, it

    # must be a subset of configuration `ips`.

    # Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"

    alertServer=${alertServer:-"10.0.0.41"}

    # A comma separated list of machine hostname or IP would be installed API server, it

    # must be a subset of configuration `ips`.

    # Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"

    apiServers=${apiServers:-"10.0.0.41"}

    # The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.

    # Do not set this configuration same as the current path (pwd)

    installPath=${installPath:-"~/dolphinscheduler"}

    # The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`

    # script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs

    # to be created by this user

    deployUser=${deployUser:-"dolphinscheduler"}

    修改dolphinscheduler_env.sh文件

    sudo vi apache-dolphinscheduler-3.0.0-beta-1-bin/bin/env/dolphinscheduler_env.sh

    # 修改内容

    export HADOOP_HOME=${HADOOP_HOME:-/usr/lib/hadoop}

    export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}

    export SPARK_HOME1=${SPARK_HOME1:-/usr/lib/spark}

    export PYTHON_HOME=${PYTHON_HOME:-/usr/lib64/python3.7}

    export JAVA_HOME=${JAVA_HOME:-/etc/alternatives/jre}

    export HIVE_HOME=${HIVE_HOME:-/usr/lib/hive}

    #export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}

    #export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}

    export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH

    export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}

    export DATABASE=${DATABASE:-mysql}

    export SPRING_PROFILES_ACTIVE=${DATABASE}

    export SPRING_DATASOURCE_DRIVER_CLASS_NAME=com.mysql.cj.jdbc.Driver

    export SPRING_DATASOURCE_URL=jdbc:mysql://xxxx:3306/dolphinscheduler

    export SPRING_DATASOURCE_USERNAME=tang

    export SPRING_DATASOURCE_PASSWORD=password

    export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}

    export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}

    export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}

    export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-10.0.0.41:2181}

    2.3.2. 初始化数据库

    使用RDS MySQL 8 作为DS的数据库。

    下载mysql connector,放入DS 的lib目录下的tools/libs/

    在MasterServer节点执行:

    wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.28/mysql-connector-java-8.0.28.jar

    sudo cp mysql-connector-java-8.0.28.jar apache-dolphinscheduler-3.0.0-beta-1-bin/tools/libs/

    sudo cp mysql-connector-java-8.0.28.jar apache-dolphinscheduler-3.0.0-beta-1-bin/master-server/libs/

    sudo cp mysql-connector-java-8.0.28.jar apache-dolphinscheduler-3.0.0-beta-1-bin/worker-server/libs/

    sudo cp mysql-connector-java-8.0.28.jar apache-dolphinscheduler-3.0.0-beta-1-bin/api-server/libs/

    sudo cp mysql-connector-java-8.0.28.jar apache-dolphinscheduler-3.0.0-beta-1-bin/alert-server/libs/

    登陆mysql,创建用户与数据库:

    sudo yum install -y mysql

    # 登陆rds mysql

    mysql -hxxx -utang -p

    # 创建dolphinscheduler 数据库

    CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

    初始化数据库:

    bash apache-dolphinscheduler-3.0.0-beta-1-bin/tools/bin/upgrade-schema.sh

    2.3.3. 启动DolphineScheduler

    su dolphinscheduler

    cd ~

    sudo cp -r /home/hadoop/apache-dolphinscheduler-3.0.0-beta-1-bin ./

    sudo chown -R dolphinscheduler:dolphinscheduler apache-dolphinscheduler-3.0.0-beta-1-bin

    # 加入zookeeper lib

    sudo cp /usr/lib/hadoop/lib/zookeeper-3.4.14.jar apache-dolphinscheduler-3.0.0-beta-1-bin/master-server/libs/

    sudo cp /usr/lib/hadoop/lib/zookeeper-3.4.14.jar apache-dolphinscheduler-3.0.0-beta-1-bin/worker-server/libs/

    sudo cp /usr/lib/hadoop/lib/zookeeper-3.4.14.jar apache-dolphinscheduler-3.0.0-beta-1-bin/api-server/libs/

    sudo cp /usr/lib/hadoop/lib/zookeeper-3.4.14.jar apache-dolphinscheduler-3.0.0-beta-1-bin/alert-server/libs/

    # 启动进程

    apache-dolphinscheduler-3.0.0-beta-1-bin/bin/install.sh

    2.3.4. 登陆DolphinScheduler

    浏览器访问地址 http://masterserver_ip:12345/dolphinscheduler/ui 即可登录系统UI。

    默认的用户名和密码是 admin/dolphinscheduler123

  • 相关阅读:
    GitLab基本用法
    SSH免密登录详解
    一文搞懂GitLab安装部署及服务配置
    初识:LevelDB
    Jenkins安装与Gitlab项目部署详解
    CentOS7的安装和配置
    C/C++语言的学习方向
    C语言atoi函数
    C语言整数的取值范围
    C语言scanf函数
  • 原文地址:https://www.cnblogs.com/zackstang/p/16380306.html
Copyright © 2020-2023  润新知