• Kubernetes搭建Hadoop服务


    网上使用Kubernetes搭建Hadoop的资料较少,因此自己尝试做了一个,记录下过程和遇到的问题。

    一、选择镜像

    首先从官方Docker Hub中选择比较热门的镜像。这里选择了bde2020的系列镜像,因为其Githab上的资料比较完善。https://github.com/big-data-europe/docker-hadoop

    二、使用docker-compose进行测试

    网站上给出的是使用docker-compose运行此hadoop镜像的方法,按照网站上操作即可。

    docker-compose是Docker自带的容器编排工具,操作简单,只需要将docker-compose.yml和hadoop.env文件下载到本地,使用docker-compose up命令即可启动。停止服务执行docker-compose down命令。

    三、编写各个组件的Kubernetes yaml文件

    上面的docker-compose案例虽然简单,但是功能较少,且运行于同一台机器上。我们要做的就是把docker-compose的yaml文件的语法改写为Kubernetes的yaml文件语法。

    1.创建configmap

    配置文件可以通过configmap录入。参考hadoop.env,编写configmap.yaml如下:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: hadoop-config
    data:
      CORE_CONF_fs_defaultFS: "hdfs://namenode:8020"
      CORE_CONF_hadoop_http_staticuser_user: "root"
      CORE_CONF_hadoop_proxyuser_hue_hosts: "*"
      CORE_CONF_hadoop_proxyuser_hue_groups: "*"
    
      HDFS_CONF_dfs_webhdfs_enabled: "true"
      HDFS_CONF_dfs_permissions_enabled: "false"
     
      YARN_CONF_yarn_log___aggregation___enable: "true"
      YARN_CONF_yarn_resourcemanager_recovery_enabled: "true"
      YARN_CONF_yarn_resourcemanager_store_class: "org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore"
      YARN_CONF_yarn_resourcemanager_fs_state___store_uri: "/rmstate"
      YARN_CONF_yarn_nodemanager_remote___app___log___dir: "/app-logs"
      YARN_CONF_yarn_log_server_url: "http://historyserver:8188/applicationhistory/logs/"
      YARN_CONF_yarn_timeline___service_enabled: "true"
      YARN_CONF_yarn_timeline___service_generic___application___history_enabled: "true"
      YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled: "true"
      YARN_CONF_yarn_resourcemanager_hostname: "resourcemanager"
      YARN_CONF_yarn_timeline___service_hostname: "historyserver"
      YARN_CONF_yarn_resourcemanager_address: "resourcemanager:8032"
      YARN_CONF_yarn_resourcemanager_scheduler_address: "resourcemanager:8030"
      YARN_CONF_yarn_resourcemanager_resource___tracker_address: "resourcemanager:8031"
    

    2.创建namenode

    hadoop节点间的通信使用hostname,但是pod在创建时会被系统随机指定一个hostname并写入自己的/etc/hosts文件中,从而造成节点间的通信问题,出现UnresolvedAddressException等错误信息。这里坑了我好久,查了很多资料才发现问题。

    解决方法就是在service中将clusterIP指定为None,并在deployment中指定hostname与service名称一致。为了避免混淆,后面的service name、container name、hostname等都设为相同的值。

    注意service中clusterIP一定要设定为None,否则使用yarn处理MapReduce任务时会报错!

    namenode需要挂载volume,因此先编写pvc.yaml(需要先创建StorageClass,具体可参考我之前的博客https://www.cnblogs.com/00986014w/p/9406962.html):

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: hadoop-namenode-pvc
    spec:
      storageClassName: nfs
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 1Gi
    

      

    编写namenode的service和deployment文件namenode.yaml如下(把所有可能用到的端口都暴露了,其实不需要这么多):

    apiVersion: v1
    kind: Service
    metadata:
      name: namenode
      labels:
        name: namenode
    spec:
      ports:
        - port: 50070
          name: http
        - port: 8020
          name: hdfs
        - port: 50075
          name: hdfs1
        - port: 50010
          name: hdfs2
        - port: 50020
          name: hdfs3
        - port: 9000
          name: hdfs4
        - port: 50090
          name: hdfs5
        - port: 31010
          name: hdfs6
        - port: 8030
          name: yarn1
        - port: 8031
          name: yarn2
        - port: 8032
          name: yarn3
        - port: 8033
          name: yarn4
        - port: 8040
          name: yarn5
        - port: 8042
          name: yarn6
        - port: 8088
          name: yarn7
        - port: 8188
          name: historyserver
      selector:
        name: namenode
      clusterIP: None
    ---
    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: namenode
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            name: namenode
        spec:
          hostname: namenode
          containers:
            - name: namenode
              image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
              imagePullPolicy: IfNotPresent
              ports:
                - containerPort: 50070
                  name: http
                - containerPort: 8020
                  name: hdfs
                - containerPort: 50075
                  name: hdfs1
                - containerPort: 50010
                  name: hdfs2
                - containerPort: 50020
                  name: hdfs3
                - containerPort: 9000
                  name: hdfs4
                - containerPort: 50090
                  name: hdfs5
                - containerPort: 31010
                  name: hdfs6
                - containerPort: 8030
                  name: yarn1
                - containerPort: 8031
                  name: yarn2
                - containerPort: 8032
                  name: yarn3
                - containerPort: 8033
                  name: yarn4
                - containerPort: 8040
                  name: yarn5
                - containerPort: 8042
                  name: yarn6
                - containerPort: 8088
                  name: yarn7
                - containerPort: 8188
                  name: historyserver
              env:
                - name: CLUSTER_NAME
                  value: test
              envFrom:
                - configMapRef:
                    name: hadoop-config
              volumeMounts:
                - name: hadoop-namenode
                  mountPath: /hadoop/dfs/name
          volumes:
            - name: hadoop-namenode
              persistentVolumeClaim:
                claimName: hadoop-namenode-pvc
    

    2.datanode

    创建3个datanode。以datanode1为例,编写datanode的datanode.yaml如下(pvc与namenode的类似,不贴出来了):

    apiVersion: v1
    kind: Service
    metadata:
      name: datanode1
      labels:
        name: datanode1
    spec:
      ports:
        - port: 50070
          name: http
        - port: 8020
          name: hdfs
        - port: 50075
          name: hdfs1
        - port: 50010
          name: hdfs2
        - port: 50020
          name: hdfs3
        - port: 9000
          name: hdfs4
        - port: 50090
          name: hdfs5
        - port: 31010
          name: hdfs6
        - port: 8030
          name: yarn1
        - port: 8031
          name: yarn2
        - port: 8032
          name: yarn3
        - port: 8033
          name: yarn4
        - port: 8040
          name: yarn5
        - port: 8042
          name: yarn6
        - port: 8088
          name: yarn7
        - port: 8188
          name: historyserver
      selector:
        name: datanode1
      clusterIP: None
    ---
    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: datanode1
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            name: datanode1
        spec:
          hostname: datanode1
          containers:
            - name: datanode1
              image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
              imagePullPolicy: IfNotPresent
              ports:
                - containerPort: 50070
                  name: http
                - containerPort: 8020
                  name: hdfs
                - containerPort: 50075
                  name: hdfs1
                - containerPort: 50010
                  name: hdfs2
                - containerPort: 50020
                  name: hdfs3
                - containerPort: 9000
                  name: hdfs4
                - containerPort: 50090
                  name: hdfs5
                - containerPort: 31010
                  name: hdfs6
                - containerPort: 8030
                  name: yarn1
                - containerPort: 8031
                  name: yarn2
                - containerPort: 8032
                  name: yarn3
                - containerPort: 8033
                  name: yarn4
                - containerPort: 8040
                  name: yarn5
                - containerPort: 8042
                  name: yarn6
                - containerPort: 8088
                  name: yarn7
                - containerPort: 8188
                  name: historyserver
              envFrom:
                - configMapRef:
                    name: hadoop-config
              volumeMounts:
                - name: hadoop-datanode1
                  mountPath: /hadoop/dfs/data
          volumes:
            - name: hadoop-datanode1
              persistentVolumeClaim:
                claimName: hadoop-datanode1-pvc     
    

    创建完成后,一定要用kubectl logs查看一下日志,确认没有错误信息后再继续下一步。

    3.resourcemanager

    编写resourcemanager.yaml文件如下:

    apiVersion: v1
    kind: Service
    metadata:
      name: resourcemanager
      labels:
        name: resourcemanager
    spec:
      ports:
        - port: 50070
          name: http
        - port: 8020
          name: hdfs
        - port: 50075
          name: hdfs1
        - port: 50010
          name: hdfs2
        - port: 50020
          name: hdfs3
        - port: 9000
          name: hdfs4
        - port: 50090
          name: hdfs5
        - port: 31010
          name: hdfs6
        - port: 8030
          name: yarn1
        - port: 8031
          name: yarn2
        - port: 8032
          name: yarn3
        - port: 8033
          name: yarn4
        - port: 8040
          name: yarn5
        - port: 8042
          name: yarn6
        - port: 8088
          name: yarn7
        - port: 8188
          name: historyserver
      selector:
        name: resourcemanager
      clusterIP: None
    ---
    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: resourcemanager
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            name: resourcemanager
        spec:
          hostname: resourcemanager
          containers:
            - name: resourcemanager
              image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8
              imagePullPolicy: IfNotPresent
              ports:
                - containerPort: 50070
                  name: http
                - containerPort: 8020
                  name: hdfs
                - containerPort: 50075
                  name: hdfs1
                - containerPort: 50010
                  name: hdfs2
                - containerPort: 50020
                  name: hdfs3
                - containerPort: 9000
                  name: hdfs4
                - containerPort: 50090
                  name: hdfs5
                - containerPort: 31010
                  name: hdfs6
                - containerPort: 8030
                  name: yarn1
                - containerPort: 8031
                  name: yarn2
                - containerPort: 8032
                  name: yarn3
                - containerPort: 8033
                  name: yarn4
                - containerPort: 8040
                  name: yarn5
                - containerPort: 8042
                  name: yarn6
                - containerPort: 8088
                  name: yarn7
                - containerPort: 8188
                  name: historyserver
              envFrom:
                - configMapRef:
                    name: hadoop-config 
    

    4.nodemanager

    编写nodemanager.yaml如下:

    apiVersion: v1
    kind: Service
    metadata:
      name: nodemanager1
      labels:
        name: nodemanager1
    spec:
      ports:
        - port: 50070
          name: http
        - port: 8020
          name: hdfs
        - port: 50075
          name: hdfs1
        - port: 50010
          name: hdfs2
        - port: 50020
          name: hdfs3
        - port: 9000
          name: hdfs4
        - port: 50090
          name: hdfs5
        - port: 31010
          name: hdfs6
        - port: 8030
          name: yarn1
        - port: 8031
          name: yarn2
        - port: 8032
          name: yarn3
        - port: 8033
          name: yarn4
        - port: 8040
          name: yarn5
        - port: 8042
          name: yarn6
        - port: 8088
          name: yarn7
        - port: 8188
          name: historyserver
      selector: 
        name: nodemanager1
      clusterIP: None
    ---
    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: nodemanager1
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            name: nodemanager1
        spec:
          hostname: nodemanager1
          containers:
            - name: nodemanager1
              image: bde2020/hadoop-nodemanager:1.1.0-hadoop2.7.1-java8
              imagePullPolicy: IfNotPresent
              ports:
                - containerPort: 50070
                  name: http
                - containerPort: 8020
                  name: hdfs
                - containerPort: 50075
                  name: hdfs1
                - containerPort: 50010
                  name: hdfs2
                - containerPort: 50020
                  name: hdfs3
                - containerPort: 9000
                  name: hdfs4
                - containerPort: 50090
                  name: hdfs5
                - containerPort: 31010
                  name: hdfs6
                - containerPort: 8030
                  name: yarn1
                - containerPort: 8031
                  name: yarn2
                - containerPort: 8032
                  name: yarn3
                - containerPort: 8033
                  name: yarn4
                - containerPort: 8040
                  name: yarn5
                - containerPort: 8042
                  name: yarn6
                - containerPort: 8088
                  name: yarn7
                - containerPort: 8188
              envFrom:
                - configMapRef:
                    name: hadoop-config
    

    5.historyserver

    pvc与前面类似。编写historyserver.yaml如下:

    apiVersion: v1
    kind: Service
    metadata:
      name: historyserver
      labels:
        name: historyserver
    spec:
      ports:
        - port: 50070
          name: http
        - port: 8020
          name: hdfs
        - port: 50075
          name: hdfs1
        - port: 50010
          name: hdfs2
        - port: 50020
          name: hdfs3
        - port: 9000
          name: hdfs4
        - port: 50090
          name: hdfs5
        - port: 31010
          name: hdfs6
        - port: 8030
          name: yarn1
        - port: 8031
          name: yarn2
        - port: 8032
          name: yarn3
        - port: 8033
          name: yarn4
        - port: 8040
          name: yarn5
        - port: 8042
          name: yarn6
        - port: 8088
          name: yarn7
        - port: 8188
          name: historyserver
      selector:
        name: historyserver
      clusterIP: None
    ---
    apiVersion: apps/v1beta1
    kind: Deployment
    metadata:
      name: historyserver
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            name: historyserver
        spec:
          hostname: historyserver
          containers:
            - name: historyserver
              image: bde2020/hadoop-historyserver:1.1.0-hadoop2.7.1-java8
              imagePullPolicy: IfNotPresent
              ports:
                - containerPort: 50070
                  name: http
                - containerPort: 8020
                  name: hdfs
                - containerPort: 50075
                  name: hdfs1
                - containerPort: 50010
                  name: hdfs2
                - containerPort: 50020
                  name: hdfs3
                - containerPort: 9000
                  name: hdfs4
                - containerPort: 50090
                  name: hdfs5
                - containerPort: 31010
                  name: hdfs6
                - containerPort: 8030
                  name: yarn1
                - containerPort: 8031
                  name: yarn2
                - containerPort: 8032
                  name: yarn3
                - containerPort: 8033
                  name: yarn4
                - containerPort: 8040
                  name: yarn5
                - containerPort: 8042
                  name: yarn6
                - containerPort: 8088
                  name: yarn7
                - containerPort: 8188
              envFrom:
                - configMapRef:
                    name: hadoop-config 
              volumeMounts:
                - name: hadoop-historyserver
                  mountPath: /hadoop/yarn/timeline
          volumes:
            - name: hadoop-historyserver
              persistentVolumeClaim:
                claimName: hadoop-historyserver-pvc
    

    以上几部分都用kubectl create创建后,参考GitHub,按照这5个部件对应的endpoint加上对应的端口,在浏览器上测试(需要在集群内部的某台机器上进行操作),如果能够正确显示Hadoop的页面,说明搭建成功!

    6.测试hdfs

    简单地测试一下节点间是否能够正常通行。

    使用kubectl exec -it namenode /bin/bash进入namenode内部,执行hdfs dfs -put /etc/issue /,看看是否能够正常上传。

    7.测试yarn

    进入namenode容器内部,按照https://www.cnblogs.com/ccskun/p/7820977.html中的操作进行测试,看看任务能否正常执行,看看resourcemanager的web页面能否看到finish的任务。

  • 相关阅读:
    慎用WSACleanup()
    WINAPI和APIENTRY是一样的
    LeetCode208:Implement Trie (Prefix Tree)
    C++学习笔记22,普通函数重载(1)
    代理server的理解(1):Windows环境下的代理server设置
    浅析android适配器adapter中的那些坑
    HTML网页之计算器代码
    Xcode 自己主动生成版本技术最佳实践
    封装RecyclerViewAdapter实现RecyclerView下拉刷新上拉载入很多其它
    Ubuntu下在Eclipse IDE for C/C++ Developers中怎样执行C语言的GTK程序?(已解决)
  • 原文地址:https://www.cnblogs.com/00986014w/p/9732796.html
Copyright © 2020-2023  润新知