• apache spark kubernets 部署试用


    spark 是一个不错的平台,支持rdd 分析stream 机器学习。。。
    以下为使用kubernetes 部署的说明,以及注意的地方

    具体的容器镜像使用别人已经构建好的

    deploy yaml 文件

    deploy-k8s.yaml

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:  
      name: spark-master
      namespace: big-data
      labels:
        app: spark-master
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            app: spark-master
        spec:
          containers:
          - name: spark-master
            image: bde2020/spark-master:2.3.1-hadoop2.7
            imagePullPolicy: IfNotPresent
            ports:
            - containerPort: 7077
            - containerPort: 8080
            env:
            - name: ENABLE_INIT_DAEMON
              value: "false"
            - name: SPARK_MASTER_PORT
              value: "7077"
    
    ---
    
    apiVersion: v1
    kind: Service
    metadata:
      name: spark-master-service
      namespace: big-data
    spec:
      type: NodePort
      ports:
        - port: 7077
          targetPort: 7077
          protocol: TCP
          name: master
      selector:
        app: spark-master
    
    ---
    
    
    apiVersion: v1
    kind: Service
    metadata:
      name: spark-webui-service
      namespace: big-data
    spec:
      ports:
        - port: 8080
          targetPort: 8080
          protocol: TCP
          name: ui
      selector:
        app: spark-master
      type: NodePort
    
    
    ---
    
    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata:
      name: spark-webui-ingress
      namespace: big-data
    spec:
      rules:
      - host: spark-webui.data.com
        http:
          paths:
          - backend:
              serviceName: spark-webui-service
              servicePort: 8080
            path: /
    
    ---
    
    
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:  
      name: spark-worker
      namespace: big-data
      labels:
        app: spark-worker
    spec:
      replicas: 1
      template:
        metadata:
          labels:
            app: spark-worker
        spec:
          containers:
          - name: spark-worker
            image: bde2020/spark-worker:2.3.1-hadoop2.7
            imagePullPolicy: IfNotPresent
            env:
            - name: SPARK_MASTER
              value: spark://spark-master-service:7077
            - name: ENABLE_INIT_DAEMON
              value: "false"
            - name: SPARK_WORKER_WEBUI_PORT
              value: "8081"
            ports:
            - containerPort: 8081
    
    ---
    
    apiVersion: v1
    kind: Service
    metadata:
      name: spark-worker-service
      namespace: big-data
    spec:
      type: NodePort
      ports:
        - port: 8081
          targetPort: 8081
          protocol: TCP
          name: worker
      selector:
        app: spark-worker
    
    ---
    apiVersion: extensions/v1beta1
    kind: Ingress
    metadata:
      name: spark-worker-ingress
      namespace: big-data
    spec:
      rules:
      - host: spark-worker.data.com
        http:
          paths:
          - backend:
              serviceName: spark-worker-service
              servicePort: 8081
            path: /

    部署&&运行

    • 部署
    kubectl apply -f deploy-k8s.yaml
    • 效果

      使用ingress 访问,访问域名 spark-webui.data.com


    说明

    • 命名的问题
    平时的习惯是deploy service 命名为一样的,但是就是这个就有问题的,因为k8s 默认会进行环境变量的注入,所以居然冲突的。
    解决方法,修改名称,重新发布
    具体问题:
    dockerfile 中的以下环境变量
    ENV SPARK_MASTER_PORT 7077
    • spark 任务运行
    具体的运行可以参考官方demo,后期也会添加

    参考资料

    https://github.com/rongfengliang/spark-k8s-deploy
    https://github.com/big-data-europe/docker-spark

  • 相关阅读:
    寒假学习进度15
    寒假学习进度14
    寒假学习进度13
    Markdown使用笔记
    MVC
    阅读笔记大型网站技术架构01
    周总结1大数据采集技术与应用(徳拓)五次实验总结
    阅读笔记架构漫谈03
    质量属性易用性分析
    阅读笔记架构漫谈02
  • 原文地址:https://www.cnblogs.com/rongfengliang/p/9560329.html
Copyright © 2020-2023  润新知