• kerberos系列之spark认证配置


    大数据安全系列的其它文章

    https://www.cnblogs.com/bainianminguo/p/12548076.html-----------安装kerberos

    https://www.cnblogs.com/bainianminguo/p/12548334.html-----------hadoop的kerberos认证

    https://www.cnblogs.com/bainianminguo/p/12548175.html-----------zookeeper的kerberos认证

    https://www.cnblogs.com/bainianminguo/p/12584732.html-----------hive的kerberos认证

    https://www.cnblogs.com/bainianminguo/p/12584880.html-----------es的search-guard认证

    https://www.cnblogs.com/bainianminguo/p/12639821.html-----------flink的kerberos认证

    https://www.cnblogs.com/bainianminguo/p/12639887.html-----------spark的kerberos认证

    今天的博客介绍大数据安全系列之spark的kerberos配置

    一、spark安装

    1、解压和重命名安装目录

     364  tar -zxvf spark-2.4.0-bin-hadoop2.7.tgz -C /usr/local/
      365  cd /usr/local/
      366  ll
      367  mv spark-2.4.0-bin-hadoop2.7/ spark
    

      

    2、设置spark的环境变量

    export SPARK_HOME=/usr/local/spark
    export PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin
    

      

    3、修改spark的env文件

    [root@cluster2-host1 conf]# vim spark-env.sh
    

      

    export JAVA_HOME=/usr/local/java   #Java环境变量
    export SCALA_HOME=/usr/local/scala #SCALA环境变量
    export SPARK_WORKING_MEMORY=1g  #每一个worker节点上可用的最大内存
    export SPARK_MASTER_IP=cluster1-host1   #驱动器节点IP
    export HADOOP_HOME=/usr/local/hadoop  #Hadoop路径
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop #Hadoop配置目录
    

      

    4、修改spark的conf文件

    [root@cluster2-host1 conf]# cp spark-defaults.conf.template spark-defaults.conf
    [root@cluster2-host1 conf]# pwd
    /usr/local/spark/conf
    

      

    spark.yarn.jars=hdfs://cluster1-host1:9000/spark_jars/*
    

      

    5、修改slaver文件

    [root@cluster2-host1 conf]# cp slaves.template slaves
    cluster2-host2
    cluster2-host3
    

      

    6、创建sparkhdfs上的jar包路径

    [root@cluster2-host1 conf]# hadoop fs -mkdir /spark_jars
    [root@cluster2-host1 conf]# hadoop dfs -ls /
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.
    
    Found 1 items
    drwxr-xr-x   - root supergroup          0 2020-03-02 04:30 /spark_jars
    

      

    7、分发安装包到其它节点

    8、启动spark

    Cd /usr/local/spark/sbin
    [root@cluster2-host1 sbin]# ./start-all.sh
    

      

    检查进程

    [root@cluster2-host1 sbin]# jps
    25922 ResourceManager
    31875 Master
    6101 Jps
    26152 NodeManager
    22924 NameNode
    23182 DataNode
    

      

    [root@cluster2-host2 conf]# jps
    22595 SecondaryNameNode
    29043 Jps
    22268 DataNode
    24462 NodeManager
    27662 Worker
    

      

    [root@cluster2-host3 ~]# jps
    25025 NodeManager
    28404 Worker
    12537 Jps
    22910 DataNode
    [root@cluster2-host3 ~]# 
    

      

    9、浏览器访问页面

    http://10.87.18.34:8080/
    

      

    二、配置spark的kerberos配置

    spark的kerberos不需要配置,只需要保证hdfs的kerberos配置正确即可

    保证使用hdfs的用户已经验证,且本地有缓存,或者指定keytab文件也可以

     

    [root@cluster2-host1 bin]# klist
    Ticket cache: FILE:/tmp/krb5cc_0
    Default principal: hdfs/cluster2-host1@HADOOP.COM
    
    Valid starting       Expires              Service principal
    03/03/2020 08:06:49  03/04/2020 08:06:49  krbtgt/HADOOP.COM@HADOOP.COM
    	renew until 03/10/2020 09:06:49
    

     

      

    进行如下的验证,能访问hdfs的数据即可

    ./spark-shell

    scala> var file = "/input/test.txt"
    file: String = /input/test.txt
    
                                                           ^
    
    scala> spark.read.textFile(file).flatMap(_.split(" ")).collect
    res1: Array[String] = Array(adfaljal, fjalfjalf, falfja, lfajsa, 23fdjalfja, abc, dda, haoop, cluster, cluster)
    

      

     

  • 相关阅读:
    MySQL JDBC驱动 01 Class.forName
    Sybase性能调试 Statistics
    MySQL InnoDB存储引擎 MySQL介绍
    Sybase性能调试 dbcc trace
    ASP.NET页面的生命周期
    注册JavaScript?
    泛型
    静态类和静态类成员
    构造函数
    MYSQL常用操作
  • 原文地址:https://www.cnblogs.com/bainianminguo/p/12639887.html
Copyright © 2020-2023  润新知