1. Scala的安装
注意点:版本匹配的问题,
Spark 1.6.2 -- Scala2.10
Spark 2.0.0 -- Scala2.11
https://www.scala-lang.org/files/archive $ wget https://www.scala-lang.org/files/archive/scala-2.11.6.tgz $ tar -zxvf ./scala-2.11.6.tgz $ mv ./scala-2.11.6.tgz /usr/local/scala 增加全局变量 $ vim ./.bashrc export SCALA_HOME=/usr/local/scala export PATH=$PATH:$SCALA_HOME/bin $ source ~/.bashrc
2. Spark的安装
http://spark.apache.org/downloads.html $ wget https://archive.apache.org/dist/spark/spark-2.0.2/spark-2.0.2-bin-hadoop2.6.tgz $ tar -zxvf spark-2.0.2-bin-hadoop2.6.tgz $ mv ./spark-2.0.2-bin-hadoop2.6 /usr/local/spark 增加全局变量 $ vim ./.bashrc export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/bin $ source ~/.bashrc
输入pyspark 显示:
至此安装成功。
3.本地运行pyspark
# 本地运行命令 pyspark --master local[4] # 本地启动,使用4个线程 # 查看当前运行模式 sc.master # 读取本地文件 textFile = sc.textFile("file:/usr/local/spark/README.md") textFile.count() # 读取HDFS文件 textFile = sc.textFile("hdfs://master:9000/user/hadoop/result.csv") textFile.count()
4.在Hadoop YARN上运行pyspark
$ vim ./.bashrc export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop # 启动 $ pyspark --master yarn --deploy-mode client # 读取hdfs文件 textFile = sc.textFile("hdfs://master:9000/user/hadoop/result.csv") textFile.count()
5.构建Spark Standalone Cluster运行环境
$ cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh $ vim /usr/local/spark/conf/spark-env.sh export SPARK_MASTER_IP=master export SPARK_WORKER_CORES=1 # 每个worker所使用的cpu核心数 export SPARK_WORKER_MEMORY=512m # 每个worker所使用的内存 export SPARK_WORKER_INSTANCES=4 # 实例数 # 将spark复制到data1,data2,data3 $ ssh data1 $ mkdir /usr/local/spark $ logout $ scp -r /usr/local/spark root@data1:/usr/local # scp -r [本地文件] [远程用户名称]@[远程主机名]:[远程目录] # -r 递归复制整个目录 # 编辑slaves文件 $ vim /usr/local/spark/conf/slaves data1 data2 data3
6.在Spark Standalone 运行pyspark
# 启动 $ /usr/local/spark/sbin/start-all.sh $ pyspark --master spark://master:7077 # 停止 $ /usr/local/spark/sbin/stop-all.sh
7.Spark Web UI界面
http://master:8080/
http://master:4040/ # 查看Spark Jobs