下载相关软件包
解压
修改配置信息
mv slaves.template slaves
mv spark-env.sh.template spark-end.sh
spark-env.sh
# 这个是我在启动work时,启动不起来,发现相同的启动脚本,但是work找不到hadoop命令,所以我这里首先生效了一下配置我呢见
source /etc/profile
# JDK 路径
export JAVA_HOME=/opt/software/jdk
# scala 路径
export SCALA_HOME=/opt/software/scala
# hadoop 路径
export HADOOP_HOME=/opt/software/hadoop
# hadoo 配置路径
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
# 使用 hadoop 的 classpath
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
# master 节点
export SPARK_MASTER_IP=192.168.200.128
export SPARK_MASTER_HOST=192.168.200.128
export SPARK_MASTER_PORT=7077
# spark-shell 启动
export SPARK_LOCAL_IP=192.168.200.128
# 启动spark时使用的内核数量
export SPARK_WORKER_CORE=1
# 工作节点的内存
export SPARK_WORKER_MEMORY=512m
export PYSPARK_PYTHON=/usr/bin/python
slaves
192.168.200.128
启动
启动 spark-shell
./spark-shell
启动集群
./sbin/start-all.sh
启动时可能出现的问题:
java.lang.ClassNotFoundException: parquet.hadoop.ParquetOutputCommitter
使用 maven 下载相关的jar包,然后将其放在 spark classpath 的加载路径
<dependency>
<groupId>com.twitter</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>1.4.3</version>
</dependency>
java.lang.NoClassDefFoundError: com/fasterxml/jackson/databind/Module
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.4.4</version>
</dependency>