手动安装mvn大于3.3.3版本
下载解压,修改~/.bash_rc
export MAVEN_HOME=/usr/local/apache-maven-3.3.9
export PATH=$MAVEN_HOME/bin:$PATH
安装jdk1.8.0
安装scala2.10.6
#JAVA VARIABLES START
#set java environment
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_66
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
#JAVA VARIABLES END
#SCALA VARIABLES START
export SCALA_HOME=/usr/local/scala-2.10.6
export PATH=$PATH:$SCALA_HOME/bin
#SCALA VARIABLES END
删除/usr/lib/jvm/jdk1.8.0_66/jre/lib/ext下,所有._开头的文件
git clone https://github.com/cloudera/spark
cd spark
git checkout cdh5-1.5.0_5.5.1
git branch
在CDH的spark中,要想集成 hive-thriftserver 进行编译,需要修改 pom.xml 文件,添加一行 sql/hive-thriftserver:
<modules>
<module>core</module>
<module>bagel</module>
<module>graphx</module>
<module>mllib</module>
<module>tools</module>
<module>streaming</module>
<module>sql/catalyst</module>
<module>sql/core</module>
<module>sql/hive</module>
<module>sql/hive-thriftserver</module> <!--添加的一行-->
<module>repl</module>
<module>assembly</module>
<module>external/twitter</module>
<module>external/kafka</module>
<module>external/flume</module>
<module>external/flume-sink</module>
<module>external/zeromq</module>
<module>external/mqtt</module>
<module>examples</module>
</modules>
使用maven编译
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Dhadoop.version=2.6.0-cdh5.5.1 -Phive -DskipTests clean package
刚编译好的在assembly-target-scala2.10.6目录
运行测试用例:
mvn -Pyarn -Dhadoop.version=2.6.0-cdh5.5.1 -Phive test
所有节点替换CDH自带的包
cd /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars
mv spark-assembly-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar spark-assembly-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar.bak
hadoop fs -get /user/spark/spark-assembly-1.5.0-cdh5.5.1-hadoop2.6.0-cdh5.5.1.jar .
然后将spark-sql添加到spark-shell同一目录,按CDH的样子设置软链接,就可以直接使用spark-sql命令