1. 安装VM14的方法在 人工智能标签中的《跨平台踩的大坑有提到》
2. CentOS分区设置:
/boot:1024M,标准分区格式创建。
swap:4096M,标准分区格式创建。
/:剩余所有空间,采用lvm卷组格式创建
其他按需要设置就好, 配置好后使用 vi /etc/sysconfig/network-scripts/ifcfg-eno16777736 设置网络连接;
HWADDR=00:0C:29:B3:AE:0E TYPE=Ethernet BOOTPROTO=static DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no NAME=eno16777736 UUID=2cb8e76d-0626-4f8e-87e5-7e0743e4555f ONBOOT=yes IPADDR=192.168.10.186 NETMASK=255.255.255.0 GATEWAY=192.168.10.1 DNS1=192.168.10.1
网关与IP与Windows主机匹配, 不能随便乱配!
ipconfig /all 按照主机网络配置虚拟机
虚拟机Ping主机, 如果一直卡住, 则修改防火墙入站规则, 启用文件与打印共享中的PV4 IN 公用;
service network restart命令重启网卡,生效刚刚修改ip地址,ping www.baidu.com测试网络连通性。
配置细则参考: http://www.cnblogs.com/wcwen1990/p/7630545.html
修改主机映射: vi /etc/hostname pc.apache vi /etc/hosts 192.168.1.186 pc.apache 192.168.1.187 pc.apache2 ... reboot #即可生效
配置免秘钥登入: 很多公司都修改了ssh端口,使用 vi ~/.ssh/config 修改, 注意: config的最大使用权限600, chmod -R 600 config。 其配置为: Host * Port 你的端口 0.安装lrzsz工具: yum install -y lrzsz 1.首先以root用户身份,修改:vim /etc/ssh/sshd_config StrictModes no RSAAuthentication yes PubkeyAuthentication yes AuthorizedKeysFile .ssh/authorized_keys 2.创建用户hadoop: useradd hadoop passwd hadoop 3. 切换到hadoop用户: su - hadoop 4. 三台机器都生成证书: ssh-keygen -t rsa 5 每台机器的证书,通过如下命令导入到一个相同的文件。这样,authorized_keys文件中追加了三台机器各自生成的证书: cat id_rsa.pub >> authorized_keys 6.将包含三台机器证书的文件authorized_keys分发到三台机器的~/.ssh/authorized_keys目录下: rz上传,sz下载 7 然后把三台机器 .ssh/ 文件夹权限改为700,authorized_keys文件权限改为644: chmod 700 ~/.ssh chmod 644 ~/.ssh/authorized_keys
3. 安装jdk
将jdk解压到software目录后, 添加环境变量;
vi /etc/profile export JAVA_HOME=/opt/software/jdk1.8.0_191 export PATH=$JAVA_HOME/bin:$PATH
source /etc/profile
java安装好后, 可使用jps。
zookeeper安装: 进入目录 mkdir zkData cp conf/zoo_sample.cfg conf/zoo.cfg $vim conf/zoo.cfg dataDir=/data/software/zookeeper-3.4.5/zkData server.1=pc1.hadoop:2888:3888 server.2=pc2.hadoop:2888:3888 server.3=pc3.hadoop:2888:3888 #在三台机器上同样敲入 #pc1 : $vim zkData/myid 1 1 对应 server.1
配置好后, 将java, zookeepr, hadoop 全部复制到其他节点, 并修改各节点环境变量。
配置ZOOKEEPER + HADOOP HA:
Zookeeper + Hadoop HA: rm -rf /data/software/zookeeper-3.4.5/zkData/* vim /data/software/zookeeper-3.4.5/zkData/myid rm -rf /data/dataware/* zkServer.sh start zkServer.sh status zkServer.sh stop 先清空其他节点的配置: rm -rf /data/software/hadoop-2.7.3/etc/* scp -r hadoop/ hadoop@app-003:/data/software/hadoop-2.7.3/etc/ 第一次初始化: 先在各节点启动: hadoop-daemon.sh start journalnode 在主节点启动: hdfs namenode -format 将core-site.xml 中hadoop.tmp.dir的目录, 在主节点复制到第二个namenode节点:scp -r /data/dataware/hadoop/tmp/ hadoop@app-002:/data/dataware/hadoop/ 在主节点启动: hdfs zkfc -formatZK 主节点启动: start-dfs.sh #主节点启动: yarn-daemon.sh start resourcemanager 主节点启动: start-yarn.sh hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 1 3 第二次及以后启动hadoop时, 可直接启动dfs不需要再先启动 journalnode, 但是这就遇到一个问题, journalnode启动需要时间,只有其稳定后, namenode才能稳定。 否则会出现一直连不到app:8485的情况。 解决: ipc.client.connect.max.retrie 20 ; ipc.client.connect.retry.interval 5000 主节点启动: stop-yarn.sh stop-dfs.sh 各节点启动: zkServer.sh stop
4. 安装hadoop单机版(集群可忽略此处)以及设置防火墙与linxu安全模式
先将hadoop添加进环境变量, $HADOOP_HOME/bin
#关闭防火墙 service iptables stop #关闭防火墙开机启动 chkconfig iptable s off #关闭linux安全模式 /etc/sysconfig/selinux #关闭centos7防火墙 systemctl stop firewalld.service # 关闭firewall systemctl disable firewalld.service # 禁止firewall开机启动 报错: The authenticity of host 'XXXX' can't be established错误解决 vim /etc/ssh/ssh_config 最后面添加 StrictHostKeyChecking no UserKnownHostsFile /dev/null 无法连接MKS, 在任务管理器中打开所有VM服务进程 bin/hdfs namenode -format sbin/start-dfs.sh sbin/start-yarn.sh pc.apache:50070 pc.apache:8088 hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 1 3 关闭方式: 除了使用 stop命令外, 实在没办法了可以使用: killall java 相关配置如下: vim hadoop-env.sh export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/opt/software/jdk1.8.0_191 vim core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://pc.apache:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/software/hadoop-2.7.3/data</value> </property> </configuration> vim hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/opt/software/hadoop-2.7.3/data/name</value> </property> <property> <name>dfs.webhdfs.enable</name> <value>true</value> </property> <property> <name>dfs.permissions.enable</name> <value>false</value> </property> </configuration> vim mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--指定jobhistory服务的主机及RPC端口号--> <property> <name>mapreduce.jobhistory.address</name> <!--配置实际的主机名和端口--> <value>pc.apache:10020</value> </property> <!--指定jobhistory服务的web访问的主机及RPC端口号--> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>pc.apache:19888</value> </property> </configuration> vim slaves pc.apache vim yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定ResorceManager所在服务器的主机名--> <property> <name>yarn.resourcemanager.hostname</name> <value>pc.apache</value> </property> <!--启用日志聚合功能--> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!--日志保存时间--> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>86400</value> </property> </configuration>
5. 安装hive
完全卸载mysql: yum remove mysql-community mysql-community-server mysql-community-libs mysql-community-common -y yum -y remove mysql57-community-release-el7-10.noarch rpm -qa |grep -i mysql rpm -ev MySQL-client-5.5.60-1.el7.x86_64 --nodeps find / -name mysql 全部删除 rpm -qa|grep mysql rpm -ev mysql57-community-release-el7-8.noarch
安装MySQL: wget http://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm yum localinstall mysql57-community-release-el7-8.noarch.rpm yum repolist enabled | grep "mysql.*-community.*" yum install mysql-community-server systemctl start mysqld systemctl status mysqld systemctl enable mysqld systemctl daemon-reload 首次安装,获取临时密码: grep 'temporary password' /var/log/mysqld.log mysql -uroot -p #即可登入 但是,在登入之前最好设置密码规则, 便于修改 vim /etc/my.cnf validate_password = off systemctl restart mysqld 重启服务 ALTER USER 'root'@'localhost' IDENTIFIED BY '111111'; #修改密码 grant all privileges on *.* to 'root'@'%' identified by '111111'; #给其他用户权限 flush privileges; 在/etc/my.cnf 配置编码 [mysqld] character_set_server=utf8 init_connect='SET NAMES utf8'
create user 'hive'@'localhost' identified by 'hive'; create database hive; alter database hive character set latin1; grant all on hive.* to hive@'%' identified by 'hive'; grant all on hive.* to hive@'localhost' identified by 'hive'; grant all on metastore.* to hive@'localhost' identified by 'hive'; grant all on metastore.* to hive@'%' identified by 'hive'; show grants for hive@'localhost'; flush privileges; 如果重装HIVE: 删除Hive在MySQL的元数据, 如下。 drop database metastore; select * from metastore.SDS; select * from metastore.DBS; delete from `metastore`.`TABLE_PARAMS` drop table `metastore`.`TABLE_PARAMS` delete from `metastore`.`TBLS` drop table `metastore`.`TBLS` delete from metastore.SDS delete from metastore.DBS drop table metastore.SDS drop table metastore.DBS
下载并解压好hive项目 cp hive-default.xml.template hive-site.xml cp hive-env.sh.template hive-env.sh cp hive-log4j2.properties.template hive-log4j2.properties vim hive-site.xml <configuration> <property> <name>hive.cli.print.header</name> <value>true</value> <description>Whether to print the names of the columns in query output.</description> </property> <property> <name>hive.cli.print.current.db</name> <value>true</value> <description>Whether to include the current database in the Hive prompt.</description> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://pc1.hadoop:3306/metastore?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> <property> <name>hive.server2.long.polling.timeout</name> <value>5000</value> </property> <property> <name>hive.server2.thrift.port</name> <value>10001</value> <!-- 因为Spark-sql服务端口是10000, 避免冲突,这里改成10001--> </property> <property> <name>hive.server2.thrift.bind.host</name> <value>pc1.hadoop</value> </property> </configuration> vim hive-env.sh export JAVA_HOME=/opt/software/jdk1.8.0_191 export HADOOP_HOME=/opt/software/hadoop-2.7.3/etc/hadoop export HIVE_CONF_DIR=/opt/software/hive-2.1.1/conf vim hive-log4j2.properties hive.log.dir=/ # 配置一下目录地址
在Maven仓库下载一个mysql-connector-java-5.1.22-bin.jar , 放入hive目录下的 lib文件夹中
在Hive目录下, 初始化数据库。 报错: Host is not allowed to connect to this MySQL server use mysql; update user set host = '%' where user = 'root'; FLUSH PRIVILEGES; schematool -dbType mysql -initSchema
将HIVE_HOME添加到环境变量, 执行hive后, 报错: Relative path in absolute URI
在hive-site.xml中 找到所有${system:java.io.tmpdir} 替换成 ./hive/logs/iotemp
再次运行hive
启动hive服务
hive --service metastore &
hive --service hiveserver2 &
第一次启动spark: hadoop fs -put /data/software/spark-2.1.1/jars/* /user/spark/libs/ start-master.sh start-slaves.sh
6. 安装spark
第一步 下载一个scala: https://www.scala-lang.org/download/2.11.8.html #将scala添加进环境变量即可; 第二步 解压spark-2.2.0-bin-hadoop2.7.tgz, 并将spark添加环境变量; vim spark-env.sh export JAVA_HOME=/opt/software/jdk1.8.0_191 export HADOOP_CONF_DIR=/opt/software/hadoop-2.7.3 export HIVE_CONF_DIR=/opt/software/hive-2.1.1 export SCALA_HOME=/opt/software/scala-2.11.8 export SPARK_WORK_MEMORY=1g export MASTER=spark://pc.apache:7077 由于Spark-SQL需要用到Hive数据源, 因此需要修改Hive中的hive-site.xml <property> <name>hive.metastore.uris</name> <value>thrift://pc.apache:9083</value> </property> 修改好后, 将其复制到spark/conf 目录下 cp hive-site.xml /opt/software/spark-2.2.0-bin-hadoop2.7/conf/ 复制依赖的jars: cp $HIVE_HOME/lib/hive-hbase-handler-2.1.1.jar $SPARK_HOME/jars/ mkdir $SPARK_HOME/lib cp $HIVE_HOME/lib/mysql-connector-java-5.1.34.jar $SPARK_HOME/lib/ cp $HIVE_HOME/lib/metrics-core-2.2.0.jar $SPARK_HOME/lib cp $HBASE_HOME/lib/guava-12.0.1.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/hbase-common-1.2.5-tests.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/hbase-client-1.2.5.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/hbase-protocol-1.2.5.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/hbase-common-1.2.5.jar $SPARK_HOME/lib/ cp $HBASE_HOME/lib/hbase-server-1.2.5.jar $SPARK_HOME/lib/ 将上述环境变量添加进 vim $SPARK_HOME/conf/spark-env.sh export SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/guava-12.0.1.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/hbase-client-1.2.5.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/hbase-common-1.2.5.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/hbase-common-1.2.5-tests.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/hbase-protocol-1.2.5.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/hbase-server-1.2.5.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/htrace-core-3.1.0-incubating.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/mysql-connector-java-5.1.34.jar export SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_HOME/lib/metrics-core-2.2.0.jar 启动hive数据源, 测试spark-sql nohup hive --service metastore >/opt/software/metastore.log 2>&1 &
启动spark /opt/software/spark-2.2.0-bin-hadoop2.7/sbin/start-master.sh /opt/software/spark-2.2.0-bin-hadoop2.7/sbin/start-slaves.sh web: http://192.168.1.186:8080/
启动spark-sql时报错: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
将hive.site.xml文件中的
<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
</property>
改为false即可
但是spark-sql服务却起不来, 执行$SPARK_HOME/sbin/start-thriftserver.sh 时报错: Could not create ServerSocket on address pc.apache/192.168.1.186:10001
使用 jps -ml查看java进程详情;
原因是 HIVE的服务与SPARK-SQL服务只能起一个? 我把HIVE服务注释掉就可以了, 但是这样是不妥的, 可能要把HIVE服务设置成什么其他的端口避免10001? 有待进一步测试!
最后关于Spark安装的,将介绍Spark on yarn
spark-shell --master yarn-client 启动后会发现报错: Error initializing SparkContext vim yarn-site.xml <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> <description>Whether virtual memory limits will be enforced for containers</description> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>4</value> <description>Ratio between virtual memory to physical memory when setting memory limits for containers</description> </property>
再次启动 spark-shell --master yarn-client 即可;
总结安装流程:
node1 node2 node3 nn1 nn2 dn3 dn1 dn2 nm3 rm1 rm2 zk3 nm1 nm2 mysql zk1 zk2 hivestat hivserv hivemeta 主节点启动: start-dfs.sh #主节点启动: yarn-daemon.sh start resourcemanager 主节点启动: start-yarn.sh stop-yarn.sh stop-dfs.sh hive --service metastore > /home/hadoop/hive.meta & hive --service hiveserver2 > /home/hadoop/hive.log & #hadoop fs -mkdir -p /user/spark/libs/ #hadoop fs -put /data/software/spark-2.1.1/jars/* /user/spark/libs/ hadoop fs -mkdir -p /tmp/spark/logs/ start-master.sh start-slaves.sh zkCli.sh rm -rf /data/software/spark-2.1.1/conf/ scp -r /data/software/spark-2.1.1/conf/ hadoop@app-002:/data/software/spark-2.1.1/ Yarn运行日志: /tmp/logs/hadoop/logs 提交任务做测试: hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 1 3 spark-shell --master yarn --deploy-mode client spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 1000m --executor-memory 1000m --executor-cores 1 /data/software/spark-2.1.1/examples/jars/spark-examples_2.11-2.1.1.jar 3
一些debug方法:
debug: nohup java -jar sent-mail.jar > log.txt & 查看端口是否占用:netstat -ntulp |grep 8020 查看一个服务有多少端口:ps -ef |grep mysqld rm -rf /data/software/hadoop-2.7.3/logs/* 单独启动各个组件, 查看bug产生原因。 hadoop-daemon.sh start namenode hadoop-daemon.sh start datanode jps -ml kill -9
7. 安装Zookeeper + HBase
如果集群中有Zookeeper集群, 使用集群中的比较好, 如果是单机测试, 用HBase自带的Zookeeper就好;
vim hbase-env.sh export JAVA_HOME=/opt/software/jdk1.8.0_191 export HBASE_MANAGES_ZK=true export HADOOP_HOME=/opt/software/hadoop-2.7.3 export HBASE_CLASSPATH=/opt/software/hadoop-2.7.3/etc/hadoop export HBASE_PID_DIR=/opt/software/hbase-1.2.5/pids vim hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://pc.apache:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>pc.apache</value>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://pc.apache:60000</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/opt/software/hbase-1.2.5/tmp</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/opt/software/hbase-1.2.5/zooData</value>
</property>
vim regionservers pc.apache
/opt/software/hbase-1.2.5/bin/start-hbase.sh
status
http://192.168.1.186:16010/
Hbase window最简单安装版本 下载:hbase-1.2.3。(http://apache.fayea.com/hbase/stable/) 必须: Java_Home, Hadoop_Home 1. 修改 hbase-1.0.2confhbase-env.cmd 文件 set JAVA_HOME=C:Program FilesJavajdk1.8.0_05 set HBASE_MANAGES_ZK=flase 2. 修改 hbase-1.0.2confhbase-env.sh 文件 export HBASE_MANAGES_ZK=false 3. 修改hbase-1.0.2confhbase-site.xml 文件 路径自己改为自己的实际路径 <configuration> <property> <name>hbase.rootdir</name> <value>file:///E:/software/hbase-1.4.10/root</value> </property> <property> <name>hbase.tmp.dir</name> <value>E:/software/hbase-1.4.10/tmp</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>127.0.0.1</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>E:/software/hbase-1.4.10/zoo</value> </property> <property> <name>hbase.cluster.distributed</name> <value>false</value> </property> </configuration> 4.进入到bin目录 点击 start-hbase.cmd 在该目录下执行命令窗口 : hbase shell create 'test','cf' scan 'test'