Hadoop在ubuntu16桌面版上的集群安装
参考文档1:http://blog.csdn.net/mark_lq/article/details/53384358
参考文档2:http://blog.csdn.net/quiet_girl/article/details/74352190
说明:由于我的ubuntu系统中已经有tensorflow、python2.7、syntaxnet等,并不是全新的系统,安装hadoop集群也是在此之上。安装中也是各种坑,最大的一个坑是ssh免密码登录,其次是安装前要新建一个hadoop账号。我是按照参考文档1进行安装,启动出问题后又想到了参考文档2。在此不说人家的文档写的不好什么的,因为每个人的系统环境不同,你安装启动不了就说人家的文档没写好。言归正传,开始:
本来是写在最后的结束语,但想想还是写在开关:
三个系统建议安装好一个后,把下面的步骤全走完再克隆,要干的事情是修改/etc/hostname,/etc/hosts,以及在master上执行以下命令:
sudo scp ./authorized_keys hadoop@slaver1:~/.ssh
sudo scp ./authorized_keys hadoop@slaver2:~/.ssh
最后是在master上格式化和启动。
1. 系统环境
Ubuntu 16.04
vmware 12.5.2 build-4638234
hadoop 2.7.4
java 1.8.0_131
master:192.168.93.140
slaver1:192.168.93.141
slaver2:192.168.93.142
2. 部署步骤
2.1 Basic Requirements
1.、添加 hadoop 用户,并添加到 sudoers
sudo adduser hadoop
说明:sudoers文件保存时要用wq!
sudo vim /etc/sudoers
添加如下:
# User privilege specification
rootALL
=(
ALL:
ALL)
ALL
hadoopALL
=(
ALL:
ALL)
ALL
2、切换到 hadoop 用户:
说明:在桌面版下切换用户按右上角的设置图标进行切换,不要用su hadoop
su hadoop
3、修改 /etc/hostname 主机名为 master
sudo vim /etc/hostname
4、修改 /etc/hosts
127.0.0.1 localhost
127.0.1.1localhost
.localdomainlocalhost
# The following lines are desirable for IPv6 capable hosts
::
1localhost ip6-localhost ip6-loopback
ff02::
1ip6-allnodes
ff02::
2ip6-allrouters
# hadoop nodes
192.168.93.140 master
192.168.93.141 slave1
192.168.93.142 slave2
说明:jdk我用的是openjdk,默认安装在/usr/lib/jvm/java-1.8.0-openjdk-amd64
5、安装配置 java 环境
下载 jdk1.8 解压到 /usr/local 目录下(为了保证所有用户都能使用),修改 /etc/profile,并生效:
# set jdk classpath
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export JRE_HOME=$JAVA_HOME/jre
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
export CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
source /etc/profile
验证 jdk 是否安装配置成功
hadoop@master:~$ jjava -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
1
6、安装 openssh-server
sudo apt-get install ssh
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
7、对于 slave1 和 slave2 可采用虚拟机 clone 的方法实现复制,复制主机后注意修改 /etc/hostname 为 slave1 和 slave2
8、配置 master 节点可通过 SSH 无密码访问 slave1 和 slave2 节点
将生成的 authorized_keys 文件复制到 slave1 和 slave2 的 .ssh目录下
sudo scp ./authorized_keys hadoop@slaver1:~/.ssh
sudo scp ./authorized_keys hadoop@slaver2:~/.ssh
测试 master 节点无密码访问 slave1 和 slave2 节点:
ssh slaver1
ssh slaver2
输出:
hadoop@master:/usr/lib/jvm/java-1.8.0-openjdk-amd64$ ssh slaver1
Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-35-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
50 packages can be updated.
0 updates are security updates.
Last login: Tue Sep 19 01:20:08 2017 from 192.168.93.140
hadoop@slaver1:~$
2.2 Hadoop 2.7 Cluster Setup
说明:先要在hadoop用户下建文件夹software,cd到文件夹下去下载hadoop-2.7.4.tar.gz
,命令如下:
wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz
1、hadoop 用户目录下解压下载的 hadoop-2.7.4.tar.gz
·
hadoop@master:~/software$ ll
·
total 260460
·
drwxrwxr-x 4 hadoop hadoop 4096 Sep 19 03:43 ./
·
drwxr-xr-x 19 hadoop hadoop 4096 Sep 19 02:37 ../
·
drwxrwxr-x 3 hadoop hadoop 4096 Sep 19 03:43 hadoop-2.7.0/
·
drwxr-xr-x 11 hadoop hadoop 4096 Sep 19 02:35 hadoop-2.7.4/
· -rw-rw-r-- 1 hadoop hadoop 266688029 Aug 6 01:15 hadoop-2.7.4.tar.gz
2、配置 hadoop 的环境变量
sudo vim /etc/profile
配置如下:
# set hadoop classpath
exportHADOOP_HOME=/home/hadoop/software/hadoop-
2.7.
4
exportHADOOP_MAPRED_HOME=
$HADOOP_HOME
exportHADOOP_COMMON_HOME=
$HADOOP_HOME
exportHADOOP_HDFS_HOME=
$HADOOP_HOME
exportYARN_HOME=
$HADOOP_HOME
exportJAVA_LIBRARY_PATH=
$HADOOP_HOME/lib/native
exportHADOOP_CONF_DIR=
$HADOOP_HOME/etc/hadoop
exportYARN_CONF_DIR=
$HADOOP_HOME/etc/hadoop
exportHADOOP_PREFIX=
$HADOOP_HOME
exportCLASSPATH=
$CLASSPATH:.:
$HADOOP_HOME/bin
生效文件:source /etc/profile
3、配置 hadoop 的配置文件,主要配置 core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml 文件。
- core-site.xml
<configuration>
<property>
<name>
fs.defaultFS
</name>
<!-- master: /etc/hosts 配置的域名 master -->
<value>
hdfs://master:9000/
</value>
</property>
</configuration>
- hdfs-site.xml
<configuration>
<property>
<name>
dfs.namenode.name.dir
</name>
<value>
/home/hadoop/software/hadoop-2.7.0/dfs/namenode
</value>
</property>
<property>
<name>
dfs.datanode.data.dir
</name>
<value>
/home/hadoop/software/hadoop-2.7.0/dfs/datanode
</value>
</property>
<property>
<name>
dfs.replication
</name>
<value>
1
</value>
</property>
<property>
<name>
dfs.namenode.secondary.http-address
</name>
<value>
master:9001
</value>
</property>
</configuration>
说明:mapred-site.xml文件在etc/hadoop下找不到,解决办法是
sudo cp mapred-site.xml.template mapred-site.xml
- mapred-site.xml
<configuration>
<property>
<name>
mapreduce.framework.name
</name>
<value>
yarn
</value>
</property>
<property>
<name>
mapreduce.jobhistory.address
</name>
<value>
master:10020
</value>
</property>
<property>
<name>
mapreduce.jobhistory.webapp.address
</name>
<value>
master:19888
</value>
</property>
</configuration>
- yarn-site.xml
<configuration>
<property>
<name>
yarn.nodemanager.aux-services
</name>
<value>
mapreduce_shuffle
</value>
</property>
<property>
<name>
yarn.nodemanager.aux-services.mapreduce.shuffle.class
</name>
<value>
org.apache.hadoop.mapred.ShuffleHandler
</value>
</property>
<property>
<name>
yarn.resourcemanager.address
</name>
<value>
master:8032
</value>
</property>
<property>
<name>
yarn.resourcemanager.scheduler.address
</name>
<value>
master:8030
</value>
</property>
<property>
<name>
yarn.resourcemanager.resource-tracker.address
</name>
<value>
master:8031
</value>
</property>
<property>
<name>
yarn.resourcemanager.admin.address
</name>
<value>
master:8033
</value>
</property>
<property>
<name>
yarn.resourcemanager.webapp.address
</name>
<value>
master:8088
</value>
</property>
</configuration>
4、修改env环境变量文件,为 hadoop-env.sh、mapred-env.sh、yarn-env.sh 文件添加 JAVA_HOME:
# The java implementation to use.
export
JAVA_HOME=
/usr/lib/jvm/java-1.8.0-openjdk-amd64
5、在etc/hadoop下配置 masters和slaves 文件:
但只找到slaves文件,那就复制一份
cp slaves masters
文件masters中的内容为master
文件slaves中的内容为slaver1, slaver2
6、 向 slave1 和 slave2 节点复制 hadoop2.7.4 整个目录至相同的位置
scp -r hadoop-2.7.4/ hadoop@slaver1:~/software
scp -r hadoop-2.7.4/ hadoop@slaver2:~/software
2.3 Start Hadoop cluster from master
1、初始格式化文件系统 bin/hdfs namenode -format
· hadoop@master:~/software/hadoop-2.7.4/bin$ ./hdfs namenode -format
输出 17/09/19 03:43:18 INFO common.Storage: Storage directory /home/hadoop/software/hadoop-2.7.0/dfs/namenode has been successfully formatted.
17/09/19 03:43:18 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/software/hadoop-2.7.0/dfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
17/09/19 03:43:18 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/software/hadoop-2.7.0/dfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 322 bytes saved in 0 seconds.
17/09/19 03:43:18 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
17/09/19 03:43:18 INFO util.ExitUtil: Exiting with status 0
17/09/19 03:43:18 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.93.140
************************************************************/
启动 Hadoop 集群 start-all.sh
hadoop@master:~/software/hadoop-2.7.4/sbin$ ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/software/hadoop-2.7.4/logs/hadoop-hadoop-namenode-master.out
slaver2: starting datanode, logging to /home/hadoop/software/hadoop-2.7.4/logs/hadoop-hadoop-datanode-slaver2.out
slaver1: starting datanode, logging to /home/hadoop/software/hadoop-2.7.4/logs/hadoop-hadoop-datanode-slaver1.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /home/hadoop/software/hadoop-2.7.4/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/software/hadoop-2.7.4/logs/yarn-hadoop-resourcemanager-master.out
slaver1: starting nodemanager, logging to /home/hadoop/software/hadoop-2.7.4/logs/yarn-hadoop-nodemanager-slaver1.out
slaver2: starting nodemanager, logging to /home/hadoop/software/hadoop-2.7.4/logs/yarn-hadoop-nodemanager-slaver2.out
hadoop@master:~/software/hadoop-2.7.4/sbin$
jps 输出运行的 java 进程:
浏览器查看 HDFS:http://192.168.93.140:50070
Overview 'master:9000' (active)
Started: Tue Sep 19 03:43:41 PDT 2017
Version: 2.7.4, rcd915e1e8d9d0131462a0b7301586c175728a282
Compiled: 2017-08-01T00:29Z by kshvachk from branch-2.7.4
Cluster ID: CID-232a6925-5563-411b-9eb9-828aa4623ed0
Block Pool ID: BP-2106543616-192.168.93.140-1505817798572
Summary
Security is off.
Safemode is off.
1 files and directories, 0 blocks = 1 total filesystem object(s).
Heap Memory used 44.74 MB of 221 MB Heap Memory. Max Heap Memory is 889 MB.
Non Heap Memory used 45.64 MB of 46.5 MB Commited Non Heap Memory. Max Non Heap Memory is -1 B.
浏览器查看 mapreduce:http://192.168.93.140:8088
注意:在 hdfs namenode -format
或 start-all.sh
运行 HDFS 或 Mapreduce 无法正常启动时(master节点或 slave 节点),可将 master 节点和 slave 节点目录下的 dfs、logs、tmp 等目录删除,重新 hdfs namenode -format
,再运行 start-all.sh
2.4 Stop Hadoop cluster from master
hadoop@master:~/software/hadoop-2.7.4/sbin$ jps
9318 ResourceManager
8840 NameNode
9098 SecondaryNameNode
15628 Jps
hadoop@master:~/software/hadoop-2.7.4/sbin$ ./stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
Stopping namenodes on [master]
master: stopping namenode
slaver1: stopping datanode
slaver2: stopping datanode
Stopping secondary namenodes [master]
master: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
slaver1: stopping nodemanager
slaver2: stopping nodemanager
no proxyserver to stop
hadoop@master:~/software/hadoop-2.7.4/sbin$
hadoop
与hbase兼容列表