1 从官网下载hadoop稳定版
http://www.apache.org/dyn/closer.cgi/hadoop/common/
2 安装JAVA 参考如下blog
http://www.cnblogs.com/zhangwenjing/p/3580726.html
安装JDK
1、将安装包上传的要安装java的位置。
2、解压缩:tar -zxvf jdk-7u51-linux-i586.gz
3、删除:rm -Rf jdk-7u51-linux-i586.gz(节省磁盘空间)
配置
#vi /etc/profile
在最后面添加如下内容:
JAVA_HOME=/usr/local/java/jdk1.7.0_51
CLASSPATH=.:$JAVA_HOME/lib/tools.jar
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME CLASSPATH PATH
退出保存:按Esc,:wq退出保存。
验证:java -version
3 安装hadoop
参考http://www.21ops.com/front-tech/9766.html
其中相关内容具体目录自己改动。
一:修改配置文件:
hadoop2.2的配置文件在/opt/hadoop-2.2.0/etc/hadoop文件夹下,具体配置文件修改如下:
1、修改/etc/hosts文件(sudo gedit /etc/hosts)
192.168.222.154 hd2-single
2、修改core-site.xml
1
2
3
4
5
6
7
8
9
10
11
|
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/sujx/hadoop/tmp</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs: //hd2-single:9000</value> < final > true </ final > </property> </configuration> |
fs.defaultFS:HDFS文件系统的URL
3. 修改hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/sujx/hadoop/dfs/name</value> < final > true </ final > </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/sujx/hadoop/dfs/data</value> < final > true </ final > </property> <property> <name>dfs.replication</name> <value> 1 </value> </property> <property> <name>dfs.permissions</name> <value> false </value> </property> </configuration> |
4. 修改mapred-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.system.dir</name> <value>file:/home/sujx/hadoop/mapred/system</value> < final > true </ final > </property> <property> <name>mapred.local.dir</name> <value>file:/home/sujx/hadoop/mapred/local</value> < final > true </ final > </property> </configuration> |
5. 修改yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <description>shuffle service that needsto be set for Map Reduce to run </description> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>hd2-single</value> <description>hostanem of RM</description> </property> </configuration> |
6. 修改slave
hd2-single
至此,配置文件修改完毕,比较多,挺麻烦的。
二:启动Hadoop脚本。
启动hadoop脚本,需呀用到一些环境变量,所以需要先修改Ubuntu的profile文件。
使用命令:sudo /etc/profile
1
2
3
4
5
6
7
8
|
export HADOOP_HOME= /opt/hadoop-2 .2.0 export PATH=$PATH:$HADOOP_HOME /bin :$HADOOP_HOME /sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME /etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME /etc/hadoop |
在初次运行Hadoop的时候需要初始化Hadoop文件系统,命令如下:
1
|
hdfs namenode - format |
1. 启动脚本一:
sujx@ubuntu:~$ hadoop-daemon.sh start namenode
starting namenode, logging to /opt/hadoop-2.2.0/logs/hadoop-sujx-namenode-ubuntu.out
sujx@ubuntu:~$ hadoop-daemon.sh start datanode
starting datanode, logging to /opt/hadoop-2.2.0/logs/hadoop-sujx-datanode-ubuntu.out
sujx@ubuntu:~$ hadoop-daemon.sh start secondarynamenode
starting secondarynamenode, logging to /opt/hadoop-2.2.0/logs/hadoop-sujx-secondarynamenode-ubuntu.out
sujx@ubuntu:~$ jps
9310 SecondaryNameNode
9345 Jps
9140 NameNode
9221 DataNode
sujx@ubuntu:~$ yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /opt/hadoop-2.2.0/logs/yarn-sujx-resourcemanager-ubuntu.out
sujx@ubuntu:~$ yarn-daemon.sh start nodemanager
starting nodemanager, logging to /opt/hadoop-2.2.0/logs/yarn-sujx-nodemanager-ubuntu.out
sujx@ubuntu:~$ jps
9310 SecondaryNameNode
9651 NodeManager
9413 ResourceManager
9140 NameNode
9709 Jps
9221 DataNode
sujx@ubuntu:~$
2. 启动脚本二:
sujx@ubuntu:~$ start-dfs.sh
Starting namenodes on [hd2-single]
hd2-single: starting namenode, logging to /opt/hadoop-2.2.0/logs/hadoop-sujx-namenode-ubuntu.out
hd2-single: starting datanode, logging to /opt/hadoop-2.2.0/logs/hadoop-sujx-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.2.0/logs/hadoop-sujx-secondarynamenode-ubuntu.out
sujx@ubuntu:~$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.2.0/logs/yarn-sujx-resourcemanager-ubuntu.out
hd2-single: starting nodemanager, logging to /opt/hadoop-2.2.0/logs/yarn-sujx-nodemanager-ubuntu.out
sujx@ubuntu:~$ jps
11414 SecondaryNameNode
10923 NameNode
11141 DataNode
12038 Jps
11586 ResourceManager
11811 NodeManager
sujx@ubuntu:~$
3. 启动脚本三:
sujx@ubuntu:~$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hd2-single]
hd2-single: starting namenode, logging to /opt/hadoop-2.2.0/logs/hadoop-sujx-namenode-ubuntu.out
hd2-single: starting datanode, logging to /opt/hadoop-2.2.0/logs/hadoop-sujx-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.2.0/logs/hadoop-sujx-secondarynamenode-ubuntu.out
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.2.0/logs/yarn-sujx-resourcemanager-ubuntu.out
hd2-single: starting nodemanager, logging to /opt/hadoop-2.2.0/logs/yarn-sujx-nodemanager-ubuntu.out
sujx@ubuntu:~$ jps
14156 NodeManager
14445 Jps
13267 NameNode
13759 SecondaryNameNode
13485 DataNode
13927 ResourceManager
sujx@ubuntu:~$
其实这三种方式最终效果都是相同,他们内部也都是相互调用关系。对应的结束脚本也简单:
1. 结束脚本一:
sujx@ubuntu:~$ hadoop-daemon.sh stop nodemanager
sujx@ubuntu:~$ hadoop-daemon.sh stop resourcemanager
sujx@ubuntu:~$ hadoop-daemon.sh stop secondarynamenode
sujx@ubuntu:~$ hadoop-daemon.sh stop datanode
sujx@ubuntu:~$ hadoop-daemon.sh stop namenode
2. 结束脚本二:
sujx@ubuntu:~$ stop-yarn.sh
sujx@ubuntu:~$ stop-dfs.sh
3. 结束脚本三:
sujx@ubuntu:~$ stop-all.sh
至此,单机伪分布就已经部署完毕。