Hadoop的运行模式可分为单机模式、伪分布模式和分布模式。
首先无论哪种模式都需要安装JDK的,这一步之前的随笔Ubuntu 14.04 LTE下安装JDK 1.8中已经做了。这里就不多说了。
其次是安装SSH。安装SSH是为了每次可以免密码登陆数据节点服务器。因为集群的环境下,每次登陆到数据节点服务器不可能每次都输入密码。这一步在前面的随笔Ubuntu 14.04 LTE下配置SSH免密码登录中已经做了。这里也不多说了。
伪分布模式安装:
首先下载Hadoop 1.2.1到本机,再解压到用户目录下。
jerry@ubuntu:~/Downloads$ tar zxf hadoop-1.2.1.tar.gz -C ~/hadoop_1.2.1
jerry@ubuntu:~/Downloads$ cd ~/hadoop_1.2.1/
jerry@ubuntu:~/hadoop_1.2.1$ ls
hadoop-1.2.1
jerry@ubuntu:~/hadoop_1.2.1$ cd hadoop-1.2.1/
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1$ ls
bin hadoop-ant-1.2.1.jar ivy sbin
build.xml hadoop-client-1.2.1.jar ivy.xml share
c++ hadoop-core-1.2.1.jar lib src
CHANGES.txt hadoop-examples-1.2.1.jar libexec webapps
conf hadoop-minicluster-1.2.1.jar LICENSE.txt
contrib hadoop-test-1.2.1.jar NOTICE.txt
docs hadoop-tools-1.2.1.jar README.txt
然后配置hadoop的几个配置文件,都是XML格式。
首先是core-default.xml。这里配置hadoop分布式文件系统的地址和端口,以及Hadoop临时文件目录(/tmp/hadoop-${user.name})。
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ cat core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/hadoop/hadooptmp</value> </property> </configuration> jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$
修改hadoop系统环境配置文件,告诉hadoop安装好的jdk的主目录路径
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1$ cd conf/ jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ ls capacity-scheduler.xml hadoop-policy.xml slaves configuration.xsl hdfs-site.xml ssl-client.xml.example core-site.xml log4j.properties ssl-server.xml.example fair-scheduler.xml mapred-queue-acls.xml taskcontroller.cfg hadoop-env.sh mapred-site.xml task-log4j.properties hadoop-metrics2.properties masters jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ sudo vim hadoop-env.sh n [sudo] password for jerry: 2 files to edit jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ sudo vim hadoop-env.sh jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ tail -n 1 hadoop-env.sh export JAVA_HOME=/usr/lib/jvm/jdk
然后是hdfs-site.xml 。修改hdfs的文件备份数量为1,dfs命名节点的主目录,dfs数据节点的目录。
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ cat hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>/hadoop/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/hadoop/hdfs/data</value> </property> </configuration> jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$
最后配置mapreduce的job tracker的地址和端口
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$
配置masters文件和slaves文件,这里因为我们是伪分布式,命名节点和数据节点其实都是一样。
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ cat masters localhost 192.168.2.100 jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ cat slaves localhost 192.168.2.100 jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$
编辑/etc/hosts文件,配置主机名和IP地址的映射关系
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 ubuntu # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 192.168.2.100 master 192.168.2.100 slave jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$
创建好core-default.xml,hdfs-site.xml,mapred-site.xml 三个配置文件里面写到的目录
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ mkdir -p /hadoop/hadooptmp
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ mkdir -p /hadoop/hdfs/name
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ mkdir -p /hadoop/hdfs/data
格式化HDFS
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/bin$ ./hadoop namenode -format
启动所有Hadoop服务,包括JobTracker,TaskTracker,Namenode
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/bin$ ./start-all.sh starting namenode, logging to /home/jerry/hadoop_1.2.1/hadoop-1.2.1/libexec/../logs/hadoop-jerry-namenode-ubuntu.out 192.168.68.130: starting datanode, logging to /home/jerry/hadoop_1.2.1/hadoop-1.2.1/libexec/../logs/hadoop-jerry-datanode-ubuntu.out localhost: starting datanode, logging to /home/jerry/hadoop_1.2.1/hadoop-1.2.1/libexec/../logs/hadoop-jerry-datanode-ubuntu.out localhost: ulimit -a for user jerry localhost: core file size (blocks, -c) 0 localhost: data seg size (kbytes, -d) unlimited localhost: scheduling priority (-e) 0 localhost: file size (blocks, -f) unlimited localhost: pending signals (-i) 7855 localhost: max locked memory (kbytes, -l) 64 localhost: max memory size (kbytes, -m) unlimited localhost: open files (-n) 1024 localhost: pipe size (512 bytes, -p) 8 localhost: starting secondarynamenode, logging to /home/jerry/hadoop_1.2.1/hadoop-1.2.1/libexec/../logs/hadoop-jerry-secondarynamenode-ubuntu.out 192.168.68.130: secondarynamenode running as process 10689. Stop it first. starting jobtracker, logging to /home/jerry/hadoop_1.2.1/hadoop-1.2.1/libexec/../logs/hadoop-jerry-jobtracker-ubuntu.out 192.168.68.130: starting tasktracker, logging to /home/jerry/hadoop_1.2.1/hadoop-1.2.1/libexec/../logs/hadoop-jerry-tasktracker-ubuntu.out localhost: starting tasktracker, logging to /home/jerry/hadoop_1.2.1/hadoop-1.2.1/libexec/../logs/hadoop-jerry-tasktracker-ubuntu.out localhost: ulimit -a for user jerry localhost: core file size (blocks, -c) 0 localhost: data seg size (kbytes, -d) unlimited localhost: scheduling priority (-e) 0 localhost: file size (blocks, -f) unlimited localhost: pending signals (-i) 7855 localhost: max locked memory (kbytes, -l) 64 localhost: max memory size (kbytes, -m) unlimited localhost: open files (-n) 1024 localhost: pipe size (512 bytes, -p) 8 jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/bin$
查看Hadoop服务是否启动成功
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$ jps
3472 JobTracker
3604 TaskTracker
3084 NameNode
5550 Jps
3247 DataNode
3391 SecondaryNameNode
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/conf$
查看hadoop群集的状态
jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/bin$ ./hadoop dfsadmin -report Configured Capacity: 41083600896 (38.26 GB) Present Capacity: 32723169280 (30.48 GB) DFS Remaining: 32723128320 (30.48 GB) DFS Used: 40960 (40 KB) DFS Used%: 0% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 1 (1 total, 0 dead) Name: 127.0.0.1:50010 Decommission Status : Normal Configured Capacity: 41083600896 (38.26 GB) DFS Used: 40960 (40 KB) Non DFS Used: 8360431616 (7.79 GB) DFS Remaining: 32723128320(30.48 GB) DFS Used%: 0% DFS Remaining%: 79.65% Last contact: Sat Dec 26 12:22:07 PST 2015 jerry@ubuntu:~/hadoop_1.2.1/hadoop-1.2.1/bin$
过程中遇到不少问题,这里贴下一些有用的链接: