以前都是玩 java,没搞过 hadoop,所以以此系列文章来记录下学习过程
安装的文件版本、操作系统说明
centos-6.5-x86_64 [bamboo@hadoop-senior opt]$ uname -a Linux hadoop-senior.bamboo.com 2.6.32-696.16.1.el6.x86_64
jdk、hadoop
[bamboo@hadoop-senior softwares]$ ll total 443172 -rw-r--r--. 1 bamboo bamboo 311430119 Dec 29 23:52 hadoop-2.5.0.tar.gz -rw-r--r--. 1 bamboo bamboo 142376665 Dec 30 02:17 jdk-7u67-linux-x64.tar.gz
- 1. 在 opt 目录下创建 4 个目录:
[bamboo@hadoop-senior opt]$ ll total 20 drwxr-xr-x. 2 bamboo bamboo 4096 Dec 30 18:05 datas drwxr-xr-x. 4 bamboo bamboo 4096 Dec 30 18:32 modules drwxr-xr-x. 2 root root 4096 Oct 3 22:14 rh drwxr-xr-x. 2 bamboo bamboo 4096 Dec 30 18:16 softwares drwxr-xr-x. 2 bamboo bamboo 4096 Dec 30 18:05 tools
- 2.安装前先检查下 java
rpm -qa | grep java 如果有的话,会返回一系列的内容, [bamboo@hadoop-senior opt]$ rpm -qa | grep java [bamboo@hadoop-senior opt]$ 可以看到我的机器是是没有的,如果返回有记录的话,则执行如下的命令 rpm -e --nodeps 1 2 3(1 2 3 是执行命令返回的结果集)
- 3.安装 java
3.1 解压 jdk tar -zxvf jdk*.tar.gz -C /opt/modules 3.2 配置环境变量 vim /etc/profile 在最后添加如下的配置 export JAVA_HOME=/opt/modules/jdk1.7.0_67 export PATH=$PATH:$JAVA_HOME/bin 保存退出 3.3 生效配置 source /etc/profile 这样即可生效 3.4 验证 java 是否安装成功 [bamboo@localhost jdk1.7.0_67]$ java -version java version "1.7.0_67" Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) 可以看到已经生效
- 4.安装 hadoop
4.1 解压 hadoop tar -zxvf hadoop-2.5.0.tar.gz -C /opt/modules/ 4.2 配置 hadoop-env.sh 环境变量 /opt/modules/hadoop-2.5.0/etc/hadoop/hadoop-env.sh vim hadoop-env.sh 配置 jdk 路径 export JAVA_HOME=/opt/modules/jdk1.7.0_67 4.3 启动 hadoop (有 3 种启动方式) 4.3.1 切换到安装根目录,然后创建 input 目录 [bamboo@localhost hadoop-2.5.0]$ pwd /opt/modules/hadoop-2.5.0 [bamboo@localhost hadoop-2.5.0]$ mkdir input [bamboo@localhost hadoop-2.5.0]$ ls bin etc include input lib libexec sbin share 4.3.2 把 etc/hadoop 下的 xml 文件 cp 到 input 目录下 [bamboo@localhost hadoop-2.5.0]$ cp etc/hadoop/*.xml input/ [bamboo@localhost hadoop-2.5.0]$ cd input/ [bamboo@localhost input]$ ls capacity-scheduler.xml core-site.xml hadoop-policy.xml hdfs-site.xml httpfs-site.xml yarn-site.xml 4.3.3 启动 hadoop bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar grep input output 'dfs[a-z.]+' 执行成功,并没有报错 执行 ls,可以看到多了 output目录: [bamboo@localhost hadoop-2.5.0]$ ls bin etc include input lib libexec output sbin share [bamboo@localhost hadoop-2.5.0]$ cat output/* 1 dfsadmin 测试一个统计字符的 mkdir wcinput vim wcinput hadoop yarn hadoop mapreduce hadoop hdfs yarn nodemanager hadoop resourcemanager 然后启动 hadoop: [bamboo@localhost hadoop-2.5.0]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount wcinput wcoutput [bamboo@localhost hadoop-2.5.0]$ cat wcoutput/* hadoop 4 hdfs 1 mapreduce 1 nodemanager 1 resourcemanager 1 yarn 2 [bamboo@localhost hadoop-2.5.0]$ cat wcinput/ cat: wcinput/: Is a directory [bamboo@localhost hadoop-2.5.0]$ cat wcinput/* hadoop yarn hadoop mapreduce hadoop hdfs yarn nodemanager hadoop resourcemanager
这个模式是 hadoop 的 Standalone Operation 启动模式.下一篇继续 Pseudo-Distributed Mode.
当然也可以参考官网的 getstart 文档,链接如下: