• VMWare9下基于Ubuntu12.10搭建Hadoop-1.2.1集群


    VMWare9下基于Ubuntu12.10搭建Hadoop-1.2.1集群

     

    下一篇:VMWare9下基于Ubuntu12.10搭建Hadoop-1.2.1集群—整合Zookeeper和Hbase


    近期在学习Hadoop,把hadoop集群环境搭建的过程记录一下,方便查询,方案中有好多细节的东西,可能会比較啰嗦,对于新手来说也许更有帮助,闲话不多说,进入正题。

     

    搭建5个节点的Hadoop集群环境

    1.        环境说明

    使用VMWare创建5台Ubuntu虚拟机,环境具体信息例如以下:

    虚拟机

    操作系统

    JDK

    Hadoop

    VMWare Workstation 9

    ubuntu-12.10-server-amd64

    jdk-7u51-linux-x64

    hadoop-1.2.1

    主机名

    IP地址

    虚拟机名

    节点内容

    master

    192.168.1.30

    Ubuntu64-Master

    namenode, Jobtracker

    secondary

    192.168.1.39

    Ubuntu64-Secondary

    secondarynamenode

    slaver1

    192.168.1.31

    Ubuntu64-slaver1

    datanode, tasktracker

    slaver2

    192.168.1.32

    Ubuntu64-slaver2

    datanode, tasktracker

    slaver3

    192.168.1.33

    Ubuntu64-slaver3

    datanode, tasktracker

    2.        搭建虚拟机系统

    下载Ubuntu server版64位系统,iso版本号,方便在vmware上安装。

    每台虚拟机配置1个双核cpu,1G RAM,20G硬盘,设置ShareFolder为共享目录,方便Windows主机向虚拟机传送文件包。

    Ubuntu系统easy方式安装,创建hadoop用户,兴许hadoop,zookeeper,hbase都用hadoop用户来部署。

    能够先安装一台主机,以master为模板,配置好了之后,用vmvare的克隆功能复制出其他主机,然后调整下ip和主机名。

    创建用户

    先创建用户组

    sudo addgroup hadoop

    然后创建用户

    sudo adduser -ingroup hadoop hadoop

    更新安装源

    先备份系统自带源内容(hadoop用户登录,所以要sudo)。

    sudo cp /etc/apt/sources.list /etc/apt/sources.list.backup

    改动源内容

    sudo vi /etc/apt/sources.list

    从网上搜索到的源内容,拷贝到vi中

    ##Ubuntu 官方更新server(欧洲,此为官方源,国内较慢,但无同步延迟问题,电信、移动/铁通、联通等公网用户能够使用):

    deb http://archive.ubuntu.com/ubuntu/ quantal main restricted universe multiverse

    deb http://archive.ubuntu.com/ubuntu/ quantal-security main restricted universe multiverse

    deb http://archive.ubuntu.com/ubuntu/ quantal-updates main restricted universe multiverse

    deb http://archive.ubuntu.com/ubuntu/ quantal-proposed main restricted universe multiverse

    deb http://archive.ubuntu.com/ubuntu/ quantal-backports main restricted universe multiverse

    deb-src http://archive.ubuntu.com/ubuntu/ quantal main restricted universe multiverse

    deb-src http://archive.ubuntu.com/ubuntu/ quantal-security main restricted universe multiverse

    deb-src http://archive.ubuntu.com/ubuntu/ quantal-updates main restricted universe multiverse

    deb-src http://archive.ubuntu.com/ubuntu/ quantal-proposed main restricted universe multiverse

    deb-src http://archive.ubuntu.com/ubuntu/ quantal-backports main restricted universe multiverse

    ##Ubuntu官方提供的其它软件(第三方闭源软件等):

    deb http://archive.canonical.com/ubuntu/ quantal partner

    deb http://extras.ubuntu.com/ubuntu/ quantal main

    ##骨头兄亲自搭建并维护的 Ubuntu 源(该源位于浙江杭州百兆共享宽带的电信机房),包括 Deepin 等镜像:

    deb http://ubuntu.srt.cn/ubuntu/ quantal main restricted universe multiverse

    deb http://ubuntu.srt.cn/ubuntu/ quantal-security main restricted universe multiverse

    deb http://ubuntu.srt.cn/ubuntu/ quantal-updates main restricted universe multiverse

    deb http://ubuntu.srt.cn/ubuntu/ quantal-proposed main restricted universe multiverse

    deb http://ubuntu.srt.cn/ubuntu/ quantal-backports main restricted universe multiverse

    deb-src http://ubuntu.srt.cn/ubuntu/ quantal main restricted universe multiverse

    deb-src http://ubuntu.srt.cn/ubuntu/ quantal-security main restricted universe multiverse

    deb-src http://ubuntu.srt.cn/ubuntu/ quantal-updates main restricted universe multiverse

    deb-src http://ubuntu.srt.cn/ubuntu/ quantal-proposed main restricted universe multiverse

    deb-src http://ubuntu.srt.cn/ubuntu/ quantal-backports main restricted universe multiverse

    ##搜狐更新server(山东联通千兆接入,官方中国大陆地区镜像跳转至此) ,包括其它开源镜像:

    deb http://mirrors.sohu.com/ubuntu/ quantal main restricted universe multiverse

    deb http://mirrors.sohu.com/ubuntu/ quantal-security main restricted universe multiverse

    deb http://mirrors.sohu.com/ubuntu/ quantal-updates main restricted universe multiverse

    deb http://mirrors.sohu.com/ubuntu/ quantal-proposed main restricted universe multiverse

    deb http://mirrors.sohu.com/ubuntu/ quantal-backports main restricted universe multiverse

    deb-src http://mirrors.sohu.com/ubuntu/ quantal main restricted universe multiverse

    deb-src http://mirrors.sohu.com/ubuntu/ quantal-security main restricted universe multiverse

    deb-src http://mirrors.sohu.com/ubuntu/ quantal-updates main restricted universe multiverse

    deb-src http://mirrors.sohu.com/ubuntu/ quantal-proposed main restricted universe multiverse

    deb-src http://mirrors.sohu.com/ubuntu/ quantal-backports main restricted universe multiverse

    运行更新源命令,这样系统中的安装源才会被刷新。

    sudo apt-get update

    安装vim

    还是用不习惯vi,安装vim替代vi。

    sudo apt-get install vim

    配置ip

    ubuntu下改动ip地址,直接改动/etc/network/interfaces文件就可以。

    sudo vim /etc/network/interfaces

    以master主机为例,改动为例如以下配置

    # The primary network interface

    auto eth0

    iface eth0 inet static

    address 192.168.1.30

    netmask 255.255.255.0

    network 192.168.1.0

    broadcast 192.168.1.255

    gateway 192.168.1.1

    # dns-* options are implemented by the resolvconf package, if installed

    dns-nameservers 8.8.8.8

    配置主机名

    ubuntu下主机名文件为/etc/hostname,还有/etc/hosts用来配置主机名ip地址转换关系。

    先配置主机名

    sudo vim /etc/hostname

    配置为

    master

    然后配置全部主机名ip地址转换

    sudo vim /etc/hosts

    配置为(一次把全部server主机都配置上,一劳永逸)

    127.0.0.1        localhost

    192.168.1.30      master

    192.168.1.31    slaver1

    192.168.1.32    slaver2

    192.168.1.33    slaver3

    192.168.1.39      secondary

    hosts文件里的配置參数格式为

    ip地址     主机名     别名(能够有0个或n个,空格分开)

    克隆系统副本

    把安装配置好的ubuntu克隆出多个副本,构建出5台ubuntu小集群,然后分别改动ip、改动主机名。

    3.        安装配置SSH

    安装SSH

    採用apt-get方式安装,方便省事。

    sudo apt-get install openssh-server

    用命令查看ssh服务是否启动

    ps –ef|grep ssh

    有例如以下信息就是启动了

    hadoop    2147  2105  0 13:11 ?        00:00:00 /usr/bin/ssh-agent /usr/bin/dbus-launch --exit-with-session gnome-session --session=ubuntu

    root      7226     1  0 23:31 ?        00:00:00 /usr/sbin/sshd -D

    hadoop    7287  6436  0 23:33 pts/0    00:00:00 grep --color=auto ssh

    ssh分为client和server,client用来ssh登录其他服务器,server用来提供ssh服务,供用户ssh远程登录。ubuntu默认安装了ssh client,所以要安装sshserver。

    生成RSA密钥对

    在hadoop用户下,用ssh的命令生成密钥对。

    ssh-keygen –t rsa

    期间会询问是否为密钥设置password,空password就可以,没有错误的话,在hadoop的.ssh文件夹下会生成密钥对文件(id_rsa和id_rsa.pub文件),id_rsa文件为私钥,server自己保存,防止外泄,id_rsa.pub文件为公钥,分发给其他须要免password訪问的server。

    注:ssh和-keygen之间不能有空格,ssh-keygen –t rsa –P “” 命令能够免密钥password。

    进入.ssh文件夹,将公钥追加到授权认证文件(authorized_keys)中,authorized_keys用来存储全部server的公钥信息。

    cat id_rsa.pub >> authorized_keys

    authorized_keys文件里,公钥以ssh-rsa开头,username@主机名结尾,多个server的公钥顺序保存,示比例如以下。

    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDs5A9sjk+44DtptGw4fXm5n0qbpnSnFsqRQJnbyD4DGMG7AOpfrEZrMmRiNJA8GZUIcrN71pHEgQimoQGD5CWyVgi1ctWFrULOnGksgixJj167m+FPdpcCFJwfAS34bD6DoVXJgyjWIDT5UFz+RnElNC14s8F0f/w44EYM49y2dmP8gGmzDQ0jfIgPSknUSGoL7fSFJ7PcnRrqWjQ7iq3B0gwyfCvWnq7OmzO8VKabUnzGYST/lXCaSBC5WD2Hvqep8C9+dZRukaa00g2GZVH3UqWO4ExSTefyUMjsal41YVARMGLEfyZzvcFQ8LR0MWhx2WMSkYp6Z6ARbdHZB4MN hadoop@master

    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2Hb6mCi6sd6IczIn/pBbj8L9PMS1ac0tlalex/vlSRj2E6kzUrw/urEUVeO76zFcZgjUgKvoZsNAHGrr1Bfw8FiiDcxPtlIREl2L9Qg8Vd0ozgE22bpuxBTn1Yed/bbJ/VxGJsYbOyRB/mBCvEI4ECy/EEPf5CRMDgiTL9XP86MNJ/kgG3odR6hhSE3Ik/NMARTZySXE90cFB0ELr/Io4SaINy7b7m6ssaP16bO8aPbOmsyY2W2AT/+O726Py6tcxwhe2d9y2tnJiELfrMLUPCYGEx0Z/SvEqWhEvvoGn8qnpPJCGg6AxYaXy8jzSqWNZwP3EcFqmVrg9I5v8mvDd hadoop@slaver1

    分发公钥

    server把自己的公钥内容分发给其他server,就是为了能免password登录其他server,把全部server的公钥集中到一台server上,然后集中分发给其他server,这样处理,5台server能够任意互相免password訪问。

    分发採用scp命令,scp须要两方server都启动ssh服务。scp初次訪问须要输入password。

    除masterserver运行例如以下命令,复制公钥。

    cd .ssh

    scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver1

    scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver2

    scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.slaver3

    scp id_rsa.pub hadoop@master:/home/hadoop/.ssh/id_rsa.pub.secondary

    masterserver运行例如以下命令,集中公钥。

    cd .ssh

    cat id_rsa.pub.slaver1 >> authorized_keys

    cat id_rsa.pub.slaver2 >> authorized_keys

    cat id_rsa.pub.slaver3 >> authorized_keys

    cat id_rsa.pub.secondary >> authorized_keys

    masterserver运行例如以下命令,分发公钥。

    scp authorized_keys hadoop @slaver1:/home/hadoop/.ssh/authorized_keys

    scp authorized_keys hadoop @slaver2:/home/hadoop/.ssh/authorized_keys

    scp authorized_keys hadoop @slaver3:/home/hadoop/.ssh/authorized_keys

    scp authorized_keys hadoop @secondary:/home/hadoop/.ssh/authorized_keys

    測试免password訪问,输入

    ssh slaver1

    4.        安装配置JDK

    部署JDK

    解jdk-7u51-linux-x64.tar.gz包到/usr/lib/jdk1.7.0_51

    tar –zxvf jdk-7u51-linux-x64.tar.gz –C /usr/lib/

    配置环境变量

    把jdk设置到全局环境变量中

    sudo vim /etc/profile

    在最以下加入例如以下内容

    export JAVA_HOME=/usr/lib/jdk1.7.0_51

    export JRE_HOME=/usr/lib/jdk1.7.0_51/jre

    export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

    export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

    注:linux下环境变量分隔符为半角冒号’:’,windows下为半角分号’;’,CLASSPATH中必须有’.’

    通过例如以下命令来刷新环境变量

    source /etc/profile

    分发JDK

    通过scp命令分发安装好的jdk,文件夹分发须要加-r參数

    scp -r /usr/lib/jdk1.7.0_51 hadoop@slaver1: /usr/lib/

    分发环境变量

    /etc/profile为root用户全部,所以scp要加sudo,并且须要传输到slaver1的root用户下。

    sudo scp /etc/profile root@slaver1:/etc/profile

    5.        安装配置Hadoop

    部署Hadoop

    解hadoop-1.2.1.tar.gz包到/home/hadoop/hadoop-1.2.1

    tar –zxvf hadoop-1.2.1.tar.gz –C /home/hadoop/

    配置环境变量

    把hadoop设置到全局环境变量中

    sudo vim /etc/profile

    在最以下加入例如以下内容

    export HADOOP_HOME=/home/hadoop/hadoop-1.2.1

    export PATH=$PATH:$HADOOP_HOME/bin

    刷新环境变量

    source /etc/profile

    conf/hadoop-env.sh

    export JAVA_HOME=/usr/lib/jdk1.7.0_51

    export HADOOP_TASKTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_TASKTRACKER_OPTS"

    export HADOOP_LOG_DIR=/home/hadoop/hadoop_home/logs

    export HADOOP_MASTER=master:/home/$USER/hadoop-1.2.1

    export HADOOP_SLAVE_SLEEP=0.1

    export HADOOP_MASTER设置hadoop应用rsync同步配置的主机文件夹,hadoop启动,就会从主机把配置同步到从机。

    export HADOOP_SLAVE_SLEEP=0.1设置hadoop从机同步配置请求的休眠时间(秒),避免多节点同一时候请求同步对主机造成负担过重。

    conf/core-site.xml

    <configuration>

        <property>

            <name>fs.default.name</name>

            <value>hdfs://master:9000</value>

        </property>

        <property>

            <name>hadoop.tmp.dir</name>

            <value>/home/hadoop/hadoop_home/tmp</value>

        </property>

        <property>

             <name>fs.trash.interval</name>

             <value>10080</value>

            <description>Number of minutes between trash checkpoints.If zero, the trash feature is disabled.</description>

        </property>

        <property>

            <name>fs.checkpoint.period</name>

            <value>600</value>

             <description>The number of seconds between two periodic checkpoints.</description>

        </property>

        <property>

            <name>fs.checkpoint.size</name>

            <value>67108864</value>

            <description>The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired.</description>

        </property>

    </configuration>

    conf/hdfs-site.xml

    <configuration>

        <property>

            <name>dfs.name.dir</name>

            <value>/home/hadoop/hadoop_home/name1,/home/hadoop/hadoop_home/name2</value>

            <description>  </description>

        </property>

        <property>

            <name>dfs.data.dir</name>

            <value>/home/hadoop/hadoop_home/data1,/home/hadoop/hadoop_home/data2</value>

            <description> </description>

        </property>

        <property>

             <name>fs.checkpoint.dir</name>

             <value>/home/hadoop/hadoop_home/namesecondary1,/home/hadoop/hadoop_home/namesecondary2</value>

        </property>

        <property>

            <name>dfs.replication</name>

            <value>3</value>

        </property>

        <property>

             <name>dfs.http.address</name>

             <value>master:50070</value>

        </property>

        <property>

            <name>dfs.https.address</name>

            <value>master:50470</value>

        </property>

        <property>

            <name>dfs.secondary.http.address</name>

            <value>secondary:50090</value>

        </property>

        <property>

            <name>dfs.datanode.address</name>

            <value>0.0.0.0:50010</value>

        </property>

        <property>

            <name>dfs.datanode.ipc.address</name>

            <value>0.0.0.0:50020</value>

        </property>

        <property>

            <name>dfs.datanode.http.address</name>

            <value>0.0.0.0:50075</value>

        </property>

        <property>

            <name>dfs.datanode.https.address</name>

            <value>0.0.0.0:50475</value>

        </property>

    </configuration>

    conf/mapred-site.xml

    <configuration>

        <property>

            <name>mapred.job.tracker</name>

            <value>master:9001</value>

        </property>

        <property>

            <name>mapred.local.dir</name>

            <value>/home/hadoop/hadoop_home/local</value>

        </property>

        <property>

            <name>mapred.system.dir</name>

             <value>/home/hadoop/hadoop_home/system</value>

        </property>

        <property>

             <name>mapred.tasktracker.map.tasks.maximum</name>

             <value>5</value>

        </property>

        <property>

            <name>mapred.tasktracker.reduce.tasks.maximum</name>

            <value>5</value>

        </property>

        <property>

            <name>mapred.job.tracker.http.address</name>

            <value>0.0.0.0:50030</value>

        </property>

        <property>

            <name>mapred.task.tracker.http.address</name>

            <value>0.0.0.0:50060</value>

        </property>

    </configuration>

    conf/masters

    secondary

    conf/masters配置secondarynamenode的主机名,本方案中secondarynamenode有单独的server,与namenode无关。

    conf/slaves

    slaver1

    slaver2

    slaver3

    分发hadoop副本

    scp -r /home/hadoop/hadoop-1.2.1 hadoop@slaver1: /home/hadoop/

    分发环境变量

    /etc/profile为root用户全部,所以scp要加sudo,并且须要传输到slaver1的root用户下。

    sudo scp /etc/profile root@slaver1:/etc/profile

    6.        启动Hadoop測试

    启动hadoop集群

    hadoop启动命令例如以下

    命令

    作用

    start-all.sh

    启动hdfs和mapreduce守护进程,包含namenode,secondarynamenode,datanode,jobtracker,tasktracker

    stop-all.sh

    停止hdfs和mapreduce守护进程,包含namenode,secondarynamenode,datanode,jobtracker,tasktracker

    start-dfs.sh

    启动hdfs守护进程,包含namenode,secondarynamenode,datanode

    stop-dfs.sh

    停止hdfs守护进程,包含namenode,secondarynamenode,datanode

    start-mapred.sh

    启动mapreduce守护进程,包含jobtracker,tasktracker

    stop-mapred.sh

    停止mapreduce守护进程,包含jobtracker,tasktracker

    hadoop-daemons.sh start namenode

    单独启动namenode守护进程

    hadoop-daemons.sh stop namenode

    单独停止namenode守护进程

    hadoop-daemons.sh start datanode

    单独启动datanode守护进程

    hadoop-daemons.sh stop datanode

    单独停止datanode守护进程

    hadoop-daemons.sh start secondarynamenode

    单独启动secondarynamenode守护进程

    hadoop-daemons.sh stop secondarynamenode

    单独停止secondarynamenode守护进程

    hadoop-daemons.sh start jobtracker

    单独启动jobtracker守护进程

    hadoop-daemons.sh stop jobtracker

    单独停止jobtracker守护进程

    hadoop-daemons.sh start tasktracker

    单独启动tasktracker守护进程

    hadoop-daemons.sh stop tasktracker

    单独停止tasktracker守护进程

    启动hadoop集群

    start-all.sh

    在用stop-all.sh脚本来停止hadoop的时候,查看日志,发现datanode中总会有错误出现,例如以下

    2014-06-10 15:52:20,216 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Call to master/192.168.1.30:9000 failed on local exception: java.io.EOFExcept

    ion

            at org.apache.hadoop.ipc.Client.wrapException(Client.java:1150)

            at org.apache.hadoop.ipc.Client.call(Client.java:1118)

            at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)

            at com.sun.proxy.$Proxy5.sendHeartbeat(Unknown Source)

            at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:1031)

            at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1588)

            at java.lang.Thread.run(Thread.java:744)

    Caused by: java.io.EOFException

            at java.io.DataInputStream.readInt(DataInputStream.java:392)

            at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:845)

            at org.apache.hadoop.ipc.Client$Connection.run(Client.java:790)

    分析原因,发现datanode停止时间在namenode之后,导致与namenode连接失败,出现上面的异常,研究一下停止脚本,发如今stop-dfs.sh中停止顺序有些不太妥当(个人觉得),先停止namenode,后停止datanode,我觉得能够调整一下停止顺序,让namenode最后停止,这样应该能避免出现连接异常警告。

    调整之后内容例如以下

    "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR stop datanode

    "$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters stop secondarynamenode

    "$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR stop namenode

    经过測试,就没有发现datanode有上面的异常出现了,不知道这么调整会不会对hadoop有影响,欢迎大家指正。

    通过http://master:50070查看hdfs的监控页面。

    通过http://master:50030查看mapreduce的监控页面。

     

    用命令查看hdfs的情况

    hadoop dfsadmin –report

    hadoop fsck /

    測试hadoop集群

    通过执行hadoop自带的wordcount程序来測试hadoop集群是否执行正常。

    首先创建两个input数据文件。

    echo “Hello World Bye World” > text1.txt

    echo “Hello Hadoop Goodbye Hadoop” > text2.txt

    上传数据文件到hdfs中

    hadoop fs –put text1.txt hdfs://master:9000/user/hadoop/input/text1.txt

    hadoop fs –put text2.txt hdfs://master:9000/user/hadoop/input/text2.txt

    执行wordcount程序

    hadoop jar hadoop-examples-1.2.1.jar wordcount input/file*.txt output-0

    执行日志例如以下

    14/06/12 01:55:21 INFO input.FileInputFormat: Total input paths to process : 2

    14/06/12 01:55:21 INFO util.NativeCodeLoader: Loaded the native-hadoop library

    14/06/12 01:55:21 WARN snappy.LoadSnappy: Snappy native library not loaded

    14/06/12 01:55:21 INFO mapred.JobClient: Running job: job_201406111818_0001

    14/06/12 01:55:22 INFO mapred.JobClient:  map 0% reduce 0%

    14/06/12 01:55:28 INFO mapred.JobClient:  map 50% reduce 0%

    14/06/12 01:55:30 INFO mapred.JobClient:  map 100% reduce 0%

    14/06/12 01:55:36 INFO mapred.JobClient:  map 100% reduce 33%

    14/06/12 01:55:37 INFO mapred.JobClient:  map 100% reduce 100%

    14/06/12 01:55:38 INFO mapred.JobClient: Job complete: job_201406111818_0001

    14/06/12 01:55:38 INFO mapred.JobClient: Counters: 29

    14/06/12 01:55:38 INFO mapred.JobClient:   Job Counters

    14/06/12 01:55:38 INFO mapred.JobClient:     Launched reduce tasks=1

    14/06/12 01:55:38 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=8281

    14/06/12 01:55:38 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

    14/06/12 01:55:38 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

    14/06/12 01:55:38 INFO mapred.JobClient:     Launched map tasks=2

    14/06/12 01:55:38 INFO mapred.JobClient:     Data-local map tasks=2

    14/06/12 01:55:38 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8860

    14/06/12 01:55:38 INFO mapred.JobClient:   File Output Format Counters

    14/06/12 01:55:38 INFO mapred.JobClient:     Bytes Written=41

    14/06/12 01:55:38 INFO mapred.JobClient:   FileSystemCounters

    14/06/12 01:55:38 INFO mapred.JobClient:     FILE_BYTES_READ=79

    14/06/12 01:55:38 INFO mapred.JobClient:     HDFS_BYTES_READ=272

    14/06/12 01:55:38 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=166999

    14/06/12 01:55:38 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=41

    14/06/12 01:55:38 INFO mapred.JobClient:   File Input Format Counters

    14/06/12 01:55:38 INFO mapred.JobClient:     Bytes Read=50

    14/06/12 01:55:38 INFO mapred.JobClient:   Map-Reduce Framework

    14/06/12 01:55:38 INFO mapred.JobClient:     Map output materialized bytes=85

    14/06/12 01:55:38 INFO mapred.JobClient:     Map input records=2

    14/06/12 01:55:38 INFO mapred.JobClient:     Reduce shuffle bytes=85

    14/06/12 01:55:38 INFO mapred.JobClient:     Spilled Records=12

    14/06/12 01:55:38 INFO mapred.JobClient:     Map output bytes=82

    14/06/12 01:55:38 INFO mapred.JobClient:     Total committed heap usage (bytes)=336338944

    14/06/12 01:55:38 INFO mapred.JobClient:     CPU time spent (ms)=3010

    14/06/12 01:55:38 INFO mapred.JobClient:     Combine input records=8

    14/06/12 01:55:38 INFO mapred.JobClient:     SPLIT_RAW_BYTES=222

    14/06/12 01:55:38 INFO mapred.JobClient:     Reduce input records=6

    14/06/12 01:55:38 INFO mapred.JobClient:     Reduce input groups=5

    14/06/12 01:55:38 INFO mapred.JobClient:     Combine output records=6

    14/06/12 01:55:38 INFO mapred.JobClient:     Physical memory (bytes) snapshot=394276864

    14/06/12 01:55:38 INFO mapred.JobClient:     Reduce output records=5

    14/06/12 01:55:38 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2918625280

    14/06/12 01:55:38 INFO mapred.JobClient:     Map output records=8

    查看执行输出文件夹,有_SUCCESS文件,说明执行成功,证明hadoop集群环境搭建基本没有问题。

    hadoop fs -ls output-0

     

    Found 3 items

    -rw-r--r--   3 hadoop supergroup          0 2014-06-12 01:55 /user/hadoop/output-0/_SUCCESS

    drwxr-xr-x   - hadoop supergroup          0 2014-06-12 01:55 /user/hadoop/output-0/_logs

    -rw-r--r--   3 hadoop supergroup         41 2014-06-12 01:55 /user/hadoop/output-0/part-r-00000

    查看执行结果

    hadoop fs -cat output-0/part-r-00000

    Bye  1

    Goodbye  1

    Hadoop    2

    Hello         2

    World        2

    与预设数据的预期结果一致。

  • 相关阅读:
    RadRails插件在MyEclipse的安装
    如何取消电脑的自动播放功能
    程序员怎么提高自己的技术书籍
    PHP网站,两个域名在一个空间,如何做301转向
    下载和安装Tcl/Tk:
    Hibernate 3.0配置Proxool 0.8.3数据库连接池
    CSS滤镜属性详解
    使用InstantRails搭建Ruby On Rails开发环境
    hibernate 二级缓存配置
    Ruby on rails开发从头来(windows)(二)创建项目和第一个Hello world
  • 原文地址:https://www.cnblogs.com/zfyouxi/p/3832033.html
Copyright © 2020-2023  润新知