• centos7配置hadoop集群


    一:测试环境搭建规划:

    主机名称 IP 用户 HDFS YARN
    hadoop11 192.168.1.101 hadoop NameNode,DataNode NodeManager
    hadoop12 192.168.1.102 hadoop DataNode NodeManager
    hadoop13  192.168.1.103 hadoop DataNode,SecondaryNameNode NodeManager
    hadoop14 192.168.1.104 hadoop DataNode ResourceManager,NodeManager

    二:hadoop安装:

    1:新建hadoop用户:

    useradd hadoop

    passwd hadoop

    su hadoop

    mkdir /home/hadoop/apps

    mkdir /home/hadoop/data

    三:关闭防火墙:

    systemctl stop firewalld

    systemctl disable firewalld

    四:关闭selinux:

    vim /etc/sysconfig/selinux

    修改SELINUX=enforcing为SELINUX=disabled

    五:安装Java:

    5-1下载JDK:

    https://www.oracle.com/technetwork/java/javase/downloads/index.html

    https://www.oracle.com/technetwork/java/javase/downloads/jdk11-downloads-5066655.html

    点选 Accept License Agreement

    下载jdk-11.0.1_linux-x64_bin.tar.gz

    把下载好的jdk-11.0.1_linux-x64_bin.tar.gz,上传到服务器 /usr/local目录

    tar -zxvf jdk-11.0.1_linux-x64_bin.tar.gz

    5-2添加环境变量:

    vim /etc/profile

    在结尾添加:

    #java
    export JAVA_HOME='/usr/local/jdk-11.0.1'
    export PATH=$JAVA_HOME/bin:$PATH

    保存退出后,执行:

    source /etc/profile

    查看版本,确定环境变量是否生效:

    java -version

    六:安装hadoop

    6-1下载hadoop:

    https://hadoop.apache.org/releases.html

    选择binary文件

    拷贝镜像地址,下载

    wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz

    6-2 解压到指定目录:

    tar -zxvf hadoop-2.8.5.tar.gz -C /home/hadoop/apps

     6-3修改配置文件:

      cd /home/hadoop/apps/hadoop-2.8.5/etc/hadoop

      配置Hadoop JDK路径修改hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中的JDK路径:

     #export JAVA_HOME='/usr/local/jdk-11.0.1'

     export JAVA_HOME='/usr/local/jdk1.8.0_191'

      

    6-3-1 vim core-site.xml

    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
     <property>
       <name>fs.defaultFS</name>
       <value>hdfs://hadoop11.zm.com:8020</value>
     </property>
     <property>
       <name>hadoop.tmp.dir</name>
       <value>/home/hadoop/data/hadoop/tmp</value>
     </property>
    </configuration>

    fs.fefaultFS是NameNode的地址,

    hadoop.tmp.dir是hadoop临时目录的地址,

    应保证目录是已存在的,如不存在,需先创建。

    6-3-2 vim hdfs-site.xml

    <!-- Put site-specific property overrides in this file. -->

    <configuration>

      <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/hadoop/data/hadoop/name</value>
        <description>为了保证元数据的安全一般配置多个不同目录</description>
      </property>

      <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/hadoop/data/hadoop/data</value>
        <description>datanode 的数据存储目录</description>
      </property>

      <property>
        <name>dfs.replication</name>
        <value>2</value>
         <description>HDFS 的数据块的副本存储个数, 默认是3</description>
      </property>


      <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop13.zm.com:50090</value>
        </property>

      </configuration>

     dfs.namenode.name.dir:namenode数据的存放地点。也就是namenode元数据存放的地方,记录了hdfs系统中文件的元数据。

     dfs.datanode.data.dir: datanode数据的存放地点。也就是block块存放的目录了。

     dfs.replication:hdfs的副本数设置。也就是上传一个文件,其分割为block块后,每个block的冗余副本个数,默认配置是3。

     dfs.secondary.http.address:secondarynamenode 运行节点的信息,和 namenode 不同节点

     dfs.namenode.secondary.http-address是指定secondaryNameNode的http访问地址和端口号,因为在规划中,我们将hadoop13规划为SecondaryNameNode服务器。

    所以这里设置为:hadoop13.zm.com:50090

    6-3-3 配置slave文件,slaves文件是指定HDFS上有哪些DataNode节点:

    cd /home/hadoop/apps/hadoop-2.8.5

    [hadoop@hadoop11 hadoop-2.8.5]$ vim slaves

    hadoop11.zm.com
    hadoop12.zm.com
    hadoop13.zm.com
    hadoop14.zm.com

    6-3-4 cd /home/hadoop/apps/hadoop-2.8.5/etc/hadoop

    vim yarn-site.xml

    <configuration>
    <!-- Site specific YARN configuration properties -->
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop14.zm.com</value>
        </property>
        <property>
            <name>yarn.log-aggregation-enable</name>
            <value>true</value>
        </property>
        <property>
            <name>yarn.log-aggregation.retain-seconds</name>
            <value>106800</value>
        </property>
    </configuration>

    yarn.resourcemanager.hostname这个指定resourcemanager服务器指向hadoop14.zm.com,

    yarn.log-aggregation-enable是配置是否启用日志聚集功能,

    yarn.log-aggregation.retain-seconds是配置聚集的日志在HDFS上最多保存多长时间。

    6-3-5 cp mapred-site.xml.template mapred-site.xml

    vim mapred-site.xml

    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>hadoop11.zm.com:10020</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>hadoop11.zm.com:19888</value>
        </property>
    </configuration>

    mapreduce.framework.name设置mapreduce任务运行在yarn上。

    mapreduce.jobhistory.address是设置mapreduce的历史服务器安装在hadoop11机器上。

    mapreduce.jobhistory.webapp.address是设置历史服务器的web页面地址和端口号。

     七:配置hosts文件:

    回到root权限:su

    vim /etc/hosts

    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    
    192.168.1.201     hadoop11     hadoop11.zm.com

    192.168.1.202 hadoop12 hadoop12.zm.com
    192.168.1.203     hadoop13     hadoop13.zm.com 

    192.168.1.204 hadoop14 hadoop14.zm.com

     八:SSH免密登陆

    su hadoop

    所有hadoop集群服务器:

    su hadoop

    cd /home/hadoop/.ssh

    ssh-keygen -t rsa

    将所有终端公钥文件拷贝到hadoop11:

      所有终端运行:

      ssh-copy-id -i hadoop11

    登陆hadoop11终端的/home/hadoop/.ssh修改权限:

      chmod 600 authorized_keys

    分发授权文件到其他终端:

      scp /home/hadoop/.ssh/authorized_keys hadoop12:/home/hadoop/.ssh/

           scp /home/hadoop/.ssh/authorized_keys hadoop13:/home/hadoop/.ssh/

           scp /home/hadoop/.ssh/authorized_keys hadoop14:/home/hadoop/.ssh/

    每台机器用ssh命令连接下所有机器,第一次连接要输入yes,以后不用输入

    九:hadoop分发到其他终端

    拷贝hadoop11 /home/hadoop/apps目录内容到其他终端:

    scp -rq data hadoop12:/home/hadoop/

    拷贝hadoop11 /home/hadoop/data目录内容到其他终端:

    scp -rq data hadoop12:/home/hadoop/

     十:配置hadoop环境变量(所有节点均需配置)

    注意:

    1、如果使用root用户进行安装。 vi /etc/profile 即可 系统变量

    2、如果使用普通用户进行安装。 vi ~/.bashrc 用户变量

    vim .bashrc

    # User specific aliases and functions
    
    export HADOOP_HOME=/home/hadoop/apps/hadoop-2.8.5
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

    [hadoop@hadoop11 ~]$ source .bashrc

    查看hadoop版本:hadoop version

    [hadoop@hadoop11 ~]$ hadoop version
    Hadoop 2.8.5
    Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 0b8464d75227fcee2c6e7f2410377b3d53d3d5f8
    Compiled by jdu on 2018-09-10T03:32Z
    Compiled with protoc 2.5.0
    From source with checksum 9942ca5c745417c14e318835f420733
    This command was run using /home/hadoop/apps/hadoop-2.8.5/share/hadoop/common/hadoop-common-2.8.5.jar

    十一:hadoop初始化

    HDFS初始化只能在主节点上进行

    如果需要重新初始化,先删除相关目录文件,

    粗略做法是删除/home/hadoop/data/hadoop路径下

    [hadoop@hadoop14 hadoop]$ ll
    总用量 0
    drwx------. 3 hadoop hadoop 40 10月 31 18:14 data
    drwxrwxr-x. 3 hadoop hadoop 21 10月 31 18:07 name
    drwxrwxr-x. 3 hadoop hadoop 26 10月 31 18:18 tmp
    [hadoop@hadoop14 hadoop]$ pwd
    /home/hadoop/data/hadoop

    这几个目录的子目录;

    [hadoop@hadoop11 ~]$ hadoop namenode -format

    18/10/31 00:01:14 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1163964159-198.8.8.201-1540915274705
    18/10/31 00:01:14 INFO common.Storage: Storage directory /home/hadoop/data/hadoop/name has been successfully formatted.
    18/10/31 00:01:14 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/data/hadoop/name/current/fsimage.ckpt_0000000000000000000 using no compression
    18/10/31 00:01:15 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/data/hadoop/name/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds.
    18/10/31 00:01:15 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    18/10/31 00:01:15 INFO util.ExitUtil: Exiting with status 0
    18/10/31 00:01:15 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at hadoop11/198.8.8.201
    ************************************************************/

    初始化后会自动在/home/hadoop/data/hadoop/name路径

    建立current目录

    生成:

    -rw-rw-r-- 1 hadoop hadoop 323 10月 31 18:07 fsimage_0000000000000000000
    -rw-rw-r-- 1 hadoop hadoop  62 10月 31 18:07 fsimage_0000000000000000000.md5
    -rw-rw-r-- 1 hadoop hadoop   2 10月 31 18:07 seen_txid
    -rw-rw-r-- 1 hadoop hadoop 213 10月 31 18:07 VERSION

    十二:启动HDFS:

    可以在集群中任何节点执行:

    [hadoop@hadoop12 ~]$ start-dfs.sh

    报错

    [hadoop@hadoop12 ~]$ start-dfs.sh
    WARNING: An illegal reflective access operation has occurred
    WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/hadoop/apps/hadoop-2.8.5/share/hadoop/common/lib/hadoop-auth-2.8.5.jar) to method sun.security.krb5.Config.getInstance()
    WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
    WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    WARNING: All illegal access operations will be denied in a future release
    Starting namenodes on [hadoop11.zm.com]
    The authenticity of host 'hadoop11.zm.com (198.8.8.201)' can't be established.
    ECDSA key fingerprint is SHA256:RlH2tSlCWblQ5jZ6DMuCt4+yEmYuWo5MZuqUTGesM9I.
    ECDSA key fingerprint is MD5:de:e9:61:07:c7:14:35:0c:e6:8a:70:0e:93:5f:2b:8d.
    Are you sure you want to continue connecting (yes/no)? no

    关闭HDFS:

    stop-dfs.sh 

    降低java版本到jdk1.8.0_191

     如步骤6-3修改集群所有节点配置文件:

      cd /home/hadoop/apps/hadoop-2.8.5/etc/hadoop

      配置Hadoop JDK路径修改hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中的JDK路径:

     #export JAVA_HOME='/usr/local/jdk-11.0.1'

     export JAVA_HOME='/usr/local/jdk1.8.0_191'

    再次启动HDFS(任何节点均可执行):

    start-dfs.sh

    [hadoop@hadoop11 sbin]$ start-dfs.sh 
    Starting namenodes on [hadoop11.zm.com]
    hadoop11.zm.com: starting namenode, logging to /home/hadoop/apps/hadoop-2.8.5/logs/hadoop-hadoop-namenode-hadoop11.out
    hadoop11: starting datanode, logging to /home/hadoop/apps/hadoop-2.8.5/logs/hadoop-hadoop-datanode-hadoop11.out
    hadoop13: starting datanode, logging to /home/hadoop/apps/hadoop-2.8.5/logs/hadoop-hadoop-datanode-hadoop13.out
    hadoop14: starting datanode, logging to /home/hadoop/apps/hadoop-2.8.5/logs/hadoop-hadoop-datanode-hadoop14.out
    hadoop12: starting datanode, logging to /home/hadoop/apps/hadoop-2.8.5/logs/hadoop-hadoop-datanode-hadoop12.out
    Starting secondary namenodes [hadoop13.zm.com]
    hadoop13.zm.com: starting secondarynamenode, logging to /home/hadoop/apps/hadoop-2.8.5/logs/hadoop-hadoop-secondarynamenode-hadoop13.out

     在其他节点启动时会报询问信息

    The authenticity of host 'hadoop13.zm.com (198.8.8.203)' can't be established.
    ECDSA key fingerprint is SHA256:RlH2tSlCWblQ5jZ6DMuCt4+yEmYuWo5MZuqUTGesM9I.
    ECDSA key fingerprint is MD5:de:e9:61:07:c7:14:35:0c:e6:8a:70:0e:93:5f:2b:8d.
    Are you sure you want to continue connecting (yes/no)? 

    解决办法(所有节点执行):

    进入root用户

    su

    vim /etc/ssh/ssh_config

    加入如下两句

    #   StrictHostKeyChecking ask
    StrictHostKeyChecking no
    UserKnownHostsFile /dev/null

     start-dfs.sh执行后:

    在/home/hadoop/data/hadoop/data中自动生成

    [hadoop@hadoop11 data]$ ll
    总用量 4
    drwxrwxr-x 3 hadoop hadoop 66 10月 31 18:14 current
    -rw-rw-r-- 1 hadoop hadoop 13 10月 31 18:14 in_use.lock

    在/home/hadoop/data/hadoop/name中自动生成

    [hadoop@hadoop11 name]$ ll
    总用量 4
    drwxrwxr-x 2 hadoop hadoop 156 10月 31 18:13 current
    -rw-rw-r-- 1 hadoop hadoop  13 10月 31 18:13 in_use.lock

    十三:启动YARN(只能在ResourceManager主节点执行):

    [hadoop@hadoop14 ~]$ start-yarn.sh

    [hadoop@hadoop14 ~]$ start-yarn.sh
    starting yarn daemons
    starting resourcemanager, logging to /home/hadoop/apps/hadoop-2.8.5/logs/yarn-hadoop-resourcemanager-hadoop11.out
    hadoop12: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.8.5/logs/yarn-hadoop-nodemanager-hadoop12.out
    hadoop13: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.8.5/logs/yarn-hadoop-nodemanager-hadoop13.out
    hadoop14: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.8.5/logs/yarn-hadoop-nodemanager-hadoop14.out
    hadoop11: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.8.5/logs/yarn-hadoop-nodemanager-hadoop11.out

     start-yarn.sh执行后:

    会在/home/hadoop/data/hadoop/tmp中自动生成

    [hadoop@hadoop14 tmp]$ ll
    总用量 0
    drwxr-xr-x 5 hadoop hadoop 57 10月 31 18:18 nm-local-dir
    [hadoop@hadoop14 nm-local-dir]$ ll
    总用量 0
    drwxr-xr-x 2 hadoop hadoop 6 10月 31 18:18 filecache
    drwx------ 2 hadoop hadoop 6 10月 31 18:18 nmPrivate
    drwxr-xr-x 2 hadoop hadoop 6 10月 31 18:18 usercache

     此配置启动后,ResourceManager没有启动,无法访问:http://198.8.8.204:8088/

    需要在所有节点添加ResourceManager配置:

    vim vim yarn-site.xml

      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop14</value>
        </property>
    
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>${yarn.resourcemanager.hostname}:8032</value>
    #ResourceManager 对客户端暴露的地址。客户端通过该地址向RM提交应用程序,杀死应用程序等。
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>${yarn.resourcemanager.hostname}:8030</value>
    #ResourceManager 对ApplicationMaster暴露的访问地址。ApplicationMaster通过该地址向RM申请资源、释放资源等。
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>${yarn.resourcemanager.hostname}:8031</value>
    #ResourceManager 对NodeManager暴露的地址。NodeManager通过该地址向RM汇报心跳,领取任务等。
        </property>
        <property>
            <name>yarn.resourcemanager.admin.address</name>
            <value>${yarn.resourcemanager.hostname}:8033</value>
    #ResourceManager 对管理员暴露的访问地址。管理员通过该地址向RM发送管理命令等。
        </property>
    
        <property>
            <name>yarn.resourcemanager.webapp.address</name>
    #        <value>198.8.8.204:8088</value>
            <value>${yarn.resourcemanager.hostname}:8088</value>
        </property>
        <property>
            <name>yarn.log-aggregation-enable</name>
            <value>true</value>
        </property>
        <property>
            <name>yarn.log-aggregation.retain-seconds</name>
            <value>106800</value>
        </property>
    
    </configuration>

    保存配置,

    在ResourceManager主节点hadoop14重启yarn:

    stop-yarn.sh

    start-yarn.sh

    [hadoop@hadoop14 hadoop]$ start-yarn.sh
    starting yarn daemons
    starting resourcemanager, logging to /home/hadoop/apps/hadoop-2.8.5/logs/yarn-hadoop-resourcemanager-hadoop14.out
    hadoop11: Warning: Permanently added 'hadoop11,198.8.8.201' (ECDSA) to the list of known hosts.
    hadoop12: Warning: Permanently added 'hadoop12,198.8.8.202' (ECDSA) to the list of known hosts.
    hadoop14: Warning: Permanently added 'hadoop14,198.8.8.204' (ECDSA) to the list of known hosts.
    hadoop13: Warning: Permanently added 'hadoop13,198.8.8.203' (ECDSA) to the list of known hosts.
    hadoop14: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.8.5/logs/yarn-hadoop-nodemanager-hadoop14.out
    hadoop11: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.8.5/logs/yarn-hadoop-nodemanager-hadoop11.out
    hadoop13: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.8.5/logs/yarn-hadoop-nodemanager-hadoop13.out
    hadoop12: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.8.5/logs/yarn-hadoop-nodemanager-hadoop12.out
    [hadoop@hadoop14 hadoop]$ jps
    2659 ResourceManager
    1511 DataNode
    2791 NodeManager
    3180 Jps

    可以访问:http://198.8.8.204:8088/

    十四:查看所有节点进程

     jps

    [hadoop@hadoop11 nm-local-dir]$ jps
    3201 NodeManager
    2934 DataNode
    3351 Jps
    2824 NameNode
    [hadoop@hadoop12 nm-local-dir]$ jps
    2358 NodeManager
    2216 DataNode
    2490 Jps
    [hadoop@hadoop13 nm-local-dir]$ jps
    2642 NodeManager
    2410 DataNode
    2778 Jps
    2524 SecondaryNameNode
    [hadoop@hadoop14 nm-local-dir]$ jps
    2577 DataNode
    2853 NodeManager
    2985 Jps

    十五:HDFS和YARN的WEB管理界面:

     http://198.8.8.201:50070

    十六:hadoop简单应用:

    创建目录

    [hadoop@hadoop11 ~]$ hadoop fs -mkdir -p /test/input

    查看创建文件

    hadoop fs -ls /

    新建文件

    vim words.txt

    hello zhangsan
    hello lisi
    hello wangwu

    上传文件

    hadoop fs -put ~/words.txt /test/input

    查看是否上传成功

    hadoop fs -ls /test/input

    下载文件

    hadoop fs -get /test/input/words.txt /home/hadoop/

    查看是否下载成功

    cd /home/hadoop

    ll

    用wordcount测试:

     [hadoop@hadoop11 hadoop]$ hadoop jar /home/hadoop/apps/hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /test/input /test/output

    18/11/02 15:46:58 INFO client.RMProxy: Connecting to ResourceManager at hadoop14/198.8.8.204:8032
    18/11/02 15:46:59 INFO input.FileInputFormat: Total input files to process : 1
    18/11/02 15:46:59 INFO mapreduce.JobSubmitter: number of splits:1
    18/11/02 15:47:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1541142769004_0001
    18/11/02 15:47:00 INFO impl.YarnClientImpl: Submitted application application_1541142769004_0001
    18/11/02 15:47:01 INFO mapreduce.Job: The url to track the job: http://hadoop14:8088/proxy/application_1541142769004_0001/
    18/11/02 15:47:01 INFO mapreduce.Job: Running job: job_1541142769004_0001
    18/11/02 15:47:10 INFO mapreduce.Job: Job job_1541142769004_0001 running in uber mode : false
    18/11/02 15:47:10 INFO mapreduce.Job:  map 0% reduce 0%
    18/11/02 15:47:19 INFO mapreduce.Job:  map 100% reduce 0%
    18/11/02 15:47:27 INFO mapreduce.Job:  map 100% reduce 100%
    18/11/02 15:47:27 INFO mapreduce.Job: Job job_1541142769004_0001 completed successfully
    18/11/02 15:47:27 INFO mapreduce.Job: Counters: 49
        File System Counters
            FILE: Number of bytes read=57
            FILE: Number of bytes written=315855
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=152
            HDFS: Number of bytes written=35
            HDFS: Number of read operations=6
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Launched reduce tasks=1
            Data-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=5816
            Total time spent by all reduces in occupied slots (ms)=4662
            Total time spent by all map tasks (ms)=5816
            Total time spent by all reduce tasks (ms)=4662
            Total vcore-milliseconds taken by all map tasks=5816
            Total vcore-milliseconds taken by all reduce tasks=4662
            Total megabyte-milliseconds taken by all map tasks=5955584
            Total megabyte-milliseconds taken by all reduce tasks=4773888
        Map-Reduce Framework
            Map input records=3
            Map output records=6
            Map output bytes=63
            Map output materialized bytes=57
            Input split bytes=113
            Combine input records=6
            Combine output records=4
            Reduce input groups=4
            Reduce shuffle bytes=57
            Reduce input records=4
            Reduce output records=4
            Spilled Records=8
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=449
            CPU time spent (ms)=2730
            Physical memory (bytes) snapshot=445227008
            Virtual memory (bytes) snapshot=4208754688
            Total committed heap usage (bytes)=278396928
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=39
        File Output Format Counters 
            Bytes Written=35

    查看输出文件:

    [hadoop@hadoop11 hadoop]$ hadoop fs -ls /test/output

    [hadoop@hadoop11 hadoop]$ hadoop fs -ls /test/output
    Found 2 items
    -rw-r--r--   2 hadoop supergroup          0 2018-11-02 15:47 /test/output/_SUCCESS
    -rw-r--r--   2 hadoop supergroup         35 2018-11-02 15:47 /test/output/part-r-00000
    [hadoop@hadoop11 hadoop]$ hadoop fs -cat /test/output/part-r-00000
    hello    3
    lisi    1
    wangwu    1
    zhangsan    1

    参考文档:

    https://www.cnblogs.com/qingyunzong/p/8496127.html

    https://blog.csdn.net/hliq5399/article/details/78193113

  • 相关阅读:
    poj-1017 Packets (贪心)
    poj-2393 Yogurt factory (贪心)
    POJ -3190 Stall Reservations (贪心+优先队列)
    hihoCoder 1082然而沼跃鱼早就看穿了一切 (字符串处理)
    kafka:一个分布式消息系统
    Kafka+Storm+HDFS整合实践
    使用Storm实现实时大数据分析
    Kafka使用入门教程 简单介绍
    Zookeeper 的学习与运用
    Kafka 分布式消息队列介绍
  • 原文地址:https://www.cnblogs.com/jackyzm/p/9875645.html
Copyright © 2020-2023  润新知