• 龙芯平台下安装配置hadoop


     

    一、       hadoop简介

    Hadoop是一个由Apache基金会所开发的分布式系统基础架构。

    用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。

    Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。

    Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。

    二、       软件介质及安装环境

    软件版本:hadoop-2.6.5.tar.gz

    下载地址:

    http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.5/hadoop-2.6.5.tar.gz

    安装环境:

    Master节点服务器 Sugon L620-G15(192.168.32.182)

    Slave节点服务器 长城3A单路(192.168.32.153 )

    以下分别简称为master服务器和slave服务器

    操作系统:iSoft Server OS 5.0 beta3 for mips

    三、       安装配置

    master服务器、slave服务器分别进行如下操作:

    关闭防火墙

    [hadoop@master hadoop]$ sudo service iptables stop
    [sudo] password for hadoop: 
    iptables:清除防火墙规则:                                 [确定]
    iptables:将链设置为政策 ACCEPTfilter                    [确定]
    iptables:正在卸载模块:                                   [确定]
    [hadoop@master hadoop]$ sudo chkconfig iptables off

    修改hostname分别为master和slave

    Master服务器

    [root@isoft182 ~]# cat /etc/sysconfig/network
    NETWORKING=yes
    HOSTNAME=master
    GATEWAY=192.168.32.1

    Slave服务器

     [root@test-153 ~]# cat /etc/sysconfig/network
    NETWORKING=yes
    HOSTNAME=slave
    GATEWAY=192.168.32.1

    分别修改hosts文件添加内容如下:

    192.168.32.182 master
    192.168.32.153 slave

    新建hadoop用户

    进入hadoop

    设为无密码登录slave服务器

    [hadoop@isoft182 ~]$ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /home/hadoop/.ssh/id_rsa.
    Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
    The key fingerprint is:
    9b:e0:7e:bb:b4:7c:9b:66:e7:1f:5f:49:30:22:be:d0 hadoop@isoft182
    The key's randomart image is:
    +--[ RSA 2048]----+
    |                 |
    |                 |
    |         . . o   |
    |        o . . o  |
    |      ..SE     . |
    |     . ..o.   . .|
    |      . +.    ...|
    |     . o..+..  o.|
    |      ..=*o+... .|
     
    [hadoop@isoft182 ~]$ ssh-copy-id slave
    hadoop@slave's password: 
    Now try logging into the machine, with "ssh 'slave'", and check in:
     
      .ssh/authorized_keys
     
    to make sure we haven't added extra keys that you weren't expecting.

    配置java环境

    ~/.bashrc文件最后加入

    export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk

    然后

    source ~./bashrc

    是设置生效。

    确认java版本信息

    [hadoop@isoft182 ~]$ java -version
    openjdk version "1.8.0_25"
    OpenJDK Runtime Environment (build 1.8.0_25-rc19-b17)
    OpenJDK 64-Bit Server VM (build 25.25-b02, mixed mode)

    安装hadoop

    sudo tar -zxf hadoop-2.6.5.tar.gz -C /usr/local    # 解压到/usr/local
    cd /usr/local/
    sudo mv ./hadoop-2.6.5/ ./hadoop            # 将文件夹名改为hadoop
    sudo chown -R hadoop:hadoop ./hadoop        # 修改文件权限

    检查版本信息

     [hadoop@isoft182 local]$ ./hadoop/bin/hadoop version
    Hadoop 2.6.5
    Subversion https://github.com/apache/hadoop.git -r e8c9fe0b4c252caf2ebf1464220599650f119997
    Compiled by sjlee on 2016-10-02T23:43Z
    Compiled with protoc 2.5.0
    From source with checksum f05c9fa095a395faa9db9f7ba5d754
    This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.6.5.jar

    配置hadoop环境变量

    ~/.bashrc文件下添加

    export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin

    source .bashrc使其生效。

    修改hadoop配置文件

    进入目录/usr/local/hadoop/etc/hadoop

    修改slaves文件

    [hadoop@master hadoop]$ cat slaves
    slave

    修改core-site.xml

    <configuration>
            <property>
                    <name>fs.defaultFS</name>
                    <value>hdfs://master:9000</value>
            </property>
            <property>
                    <name>hadoop.tmp.dir</name>
                    <value>file:/usr/local/hadoop/tmp</value>
                    <description>Abase for other temporary directories.</description>
            </property>
    </configuration>

    修改hdfs-site.xml

    [hadoop@master hadoop]$ cat hdfs-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
     
        http://www.apache.org/licenses/LICENSE-2.0
     
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
     
    <!-- Put site-specific property overrides in this file. -->
     
    <configuration>
       <property>
               <name>dfs.namenode.secondary.http-address</name>
               <value>master:50090</value>
       </property>
       <property>
                <name>dfs.replication</name>
                <value>1</value>
       </property>
       <property>
              <name>dfs.namenode.name.dir</name>
              <value>file:/usr/local/hadoop/tmp/dfs/name</value>
       </property>
       <property>
               <name>dfs.datanode.data.dir</name>
               <value>file:/usr/local/hadoop/tmp/dfs/data</value>
        </property>
    </configuration>

    将mapred-site.xml.template修改mapred-site.xml,内容如下:

    [hadoop@master hadoop]$ cat mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
     
        http://www.apache.org/licenses/LICENSE-2.0
     
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
     
    <!-- Put site-specific property overrides in this file. -->
     
    <configuration>
    <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
     </property>
      <property>
                  <name>mapreduce.jobhistory.address</name>
                  <value>master:10020</value>
      </property>
      <property>
                  <name>mapreduce.jobhistory.webapp.address</name>
                  <value>master:19888</value>
       </property>
    </configuration>

    修改文件 yarn-site.xml

    [hadoop@master hadoop]$ cat yarn-site.xml
    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
     
        http://www.apache.org/licenses/LICENSE-2.0
     
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
                <property>
                            <name>yarn.resourcemanager.hostname</name>
                            <value>master</value>
                </property>
                <property>
                            <name>yarn.nodemanager.aux-services</name>
                            <value>mapreduce_shuffle</value>
                </property>
    <!-- Site specific YARN configuration properties -->
     
    </configuration>

    将master服务器的/usr/local/hadoop文件复制到slave服务器相应目录下

    并执行

    sudo chown -R hadoop:hadoop hadoop/

    master服务器初始化

     Master 节点执行 NameNode 的格式化:

    [hadoop@master ~]$ hdfs namenode -format

    启动hadoop

    start-dfs.sh
    start-yarn.sh
    mr-jobhistory-daemon.sh start historyserver

    通过命令 jps 可以查看各个节点所启动的进程。在 Master 节点上可以看到 NameNode、ResourceManager、SecondrryNameNode、JobHistoryServer 进程,如下图所示:

    [hadoop@master ~]$ jps
    5176 NameNode
    5883 Jps
    5836 JobHistoryServer
    5548 ResourceManager
    5341 SecondaryNameNode

    在 Slave 节点可以看到 DataNode 和 NodeManager 进程,如下图所示:

    [hadoop@slave ~]$ jps
    5302 Jps
    5179 NodeManager
    5070 DataNode

    另外还需要在 Master 节点上通过命令 hdfs dfsadmin -report 查看 DataNode 是否正常启动,如果 Live datanodes 不为 0 ,则说明集群启动成功。

    [hadoop@master ~]$  hdfs dfsadmin -report
    17/09/01 14:06:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Configured Capacity: 51471126528 (47.94 GB)
    Present Capacity: 43123650560 (40.16 GB)
    DFS Remaining: 43123625984 (40.16 GB)
    DFS Used: 24576 (24 KB)
    DFS Used%: 0.00%
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
     
    -------------------------------------------------
    Live datanodes (1):
     
    Name: 192.168.32.153:50010 (slave)
    Hostname: slave
    Decommission Status : Normal
    Configured Capacity: 51471126528 (47.94 GB)
    DFS Used: 24576 (24 KB)
    Non DFS Used: 8347475968 (7.77 GB)
    DFS Remaining: 43123625984 (40.16 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 83.78%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Fri Sep 01 14:07:06 CST 2017

    四、       执行分布式实例

    执行WordCount实例

    进入/usr/local/hadoop目录

    创建文件夹data_input,并新建两个文本文件,内容任意

    mkdir data_input
    touch file1.txt 
    touch file2.txt

    执行以下命令

    ./bin/hadoop fs -mkdir /data
    ./bin/hadoop fs -put -f ./data_input/* /data

    执行WordCount命令,并查看结果:

    ./bin/hadoop jar ./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2..6.5-sources.jar org.apache.hadoop.examples.WordCount /data /output
    17/09/02 12:06:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/09/02 12:06:08 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.32.182:8032
    17/09/02 12:06:13 INFO input.FileInputFormat: Total input paths to process : 2
    17/09/02 12:06:13 INFO mapreduce.JobSubmitter: number of splits:2
    17/09/02 12:06:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1504324771910_0002
    17/09/02 12:06:18 INFO impl.YarnClientImpl: Submitted application application_1504324771910_0002
    17/09/02 12:06:18 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1504324771910_0002/
    17/09/02 12:06:18 INFO mapreduce.Job: Running job: job_1504324771910_0002
    17/09/02 12:07:09 INFO mapreduce.Job: Job job_1504324771910_0002 running in uber mode : false
    17/09/02 12:07:09 INFO mapreduce.Job:  map 0% reduce 0%
    17/09/02 12:07:44 INFO mapreduce.Job:  map 100% reduce 0%
    17/09/02 12:08:15 INFO mapreduce.Job:  map 100% reduce 100%
    17/09/02 12:08:16 INFO mapreduce.Job: Job job_1504324771910_0002 completed successfully
    17/09/02 12:08:18 INFO mapreduce.Job: Counters: 49
                File System Counters
                            FILE: Number of bytes read=122
                            FILE: Number of bytes written=322095
                            FILE: Number of read operations=0
                            FILE: Number of large read operations=0
                            FILE: Number of write operations=0
                            HDFS: Number of bytes read=278
                            HDFS: Number of bytes written=81
                            HDFS: Number of read operations=9
                            HDFS: Number of large read operations=0
                            HDFS: Number of write operations=2
                Job Counters 
                            Launched map tasks=2
                            Launched reduce tasks=1
                            Data-local map tasks=2
                            Total time spent by all maps in occupied slots (ms)=63108
                            Total time spent by all reduces in occupied slots (ms)=27632
                            Total time spent by all map tasks (ms)=63108
                            Total time spent by all reduce tasks (ms)=27632
                            Total vcore-milliseconds taken by all map tasks=63108
                            Total vcore-milliseconds taken by all reduce tasks=27632
                            Total megabyte-milliseconds taken by all map tasks=64622592
                            Total megabyte-milliseconds taken by all reduce tasks=28295168
                Map-Reduce Framework
                            Map input records=10
                            Map output records=8
                            Map output bytes=112
                            Map output materialized bytes=128
                            Input split bytes=196
                            Combine input records=8
                            Combine output records=7
                            Reduce input groups=6
                            Reduce shuffle bytes=128
                            Reduce input records=7
                            Reduce output records=6
                            Spilled Records=14
                            Shuffled Maps =2
                            Failed Shuffles=0
                            Merged Map outputs=2
                            GC time elapsed (ms)=1846
                            CPU time spent (ms)=20900
                            Physical memory (bytes) snapshot=699613184
                            Virtual memory (bytes) snapshot=5563170816
                            Total committed heap usage (bytes)=603979776
                Shuffle Errors
                            BAD_ID=0
                            CONNECTION=0
                            IO_ERROR=0
                            WRONG_LENGTH=0
                            WRONG_MAP=0
                            WRONG_REDUCE=0
                File Input Format Counters 
                            Bytes Read=82
                File Output Format Counters 
                            Bytes Written=81

    上面的日志显示出了wordCount的详细情况,然后执行查看结果命令查看统计结果:

     

    [hadoop@master hadoop]$ ./bin/hadoop fs -cat /output/part-r-00000
    17/09/02 12:11:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    --file1.txt 1
    --file2.txt 1
    dddd        2
    ddddd       2
    dddeeeeeeeeeeeee        1
    dddkkkkkkkkkkkkk        1

    wordCount统计结果完成。

    关闭 Hadoop 集群也是在 Master 节点上执行的:

    stop-yarn.sh
    stop-dfs.sh
    mr-jobhistory-daemon.sh stop historyserver

    五、       问题与解决方法

    1)         启动hadoop服务时会提示:17/09/01 14:42:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

    解决办法:原来系统预装的glibc库是2.12版本,而hadoop期望是2.14版本,所以打印警告信息。

    方法一:重新编译glibc.2.14版本,安装后专门给hadoop使用。

    方法二:直接在log4j日志中去除告警信息。在etc/hadoop/log4j.properties文件中添加

                log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR

             第一种方法风险比较大,本文档使用第二种方法。

    2)         Master 节点执行 NameNode 的格式化后,在slave节点无法启动DataNode服务

    解决方法:这是由于格式化后,会导致master节点的tmp/dfs/name/current/VERSION文件中的clusterID与slave节点下的tmp/dfs/data/current/VERSION文件中的clusterID不一致,所以DataNode无法启动。将slave下的clusterID替换为master下的clusterID即可解决问题。

    3)         执行WordCount实例时,有超时现象,remote无法到达,而且使用hdfs dfsadmin –report命令,可以看到datanodes的hostname为一个ip地址,172.16.0.1

    解决方法:这是由于之前有同事在slave节点服务器上安装了docker服务,使用过网桥设备,使得系统中多了一个无用的网卡设备,关闭docker服务重启主机即可。

  • 相关阅读:
    几种函数调用方式
    MOSS 2010:Visual Studio 2010开发体验(11)——扩展SharePoint Explorer
    MOSS 2010:Visual Studio 2010开发体验(14)——列表开发之事件接收器
    MOSS 开发之服务帐号过期的问题
    关于工作流(Workflow Foundation)的一些总结归纳
    Infopath 2010的变化
    MOSS 2010:Visual Studio 2010开发体验(13)——列表开发之列表实例
    MTOM效率测试
    MTOM以及在WCF中的应用
    在Outlook中设置农历生日周期性事件
  • 原文地址:https://www.cnblogs.com/aijingyi/p/7472398.html
Copyright © 2020-2023  润新知