• 安装hadoop1.2.1集群环境





    一、规划
    (一)硬件资源
    10.171.29.191 master
    10.173.54.84  slave1
    10.171.114.223 slave2

    (二)基本资料
    用户:  jediael
    目录:/opt/jediael/

    二、环境配置
    (一)统一用户名密码,并为jediael赋予执行所有命令的权限
    #passwd
    # useradd jediael
    # passwd jediael
    # vi /etc/sudoers
    增加以下一行:
    jediael ALL=(ALL) ALL
    (二)创建目录/opt/jediael
    $sudo chown jediael:jediael /opt
    $ cd /opt
    $ sudo mkdir jediael
    注意:/opt必须是jediael的,否则会在format namenode时出错。

    (三)修改用户名及/etc/hosts文件
    1、修改/etc/sysconfig/network
    NETWORKING=yes
    HOSTNAME=*******
    2、修改/etc/hosts
    10.171.29.191 master
    10.173.54.84  slave1
    10.171.114.223 slave2
    注 意hosts文件不能有127.0.0.1  *****配置,否则会导致出现异常。org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.171.29.191:9000. Already trie
    3、hostname命令
    hostname ****

    (四)配置免密码登录
    以上命令在master上使用jediael用户执行:
    $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    然后,将authorized_keys复制到slave1,slave2
    scp ~/.ssh/authorized_keys slave1:~/.ssh/
    scp ~/.ssh/authorized_keys slave2:~/.ssh/
    注意
    (1)若提示.ssh目录不存在,则表示此机器从未运行过ssh,因此运行一次即可创建.ssh目录。
    (2).ssh/的权限为600,authorized_keys的权限为700,权限大了小了都不行。


    (五)在3台机器上分别安装java,并设置相关环境变量
    参考http://blog.csdn.net/jediael_lu/article/details/38925871

    (六)下载hadoop-1.2.1.tar.gz,并将其解压到/opt/jediael

    三、修改配置文件
    【3台机器上均要执行】
    (一)修改conf/hadoop_env.sh
    export JAVA_HOME=/usr/java/jdk1.7.0_51
    (二)修改core-site.xml
    <property>
     <name>fs.default.name</name>
     <value>hdfs://master:9000</value>
    </property>
     
    <property>
     <name>hadoop.tmp.dir</name>
     <value>/opt/tmphadoop</value>
    </property> 
    

    (三)修改hdfs-site.xml
    <property>
     <name>dfs.replication</name>
     <value>2</value>
    </property>

    (四)修改mapred-site.xml
    <property>
     <name>mapred.job.tracker</name>
     <value>master:9001</value>
    </property>

    (五)修改master及slaves
    master:
    master
    
    slaves:
    slave1
    slave2

    可以在master中完成上述配置,然后使用scp命令复制到slave1与slave2上。
     如:
    $scp core-site.xml slave2:/opt/jediael/hadoop-1.2.1/conf

    四、启动并验证


    1、格式 化namenode【此步骤在3台机器上均要运行】
    [jediael@master hadoop-1.2.1]$  bin/hadoop namenode -format
    15/01/21 15:13:40 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = master/10.171.29.191
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 1.2.1
    STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
    STARTUP_MSG:   java = 1.7.0_51
    ************************************************************/
    Re-format filesystem in /opt/tmphadoop/dfs/name ? (Y or N) Y
    15/01/21 15:13:43 INFO util.GSet: Computing capacity for map BlocksMap
    15/01/21 15:13:43 INFO util.GSet: VM type       = 64-bit
    15/01/21 15:13:43 INFO util.GSet: 2.0% max memory = 1013645312
    15/01/21 15:13:43 INFO util.GSet: capacity      = 2^21 = 2097152 entries
    15/01/21 15:13:43 INFO util.GSet: recommended=2097152, actual=2097152
    15/01/21 15:13:43 INFO namenode.FSNamesystem: fsOwner=jediael
    15/01/21 15:13:43 INFO namenode.FSNamesystem: supergroup=supergroup
    15/01/21 15:13:43 INFO namenode.FSNamesystem: isPermissionEnabled=true
    15/01/21 15:13:43 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
    15/01/21 15:13:43 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    15/01/21 15:13:43 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
    15/01/21 15:13:43 INFO namenode.NameNode: Caching file names occuring more than 10 times
    15/01/21 15:13:44 INFO common.Storage: Image file /opt/tmphadoop/dfs/name/current/fsimage of size 113 bytes saved in 0 seconds.
    15/01/21 15:13:44 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/opt/tmphadoop/dfs/name/current/edits
    15/01/21 15:13:44 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/opt/tmphadoop/dfs/name/current/edits
    15/01/21 15:13:44 INFO common.Storage: Storage directory /opt/tmphadoop/dfs/name has been successfully formatted.
    15/01/21 15:13:44 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at master/10.171.29.191
    ************************************************************/


    2、启动hadoop【此步骤只需要在master上执行】
    [jediael@master hadoop-1.2.1]$ bin/start-all.sh 
    starting namenode, logging to /opt/jediael/hadoop-1.2.1/libexec/../logs/hadoop-jediael-namenode-master.out
    slave1: starting datanode, logging to /opt/jediael/hadoop-1.2.1/libexec/../logs/hadoop-jediael-datanode-slave1.out
    slave2: starting datanode, logging to /opt/jediael/hadoop-1.2.1/libexec/../logs/hadoop-jediael-datanode-slave2.out
    master: starting secondarynamenode, logging to /opt/jediael/hadoop-1.2.1/libexec/../logs/hadoop-jediael-secondarynamenode-master.out
    starting jobtracker, logging to /opt/jediael/hadoop-1.2.1/libexec/../logs/hadoop-jediael-jobtracker-master.out
    slave1: starting tasktracker, logging to /opt/jediael/hadoop-1.2.1/libexec/../logs/hadoop-jediael-tasktracker-slave1.out
    slave2: starting tasktracker, logging to /opt/jediael/hadoop-1.2.1/libexec/../logs/hadoop-jediael-tasktracker-slave2.out

    3、登录页面验证
    NameNode    http://ip:50070  
    JobTracker     http://ip50030



    4、查看各个主机的java进程
    (1)master:
    $ jps
    17963 NameNode
    18280 JobTracker
    18446 Jps
    18171 SecondaryNameNode
    (2)slave1:
    $ jps
    16019 Jps
    15858 DataNode
    15954 TaskTracker
    (3)slave2:
    $ jps
    15625 Jps
    15465 DataNode
    15561 TaskTracker

    五、运行一个完整的mapreduce程序。

    以下内容均只是master上执行
    1、将wordcount.jar包复制至服务器上
    程序见http://blog.csdn.net/jediael_lu/article/details/37596469

    2、创建输入目录,并将相关文件复制至目录
    [jediael@master166 ~]$ hadoop fs -mkdir /wcin
    [jediael@master166 projects]$ hadoop fs -copyFromLocal /opt/jediael/hadoop-1.2.1/conf/hdfs-site.xml /wcin 
    

    3、运行程序
    [jediael@master166 projects]$ hadoop jar wordcount.jar org.jediael.hadoopdemo.wordcount.WordCount /wcin /wcout
    14/08/31 20:04:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    14/08/31 20:04:26 INFO input.FileInputFormat: Total input paths to process : 1
    14/08/31 20:04:26 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    14/08/31 20:04:26 WARN snappy.LoadSnappy: Snappy native library not loaded
    14/08/31 20:04:26 INFO mapred.JobClient: Running job: job_201408311554_0003
    14/08/31 20:04:27 INFO mapred.JobClient: map 0% reduce 0%
    14/08/31 20:04:31 INFO mapred.JobClient: map 100% reduce 0%
    14/08/31 20:04:40 INFO mapred.JobClient: map 100% reduce 100%
    14/08/31 20:04:40 INFO mapred.JobClient: Job complete: job_201408311554_0003
    14/08/31 20:04:40 INFO mapred.JobClient: Counters: 29
    14/08/31 20:04:40 INFO mapred.JobClient: Job Counters
    14/08/31 20:04:40 INFO mapred.JobClient: Launched reduce tasks=1
    14/08/31 20:04:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4230
    14/08/31 20:04:40 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
    14/08/31 20:04:40 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
    14/08/31 20:04:40 INFO mapred.JobClient: Launched map tasks=1
    14/08/31 20:04:40 INFO mapred.JobClient: Data-local map tasks=1
    14/08/31 20:04:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8531
    14/08/31 20:04:40 INFO mapred.JobClient: File Output Format Counters
    14/08/31 20:04:40 INFO mapred.JobClient: Bytes Written=284
    14/08/31 20:04:40 INFO mapred.JobClient: FileSystemCounters
    14/08/31 20:04:40 INFO mapred.JobClient: FILE_BYTES_READ=370
    14/08/31 20:04:40 INFO mapred.JobClient: HDFS_BYTES_READ=357
    14/08/31 20:04:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=104958
    14/08/31 20:04:40 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=284
    14/08/31 20:04:40 INFO mapred.JobClient: File Input Format Counters
    14/08/31 20:04:40 INFO mapred.JobClient: Bytes Read=252
    14/08/31 20:04:40 INFO mapred.JobClient: Map-Reduce Framework
    14/08/31 20:04:40 INFO mapred.JobClient: Map output materialized bytes=370
    14/08/31 20:04:40 INFO mapred.JobClient: Map input records=11
    14/08/31 20:04:40 INFO mapred.JobClient: Reduce shuffle bytes=370
    14/08/31 20:04:40 INFO mapred.JobClient: Spilled Records=40
    14/08/31 20:04:40 INFO mapred.JobClient: Map output bytes=324
    14/08/31 20:04:40 INFO mapred.JobClient: Total committed heap usage (bytes)=238026752
    14/08/31 20:04:40 INFO mapred.JobClient: CPU time spent (ms)=1130
    14/08/31 20:04:40 INFO mapred.JobClient: Combine input records=0
    14/08/31 20:04:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=105
    14/08/31 20:04:40 INFO mapred.JobClient: Reduce input records=20
    14/08/31 20:04:40 INFO mapred.JobClient: Reduce input groups=20
    14/08/31 20:04:40 INFO mapred.JobClient: Combine output records=0
    14/08/31 20:04:40 INFO mapred.JobClient: Physical memory (bytes) snapshot=289288192
    14/08/31 20:04:40 INFO mapred.JobClient: Reduce output records=20
    14/08/31 20:04:40 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1533636608
    14/08/31 20:04:40 INFO mapred.JobClient: Map output records=20

    4、查看结果
    [jediael@master166 projects]$ hadoop fs -cat /wcout/* 
    --> 1
    <!-- 1
    </configuration> 1
    </property> 1
    <?xml 1
    <?xml-stylesheet 1
    <configuration> 1
    <name>dfs.replication</name> 1
    <property> 1
    <value>2</value> 1
    Put 1
    file. 1
    href="configuration.xsl"?> 1
    in 1
    overrides 1
    property 1
    site-specific 1
    this 1
    type="text/xsl" 1
    version="1.0"?> 1
    cat: File does not exist: /wcout/_logs
  • 相关阅读:
    081、Weave Scope 多主机监控(2019-04-29 周一)
    080、Weave Scope 容器地图(2019-04-28 周日)
    079、监控利器 sysdig (2019-04-26 周五)
    078、Docker 最常用的监控方案(2019-04-25 周四)
    077、跨主机使用Rex-Ray volume (2019-04-24 周三)
    076、创建Rex-Ray volume (2019-04-23 周二)
    075、配置Virtualbox backend(2019-04-22 周一)
    074、如何安装和配置Rex-Ray?(2019-04-19 周五)
    073、如何实现跨Docker Host 存储? (2019-04-18 周四)
    072、一文搞懂各种Docker网络 (2019-04-17 周三)
  • 原文地址:https://www.cnblogs.com/jediael/p/4304058.html
Copyright © 2020-2023  润新知