• Hadoop伪分布模式安装


    一、本文说明:

        本次测试在一台虚拟机系统上进行伪分布式搭建。Hadoop伪分布式模式是在单机上模拟Hadoop分布式,单机上的分布式并不是真正的伪分布式,而是使用线程模拟分布式。Hadoop本身是无法区分伪分布式和分布式的,两种配置也很相似。唯一不同的地方是伪分布式是在单机器上配置,数据节点和名字节点均是一个机器。

        环境说明:

          操作系统:red hat 5.4 x86

          hadoop版本:hadoop-0.20.2

          JDK版本:jdk1.7

    二、JDK安装及Java环境变量的配置

     ----首先把压缩包解压出来----
    1
    [root@localhost ~]# tar -zxvf jdk-7u9-linux-i586.tar.gz 2 ----修改目录名----
    3
    [root@localhost ~]# mv jdk1.7.0_09 /jdk1.7 4 ----在/etc/profile文件中添加下面几行----
    5
    [root@localhost ~]# vi /etc/profile 6 7 export JAVA_HOME=/jdk1.7 8 export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib 9 export PATH=$JAVA_HOME/bin:$PATH 10 ----验证是否已经成功安装jdk1.7----
    11 [root@localhost ~]# java -version 12 java version "1.7.0_09" 13 Java(TM) SE Runtime Environment (build 1.7.0_09-b05) 14 Java HotSpot(TM) Client VM (build 23.5-b02, mixed mode)

    三、SSH无密码验证设置

        Hadoop需要使用SSH协议,namemode将使用SSH协议启动namenode和datanode进程,伪分布式模式数据节点和名称节点均是本身,必须配置SSH localhost无密码验证。

     1 [root@localhost bin]# ssh-keygen -t rsa
     2 Generating public/private rsa key pair.
     3 Enter file in which to save the key (/root/.ssh/id_rsa): 
     4 /root/.ssh/id_rsa already exists.
     5 Overwrite (y/n)? y
     6 Enter passphrase (empty for no passphrase): 
     7 Enter same passphrase again: 
     8 Your identification has been saved in /root/.ssh/id_rsa.
     9 Your public key has been saved in /root/.ssh/id_rsa.pub.
    10 The key fingerprint is:
    11 2f:eb:6c:c5:c5:3b:0b:26:a4:7f:0f:7a:d7:3b:5e:e5 root@localhost.localdomain
    12 You have mail in /var/spool/mail/root
    13 [root@localhost bin]# cd 
    14 [root@localhost ~]# cd .ssh
    15 [root@localhost .ssh]# ls
    16 authorized_keys  id_rsa  id_rsa.pub  known_hosts
    17 [root@localhost .ssh]# cat id_rsa.pub > authorized_keys
    18 [root@localhost .ssh]# ssh 192.168.20.150
    19 Last login: Fri Apr 26 11:07:21 2013 from 192.168.20.103
    20 [root@localhost ~]# ssh localhost
    21 Last login: Fri Apr 26 12:45:43 2013 from master

    四、Hadoop配置

        4.1、下载hadoop-0.20.2.tar.gz,将其解压缩到/123目录下

    1 [root@localhost 123]# tar -zxvf hadoop-0.20.2.tar.gz 

        4.2、进入/123/hadoop-0.20.2/conf,配置Hadoop配置文件
        4.3、配置hadoop-env.sh文件

     1 [root@localhost conf]# pwd
     2 /123/hadoop-0.20.2/conf
     3 [root@localhost conf]# vi hadoop-env.sh 
     4 
     5 # Set Hadoop-specific environment variables here.
     6 
     7 # The only required environment variable is JAVA_HOME.  All others are
     8 # optional.  When running a distributed configuration it is best to
     9 # set JAVA_HOME in this file, so that it is correctly defined on
    10 # remote nodes.
    11 
    12 # The java implementation to use.  Required.
    ----下面这句是添加进去的----
    13
    export JAVA_HOME=/jdk1.7 14 15 # Extra Java CLASSPATH elements. Optional. 16 # export HADOOP_CLASSPATH=

        4.4、配置core-site.xml

     1 [root@localhost conf]# cat core-site.xml 
     2 <?xml version="1.0"?>
     3 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     4 
     5 <!-- Put site-specific property overrides in this file. -->
     6 
     7 <configuration>
     8 <property>
     9 <name>fs.default.name</name>
    10 <value>hdfs://192.168.20.150:9000</value>
    11 </property>
    12 <property>
    13 <name>hadoop.tmp.dir</name>
    14 <value>/123/hadooptmp</value>
    15 </property>
    16 </configuration>

        4.6、配置hdfs-site.xml

     1 [root@localhost conf]# cat hdfs-site.xml 
     2 <?xml version="1.0"?>
     3 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     4 
     5 <!-- Put site-specific property overrides in this file. -->
     6 
     7 <configuration>
     8 <property>
     9 <name>dfs.name.dir</name>
    10 <value>/123/hdfs/name</value>
    11 </property>
    12 <property>
    13 <name>dfs.data.dir</name>
    14 <value>/123/hdfs/data</value>
    15 </property>
    16 <property>
    17 <name>dfs.replication</name>
    18 <value>1</value>
    19 </property>
    20 </configuration>

        4.7、配置mapred-site.xml

     1 [root@localhost conf]# cat mapred-site.xml 
     2 <?xml version="1.0"?>
     3 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     4 
     5 <!-- Put site-specific property overrides in this file. -->
     6 
     7 <configuration>
     8 <property>
     9 <name>mapred.job.tracker</name>
    10 <value>localhost:9001</value>
    11 </property>
    12 </configuration>

        4.8、配置masters文件和slaves文件

    1 [root@localhost conf]# cat masters 
    2 192.168.20.150
    3 [root@localhost conf]# cat slaves 
    4 192.168.20.150

        注:因为在伪分布式模式下,作为master的namenode与作为slave的datanode是同一台服务器,所以配置文件中的ip是一样的
        4.9、编辑主机名

    1 [root@localhost conf]# cat /etc/hosts
    2 # Do not remove the following line, or various programs
    3 # that require network functionality will fail.
    4 127.0.0.1        localhost.localdomain localhost
    5 ::1        localhost6.localdomain6 localhost6
    6 192.168.20.150 master
    7 192.168.20.150 slave

        4.10、创建上面被编辑文件中的目录

    1 [root@localhost conf]# mkdir -p /123/hadooptmp
    2 
    3 [root@localhost conf]# mkdir -p /123/hdfs/name
    4 
    5 [root@localhost conf]# mkdir -p /123/hdfs/data

    五、启动Hadoop并进行验证

        5.1、对namenode进行格式化

     1 [root@localhost bin]# ./hadoop namenode -format
     2 13/04/26 11:08:05 INFO namenode.NameNode: STARTUP_MSG: 
     3 /************************************************************
     4 STARTUP_MSG: Starting NameNode
     5 STARTUP_MSG:   host = localhost.localdomain/127.0.0.1
     6 STARTUP_MSG:   args = [-format]
     7 STARTUP_MSG:   version = 0.20.2
     8 STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
     9 ************************************************************/
    10 Re-format filesystem in /123/hdfs/name ? (Y or N) Y
    11 13/04/26 11:08:09 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel
    12 13/04/26 11:08:09 INFO namenode.FSNamesystem: supergroup=supergroup
    13 13/04/26 11:08:09 INFO namenode.FSNamesystem: isPermissionEnabled=true
    14 13/04/26 11:08:09 INFO common.Storage: Image file of size 94 saved in 0 seconds.
    15 13/04/26 11:08:09 INFO common.Storage: Storage directory /123/hdfs/name has been successfully formatted.
    16 13/04/26 11:08:09 INFO namenode.NameNode: SHUTDOWN_MSG: 
    17 /************************************************************
    18 SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
    19 ************************************************************/

        5.2、启动hadoop所有进程

    1 [root@localhost bin]# ./start-all.sh 
    2 starting namenode, logging to /123/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-localhost.localdomain.out
    3 192.168.20.150: starting datanode, logging to /123/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-localhost.localdomain.out
    4 192.168.20.150: starting secondarynamenode, logging to /123/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out
    5 starting jobtracker, logging to /123/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-localhost.localdomain.out
    6 192.168.20.150: starting tasktracker, logging to /123/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-localhost.localdomain.out

        5.2、使用jps命令查看hadoop进程是否启动完全。

    1 [root@localhost bin]# jps
    2 15219 JobTracker
    3 15156 SecondaryNameNode
    4 15495 Jps
    5 15326 TaskTracker
    6 15044 DataNode
    7 14959 NameNode

        5.3、查看集群状态:

     1 [root@localhost bin]# ./hadoop dfsamin -report
     2 Error: Could not find or load main class dfsamin
     3 [root@localhost bin]# ./hadoop dfsadmin -report
     4 Configured Capacity: 19751522304 (18.4 GB)
     5 Present Capacity: 14953619456 (13.93 GB)
     6 DFS Remaining: 14953582592 (13.93 GB)
     7 DFS Used: 36864 (36 KB)
     8 DFS Used%: 0%
     9 Under replicated blocks: 0
    10 Blocks with corrupt replicas: 0
    11 Missing blocks: 0
    12 
    13 -------------------------------------------------
    14 Datanodes available: 1 (1 total, 0 dead)
    15 
    16 Name: 192.168.20.150:50010
    17 Decommission Status : Normal
    18 Configured Capacity: 19751522304 (18.4 GB)
    19 DFS Used: 36864 (36 KB)
    20 Non DFS Used: 4797902848 (4.47 GB)
    21 DFS Remaining: 14953582592(13.93 GB)
    22 DFS Used%: 0%
    23 DFS Remaining%: 75.71%
    24 Last contact: Fri Apr 26 13:06:15 CST 2013
  • 相关阅读:
    AndroidStudio小技巧--依赖库
    仿iOS Segmented Control样式"
    Metaweblog在Android上使用
    正则表达式使用技巧
    flask中gunicorn的使用
    Git用法小记
    指定GPU训练模型
    python中grpc的使用示例
    如何用LaTex编辑数学公式
    keras使用多进程
  • 原文地址:https://www.cnblogs.com/Richardzhu/p/3043997.html
Copyright © 2020-2023  润新知