• hadoop伪分布模式安装


    软件环境

      操作系统 :  OracleLinux-R6-U6

      主机名:  hadoop

      java:  jdk1.7.0_75

      hadoop: hadoop-2.4.1

    环境搭建

      1、软件安装

      由于所需的软件均为绿色包,所以将java和hadoop分别解压到操作系统根目录即可。

    [root@hadoop training]# ls -l /
    总用量 110
    dr-xr-xr-x.   2 root  root   4096 5月  17 19:13 bin
    dr-xr-xr-x.   5 root  root   1024 5月  17 17:45 boot
    drwxr-xr-x.   2 root  root   4096 10月 15 2014 cgroup
    drwxr-xr-x.  19 root  root   3780 5月  18 01:36 dev
    drwxr-xr-x. 131 root  root  12288 5月  18 17:59 etc
    drwxr-xr-x.  11 67974 users  4096 5月  18 18:22 hadoop-2.4.1
    drwxr-xr-x.   2 root  root   4096 11月  1 2011 home
    drwxr-xr-x.   8 uucp    143  4096 12月 19 2014 jdk1.7.0_75
    

      

      2、配置环境变量

      修改profile文件

    [root@hadoop training]# cat ~/.bash_profile 
    # .bash_profile
    
    # Get the aliases and functions
    if [ -f ~/.bashrc ]; then
    	. ~/.bashrc
    fi
    
    # User specific environment and startup programs
    
    PATH=$PATH:$HOME/bin
    
    export PATH
    
    # set python environment
    PYTHON_PATH=/python2.7
    export PATH=$PYTHON_PATH/bin:$PATH
    
    # set java environment 分为JAVA JDK CLASSPATH三类
    export JAVA_HOME=/jdk1.7.0_75
    export JRE_HOME=/jdk1.7.0_75/jre
    export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
    export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
    export HADOOP_HOME=/hadoop-2.4.1
    export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
    

      修改hosts文件

    [root@hadoop ~]# cat /etc/hosts
    127.0.0.1         localhost
    172.10.236.21 hadoop
    

      

      3、hadoop分布式文件配置

      hadoop的所有配置文件均在/hadoop-2.4.1/etc/hadoop/目录下

      配置hadoop-env.sh,修改java_home

    # The java implementation to use.
    export JAVA_HOME=/jdk1.7.0_75
    

      

      配置hdfs-site.xml,伪分布只需要设置一个复制节点即可。

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
    	<name>dfs.replication</name>
    	<value>1</value>
    </property>
    </configuration>
    

      

      配置core-site.xml,设置namenode格式化数据的存储目录,操作系统每次重启/tmp目录下的数据被清除,所以需要为namenode数据设置一个别的目录。

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hadoop:9000</value>
    </property>
    <property>
            <name>hadoop.tmp.dir</name>
            <value>/hadoop-2.4.1/tmp</value>
    </property>
    </configuration>
    

      配置mapred-site.xml,由于默认只有mapred-site.xml.template文件,所以拷贝一份mapred-site.xml.template为mapred-site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
    </property>
    </configuration>
    

      

      配置yarn-site.xml

    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop</value>
    </property>
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
    </property>
    </configuration>
    

     

    最后一步: 

      格式化namenode

      至此hadoop所有配置文件已全部配置完成,现在格式化namenode,以记录处理hadoop分布式信息了。

      # hdfs namenode -format

    ……
    17/05/18 18:28:19 INFO util.GSet: VM type       = 32-bit
    17/05/18 18:28:19 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
    17/05/18 18:28:19 INFO util.GSet: capacity      = 2^19 = 524288 entries
    17/05/18 18:28:19 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
    17/05/18 18:28:19 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
    17/05/18 18:28:19 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
    17/05/18 18:28:19 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
    17/05/18 18:28:19 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
    17/05/18 18:28:19 INFO util.GSet: Computing capacity for map NameNodeRetryCache
    17/05/18 18:28:19 INFO util.GSet: VM type       = 32-bit
    17/05/18 18:28:19 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
    17/05/18 18:28:19 INFO util.GSet: capacity      = 2^16 = 65536 entries
    17/05/18 18:28:19 INFO namenode.AclConfigFlag: ACLs enabled? false
    17/05/18 18:28:20 INFO namenode.FSImage: Allocated new BlockPoolId: BP-39137453-172.10.236.21-1495103299866
    17/05/18 18:28:20 INFO common.Storage: Storage directory /hadoop-2.4.1/tmp/dfs/name has been successfully formatted.
    17/05/18 18:28:20 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    17/05/18 18:28:20 INFO util.ExitUtil: Exiting with status 0
    17/05/18 18:28:20 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at hadoop/172.10.236.21
    ************************************************************/
    

      查看格式化信息

    [root@hadoop hadoop]# ls /hadoop-2.4.1/tmp/
    dfs  nm-local-dir
    

     

     至此,hadoop环境已经可以使用了,可以通过start-all.sh来启动hadoop所有服务

      检查进程 

    [root@hadoop ~]# jps
    16570 Jps
    15893 DataNode
    16461 NodeManager
    16179 ResourceManager
    16041 SecondaryNameNode
    15774 NameNode
    

      

    hdfs环境可用性测试

    创建dfs目录

    # hdfs dfs -mkdir /logs

    查看创建的目录

    # hdfs dfs -ls /

    Found 1 items
    drwxr-xr-x - root supergroup 0 2017-05-18 18:32 /logs  

    向新建目录发送数据文件

    # hdfs dfs -put install.log /logs

    查看文件发送结果

    [root@hadoop ~]# hdfs dfs -ls /logs

    Found 1 items
    -rw-r--r-- 1 root supergroup 57162 2017-05-18 18:32 /logs/install.log
    

      

    hadoop伪分布模式配置成功并可以使用了。

    无密码验证配置

    由于每次启动hadoop服务都需要输入密码,对于hadoop集群节点太多的情况下显然不合适,所以需要设置启动hadoop服务无密码的方法。

    生成密码文件,并将公钥文件拷贝到hadoop服务器(这里是自己)

    # ssh-keygen -t rsa

    # 将id_rsa.pub内容拷贝到authorized_keys中

    # ssh-copy-id id_rsa.pub hadoop

    [root@hadoop ~]# ls ~/.ssh/
    authorized_keys  id_rsa  id_rsa.pub  known_hosts
    

      

    启动hadoop

    [root@hadoop ~]# start-all.sh 
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [hadoop]
    The authenticity of host 'hadoop (172.10.236.21)' can't be established.
    RSA key fingerprint is b9:d5:64:bb:f9:34:77:22:d7:a7:09:a6:1e:ab:ba:83.
    Are you sure you want to continue connecting (yes/no)? yes
    hadoop: Warning: Permanently added 'hadoop' (RSA) to the list of known hosts.
    hadoop: starting namenode, logging to /hadoop-2.4.1/logs/hadoop-root-namenode-hadoop.out
    localhost: starting datanode, logging to /hadoop-2.4.1/logs/hadoop-root-datanode-hadoop.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /hadoop-2.4.1/logs/hadoop-root-secondarynamenode-hadoop.out
    starting yarn daemons
    starting resourcemanager, logging to /hadoop-2.4.1/logs/yarn-root-resourcemanager-hadoop.out
    localhost: starting nodemanager, logging to /hadoop-2.4.1/logs/yarn-root-nodemanager-hadoop.out
    

      

  • 相关阅读:
    985大学的部分课程链接
    SVD学习
    资源三:机器学习源代码
    资源二:计算机视觉,机器学习方面牛人网站链接
    资源一:计算机视觉,机器学习方面的论文和算法代码
    PHPCMS v9 分析(1)
    highcharts 配置选项
    highcharts 基本组成
    Jquery 代码性能改善
    非80端口的网站发布后外网访问的问题
  • 原文地址:https://www.cnblogs.com/kongzhagen/p/6872297.html
Copyright © 2020-2023  润新知