• Hadoop安装和使用


    1、安装

    1.1、下载hadoop-2.5.1.tar.gz

    1.2、解压至安装目录

    tar -zxv -f hadoop-2.5.1.tar.gz  -C ../soft/
    

     1.3、配置hadoop相关配置文件

    vim .bashrc
    ##添加JAVA配置
    export JAVA_HOME=/usr/xuelu/java
    export PATH=$PATH:$JAVA_HOME/bin
    

     vim .bash_profile

    # .bash_profile
    
    # Get the aliases and functions
    if [ -f ~/.bashrc ]; then
            . ~/.bashrc
    fi
    
    # User specific environment and startup programs
    
    PATH=$PATH:$HOME/bin
    
    #设置hadoop的环境变量
    export HADOOP_HOME=/home/xuelul/soft/hadoop251
    #设置maven的环境变量
    export MAVEN_HOME=/usr/xuelul/maven
    export ZOOKEEPER_HOME=/home/xuelu/soft/zoo346
    PATH=$PATH:$HADOOP_HOME/bin:$MAVEN_HOME/bin:$ZOOKEEPER_HOME/bin
    export PATH
    

     source .bash_profile,使上述修改生效

    修改hadoop自带的配置文件:

    etc/hadoop/core-site.xml:

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
    </configuration>

    etc/hadoop/hdfs-site.xml:

    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
    </configuration>

    Setup passphraseless ssh

    Now check that you can ssh to the localhost without a passphrase:

      $ ssh localhost

    If you cannot ssh to localhost without a passphrase, execute the following commands:

      $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
      $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

    数据准备:
    $ mkdir input
    $ cp etc/hadoop/*.xml input
    #1、格式化文件系统:
    
       $ bin/hdfs namenode -format
    
    #2、开启 NameNode daemon and DataNode daemon:
    
          $ sbin/start-dfs.sh
    
    #3、The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
        Browse the web interface for the NameNode; by default it is available at:
            NameNode - http://localhost:50070/
    #4、Make the HDFS directories required to execute MapReduce jobs:
    
          $ bin/hdfs dfs -mkdir /user
          $ bin/hdfs dfs -mkdir /user/<username>
    
    #5、Copy the input files into the distributed filesystem:
    
          $ bin/hdfs dfs -put etc/hadoop input
    
    #6、Run some of the examples provided:
    
          $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar grep input output 'dfs[a-z.]+'
    
    #7、Examine the output files:
    
        Copy the output files from the distributed filesystem to the local filesystem and examine them:
    
          $ bin/hdfs dfs -get output output
          $ cat output/* 
    
        or View the output files on the distributed filesystem:
    
          $ bin/hdfs dfs -cat output/*
    
    #8、When you're done, stop the daemons with:
    
          $ sbin/stop-dfs.sh

    YARN on Single Node

    You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.

     The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.

    1. Configure parameters as follows:

      etc/hadoop/mapred-site.xml:

      <configuration>
          <property>
              <name>mapreduce.framework.name</name>
              <value>yarn</value>
          </property>
      </configuration>

      etc/hadoop/yarn-site.xml:

      <configuration>
          <property>
              <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
          </property>
      </configuration>
    2. Start ResourceManager daemon and NodeManager daemon:
        $ sbin/start-yarn.sh
    3. Browse the web interface for the ResourceManager; by default it is available at:
      • ResourceManager - http://localhost:8088/
    4. Run a MapReduce job.
    5. When you're done, stop the daemons with:
        $ sbin/stop-yarn.sh
  • 相关阅读:
    Python学习Day1
    Linux使用外部邮箱发送邮件
    Linux命令学习1(awk、grep、sed)
    html笔记之表格
    html笔记之认识标签
    zabbix笔记之告警配置
    zabbix笔记之zabbix基础知识了解
    Windows之80端口被系统占用
    python笔记之流程控制
    python笔记之基本数据类型
  • 原文地址:https://www.cnblogs.com/xuelu/p/4085573.html
Copyright © 2020-2023  润新知