• 在虚拟机VM中安装的Ubuntu上安装和配置Hadoop


    一、系统环境:

    1. 我使用的Ubuntu版本是:ubuntu-12.04-desktop-i386.iso
    2. jdk版本:jdk1.7.0_67
    3. hadoop版本:hadoop-2.5.0

    二、下载jdk和hadoop,并上传到Ubuntu系统中

    Vmware中的Linux与主机系统Windows交互文件的方法请参考:http://blog.chinaunix.net/uid-27717694-id-3834143.html

    三、设置hadoop用户:

    sudo addgroup hadoop #创建hadoop用户组
    
    sudo adduser -ingroup hadoop hadoop #添加hadoop用户到hadoop组中
    
    sudo gedit /etc/sudoers #为hadoop用户添加权限
    
    
    在root设置权限的代码下添加一行:
    
    hadoopALL=(ALL:ALL) ALL

    四、安装ssh,配置无密码登录

    1. 安装ssh-server:sudo apt-get install openssh-server
    2. 配置无密码登录:
    ssh-keygen -t ras -P ""
    sudo cat ~/.ssh/id_rsa.pub >> authorized_keys
    chmod 644 authorized_keys
    sudo gedit /etc/ssh/sshd_config 
    把AuthroziedKeysFile   %h/.ssh/authorized_keys这一行注释取消

    3. ssh localhost 成功!

    五、安装jdk

    1. 在usr/local目录下新建java文件夹,命令行:sudo mkdir /usr/local/java
    2. 把下载到的jdk压缩包拷贝到java文件夹下,命令行:sudo cp ***.tar.gz /usr/local/java
    3. 进入java目录,命令行:cd /usr/local/java
    4. 解压压缩包,命令行:sudo tar -xvf ***.tar.gz
    5. 删除压缩包,命令行:sudo rm ***.tar.gz
    6. 设置jdk环境变量
    这里采用全局设置方法,就是修改etc/profile,它是所有用户的共用的环境变量
    sudo gedit /etc/profile
    
    打开之后在末尾添加
    export JAVA_HOME=/usr/local/java/jdk1.7.0_67
    export JRE_HOME=/usr/local/java/jdk1.7.0_67/jre
    export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPATH
    export PATH=$JAVA_HOME/bin:$PATH

    source /etc/profile  使profile生效

     

    7. 检验是否安装成功

    java -version
    
    成功则显示如下
    
    java version "1.7.0_67"
    Java(TM) SE Runtime Environment (build 1.7.0_67-b18)
    Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

    六、安装Hadoop

    1. 把hadoop源码包拷贝到/home/hadoop目录下,命令行:sudo cp hadoop-2.5.0.tar.gz /home/hadoop
    2. 解压,命令行:sudo tar -xvf hadoop-2.5.0.tar.gz
    3. 配置hadoop环境变量
    配置:
    sudo gedit /etc/profile
    
    添加:
    #HADOOP VARIABLES START
     export HADOOP_INSTALL=/home/hadoop/hadoop-2.5.0
     export PATH=$PATH:$HADOOP_INSTALL/bin
     export PATH=$PATH:$HADOOP_INSTALL/sbin
     export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
     export HADOOP_COMMON_HOME=$HADOOP_INSTALL
     export HADOOP_HDFS_HOME=$HADOOP_INSTALL
     export YARN_HOME=$HADOOP_INSTALL
     export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
     export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
     #HADOOP VARIABLES END

     

    4. 配置core-site.xml,包含了Hadoop启动时的配置信息

    sudo gedit /etc/hadoop/core-site.xml
    
    <configuration>
       <property>
             <name>fs.default.name</name>
             <value>hdfs://localhost:9000</value>
        </property>
     </configuration>


    5. 配置yarn-site.xml,包含了MapReduce启动时的配置信息

    sudo gedit /etc/hadoop/yarn-site.xml
    
    <configuration>
     <!-- Site specific YARN configuration properties -->
       <property>
           <name>yarn.nodemanager.aux-services</name>
           <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
         </property>
    </configuration>

     

    6. 创建和配置mapred-site.xml

    cd /etc/hadoop
    cp mapred-site.xml.template mapred-site.xml
    
    <configuration>
       <property>
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
       </property>
    </configuration>

    7. 配置hdfs-site.xml

    sudo gedit /etc/hadoop/hdfs-site.xml
    
    <configuration>
        <property>
             <name>dfs.replication</name>
             <value>1</value>
         </property>
         <property>
             <name>dfs.namenode.name.dir</name>
             <value>file:/home/hadoop/software/hadoop-2.4.0/hdfs/name</value>
          </property>
          <property>
          <name>dfs.datanode.data.dir</name>
               <value>file:/home/hadoop/software/hadoop-2.4.0/hdfs/data</value>
          </property>
    </configuration>

      

      8. 格式化hdfs,命令行:hdfs namenode -format

      9. 启动hadoop

    start-dfs.sh:启动NameNode,DataNode,SecondaryNameNode
    
    start-yarn.sh:启动NodeManager,Resourcemanager

    七、安装Eclipse

    1. 下载Lunix,下载地址:http://www.eclipse.org/downloads/packages/eclipse-standard-44/lunar
    2. 拷贝Eclipse安装包到/usr/local目录下,命令行:sudo cp eclipse-standard-luna-R-linux-gtk.tar.gz /usr/local
    3. 解压Eclipse安装包,命令行:sudo tar -xvf eclipse-standard-luna-R-linux-gtk.tar.gz
    4. 运行eclipse,命令行:./eclipse

    八、配置Eclipse上的hadoop插件:

    1. 下载插件:hadoop-eclipse-kepler-plugin-2.2.0.jar
    2. 拷贝插件到eclipse的plugins目录下,命令行:sudo cp hadoop-eclipse-kepler-plugin-2.2.0.jar /usr/local/eclipse/plugins
    3. 重启eclipse,配置Hadoop installation directory:打开Windows—Preferences,选择Hadoop Map/Reduce选项,设置Hadoop安装路径

    九、关闭hadoop环境:

    stop-dfs.sh
    
    stop-yarn.sh
  • 相关阅读:
    slenium截屏
    效率提升
    R语言网页爬虫
    高性能计算
    数据操作
    数据库操作
    面向对象编程
    元编程
    R 的内部机制
    数据处理
  • 原文地址:https://www.cnblogs.com/luonet/p/3970217.html
Copyright © 2020-2023  润新知