• 《OD大数据实战》Hadoop伪分布式环境搭建


    一、安装并配置Linux

    8. 使用当前root用户创建文件夹,并给/opt/下的所有文件夹及文件赋予775权限,修改用户组为当前用户

    mkdir -p /opt/modules
    mkdir -p /opt/software
    mkdir -p /opt/datas
    mkdir -p /opt/tools
    chmod 775 /opt/*
    chown beifeng:beifeng /opt/*

    最终效果如下:

    [beifeng@beifeng-hadoop-02 opt]$ pwd
    /opt
    [beifeng@beifeng-hadoop-02 opt]$ ll
    total 20
    drwxrwxr-x.  5 beifeng beifeng 4096 Jul 30 00:13 clusterapps
    drwxr-xr-x. 11 beifeng beifeng 4096 Jul 21 23:30 datas
    drwxr-xr-x.  6 beifeng beifeng 4096 Jul 31 22:03 modules
    drwxr-xr-x.  2 beifeng beifeng 4096 Jul 30 18:17 software
    drwxr-xr-x.  2 beifeng beifeng 4096 Jul 10 20:26 tools

    二、安装并配置JDK

    1. 安装文件  

    jdk-7u67-linux-x64.tar.gz

    2. 解压

    tar -zxvf jdk-7u67-linux-x64.tar.gz -C /opt/modules

    3. 配置jdk

    1)使用sudo配置/etc/profile,在文件尾加上以下配置

    #JAVA_HOME
    export JAVA_HOME=/opt/modules/jdk1.7.0_67
    export PATH=$PATH:$JAVA_HOME/bin

    2)配置完成后,使用su - root 切换到root用户,使用source命令生效配置。

    source /etc/profile

    3)验证jdk是否安装成功

    [root@beifeng-hadoop-02 ~]# java -version
    java version "1.7.0_67"
    Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
    Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
    [root@beifeng-hadoop-02 ~]# javac -version
    javac 1.7.0_67

    三、安装并配置hadoop

    1. 安装文件

    下载地址:http://archive.cloudera.com/cdh5/cdh/5/

    下载: hadoop-2.5.0-cdh5.3.6.tar.gz

    2. 解压

    tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/modules/cdh/

    3. 配置伪分布式环境

    参考文档: http://hadoop.apache.org/docs/r2.5.2/hadoop-project-dist/hadoop-common/ClusterSetup.html

    cd /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop

    修改/etc/profile,在文件尾增加以下配置:

    #HADOOP_HOME
    export HADOOP_HOME=/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    export HADOOP_COMMON_LIB_NATIE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

    建议使用远程sftp编辑工具,windows上可以使用notepad++,mac上推荐使用skEdit。

    1)修改hadoop-evn.sh

    export JAVA_HOME=/opt/modules/jdk1.7.0_67

    2)修改yarn-env.sh

    export JAVA_HOME=/opt/modules/jdk1.7.0_67

    3)修改mapred-env.sh

    export JAVA_HOME=/opt/modules/jdk1.7.0_67

    4)修改core-site.xml

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://beifeng-hadoop-02:9000</value>
        </property>
         <property>
             <name>hadoop.tmp.dir</name>
             <value>/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/data/tmp</value>
         </property>
         <property>
              <name>hadoop.http.staticuser.user</name>
              <value>beifeng</value>
         </property>    
    </configuration>

    5)修改hdfs-site.xml

    <configuration>
    
            <!-- 数据副本数,副本数等于所有datanode的总和 -->
            <property>
                    <name>dfs.replication</name>
                    <value>1</value>
            </property>
    
            <property>
                    <name>dfs.namenode.secondary.http-address</name>
                    <value>beifeng-hadoop-02:50090</value>
            </property>
    
            <property>
                    <name>dfs.permissions.enabled</name>
                    <value>false</value>
            </property>
            
    </configuration>

    6)修改slaves

    beifeng-hadoop-02

    7)修改yarn-site.xml

    <configuration>
    
    <!-- Site specific YARN configuration properties -->
            <property>
                    <name>yarn.nodemanager.aux-services</name>
                    <value>mapreduce_shuffle</value>
            </property>
    
            <property>
                    <name>yarn.resourcemanager.hostname</name>
                    <value>beifeng-hadoop-02</value>
            </property>
    
            <!-- 是否启用日志聚集功能 -->
            <property>
                    <name>yarn.log-aggregation-enable</name>
                    <value>true</value>
            </property>
    
            <!-- 日志保留时间(单位为秒) -->
            <property>
                    <name>yarn.log-aggregation.retain-seconds</name>
                    <value>106800</value>
            </property>
    </configuration>

    8) 修改mapred-site.xml

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>

    9)启动服务

    (1)格式化hdfs

    bin/hdfs namenode -format

    (2)启动namenode和datanode

    sbin/hadoop-daemon.sh start namenode
    sbin/hadoop-daemon.sh start datanode

    使用jps命令,或者web UI界面查看namenode是否已启动成功。

    [beifeng@beifeng-hadoop-02 hadoop-2.5.0-cdh5.3.6]$ jps
    82334 DataNode
    82383 Jps
    82248 NameNode

    hdfs可视化界面: http://beifeng-hadoop-02:50070/dfshealth.html#tab-overview

    (2)启动resourcemanager和nodemanager

    sbin/yarn-daemon.sh start resourcemanager
    sbin/yarn-daemon.sh start nodemanager

    使用jps命令,或者web UI界面查看resourcemanager和nodemanager是否已成功启动

    [beifeng@beifeng-hadoop-02 hadoop-2.5.0-cdh5.3.6]$ jps
    82334 DataNode
    82757 NodeManager
    82874 Jps
    82248 NameNode
    82507 ResourceManager

    yarn可视化界面: http://beifeng-hadoop-02:8088/cluster

    (3)启动job历史服务器

    sbin/mr-jobhistory-daemon.sh start historyserver

    查看是否已成功启动:

    历史服务器可视化界面:http://beifeng-hadoop-02:19888/

    (4)启动secondarynamenode

    sbin/hadoop-daemon.sh start secondarynamenode

    查看是否已成功启动:

    secondarynamenode可视化界面 http://beifeng-hadoop-02:50090/status.html

    (5)所有相关服务停止命令

    sbin/hadoop-daemon.sh stop namenode
    sbin/hadoop-daemon.sh stop datanode
    sbin/yarn-daemon.sh stop resourcemanager
    sbin/yarn-daemon.sh stop nodemanager
    sbin/mr-jobhistory-daemon.sh stop historyserver
    sbin/hadoop-daemon.sh stop secondarynamenode

    10)跑一个wordcount 验证环境搭建结果

    文件系统shell:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.6/hadoop-project-dist/hadoop-common/FileSystemShell.html

    hdfs dfs -mkdir -p /user/beifeng/input
    
    hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount /user/beifeng/input /user/beifeng/output 
    
    hdfs dfs -cat /user/beifeng/output/part-r-00000

    四、给Hadoop2.x添加Snappy解压缩库

    1. 修改配置

    1)修改core-site.xml

         <!-- SNAPPY compress -->
         <property> 
             <name>io.compression.codecs</name> 
             <value>org.apache.hadoop.io.compress.GzipCodec,
                     org.apache.hadoop.io.compress.DefaultCodec, 
                     org.apache.hadoop.io.compress.BZip2Codec, 
                     org.apache.hadoop.io.compress.SnappyCodec
            </value>
            <description>A comma-separated list of the compression codec classes that can
                be used for compression/decompression. In addition to any classes
                specified with this property (which take precedence), codec classes on the classpath are discovered
                using a Java ServiceLoader. 
            </description>
        </property>

    2)修改mapred-site.xml

        <!-- 开启 MapReduce map 输出结果压缩功能 --> 
        <property>
            <name>mapreduce.map.output.compress</name>
            <value>true</value>
        </property>
        <property>
            <name>mapreduce.map.output.compress.codec</name>
            <value>org.apache.hadoop.io.compress.SnappyCodec</value>
        </property>

    2. 安装snappy

    1)解压

    tar -zxvf snappy-1.1.2.tar.gz -C /opt/modules/cdh/
    
    cd /opt/modules/cdh/snappy-1.1.2

    2)预编译

    ./configure

    3)编译安装

    sudo make && sudo make install

    4)编译成功后,查看安装目录

    cd /usr/local/lib && ls

    3. 安装hadoop-snappy

    1)解压

    tar -zxvf hadoop-snappy.tar.gz -C /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/

    2)打包编译

    cd /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/hadoop-snappy
    
    mvn package -Dsnappy.prefix=/usr/local

    ubuntu安装hadoop常见错误与解决方法

    sudo ln -s /opt/modules/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so /usr/local/lib

    3)copy 编译好的jar包到hadoop lib下

    cp /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/hadoop-snappy/target/hadoop-snappy-0.0.1-SNAPSHOT.jar /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/lib

    4)修改hadoop-env.sh

    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/native/Linux-amd64-64/

    5)编译生成后的动态库 copy 到 $HADOOP_HOME/lib/native/ 目录下

    cd /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/hadoop-snappy/target/hadoop-snappy-0.0.1-SNAPSHOT-tar/hadoop-snappy-0.0.1-SNAPSHOT/lib
    cp -r native/Linux-amd64-64 /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/lib/native/

    6)copy Linux-amd64-64 目录下的文件,到/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/lib/native/

    cd Linux-amd64-64/
    
    cp -r ./* ../

    4. 编译hadoop-2.5.0-cdh5.3.6-src源码

    注意.m2/settings.xml文件,使用maven原生的配置,否则无法加载pom

    mvn package -Pdist,native -DskipTests -Dtar -Drequire.snappy

    执行了一半,磁盘空间不够

    http://os.51cto.com/art/201012/240726_all.htm

    http://www.cnblogs.com/chenmh/p/5096592.html

    http://www.linuxfly.org/post/243/

    1)替换 hadoop 安装目录下的 lib/native 目录下的本地库文件 

    /opt/modules/hadoop-2.5.0-src/hadoop-dist/target/hadoop-2.5.0/lib/native

    cp ./* /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/lib/native/ 

    5. 验证

    hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar pi 2 100
    
    hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount /user/beifeng/input /user/beifeng/output03 
    
    hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.6.jar wordcount -Dmapreduce.map.output.compress=true -Dmapreduce.map.output.codec=org.apache.hadoop.io.compress.SnappyCodec /user/beifeng/input /user/beifeng/output02 
  • 相关阅读:
    float、定位、inline-block、兼容性需注意的特性总结
    meta 标签 详细说明
    兼容探讨一
    javascript性能优化总结二(转载)
    javascript性能优化总结一(转载人家)
    特效合集(原生JS代码)适合初学者
    svg实现简单沙漏旋转
    SVG制作简单的图形
    SVG的简单介绍
    jQuery之效果
  • 原文地址:https://www.cnblogs.com/yeahwell/p/5726351.html
Copyright © 2020-2023  润新知