• Hadoop伪分布模式配置


    请先按照上一篇文章Hadoop单机模式配置安装好java和hadoop后再进行伪分布模式的配置。

     Hadoop伪分布模式:一台机器,每个Hadoop守护进程都是一个独立的JVM进程

    安装ssh服务

    sudo apt-get install openssh-server

    (如果没有安装ssh服务,将会出现如下情况:

    manhua@manhua-Aspire-4741 ~/.ssh $ ssh localhost
    ssh: connect to host localhost port 22: Connection refused
    )

    如果中途因为时间过长您中断了更新(Ctrl+z),当您再次更新时,会更新不了,报错为:“Ubuntu无法锁定管理目录(/var/lib/dpkg/),是否有其他进程占用它?“需要如下操作
    sudo rm /var/lib/dpkg/lock
    sudo rm /var/cache/apt/archives/lock

    建立ssh无密码登录本机

    ssh-keygen [三次Enter]
    ssh-copy-id user@localhost

    登录localhost

    ssh localhost

    首次连接需要输入yes确认
    退出ssh输入exit / ctrl+D即可

    配置hadoop

    进入Hadoop配置文件目录

    cd ~/hadoop/hadoop/etc/hadoop

    core-site.xml

    sudo gedit core-site.xml
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/kevin/hadoop</value>
        <description>temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>

    hdfs-site.xml

    sudo gedit hdfs-site.xml
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

    yarn-site.xml

    sudo gedit yarn-site.xml
    <property>
        <name>yarn.nodemanager.aux-services</name> 
        <value>mapreduce_shuffle</value>
        </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>127.0.0.1:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>127.0.0.1:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>127.0.0.1:8031</value>
    </property>

    mapred-site.xml  (默认不存在此文件,需要创建)

    sudo cp mapred-site.xml.template mapred-site.xml
    sudo gedit mapred-site.xml
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

     创建slaves文件,填入

    localhost

    测试

    格式化NameNode

    hdfs namenode -format

    看到倒数10行内有common.Storage: Storage directory /opt/hadoop/hadoop_tmp/dfs/name has been successfully formatted.即成功

    启动

    start-all.sh

    查看所有JAVA进程

    jps

    此时可以看到如下:
    jps
    ResourceManager
    NodeManager
    DataNode
    NameNode
    SecondaryNameNode

    并且可以访问以下站点

    http://localhost:8088/cluster
    http://localhost:50070/dfshealth.jsp

    在用户文件夹建立test文件夹,新建几个文本文档,输入一些文字

    hdfs dfs -copyFromLocal ~/setupEnv/Test1-WordCount/in /in
    hdfs dfs -ls /in
    hadoop jar ~/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /in /out

    上传时指定block大小: hadoop fs -D fs.local.block.size=134217728 -put local_name remote_location

    查看结果

    hadoop fs -cat /out/part-r-00000

      hdfs dfs -cat /out/part-r-00000

     停止

    stop-all.sh

    =====================

    Extra 测试二 compile &jar

    cd ~/setupEnv/Test2-Dedup
    javac -cp /opt/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar:/opt/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/opt/hadoop-2.2.0/share/hadoop/common/lib/* Dedup.java 
    jar -cvf Dedup.jar ./*.class

     再次执行前要删除输出文件夹

    hadoop fs -rm -r /out
    hadoop dfs -copyFromLocal ~/setupEnv/Test2-Dedup/in2 /in
    hadoop fs -ls /in
    hadoop jar ~/setupEnv/Test2-Dedup/Dedup.jar Dedup /in /out

    异常处理

    如果发生任何启动异常,关闭进程后,需要清空数据:

    rm -rf ~/hadoop/tmp

    然后重新执行格式化:

    hdfs namenode -format  
  • 相关阅读:
    Educational Codeforces Round 23 D. Imbalanced Array(单调栈)
    hdu 4355 Party All the Time(三分)
    Educational Codeforces Round 21 F. Card Game(网络流之最大点权独立集)
    qscoj Round 1(div 2)
    玲珑杯 ACM Round #10
    hihoCoder #27
    Codeforces Round #396(div 2)
    高数A(下)第九章
    Mutual Training for Wannafly Union #5
    寒假集训补完
  • 原文地址:https://www.cnblogs.com/manhua/p/3530065.html
Copyright © 2020-2023  润新知