• Linux下搭建Hadoop具体步骤


    装好虚拟机+Linux。而且主机网络和虚拟机网络互通。

    以及Linux上装好JDK


    1:在Linux下输入命令vi /etc/profile 加入HADOOP_HOME

    export  JAVA_HOME=/home/hadoop/export/jdk
    export  HADOOP_HOME=/home/hadoop/export/hadoop
    export  PATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

    2:改动hadoop/conf文件夹以下hadoop-env.sh第九行

    export JAVA_HOME=/home/hadoop/export/jdk

    3:改动hadoop/conf文件夹以下core-site.xml

    <configuration>
             <property>
                    <name>hadoop.tmp.dir</name>
                    <value>/home/.../tmp</value>
            </property>
            <property>
                    <name>fs.default.name</name>
                    <value>hdfs://127.0.0.1:9000</value>
            </property>
    </configuration>

    4:改动hadoop/conf文件夹以下hdfs-site.xml

    <configuration>
            <property>
                <name>dfs.replication</name>
                <value>1</value>
            </property>
    </configuration>

    5:改动hadoop/conf文件夹以下mapred-site.xml

    <configuration>
            <property>
                    <name>mapred.job.tracker</name>
                    <value>127.0.0.1:9001</value>
            </property>
    </configuration>

    改动完毕。
    转到hadoop/bin以下输入hadoop namenode -format
    出现例如以下:(说明成功)
    Warning: $HADOOP_HOME is deprecated.
    
    14/07/15 16:06:27 INFO namenode.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = ubuntu/127.0.1.1
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 1.2.1
    STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
    STARTUP_MSG:   java = 1.7.0_55
    ************************************************************/
    
    14/07/15 16:07:09 INFO util.GSet: Computing capacity for map BlocksMap
    14/07/15 16:07:09 INFO util.GSet: VM type       = 32-bit
    14/07/15 16:07:09 INFO util.GSet: 2.0% max memory = 1013645312
    14/07/15 16:07:09 INFO util.GSet: capacity      = 2^22 = 4194304 entries
    14/07/15 16:07:09 INFO util.GSet: recommended=4194304, actual=4194304
    14/07/15 16:07:10 INFO namenode.FSNamesystem: fsOwner=hadoop
    14/07/15 16:07:10 INFO namenode.FSNamesystem: supergroup=supergroup
    14/07/15 16:07:10 INFO namenode.FSNamesystem: isPermissionEnabled=true
    14/07/15 16:07:10 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
    14/07/15 16:07:10 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    14/07/15 16:07:10 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
    14/07/15 16:07:10 INFO namenode.NameNode: Caching file names occuring more than 10 times 
    14/07/15 16:07:10 INFO common.Storage: Image file /home/hadoop/tmp/dfs/name/current/fsimage of size 118 bytes saved in 0 seconds.
    14/07/15 16:07:10 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/hadoop/tmp/dfs/name/current/edits
    14/07/15 16:07:10 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/hadoop/tmp/dfs/name/current/edits
    14/07/15 16:07:10 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.
    14/07/15 16:07:10 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
    ************************************************************/

    在这一部分中有一部分人会出现失败的情况。可是你一定要去查hadoop以下logs里面的输出异常非常具体。


    第一次失败一定要记住删掉tmp以下的输出。由于有可能会出现不兼容的情况。


    然后输入start-all.sh

    Warning: $HADOOP_HOME is deprecated.
    
    
    starting namenode, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-namenode-ubuntu.out
    localhost: starting datanode, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-datanode-ubuntu.out
    localhost: starting secondarynamenode, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-ubuntu.out
    starting jobtracker, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-ubuntu.out
    localhost: starting tasktracker, logging to /home/hadoop/export/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-ubuntu.out

    在上面的过程中可能会提示你输入password,这时你能够设置个ssh免password登陆,我博客里面有。
    输入jps 出现例如以下:(少一个datanode。这里我有益设置一个错误)
    10666 NameNode
    11547 Jps
    11445 TaskTracker
    11130 SecondaryNameNode
    11218 JobTracker

    查看logs

    2014-07-15 16:13:43,032 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
    2014-07-15 16:13:43,094 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
    2014-07-15 16:13:43,098 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
    2014-07-15 16:13:43,118 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
    2014-07-15 16:13:43,999 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
    2014-07-15 16:13:44,044 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
    2014-07-15 16:13:45,484 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /home/hadoop/tmp/dfs/data: namenode namespaceID = 224603228; datanode namespaceID = 566757162
    	at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
    	at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
    	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
    	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:321)
    	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
    	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
    	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
    	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
    	at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)

    这时你仅仅要删除tmp下的文件,问题解决。

    然后你能够运行一个实例:详细操作例如以下

    hadoop@ubuntu:~/export/hadoop$ ls
    bin          hadoop-ant-1.2.1.jar          ivy          README.txt
    build.xml    hadoop-client-1.2.1.jar       ivy.xml      sbin
    c++          hadoop-core-1.2.1.jar         lib          share
    CHANGES.txt  hadoop-examples-1.2.1.jar     libexec      src
    conf         hadoop-minicluster-1.2.1.jar  LICENSE.txt  webapps
    contrib      hadoop-test-1.2.1.jar         logs
    docs         hadoop-tools-1.2.1.jar        NOTICE.txt

     进行上传hdfs文件操作
     hadoop@ubuntu:~/export/hadoop$ hadoop fs -put README.txt  /
    Warning: $HADOOP_HOME is deprecated.

    如上说明上传成功。
    运行一段wordcount程序(进行对README.txt文件处理)

    hadoop@ubuntu:~/export/hadoop$ hadoop jar hadoop-examples-1.2.1.jar word
    count /README.txt /wordcountoutput
    Warning: $HADOOP_HOME is deprecated.
    
    14/07/15 15:23:01 INFO input.FileInputFormat: Total input paths to process : 1
    14/07/15 15:23:01 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    14/07/15 15:23:01 WARN snappy.LoadSnappy: Snappy native library not loaded
    14/07/15 15:23:02 INFO mapred.JobClient: Running job: job_201407141636_0001
    14/07/15 15:23:03 INFO mapred.JobClient:  map 0% reduce 0%
    14/07/15 15:23:15 INFO mapred.JobClient:  map 100% reduce 0%
    14/07/15 15:23:30 INFO mapred.JobClient:  map 100% reduce 100%
    14/07/15 15:23:32 INFO mapred.JobClient: Job complete: job_201407141636_0001
    14/07/15 15:23:32 INFO mapred.JobClient: Counters: 29
    14/07/15 15:23:32 INFO mapred.JobClient:   Job Counters 
    14/07/15 15:23:32 INFO mapred.JobClient:     Launched reduce tasks=1
    14/07/15 15:23:32 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=12563
    14/07/15 15:23:32 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    14/07/15 15:23:32 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    14/07/15 15:23:32 INFO mapred.JobClient:     Launched map tasks=1
    14/07/15 15:23:32 INFO mapred.JobClient:     Data-local map tasks=1
    14/07/15 15:23:32 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14550
    14/07/15 15:23:32 INFO mapred.JobClient:   File Output Format Counters 
    14/07/15 15:23:32 INFO mapred.JobClient:     Bytes Written=1306
    14/07/15 15:23:32 INFO mapred.JobClient:   FileSystemCounters
    14/07/15 15:23:32 INFO mapred.JobClient:     FILE_BYTES_READ=1836
    14/07/15 15:23:32 INFO mapred.JobClient:     HDFS_BYTES_READ=1463
    14/07/15 15:23:32 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=120839
    14/07/15 15:23:32 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1306
    14/07/15 15:23:32 INFO mapred.JobClient:   File Input Format Counters 
    14/07/15 15:23:32 INFO mapred.JobClient:     Bytes Read=1366
    14/07/15 15:23:32 INFO mapred.JobClient:   Map-Reduce Framework
    14/07/15 15:23:32 INFO mapred.JobClient:     Map output materialized bytes=1836
    14/07/15 15:23:32 INFO mapred.JobClient:     Map input records=31
    14/07/15 15:23:32 INFO mapred.JobClient:     Reduce shuffle bytes=1836
    14/07/15 15:23:32 INFO mapred.JobClient:     Spilled Records=262
    14/07/15 15:23:32 INFO mapred.JobClient:     Map output bytes=2055
    14/07/15 15:23:32 INFO mapred.JobClient:     Total committed heap usage (bytes)=212611072
    14/07/15 15:23:32 INFO mapred.JobClient:     CPU time spent (ms)=2430
    14/07/15 15:23:32 INFO mapred.JobClient:     Combine input records=179
    14/07/15 15:23:32 INFO mapred.JobClient:     SPLIT_RAW_BYTES=97
    14/07/15 15:23:32 INFO mapred.JobClient:     Reduce input records=131
    14/07/15 15:23:32 INFO mapred.JobClient:     Reduce input groups=131
    14/07/15 15:23:32 INFO mapred.JobClient:     Combine output records=131
    14/07/15 15:23:32 INFO mapred.JobClient:     Physical memory (bytes) snapshot=177545216
    14/07/15 15:23:32 INFO mapred.JobClient:     Reduce output records=131
    14/07/15 15:23:32 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=695681024
    14/07/15 15:23:32 INFO mapred.JobClient:     Map output records=179
    

    hadoop@ubuntu:~/export/hadoop$ hadoop fs -ls /
    Warning: $HADOOP_HOME is deprecated.
    
    Found 3 items
    -rw-r--r--   1 hadoop supergroup       1366 2014-07-15 15:21 /README.txt
    drwxr-xr-x   - hadoop supergroup          0 2014-07-14 16:36 /home
    drwxr-xr-x   - hadoop supergroup          0 2014-07-15 15:23 /wordcountoutput
    hadoop@ubuntu:~/export/hadoop$ hadoop fs -get  /wordcountoutput  /home/hadoop/
    Warning: $HADOOP_HOME is deprecated.

    你能够下载下来看看这个文件
    例如以下:

    (see	1
    5D002.C.1,	1
    740.13)	1
    <http://www.wassenaar.org/>	1
    Administration	1
    Apache	1
    BEFORE	1
    BIS	1
    Bureau	1
    Commerce,	1
    Commodity	1
    Control	1
    Core	1
    Department	1
    ENC	1
    Exception	1
    Export	2
    For	1
    Foundation	1
    Government	1
    Hadoop	1
    Hadoop,	1
    Industry	1
    Jetty	1
    License	1
    Number	1
    Regulations,	1
    SSL	1
    Section	1
    Security	1
    See	1
    Software	2
    Technology	1
    The	4
    This	1
    U.S.	1
    Unrestricted	1
    about	1
    algorithms.	1
    and	6
    and/or	1
    another	1
    any	1
    as	1
    asymmetric	1
    at:	2
    both	1
    by	1
    check	1
    classified	1
    code	1
    code.	1
    concerning	1
    country	1
    country's	1
    country,	1
    cryptographic	3
    currently	1
    details	1
    distribution	2
    eligible	1
    encryption	3
    exception	1
    export	1
    following	1
    for	3
    form	1
    from	1
    functions	1
    has	1
    have	1


  • 相关阅读:
    SlimDX.dll安装之后所在位置
    使用正则表达式进行简单查找
    UDP-C#代码
    非Unicode工程读取Unicode文件
    模板类重载<<运算符
    ganglia及ganglia-api相关介绍
    keystone v3 相关介绍
    ubuntu下ssh使用proxy:corkscrew
    neutron用linux_bridge部署provider网络
    python thread的join方法解释
  • 原文地址:https://www.cnblogs.com/zfyouxi/p/5154542.html
Copyright © 2020-2023  润新知