• Hadoop 3.1.0 在 Ubuntu 16.04 上的安装过程


    安装过程主要参考官方文档:

    http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html

    目标:

    Set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

    部署安装一个单节点的 Hadoop ,以便使用 Hadoop MapReduce、HDFS 完成一些简单操作。 

    软件准备:

    Ubuntu 16.04

    这里使用 VMware 搭建 Ubuntu 16.04 的虚拟机系统

    Hadoop 3.1.0

    Hadoop 安装包下载镜像站:Apache Download Mirrors

    这里下载的版本是: hadoop-3.1.0.tar.gz 

    解压下载的 Hadoop 压缩包:

    $ tar -zxvf hadoop-3.0.0.tar.gz

    Openjdk 8.0

    Version 2.7 and later of Apache Hadoop requires Java 7. It is built and tested on both OpenJDK and Oracle (HotSpot)'s JDK/JRE.

    Earlier versions (2.6 and earlier) support Java 6.

    Here are the known JDKs in use or which have been tested:

    • Version

      Status

      Reported By

      oracle 1.7.0_15

      Good

      Cloudera

      oracle 1.7.0_21

      Good (4)

      Hortonworks

      oracle 1.7.0_45

      Good

      Pivotal

      openjdk 1.7.0_09-icedtea

      Good (5)

      Hortonworks

      oracle 1.6.0_16

      Avoid (1)

      Cloudera

      oracle 1.6.0_18

      Avoid

      Many

      oracle 1.6.0_19

      Avoid

      Many

      oracle 1.6.0_20

      Good (2)

      LinkedIn, Cloudera

      oracle 1.6.0_21

      Good (2)

      Yahoo!, Cloudera

      oracle 1.6.0_24

      Good

      Cloudera

      oracle 1.6.0_26

      Good(2)

      Hortonworks, Cloudera

      oracle 1.6.0_28

      Good

      LinkedIn

      oracle 1.6.0_31

      Good(3, 4)

      Cloudera, Hortonworks

    $ sudo apt upgrade
    $ sudo apt install openjdk-8-jre openjdk-8-jdk

    JAVA 可以使用 Oracle 的JDK 或者直接安装 OpenJDK,这里安装的是 OpenJDK 8。

    可以使用 java -version 来查看 java 版本。

    安装 java 之后编辑 /etc/profile ,配置 Java、Hadoop 环境变量:

    $ sudo vi /etc/profile
    # set oracle jdk 、hadoop environment
    export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64 export HADOOP_HOME=/home/wu/hadoop-3.1.0
    $ source /etc/profile

    使用以下命令检查环境值:

    $ echo $JAVA_HOME
    /usr/lib/jvm/java-1.8.0-openjdk-amd64
    
    $ echo $HADOOP_HOME
    /home/wu/hadoop-3.1.0

    ssh

    $ sudo apt install ssh

    安装过程:

    Prepare to Start the Hadoop Cluster

    在 hadoop 安装目录下,编辑 etc/hadoop/hadoop-env.sh 文件:

    export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64

    可以测试一下 hadoop:

    $ bin/hadoop

    或者:

    $ bin/hadoop version
    Hadoop
    3.1.0 Source code repository https://github.com/apache/hadoop -r 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d Compiled by centos on 2018-03-30T00:00Z Compiled with protoc 2.5.0 From source with checksum 14182d20c972b3e2105580a1ad6990 This command was run using /home/wu/hadoop-3.1.0/share/hadoop/common/hadoop-common-3.1.0.jar

     启动 Hadoop

     启动hadoop集群有三种模式:

    1. 本地(独立)模式,Local (Standalone) Mode
    2. 伪分布式模式,Pseudo-Distributed Mode
    3. 完全分布式模式、Fully-Distributed Mode

    1、针对本地模式 Standalone Operation :

    By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

    The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

    可以使用下面的一个例子简单使用 hadoop :

      $ mkdir input
      $ cp etc/hadoop/*.xml input
      $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep input output 'dfs[a-z.]+'
      $ cat output/*

    2、针对伪分布式模式 Pseudo-Distributed Mode :

    Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

    配置环境:

      $ sudo vi etc/hadoop/core-site.xml
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
    </configuration>

        $ sudo vi etc/hadoop/hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
    </configuration>

    设置 SSH 免密码登陆(Setup passphraseless ssh):

    先测试是否能免密码登陆(Now check that you can ssh to the localhost without a passphrase):

      $ ssh localhost

    如果需要密码才能登陆,则执行下面的命令(If you cannot ssh to localhost without a passphrase, execute the following commands):

      $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
      $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
      $ chmod 0600 ~/.ssh/authorized_keys

     执行(Execution):

    The following instructions are to run a MapReduce job locally.

    格式化文件系统:

      $ bin/hdfs namenode -format

    启动 namenode、datanode 守护进程:

      $ sbin/start-dfs.sh

      Starting namenodes on [localhost]
      Starting datanodes
      Starting secondary namenodes [ubuntu]

    The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

    此时可以访问namenode的web服务:http://localhost:9870/,查看namenode健康状况,可以观察到有一个存活的datanode节点。

     执行mapreduce任务

    # 在分布式文件系统中创建用户目录
    $ bin/hdfs dfs -mkdir /user  
    $ bin/hdfs dfs -mkdir /user/root  
      
    # 拷贝数据到分布式文件系统中  
    $ bin/hdfs dfs -mkdir -p input  
    $ bin/hdfs dfs -put etc/hadoop/*.xml input  
      
    # 运行hadoop提供的mapreduce任务  
    $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep /input output 'dfs[a-z.]+'  
      
    # 拷贝任务执行结果到本地文件系统中

     $ sudo bin/hdfs dfs -mkdir -p output

     $ bin/hdfs dfs -get output output  

     $ cat output/*  

      
    # 或直接从分布式文件系统中查看计算结果  
    # $ bin/hdfs dfs -cat output/*  

     最后,结束守护进程:

     $ sbin/stop-dfs.sh
  • 相关阅读:
    蛤蟆先生去看心理医生
    8条github使用小技巧
    (数据科学学习手札137)orjson:Python中最好用的json库
    FastAPI官方教程太棒了(上)
    大厂必备的40个方法论
    CodeReview技巧和规范
    Java并发编程之美
    硬件开发笔记(四):硬件开发基本流程,制作一个USB转RS232的模块(三):设计原理图
    RK3568开发笔记(三):RK3568虚拟机基础环境搭建之更新源、安装网络工具、串口调试、网络连接、文件传输、安装vscode和samba共享服务
    公司服务器建站笔记(三):腾讯云服务器CentOS8.2安装界面环境,使用vnc远程登陆并搭建轻量级Qt服务器
  • 原文地址:https://www.cnblogs.com/sylar5/p/9156660.html
Copyright © 2020-2023  润新知