• Docker 安装Hadoop集群


    资源准备:jdk1.8及hadoop2.7.3

    链接:https://pan.baidu.com/s/1x8t1t2iY46jKkvNUBHZlGQ 
    提取码:g1gm 
    复制这段内容后打开百度网盘手机App,操作更方便哦

    1、下载Ubuntu镜像

    docker pull ubuntu:14.04

    2、运行Ubuntu安装环境

    #启动ubuntu容器
    docker run -ti ubuntu

    3、ubuntu安装基础环境

      安装ssh、vim等常用工具

    4、ubuntu安装jdk环境

          linux系统安装jdk环境

    5、ubuntu hadoop配置

    #解压hadoop文件
    tar xvzf hadoop-2.7.3.tar.gz
    
    修改~/.bashrc文件。在文件末尾加入下面配置信息:
    
    export JAVA_HOME=/usr/lib/jvm/java-7-oracle
    export HADOOP_HOME=/root/soft/apache/hadoop/hadoop-2.7.3
    export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin

    这里创建了三个目录,后续配置的时候会用到:

    1. tmp:作为Hadoop的临时目录
    2. namenode:作为NameNode的存放目录
    3. datanode:作为DataNode的存放目录

      1).core-site.xml配置

      

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/root/soft/apache/hadoop/hadoop-2.7.3/tmp</value>
                <description>A base for other temporary directories.</description>
        </property>
    
        <property>
                <name>fs.default.name</name>
                <value>hdfs://master:9000</value>
                <final>true</final>
                <description>The name of the default file system.  A URI whose
                scheme and authority determine the FileSystem implementation.  The
                uri's scheme determines the config property (fs.SCHEME.impl) naming
                the FileSystem implementation class.  The uri's authority is used to
                determine the host, port, etc. for a filesystem.</description>
        </property>
    </configuration>

    注意:

    • hadoop.tmp.dir配置项值即为此前命令中创建的临时目录路径。
    • fs.default.name配置为hdfs://master:9000,指向的是一个Master节点的主机(后续我们做集群配置的时候,自然会配置这个节点,先写在这里)

      2).hdfs-site.xml配置

      使用命令nano hdfs-site.xml编辑hdfs-site.xml文件:

      

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>2</value>
            <final>true</final>
            <description>Default block replication.
            The actual number of replications can be specified when the file is created.
            The default is used if replication is not specified in create time.
            </description>
        </property>
    
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/root/soft/apache/hadoop/hadoop-2.7.3/namenode</value>
            <final>true</final>
        </property>
    
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/root/soft/apache/hadoop/hadoop-2.7.3/datanode</value>
            <final>true</final>
        </property>
    </configuration>

    注意:

    • 我们后续搭建集群环境时,将配置一个Master节点和两个Slave节点。所以dfs.replication配置为2。
    • dfs.namenode.name.dirdfs.datanode.data.dir分别配置为之前创建的NameNode和DataNode的目录路径

      3).mapred-site.xml配置

      Hadoop安装文件中提供了一个mapred-site.xml.template,所以我们之前使用了命令cp mapred-site.xml.template mapred-site.xml,创建了一个mapred-site.xml文件。

      下面使用命令nano mapred-site.xml编辑这个文件:

      

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>mapred.job.tracker</name>
            <value>master:9001</value>
            <description>The host and port that the MapReduce job tracker runs
            at.  If "local", then jobs are run in-process as a single map
            and reduce task.
            </description>
        </property>
    </configuration>

    这里只有一个配置项mapred.job.tracker,我们指向master节点机器。

      4)修改JAVA_HOME环境变量

      hadoop-env.sh、mapred-env.sh、yarn-env.sh添加jdk环境配置,修改如下配置:

    # The java implementation to use.
    export JAVA_HOME=/soft/jdk-8u161

      5)格式化 namenode

      这是很重要的一步,执行命令hadoop namenode -format

    6、hadoop集群启动

  • 相关阅读:
    IDEA的Debug详解
    websocket学习(转载)
    Shiro授权及注解式开发
    Redis分布式缓存安装和使用
    JEESZ-SSO解决方案
    英语是学习Java编程的基础吗
    深入分析 ThreadLocal 内存泄漏问题
    这些JVM命令配置参数你知道吗?
    安全开发Java动态代理
    学java编程软件开发,非计算机专业是否能学
  • 原文地址:https://www.cnblogs.com/raorao1994/p/9945130.html
Copyright © 2020-2023  润新知