• How To Setup Apache Hadoop On CentOS


    he Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

    The project includes these modules:

    • Hadoop Common: The common utilities that support the other Hadoop modules.
    • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
    • Hadoop YARN: A framework for job scheduling and cluster resource management.
    • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

    This article will help you for step by step install and configure single node hadoop cluster using Hadoop on centos.

    Install Java

    Before installing hadoop make sure you have java installed on your system. Use this command to check the version of the installed Java.

    java -version
    java version "1.7.0_75"
    Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

    To install or update Java use following step by step instructions.

    First step is to download latest version of java from the Oracle official website.

    cd /opt/
    wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz"
    tar xzf jdk-7u79-linux-x64.tar.gz

    Need to set up to use newer version of Java using alternatives. Use the following commands to do it.

    cd /opt/jdk1.7.0_79/
    alternatives --install /usr/bin/java java /opt/jdk1.7.0_79/bin/java 2
    alternatives --config java
    
    There are 3 programs which provide 'java'.
    Selection    Command
    -----------------------------------------------
    *  1           /opt/jdk1.7.0_60/bin/java
    + 2           /opt/jdk1.7.0_72/bin/java
    3           /opt/jdk1.7.0_79/bin/java
    Enter to keep the current selection[+], or type selection number: 3 [Press Enter]

    Now you may also required to set up javac and jar commands path using alternatives command.

    alternatives --install /usr/bin/jar jar /opt/jdk1.7.0_79/bin/jar 2
    alternatives --install /usr/bin/javac javac /opt/jdk1.7.0_79/bin/javac 2
    alternatives --set jar /opt/jdk1.7.0_79/bin/jar
    alternatives --set javac /opt/jdk1.7.0_79/bin/javac

    The next step is to configure environment variables. Use following commands to set up these variable properly

    • Setup JAVA_HOME Variable
    export JAVA_HOME=/opt/jdk1.7.0_79
    
    • Setup JRE_HOME Variable
    export JRE_HOME=/opt/jdk1.7.0_79/jre
    
    • Setup PATH Variable
    export PATH=$PATH:/opt/jdk1.7.0_79/bin:/opt/jdk1.7.0_79/jre/bin

    Installing Apache Hadoop

    After setting up the java environment. Let stat installing Apache Hadoop.

    The first step is to create a system user account to use for hadoop installation.

    useradd hadoop
    passwd hadoop

    Now you need to configure the ssh keys for the user hadoop. Using following command to enable ssh login without password.

    su - hadoop
    ssh-keygen -t rsa -P ''
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    chmod 0600 ~/.ssh/authorized_keys
    exit

    Now download hadoop latest available version from its official site hadoop.apache.org.

    cd ~
    wget http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
    tar xzf hadoop-2.6.0.tar.gz
    mv hadoop-2.6.0 hadoop

    Now the next step is to set environment variable uses by hadoop.

    Edit ~/.bashrc file and add the following listes of  values at end of file.

     
    export HADOOP_HOME=/home/hadoop/hadoop
    export HADOOP_INSTALL=$HADOOP_HOME
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

    Then apply the changes in current running environment

    source ~/.bashrc

    edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable

    export JAVA_HOME=/opt/jdk1.7.0_79/

    Now you start with the configuration with basic hadoop single node cluster setup.

    First edit hadoop configuration files and make following changes.

     cd /home/hadoop/hadoop/etc/hadoop

    Let’s start by editing core-site.xml

    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    </property>
    </configuration>
    

    Then Edit hdfs-site.xml:

    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>dfs.name.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
    </property>
    <property>
    <name>dfs.data.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
    </property>
    </configuration>
    

    and edit mapred-site.xml:

    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    </configuration>
    

    finally edit yarn-site.xml:

    <configuration>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    </configuration>

    Now format the namenode using following command:

    hdfs namenode -format

    To start all hadoop services use the following command:

    cd /home/hadoop/hadoop/sbin/
    start-dfs.sh
    start-yarn.sh

    To check if all services are started well use ‘jps‘ command:

    jps

    You should see like this output.

    26049 SecondaryNameNode
    25929 DataNode
    26399 Jps
    26129 JobTracker
    26249 TaskTracker
    25807 NameNode

    Now you can in your Browser at:

    Verify all applications for cluster:http://your-ip-address:8088/

    Verify all Hadoop Services : http://your-ip-address:50070/

    hadoop

    Thanks!!!

    Referred : http://www.unixmen.com/setup-apache-hadoop-centos

     

     

     

     

    ----------------------------

    hadoop安装完以后,经常会提示一下警告:

    WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... 
    using builtin-java classes where applicable

    搜了好多文章,都说是跟系统位数有关系,我使用的是Centos 6.5 64位操作系统。
    前两天在做Docker镜像的时候发现了一个步骤可以解决这个问题,亲试了一下,果然不再提示了。
    首先下载hadoop-native-64-2.4.0.tar: http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.4.0.tar 如果你是hadoop2.6的可以下载下面这个: http://dl.bintray.com/sequenceiq/sequenceiq-bin/hadoop-native-64-2.6.0.tar

    下载完以后,解压到hadoop的native目录下,覆盖原有文件即可。操作如下:
    tar -x hadoop-native-64-2.4.0.tar -C  hadoop/lib/native/


    1、linux下jps command not found

    hadoop启动,使用命令jps,可是却提示找不到命令,

    hadoop执行jps 报
    jps -bash: jps: command not found

    这条命令是在jdk下的bin目录下的一个可执行文件,我查看了一下我的jdk目录,发现有jps可执行文件,但是只是没有放在环境变量里面而已,环境变量可以通过etho $PATH命令查看。

    所以就要自己加上去,以root身份vi /etc/profile,然后在下面加一行export PATH="usr/java/jdk160_05/bin:$PATH",其中橘色的部分是你把jdk安装在哪的路径和jdk文件夹名称。保存退出。

    然后source /etc/profile就可以,没报错就说明是成功了,再执行jps就看到了。

     

    启动SSHD

    service sshd start 

     

    关闭firewall:

    systemctl stop firewalld.service #停止

    firewall systemctl disable firewalld.service #禁止firewall开机启动

    firewall-cmd --state #查看默认防火墙状态(关闭后显示notrunning,开启后显示running)

    How To Install HBASE

    Download and unpack the latest release.

    1. Choose a download site from this list of Apache Download Mirrors. Download the latest stable or fresh release, hbase-0.96.1.1-hadoop2-bin.tar.gz.
    2. Upload file to /opt - decompress and untar:
      tar xvfz hbase-0.96.1.1-hadoop2-bin.tar.gz
    3.  Extract and setup HBase
    4. Similar to what you have done for Hadoop, extract HBase and rename the folder. Run the following command from the location where you have saved hbase files

      $tar xvfz hbase-0.96.0-hadoop2-bin.tar.gz
      $mv hbase-0.96.0-hadoop2 hbase
      • Note: If you are using Hbase 0.96 version
        Hbase 0.96 version uses older version of hadoop libraries. They are not compatible with Hadoop-2.2, we have downloaded. It uses beta version of hadoop common jar files. If you continue using, you will get errors like,
        org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcServerException): Unknown out of band call #xxxxxx 
        in hbase log files. To fix the problem, we need to remove all beta jar files from hbase and use the correct set of files. To start with we remove all the files which are not compatible
        $cd /home/hdtest/hbase/lib
        $rm -rf hadoop*.jar

        Once the files are removed, copy the correct files from hadoop installation

        $cd /home/hdtest/hbase/lib
        $cp $HADOOP_HOME/share/hadoop/common/hadoop*.jar .
        $cp $HADOOP_HOME/share/hadoop/hdfs/hadoop*.jar .
        $cp $HADOOP_HOME/share/hadoop/mapreduce/hadoop*.jar .
        $cp $HADOOP_HOME/share/hadoop/tools/lib/hadoop*.jar .
        $cp $HADOOP_HOME/share/hadoop/yarn/hadoop*.jar .

      Update the Hbase configuration file at hbase/conf/hbase-site.conf. Add the following content between configuration tag.

       <property>
          <name>hbase.rootdir</name>
          <value>hdfs://127.0.0.1:54310/sample</value>
        </property>
        <property>
          <name>hbase.zookeeper.property.dataDir</name>
          <value>hdfs://127.0.0.1:54310/zookeeper</value>
        </property>
        <property>
          <name>hbase.zookeeper.property.clientPort</name>
          <value>2181</value>
          <description>Property from ZooKeeper's config zoo.cfg.
                       The port at which the clients will connect.</description>
        </property>

      If you want to use a local file system instead of HDFS, replace the URL withfile:///your/preferred/path/. Now lets start the HBase instance bu running teh following command.

      $cd /home/hdtest/hbase/bin
      $./start-hbase.sh

      Hbase will not be started automatically in this configuration. You have to run this command again on your next reboot. Once this command is executed and you are back in shell prompt, you can check the log files if something is wrong. You can find the log files under/home/hdtest/hbase/logs folder. If you don’t see any issue, lets try to use hbase. Open the hbase shell and run a simple list command.

      $./hbase shell
      HBase Shell; enter 'help<RETURN>' for list of supported commands.
      Type "exit<RETURN>" to leave the HBase Shell
      Version 0.96.0-hadoop2, r1531434, Fri Oct 11 15:28:08 PDT 2013
      
      hbase(main):001:0> list
      TABLE                                                                                                                                              
      0 row(s) in 3.1580 seconds
      
      => []
      hbase(main):002:0>

      There are not tables on Hbase. That is why you see the outpit as []. Now lets create a table and run the list command.

      hbase(main):003:0> create 'sample', 'r'
      0 row(s) in 0.5550 seconds
      
      => Hbase::Table - sample
      hbase(main):004:0> list
      TABLE                                                                                                                                              
      sample                                                                                                                                             
      1 row(s) in 0.0600 seconds
      
      => ["sample"]
      hbase(main):005:0>

      Congrats..!!! you have your Hadoop and HBase running.

      **** and now for the previous Hadoop from the above posting:

       

    How to Install Hive

    HIVE INSTALLATION

    This section refers to the installation settings of Hive on a standalone system as well as on a system existing as a node in a cluster.

    INTRODUCTION

    Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Apache Hive supports analysis of large datasets stored in Hadoop’s HDFS and compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL(Hive Query Language) while maintaining full support for map/reduce.

    Hive Installation

    Installing HIVE:

    • Browse to the link: http://apache.claz.org/hive/stable/

    • Click the apache-hive-0.13.0-bin.tar.gz

    • Save and Extract it

      Commands

      user@ubuntu:~$  cd  /usr/lib/
      user@ubuntu:~$  sudo mkdir hive
      user@ubuntu:~$  cd Downloads
      user@ubuntu:~$  sudo mv apache-hive-0.13.0-bin /usr/lib/hive
      

    Setting Hive environment variable:

    Commands

    user@ubuntu:~$  cd
    user@ubuntu:~$  sudo gedit  ~/.bashrc
    

    Copy and paste the following lines at end of the file

    # Set HIVE_HOME
    export HIVE_HOME="/usr/lib/hive/apache-hive-0.13.0-bin"
    PATH=$PATH:$HIVE_HOME/bin
    export PATH
    

    Setting HADOOP_PATH in HIVE config.sh

    Commands

    user@ubuntu:~$ cd  /usr/lib/hive/apache-hive-0.13.0-bin/bin
    user@ubuntu:~$ sudo gedit hive-config.sh
    

    Go to the line where the following statements are written

    # Allow alternate conf dir location.
    HIVE_CONF_DIR="${HIVE_CONF_DIR:-$HIVE_HOME/conf"
    export HIVE_CONF_DIR=$HIVE_CONF_DIR
    export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH
    

    Below this write the following

    export HADOOP_HOME=/usr/local/hadoop    (write the path where hadoop file is there)
    

    Create Hive directories within HDFS

    Command

    user@ubuntu:~$   hadoop fs -mkdir /usr/hive/warehouse
    

    Setting READ/WRITE permission for table

    Command

    user@ubuntu:~$  hadoop fs -chmod g+w /usr/hive/warehouse
    

    HIVE launch

    Command

    user@ubuntu:~$  hive
    

    Hive shell will prompt:

    OUTPUT

    Shell will look like

    Logging initialized using configuration in jar:file:/usr/lib/hive/apache-hive-0.13.0-bin/lib/hive- common-0.13.0.jar!/hive-log4j.properties
    hive>show databases
    

    Creating a database

    Command

    hive> create database mydb;
    

    OUTPUT

    OK
    Time taken: 0.369 seconds
    hive>
    

    Configuring hive-site.xml:

    Open with text-editor and change the following property

    <property>
        <name>hive.metastore.local</name>
        <value>TRUE</value>
        <description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://usr/lib/hive/apache-hive-0.13.0-bin/metastore_db? createDatabaseIfNotExist=true</value>
        <description>JDBC connect string for a JDBC metastore</description>
    </property>
    
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>
    
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/usr/hive/warehouse</value>
        <description>location of default database for the warehouse</description>
     </property>
    

    Writing a Script

    Open a new terminal (CTRL+ALT+T)

    user@ubuntu:~$      sudo gedit sample.sql
    
    create database sample;
    use sample;
    create table product(product int, productname string, price float)[row format delimited fields terminated by ',';]
    describe product;
    

    load data local inpath ‘/home/hduser/input_to_product.txt’ into table product

    select * from product;
    

    SAVE and CLOSE

    user@ubuntu:~$ sudo gedit input_to_product.txt
    user@ubuntu:~$ cd /usr/lib/hive/apache-hive-0.13.0-bin/ $ bin/hive -f /home/hduser/sample.sql

     启动 Hive时, 遇到如下错误

    Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
    [ERROR] Terminal initialization failed; falling back to unsupported
    java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected

    解决办法:

    删掉以下文件即可 

    $HADOOP_Home/share/hadoop/yarn/lib/jline-0.9.94.jar

  • 相关阅读:
    Linux 进程管理
    强大的bat文件搞定系统所有问题
    Java多线程设计要点
    Linux 内核
    Linux 文件和目录管理之列出、删除、复制、移动及改名
    命令dd 及简单应用
    Transferring Files with SFTP or SCP
    简述Linux文件搜索
    加强Eclipse代码自动提示的方法
    Cisco交换机配置新手篇之端口配置
  • 原文地址:https://www.cnblogs.com/haoliansheng/p/5116996.html
Copyright © 2020-2023  润新知