• Hadoop Ecosytem


    There are a lot of Hadoop related projects which are open sourced and widely used by many componies. This article will go through the installations of them.

    Install JDK

    Install Hadoop

    Install Hbase

    Install Hive

    Install Spark

    Install Impala

    Install Sqoop

    Install Alluxio

     

    Install JDK

    Step 1: download package from offical site, and choose appropriate version.

    Step 2: unzip the package and copy to destination folder

    tar zxf jdk-8u111-linux-x64.tar.gz

    cp -R jdk1.8.0_111/* /usr/share

    Step 3: setting PATH and JAVA_HOME

    vi ~/.bashrc

    export JAVA_HOME=/usr/share/jdk1.8.0_111
    export PATH=$JAVA_HOME/bin:$PATH
    export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

    source ~/.bashrc

    Step 4: reboot server to make the changes take effect

    Step 5: check java version

    java -version

    javac -version

    Install Hadoop

    Follow below steps to install Hadoop in standalone mode.

    Step 1: download package from apache site

    Step 2: unzip the package and copy to destination folder

    tar zxf hadoop-2.7.3.tar.gz

    cp -R hadoop-2.7.3/* /usr/share/hadoop

    Step 3: create 'hadoop' fiolder under 'home'

    mkdir /home/hadoop

    Step 4: set PATH and HADOOP_HOME

    vi ~/.bashrc

    export HADOOP_HOME=/usr/share/hadoop
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
    export HADOOP_INSTALL=$HADOOP_HOME

    source ~/.bashrc

    Step 5: check hadoop version

    hadoop version

    Step 6: config hadoop hdfs, core site, yarn and map-reduce

    cd $HADOOP_HOME/etc/hadoop

    vi hadoop-env.sh

    export JAVA_HOME=/usr/share/jdk1.8.0_111

    vi core-site.xml

    <property>
          <name>fs.default.name</name>
          <value>hdfs://localhost:9000</value>
    </property>

    vi hdfs-site.xml

    <property>
         <name>dfs.replication</name>
         <value>1</value>
      </property>
        
      <property>
         <name>dfs.name.dir</name>
         <value>file:///home/hadoop/hadoopinfra/hdfs/namenode</value>
      </property>
        
      <property>
         <name>dfs.data.dir</name>
         <value>file:///home/hadoop/hadoopinfra/hdfs/datanode</value>
      </property>

    vi yarn-site.xml

    <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
    </property>

    cp mapred-site.xml.template mapred-site.xml

    vi mapred-site.xml

    <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
    </property>

    Step 7: initialize hadoop namenode

    hdfs namenode -format

    Step 8: start hadoop

    start-dfs.sh

    start-yarn.sh

    Step 9: check hadoop site to see if it works

    http://localhost:50070/

    http://localhost:8088/

    Install HBase

    Follow below steps to install HBase in standalone mode.

    Step 1: check if Hadoop installed

    hadoop version

    Step 2: download version 1.2.4 of hbase from apache site

    Step3: unzip package and copy to destination folder

    tar zxf hbase-1.2.4-bin.tar.gz

    cp -R hbase-1.2.4-bin/* /usr/share/base

    Step 4: configure hbase env

    cd /usr/shar/hbase/conf

    vi hbase-env.sh

    export JAVA_HOME=/usr/share/jdk1.8.0_111

    Step 5: modify hbase-site.xml

    vi hbase-site.xml

    <configuration>
       //Here you have to set the path where you want HBase to store its files.
       <property>
          <name>hbase.rootdir</name>
          <value>file:/home/hadoop/HBase/HFiles</value>
       </property>
        
       //Here you have to set the path where you want HBase to store its built in zookeeper  files.
       <property>
          <name>hbase.zookeeper.property.dataDir</name>
          <value>/home/hadoop/zookeeper</value>
       </property>
    </configuration>

    Step 6: start hbase and check hbase directory in hdfs

    cd /usr/share/hbase/bin

    start-hbase.sh

    hadoop fs -ls /hbase

    Step 7: check hbase via web interface

    http://localhost:60010

    Install Hive

    Step 1: download version 1.2.1 of hive from apache site

    Step 2: unzip the package and copy to destination folder

    tar zxf apache-hive-1.2.1-bin.tar.gz

    cp -R apache-hive-1.2.1-bin/* /usr/share/hive

    Step 3: set HIVE_HOME

    vi ~/.bashrc

    export HIVE_HOME=/usr/share/hive
    export PATH=$PATH:$HIVE_HOME/bin
    export CLASSPATH=$CLASSPATH:/usr/share/hadoop/lib/*:.
    export CLASSPATH=$CLASSPATH:/usr/share/hive/lib/*:.

    source ~/.bashrc

    Step 4: configure env for hive

    cd $HIVE_HOME/conf

    cp hive-env.sh.template hive-env.sh

    export HADOOP_HOME-/usr/share/hadoop

    Step 5: download version 10.12.1.1 of Apache Derby from apache site

    Step 6: unzip derby package and copy to destination folder

    tar zxf db-derby-10.12.1.1-bin.tar.gz

    cp -R db-derby-10.12.1.1-bin/* /usr/share/derby

    Step 7: setup DERBY_HOME

    vi ~/.bashrc

    export DERBY_HOME=/usr/local/derby
    export PATH=$PATH:$DERBY_HOME/bin
    export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar

    source ~/.bashrc

    Step 8: create a directory to store metastore

    mkdir $DERBY_HOME/data

    Step 9: configure metasore of hive

    cd $HIVE_HOME/conf

    cp hive-default.xml.template hive-site.xml

    <property>
       <name>javax.jdo.option.ConnectionURL</name>
       <value>jdbc:derby://localhost:1527/metastore_db;create=true </value>
       <description>JDBC connect string for a JDBC metastore </description>
    </property>

    Step 10: create a file named jpox.properties and add the following content into it

    touch jpox.properties

    vi jpox.properties

    javax.jdo.PersistenceManagerFactoryClass =
    
    org.jpox.PersistenceManagerFactoryImpl
    org.jpox.autoCreateSchema = false
    org.jpox.validateTables = false
    org.jpox.validateColumns = false
    org.jpox.validateConstraints = false
    org.jpox.storeManagerType = rdbms
    org.jpox.autoCreateSchema = true
    org.jpox.autoStartMechanismMode = checked
    org.jpox.transactionIsolation = read_committed
    javax.jdo.option.DetachAllOnCommit = true
    javax.jdo.option.NontransactionalRead = true
    javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver
    javax.jdo.option.ConnectionURL = jdbc:derby://hadoop1:1527/metastore_db;create = true
    javax.jdo.option.ConnectionUserName = APP
    javax.jdo.option.ConnectionPassword = mine

    Step 11: enter into hive shell and execute command 'show tables'

    cd $HIVE_HOME/bin

    hive

    hive> show tables;

    Install Spark 

    Step 1: download version 2.12.0 of scala from scala site

    Step 2: unzip the package and copy to destination folder

    tar zxf scala-2.12.0.tgz

    cp -R scala-2.12.0/* /usr/share/scala

    Step 3: set PATH for scala

    vi ~/.bashrc

    export PATH=$PATH:/usr/share/scala/bin

    source ~/.bashrc

    Step 4: check scala version

    scala -version

    Step 5: download version 2.0.2 of spark from apache site

    Step 6: unzip the package and copy to destination folder

    tar zxf spark-2.0.2-bin-hadoop2.7.tgz

    copy spark-2.0.2-bin-hadoop2.7/* /usr/share/spark

    Step 7: setup PATH

    vi ~/.bashrc

    export PATH=$PATH:/usr/share/spark/bin

    source ~/.bashrc

    Step 8: enter into spark-shell to see if spark is installed successfully

    spark-shell

    Install Impala

    Step 1: download version 2.7.0 of impala from impala site

    Step 2: unzip the package and copy to destination folder

    tar zxf apache-impala-incubating-2.7.0.tar.gz

    cp -R apache-impala-incubating-2.7.0/* /usr/share/impala

    Step 3: set PATH and IMPALA_HOME

    vi ~/.bashrc

    export IMPALA_HOME=/usr/share/impala
    export PATH=$PATH:/usr/share/impala

    source ~/.bashrc

    Step 4: to be continued...

    Install Sqoop

    Preconditions: should have Hadoop (HDFS and Map-Reduce) installed

    Step 1: download version 1.4.6 of sqoop from apache site

    Step 2: unzip the package and copy to destination folder

    tar zxf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

    cp -R sqoop-1.4.6.bin__hadoop-2.0.4-alpha/* /usr/share/sqoop

    Step 3: set SQOOP_HOME and PATH

    vi ~/.bashrc

    export SQOOP_HOME=/usr/lib/sqoop
    export PATH=$PATH:$SQOOP_HOME/bin

    source ~/.bashrc

    Step 4: configure sqoop

    cd $SQOOP_HOME/conf

    mv sqoop-env-template.sh sqoop-env.sh

    vi sqoop-env.sh

    export HADOOP_COMMON_HOME=/usr/share/hadoop 
    export HADOOP_MAPRED_HOME=/usr/share/hadoop

    Step 5: download version 5.1.40 of mysql-connector-java from site

    Step 6: unzip the package and move related jar file into destination folder

    $ tar -zxf mysql-connector-java-5.1.40.tar.gz
    # cd mysql-connector-java-5.1.40
    # mv mysql-connector-java-5.1.40-bin.jar /usr/lib/sqoop/lib

    Step 7: verify if sqoop is installed successfully

    cd $SQOOP_HOME/bin

    sqoop-version

    Install Alluxio

    Step 1: download version 1.3.0 of alluxio from site

    Step 2: unzip the package and move it to destination folder

    tar zxf alluxio-1.3.0-hadoop2.7-bin.tar.gz

    cp -R alluxio-1.3.0-hadoop2.7-bin/* /usr/share/alluxio

    Step 3: create alluxio-env

    cd /usr/share/alluxio

    bin/alluxio bootstrapConf localhost local

    vi conf/alluxio-env.sh

    export ALLUXIO_UNDERFS_ADDRESS=/tmp

    Step 4: format alluxio file system and start alluxio

    cd /usr/share/alluxio

    bin/alluxio format

    bin/alluxio-start.sh local

    Step 5: verify if alluxio is running by visiting http://localhost:19999

    Step 6: run predefined tests

    cd /usr/share/alluxio

    bin/alluxio runTests

  • 相关阅读:
    检查你的iOS程序是否正在被调试
    破解从 AppStore 下载的 IPA
    在 iOS 中如何发短信
    关于移动应用UI部分管理的一些思考
    在 iOS 应用中实现飞行模式提醒
    如何在iOS应用中拨打电话,并让用户确认
    [转]How to hide inputAccessoryView without dismissing keyboard
    php 读取文件并以文件方式下载
    一个session,判断用户唯一的技巧
    一个简单文件上传代码
  • 原文地址:https://www.cnblogs.com/allanli/p/hadoop_ecosystem.html
Copyright © 2020-2023  润新知