• 【原】centos6.5下hadoop cdh4.6 安装


    1、架构准备:
         namenode 10.0.0.2     
         secondnamenode 10.0.0.3
         datanode1 10.0.0.4
         datanode2 10.0.0.6
         datanode3 10.0.0.11
    2、安装用户:cloud-user
    3、[namenode]namenode到其他节点ssh无密码登录:
         ssh-keygen     (一路回车)
         ssh-copy-id cloud-user@10.0.0.3
         ssh-copy-id cloud-user@10.0.0.4
         ssh-copy-id cloud-user@10.0.0.6
         ssh-copy-id cloud-user@10.0.0.11
    4、[ALL]cdh源的准备:
         sudo rpm --import RPM-GPG-KEY-cloudera
         sudo yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm
         sudo yum clean all && sudo yum makecache
    5、[ALL]java的安装:这里用的是jdk-6u45-linux-x64-rpm.bin
          chmod +x  jdk-6u45-linux-x64-rpm.bin && sudo ./jdk-6u45-linux-x64-rpm.bin
    6、 [ALL]添加java的环境变量:
        /etc/profile中添加:
    1 export JAVA_HOME=/usr/java/jdk1.6.0_45
    2 export JRE_HOME=$JAVA_HOME/jre   
    3 export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH 
    4 export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
    5 export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
       使之生效:source /etc/profile
    7、安装cdh包:
         sudo yum install -y hadoop-yarn-resourcemanager hadoop-mapreduce-historyserver hadoop-yarn-proxyserver hadoop-hdfs-namenode //仅在namenode上安装
         sudo yum install -y hadoop-hdfs-namenode //仅在secondnamenode上安装
         sudo yum install -y hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce //在datanode1,datanode2,datanode3上安装
         sudo yum install -y hadoop-lzo-cdh4 //在所有节点上安装
    8、[ALL]配置hadoop:
         sudo service iptables stop && sudo service ip6tables stop
         sudo echo -e "namenode secondnamenode" >> /etc/hadoop/conf/masters &&   sudo echo -e "datanode1 datanode2 datanode3" >> /etc/hadoop/conf/slaves
         sudo cp -r /etc/hadoop/conf.dist /etc/hadoop/conf.my_cluster &&  sudo alternatives --verbose --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50 &&   sudo alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
       core-site.xml:
    <property>
         <name>fs.defaultFS</name>
         <value>hdfs://sdc</value>
    </property>
    <property>
         <name>ha.zookeeper.quorum</name>
         <value>datanode1:2181,datanode2:2181,datanode3:2181</value>
    </property>
    <property>
         <name>fs.trash.interval</name>
         <value>10080</value>
    </property>
    <property>
         <name>fs.trash.checkpoint.interval</name>
         <value>10080</value>
    </property>
    <property>
         <name>hadoop.native.lib</name>
         <value>true</value>
    </property>
    <property>
         <name>hadoop.proxyuser.mapred.groups</name>
         <value>*</value>
    </property>
    <property>
         <name>hadoop.proxyuser.mapred.hosts</name>
         <value>*</value>
    </property>
    <property> 
       <name>hadoop.proxyuser.oozie.hosts</name> 
       <value>10.0.0.2</value> 
    </property> 
    <property> 
       <name>hadoop.proxyuser.oozie.groups</name> 
       <value>*</value> 
    </property> 
    <property>
       <name>io.compression.codecs</name>
       <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
    </property>
      hdfs-site.xml:
    <property>
         <name>dfs.replication</name>
         <value>3</value>
    </property>
    <property>
         <name>dfs.nameservices</name>
         <value>sdc</value>
    </property>
    <property>
         <name>dfs.ha.namenodes.sdc</name>
         <value>nn1,nn2</value>
    </property>
    <!--配置rpc通信地址:dfs.namenode.rpc-address.[nameservice ID].-->
    <property>
         <name>dfs.namenode.rpc-address.sdc.nn1</name>
         <value>namenode:8020</value>
    </property>
    <property>
         <name>dfs.namenode.rpc-address.sdc.nn2</name>
         <value>secondnamenode:8020</value>
    </property>
    <!--配置http通信地址:dfs.namenode.http-address.[nameservice ID] -->
    <property>
         <name>dfs.namenode.http-address.sdc.nn1</name>
         <value>namenode:50070</value>
    </property>
    <property>
         <name>dfs.namenode.http-address.sdc.nn2</name>
         <value>secondnamenode:50070</value>
    </property>
    <property>
         <name>dfs.namenode.shared.edits.dir</name>
         <value>qjournal://datanode1:8485;datanode2:8485;datanode3:8485/sdc</value>
    </property>
    <property>
         <name>dfs.permissions.superusergroup</name>
         <value>hadoop</value>
    </property>
    <property> 
         <name>dfs.permissions</name> 
         <value>false</value> 
    </property> 
    <property> 
         <name>dfs.permissions.enabled</name> 
         <value>false</value> 
    </property>
    <property>
         <name>dfs.journalnode.edits.dir</name>
         <value>/data/1/dfs/jn</value>
    </property>
    <!--配置客户端failover,解决客户端故障转移-->
    <property>
         <name>dfs.client.failover.proxy.provider.sdc</name> 
         <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
         <name>dfs.namenode.name.dir</name>
         <value>file:///data/1/dfs/nn,/nfsmount/dfs/nn</value>
    </property>
    <!--配置:Fencing,
    这里dfs.ha.fencing.methods实现的方法有两种sshfence和shell,我下面实现的是sshfence,dfs.ha.fencing.ssh.private-key-files这个是ssh的key file ,于在Active 节点切换期间的安全机制,确保在任何时间都只有一个NameNode 处于活跃状态。在故障切换期间,haadmin 命令确保在将其它NameNode 转换为Active 状态之前Active 节点处在Standby 状态,或其进程已被终止。
    至少应该配置一个,因为没有默认配置,因此如果配置则HA 机制将会失效。
    如果要实现自定义的安全机制,参照org.apache.hadoop.ha.NodeFencer
    -->
    <property>
         <name>dfs.ha.fencing.methods</name>
         <value>sshfence</value>
    </property>
    <property>
         <name>dfs.ha.fencing.ssh.private-key-files</name>
         <value>/home/cloud-user/.ssh/id_rsa</value>
    </property>
    <!--启用失败自动切换-->
    <property>
         <name>dfs.ha.automatic-failover.enabled</name>
         <value>true</value>
    </property>
    <!--配置zk集群信息-->
    <property>
         <name>ha.zookeeper.quorum</name>
         <value>datanode1:2181,datanode2:2181,datanode3:2181</value>
    </property>
    <property>
         <name>dfs.datanode.data.dir</name>
         <value>file:///data/1/dfs/dn,/data/2/dfs/dn,/data/3/dfs/dn</value>
    </property>
    <property>
         <name>dfs.webhdfs.enabled</name>
         <value>true</value>
    </property>
    <property> 
         <name>dfs.ha.fencing.ssh.connect-timeout</name> 
         <value>10000</value> 
    </property>

      mapred-site.xml:

    <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
    </property>
    <property>
         <name>yarn.app.mapreduce.am.staging-dir</name>
         <value>/user</value>
    </property>
    <property>
         <name>mapreduce.jobhistory.address</name>
         <value>namenode:10020</value>
    </property>
    <property>
         <name>mapreduce.jobhistory.webapp.address</name>
         <value>namenode:19888</value>
    </property>

      yarn-site.xml:

    <property>
         <name>yarn.resourcemanager.resource-tracker.address</name>
         <value>namenode:8031</value>
    </property>
    <property>
         <name>yarn.resourcemanager.address</name>
         <value>namenode:8032</value>
    </property>
    <property>
         <name>yarn.resourcemanager.scheduler.address</name>
         <value>namenode:8030</value>
    </property>
    <property>
         <name>yarn.resourcemanager.admin.address</name>
         <value>namenode:8033</value>
    </property>
    <property>
         <name>yarn.resourcemanager.webapp.address</name>
         <value>namenode:8088</value>
    </property>
    <property>
         <name>yarn.web-proxy.address</name>
         <value>namenode:8100</value>
    </property>
    <property>
         <description>Classpath for typical applications.</description>
         <name>yarn.application.classpath</name>
         <value>
         $HADOOP_CONF_DIR,
         $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
         $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
         $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
         $YARN_HOME/*,$YARN_HOME/lib/*
         </value>
    </property>
    <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce.shuffle</value>
    </property>
    <property>
         <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
         <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
         <name>yarn.nodemanager.local-dirs</name>
         <value>file:///data/1/yarn/local,file:///data/2/yarn/local,file:///data/3/yarn/local</value>
    </property>
    <property>
         <name>yarn.nodemanager.log-dirs</name>
         <value>file:///data/1/yarn/logs,file:///data/2/yarn/logs,file:///data/3/yarn/logs</value>
    </property>
    <property>
         <description>Where to aggregate logs</description>
         <name>yarn.nodemanager.remote-app-log-dir</name>
         <value>hdfs://var/log/hadoop-yarn/apps</value>
    </property>
    9、建立配置所需文件目录:
    1 sudo mkdir -p /data/1/yarn/local /data/2/yarn/local /data/3/yarn/local /data/4/yarn/local && sudo mkdir -p /data/1/yarn/logs /data/2/yarn/logs /data/3/yarn/logs /data/4/yarn/logs && sudo chown -R yarn:yarn /data/1/yarn/local /data/2/yarn/local /data/3/yarn/local /data/4/yarn/local && sudo chown -R yarn:yarn /data/1/yarn/logs /data/2/yarn/logs /data/3/yarn/logs /data/4/yarn/logs
    2 datanode:
    3 sudo mkdir -p /data/1/dfs/jn && sudo chown -R hdfs:hdfs /data/1/dfs/jn
    4 Namenode:
    5 sudo mkdir -p /data/1/dfs/nn /nfsmount/dfs/nn && sudo  chown -R hdfs:hdfs /data/1/dfs/nn /nfsmount/dfs/nn &&sudo  chmod 700 /data/1/dfs/nn /nfsmount/dfs/nn
    6 Datanode:
    7 sudo mkdir -p /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn && sudo chown -R hdfs:hdfs /data/1/dfs/dn /data/2/dfs/dn /data/3/dfs/dn /data/4/dfs/dn
    10、[datanode]在datanode上安装zookeeper:
      sudo yum install -y zookeeper
        配置/etc/zookeeper/conf/zoo.cfg:
    tickTime=2000
    dataDir=/var/lib/zookeeper/
    clientPort=2181
    initLimit=5
    syncLimit=2
    server.1=datanode1:2888:3888
    server.2=datanode2:2888:3888
    server.3=datanode3:2888:3888

       

      配置id:
      sudo echo 1 > /var/lib/zookeeper/myid //仅在namenode1上执行
      sudo echo 2 > /var/lib/zookeeper/myid //仅namenode2上执行
      sudo echo 3 > /var/lib/zookeeper/myid //namenode3上执行
      sudo chown -R zookeeper:zookeeper /var/lib/zookeeper //在所有datanode上

      启动zookeeper:
      sudo /usr/lib/zookeeper/bin/zkServer.sh start
      查看状态:
      sudo /usr/lib/zookeeper/bin/zkServer.sh status

     
    11、namenode上格式化ZooKeeper集群:
         sudo -u hdfs hdfs zkfc -formatZK
    12、[datanode]安装并启动JournalNode集群:
         sudo yum install -y hadoop-hdfs-journalnode
         sudo service hadoop-hdfs-journalnode start
    13、[namenode]namenode上格式化namenode并启动:
         sudo -u hdfs hdfs namenode -format
         sudo service hadoop-hdfs-namenode start
    14、[secondnamenode]同步数据到secondnamenode上并启动:
         sudo -u hdfs hdfs namenode -bootstrapStandby
         sudo service hadoop-hdfs-namenode start
    15、[datanode]启动所有的datanode:
         sudo service hadoop-hdfs-datanode start   
    16、[namenode]启动yarn服务:
         sudo service hadoop-yarn-resourcemanager start
         sudo service hadoop-mapreduce-historyserver start
         sudo service hadoop-yarn-proxyserver start
    17、[datanode]datanode上启动yarn:
         sudo service hadoop-yarn-nodemanager start
    18、[namenode + secondnamenode]安装并启动ZooKeeperFailoverCotroller:
       sudo yum install -y hadoop-hdfs-zkfc
         sudo service hadoop-hdfs-zkfc start
    19、[ALL]检测状态:
         sudo /usr/java/jdk1.6.0_45/bin/jps
    20、建立hdfs文件目录(有些是以后用到的):
        sudo -u hdfs hadoop fs -chmod 777 /user
         sudo -u hdfs hadoop fs -mkdir /user/history
         sudo -u hdfs hadoop fs -chmod -R 1777 /user/history
         sudo -u hdfs hadoop fs -chown mapred:hadoop /user/history
         sudo -u hdfs hadoop fs -mkdir /var/log/hadoop-yarn
         sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn

         sudo -u hdfs hadoop fs -mkdir /tmp
         sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
         sudo -u hdfs hadoop fs -mkdir /user/hive
         sudo -u hdfs hadoop fs -mkdir /user/hive/warehouse
         sudo -u hdfs hadoop fs -chown -R hive /user/hive
         sudo -u hdfs hadoop fs -chmod -R 1777 /user/hive/warehouse

         sudo -u hdfs hadoop fs -mkdir /tmp/hadoop-mapred
         sudo -u hdfs hadoop fs -mkdir /tmp/hive-hive

         sudo -u hdfs hadoop fs -chmod -R 777 /tmp/hadoop-mapred
         sudo -u hdfs hadoop fs -chmod -R 777 /tmp/hive-hive

         sudo -u hdfs hadoop fs -mkdir /user/cloud-user
         sudo -u hdfs hadoop fs -chown cloud-user:cloud-user /user/cloud-user

    21、测试map-reduce: 
        cd /usr/lib/hadoop-mapreduce
         hadoop jar hadoop-mapreduce-examples-2.0.0-cdh4.6.0.jar pi 2 10
  • 相关阅读:
    kubernetes 之配置问题
    awk 输入文件包含某个关键字的整行记录或者某个字段
    Kubernetes资源在线生成工具
    使用print计算一串字符串各字符出现的个数
    Kubernetes 部署可视化dashboard管理面板
    nexus 3.37.302安装
    阿里云免费证书验证,nginx返回403,/.wellknown/pkivalidation/
    通过docker安装openldap并通过ldapadmin配置用户与第三方软件集成ldap配置
    kubernetes创建storageclass
    k8s快速生成一个deploymenet的yaml文件
  • 原文地址:https://www.cnblogs.com/yuandianliws/p/3716602.html
Copyright © 2020-2023  润新知