Hadoop 2.4.0+zookeeper3.4.6+hbase0.98.3分布式集群搭建
Ip |
主机名 |
程序 |
进程 |
192.168.137.11 |
h1 |
Jdk Hadoop hbase |
Namenode DFSZKFailoverController Hamster |
192.168.137.12 |
h2 |
Jdk Hadoop hbase |
Namenode DFSZKFailoverController Hamster |
192.168.137.13 |
h3 |
Jdk Hadoop |
resourceManager |
192.168.137.14 |
h4 |
Jdk Hadoop Zookeeper hbase |
Datanode nodeManager JournalNode QuorumPeerMain HRegionServer |
192.168.137.15 |
h5 |
Jdk Hadoop Zookeeper Hbase |
Datanode nodeManager JournalNode QuorumPeerMain HRegionServer |
192.168.137.16 |
h6 |
Jdk Hadoop Zookeeper hbase |
Datanode nodeManager JournalNode QuorumPeerMain HRegionServer |
准备工作
- 修改Linux主机名
Vim /etc/sysconfig/network
添加 HOSTNAME=h1
- 修改IP
vim /etc/sysconfig/network-scripts/ifcfg-eth0
修改IPADDR=192.168.137.11
- 修改主机名和IP的映射关系
Vim /etc/hosts
添加192.168.137.11 h1
- 关闭防火墙
service iptables stop
- ssh免登陆
ssh-keygen –t rsa //产生公钥和私钥
拷贝公钥到其他电脑(h2为主机名)
ssh-copy-id -i h2
- 安装JDK,配置环境变量等
这里可以在一台电脑上配置,然后拷贝到其他电脑
scp –r /home/jdk/ h2:/home/
都做完可以重启一下电脑
//-----------------------------------------------------------------------------------
//----------------------------zookeeper集群安装-------------------------------------
//-----------------------------------------------------------------------------------
安装zookeeper
解压 tar –zxvf zookeeper-3.4.6.tar.gz
1.修改配置文件conf/ zoo_sample.cfg 为zoo.cfg
mv zoo_sample.cfg zoo.cfg
打开修改内容:
dataDir=/home/gj/zookeeper-3.4.6/data //数据目录,可随意定义
最后面添加:
server.1=h4:2888:3888
server.2=h5:2888:3888
server.3=h6:2888:3888
// server.X=A:B:C
其中X是一个数字, 表示这是第几号server.
A是该server所在的IP地址.
B配置该server和集群中的leader交换消息所使用的端口.
C配置选举leader时所使用的端口.
注意这里需要创建data文件夹
进入data文件夹创建文件myid 内容为1
1表示这是第几号server, 与server.X=A:B:C中的X对应
2.将配置到的zookeeper拷贝到其他电脑(h2,h3)上
使用 scp -r 命令
分别修改 myid文件内容为2,3
- 启动三个节点的 bin目录下的./zkServer.sh start
也可以将zookeeper 配置到环境变量里面
//-----------------------------------------------------------------------------------
//----------------------------zookeeper集群安装-------------------------------------
//-----------------------------------------------------------------------------------
//-----------------------------------------------------------------------------------
//----------------------------Hadoop集群安装-------------------------------------
//-----------------------------------------------------------------------------------
安装hadoop
修改文件:
1.hadoop-env.sh
export JAVA_HOME=/usr/hadoop/jdk //添加java环境
2.core-site.xml
<configuration>
<!--指定hdfs的nameservice为ns1-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
<!--指定hadoop数据存放目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop/hadoop-2.4.0/tmp</value>
</property>
<!--指定zookeeper地址-->
<property>
<name>ha.zookeeper.quorum</name>
<value>h4:2181,h5:2181,h6:2181</value>
</property>
</configuration>
- hdfs-site.xml
<configuration>
<!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!-- ns1下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>h1:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>h1:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>h2:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>h2:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://h4:8485;h5:8485;h6:8485/ns1</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/root/hadoop/hadoop-2.4.0/journal</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
</configuration>
4. mapred-site.xml.template 重命名为mapred-site.xml
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5. yarn-site.xml
<configuration>
<!-- 指定resourcemanager地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>h3</value>
</property>
<!-- 指定nodemanager启动时加载server的方式为shuffle server -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
6.slaves
h4
h5
h6
将在一台电脑上配置好的hadoop拷贝到其他电脑
启动hadoop 和zookeeper(已配置到环境变量里面)
1.先启动zookeeper在h4,h5,h6上
zkServer.sh start
查看状态zkServer.sh status(会发现有一个leader,两个follower)
2. 启动journalnode(在h1上启动)
hadoop-daemons.sh start journalnode
3.格式化HDFS(在h1上启动)
hadoop namenode –format
此时会在hadoop目录里面产生tmp文件夹,将这个文件夹拷贝到h2上
- 格式化ZK(在h1上启动)
hdfs zkfc –formatZK
- 启动hadoop(在h1上启动)
start-all.sh
此时可能在h3上的resourceManager没有启动,可以进入h3启动start-yarn.sh
这时就可以通过web查看hadoop集群的各个状态,也可以用jps 命令查看进程
//-----------------------------------------------------------------------------------
//----------------------------Hadoop集群安装-------------------------------------
//-----------------------------------------------------------------------------------
//-----------------------------------------------------------------------------------
//----------------------------hbase集群安装-------------------------------------
//-----------------------------------------------------------------------------------
hbase 集群配置
- conf/hbase-env.sh
java_home=java路径
export HBASE_MANAGES_ZK=false
使用独立的ZooKeeper时需要修改HBASE_MANAGES_ZK值为false,为不使用默认ZooKeeper实例。
2. conf/hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://h1:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>h1:60000</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
<description>The port master should bind to.</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>h4,h5,h6</value>
</property>
3.conf/ regionservers
h4
h5
h6
启动hbase
在h1 上
start-hbase.sh
在h2上
start-hbase.sh
这是可以通过web查看hbase的状态 ,会发现像namenode一样有一个Active 状态的hmaster和Standby 状态的hmaster
至此完成集群。
//-----------------------------------------------------------------------------------
//----------------------------hbase集群安装-------------------------------------
//-----------------------------------------------------------------------------------
//-----------------------------------------------------------------------------------
//----------------------------storm集群安装-------------------------------------
//-----------------------------------------------------------------------------------
storm的集群安装与配置
机器:
192.168.180.101
192.168.187.16
需要准备的软件有:
zookeeper(zookeeper-3.4.4.tar.gz),storm(storm-0.8.1.zip) ,jdk
1、配置zookeeper
解压zookeeper,将conf目录下的zoo_sample.cfg 重命名为:zoo.cfg
修改后内容为:
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/data/zookeeper/data dataLogDir=/data/zookeeper/log # the port at which the clients will connect clientPort=2181 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1 server.1=192.168.187.16:2888:3888 server.2=192.168.180.101:2888:3888
具体配置可以参看:
http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_configuration
注意,最后两行的配置:
格式为:server.id=host:port:port
id只能为数字 1-255,同时需要在 dataDir目录下面新建一个文件名为myid的文件,里面的内容只有一行:"id"
Every machine that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. You accomplish this with the series of lines of the form server.id=host:port:port. The parameters host and port are straightforward. You attribute the server id to each machine by creating a file named myid, one for each server, which resides in that server's data directory, as specified by the configuration file parameterdataDir.
接着还需要添加环境变量:
export ZOOKEEPER_HOME=/home/zhxia/apps/db/zookeeper
两台机器上环境配置相同,但是myid文件内的id指不一样
2、配置storm
解压storm
进入conf目录,编辑storm.yaml文件
########## These MUST be filled in for a storm configuration storm.zookeeper.servers: - "192.168.187.16" - "192.168.180.101" nimbus.host: "192.168.187.16" storm.local.dir: "/data/storm/data" ##### These may optionally be filled in: # List of custom serializations # topology.kryo.register: # - org.mycompany.MyType # - org.mycompany.MyType2: org.mycompany.MyType2Serializer # ## List of custom kryo decorators # topology.kryo.decorators: # - org.mycompany.MyDecorator # Locations of the drpc servers # drpc.servers: # - "127.0.0.1" #- "server2" ## to nimbus #nimbus.childopts: "-Xmx1024m" # ## to supervisor #supervisor.childopts: "-Xmx1024m" # ## to worker #worker.childopts: "-Xmx768m"
配置完成之后,开始启动zookeeper和storm
启动zookeeper
bin/zkServer.sh start
启动storm
bin/storm nimbus
bin/storm supervisor
bin/storm ui
浏览器打开: http://localhost:8080 查看集群的运行状态