一、安装前提
1、对于Hbase来说,安装方式与Hadoop类似,分为单节点安装、伪分布式安装和完全分布式安装。
- 单节点安装:单节点安装不依赖与Hadoop,使用本地的文件系统,所有的进程都在一个jvm里面,通常用于测试,这里就不进行演示了。
- 伪分布式安装:伪分布式安装通常也是用于测试,进程存在与独立的jvm里面,但是底层存储使用的Hadoop,所以需要先安装Hadoop(分布式或者伪分布式都行)。
- 分布式安装:分布式安装用于生产环境,进程分布于不同的节点上面,底层使用Hadoop存储数据。
2、不管以什么方式安装,都需要zookeeper,可以使用Hbase自带的zookeeper,也可以使用外部的zookeeper,如果只是测试,可以使用自带的zookeeper,对于生产环境,需要使用外带的zookeeper集群,zookeeper集群通常是奇数个节点。
3、依赖java,最好是java8
4、操作系统:centos7
二、伪分布式安装
1、下载hbase包和jdk
https://mirrors.bfsu.edu.cn/apache/hbase/2.2.6/hbase-2.2.6-bin.tar.gz
http://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/jdk-8u191-linux-x64.tar.gz
如下操作使用普通用户
2、解压
$ tar zxvf hbase-2.2.6-bin.tar.gz -C /data1/hadoop
$ cd /data1/hadoop && mv hbase-2.2.6-bin hbase
$ tar zxvf jdk-8u191-linux-x64.tar.gz -C /data1/hadoop
$ cd /data1/hadoop && mv jdk-8u191-linux-x64 jdk
3、配置环境变量
$ cat ~./bashrc
export HADOOP_HOME=/data1/hadoop/hadoop
export HBASE_HOME=/data1/hadoop/hbase
export JAVA_HOME=/data1/hadoop/jdk
export ZOOKEEPER_HOME=/data1/hadoop/zookeeper
export PATH=${PATH}:${HADOOP_HOME}/bin:${HBASE_HOME}/bin:${HADOOP_HOME}/sbin:${JAVA_HOME}/bin
3、配置hbase
$ cd /data1/hadoop/hbase/conf/
修改hbase-site.xml文件
<property>
<name>hbase.rootdir</name>
<value>hdfs://主机名:8020/hbase</value> <!-- hbase数据存放目录 -->
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/data1/hadoop/hbase/tmp</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name> <!-- 使用自带的zookeeper -->
<value>主机名</value>
</property>
4、修改hbase-env.sh,添加java环境变量
export JAVA_HOME=/data1/hadoop/jdk
5、软链Hadoop配置文件
$ cd /data1/hadoop/hbase/conf/ && ln -s /data1/hadoop/hadoop/etc/hadoop/core-site.xml core-site.xml
$ cd /data1/hadoop/hbase/conf/ && ln -s /data1/hadoop/hadoop/etc/hadoop/hdfs-site.xml hdfs-site.xml
6、启动hbase
$ start-hbase.sh
三、分布式安装
1、节点说明:
主机名 | 进程 |
---|---|
hadoop1 | hmaster |
hadoop2 | hmaster、regionserver |
hadoop3 | regionserver |
2、安装前需要安装好zookeeper
这里提供一个启动多个zookeeper的脚本,该脚本未进行优化,可以自己优化,里面的for循环太多。
[hduser@hadoop1 script]$ cat startZookeeper.sh
#!/bin/bash
host=(hadoop1 hadoop2 hadoop3)
case $1 in
"start")
for i in ${host[*]};do
ssh -t $i /data1/hadoop/zookeeper/bin/zkServer.sh start /data1/hadoop/zookeeper/conf/zoo.cfg > /dev/null 2>&1
if [ $? -eq 0 ];then
echo "start successed"
else
echo "start failed"
fi
done
;;
"jps")
for i in ${host[*]};do
echo "------$i------"
ssh -T $i jps
echo
done
;;
"stop")
for i in ${host[*]};do
ssh -T $i /data1/hadoop/zookeeper/bin/zkServer.sh stop > /dev/null 2>&1
if [ $? -eq 0 ];then
echo "stop successed"
else
echo "stop failed"
fi
done
;;
"restart")
for i in ${host[*]};do
ssh -T $i /data1/hadoop/zookeeper/bin/zkServer.sh restart > /dev/null 2>&1
if [ $? -eq 0 ];then
echo "restart successed"
else
echo "restart failed"
fi
done
;;
"status")
for i in ${host[*]};do
status=`ssh -T $i /data1/hadoop/zookeeper/bin/zkServer.sh status /data1/hadoop/zookeeper/conf/zoo.cfg |grep Mode|awk -F : '{print $NF}'`
echo "$i status : $status"
echo
done
;;
*)
echo "usage $0 start|status"
;;
esac
3、配置hbase
- 配置hbase-env.sh,添加如下参数
export JAVA_HOME=/data1/hadoop/jdk
export HBASE_MANAGES_ZK=false
- 配置hbase-site.xml
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/data1/hadoop/hbase/tmp/</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://yjt/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop1,hadoop2,hadoop3</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/data1/hadoop/data/zookeeper/hbasedata/</value>
</property>
<property>
<name>hbase.zookeeper.property.tickTime</name>
<value>10000</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property>
<property>
<name>data.tx.timeout</name>
<value>1800</value>
</property>
<property>
<name>ipc.socket.timeout</name>
<value>18000000</value>
</property>
<property>
<name>hbase.master.maxclockskew</name>
<value>300000</value>
</property>
<property>
<name>hbase.client.scanner.timeout.period</name>
<value>1800000</value>
</property>
<property>
<name>hbase.rpc.timeout</name>
<value>1800000</value>
</property>
<property>
<name>hbase.client.operation.timeout</name>
<value>1800000</value>
</property>
<property>
<name>hbase.lease.recovery.timeout</name>
<value>3600000</value>
</property>
<property>
<name>hbase.lease.recovery.dfs.timeout</name>
<value>1800000</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>1000</value>
</property>
<property>
<name>hbase.htable.threads.max</name>
<value>7000</value>
</property>
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
<property>
<name>hbase.region.server.rpc.scheduler.factory.class</name>
<value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>
</property>
<property>
<name>hbase.rpc.controllerfactory.class</name>
<value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value>
</property>
<property>
<name>phoenix.query.timeoutMs</name>
<value>1800000</value>
</property>
<property>
<name>phoenix.rpc.timeout</name>
<value>1800000</value>
</property>
<property>
<name>phoenix.coprocessor.maxServerCacheTimeToLiveMs</name>
<value>1800000</value>
</property>
<property>
<name>phoenix.query.keepAliveMs</name>
<value>1800000</value>
</property>
<property>
<name>hbase.master.distributed.log.splitting</name>
<value>false</value>
</property>
<property>
<name>hbase.coprocessor.user.region.classes</name>
<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
</property>
<property>
<name>hbase.regionserver.region.split.policy</name>
<value>org.apache.hadoop.hbase.regionserver.IncreasingToUpperBoundRegionSplitPolicy</value>
</property>
<property>
<name>hbase.ipc.server.callqueue.read.ratio</name>
<value>0.5</value>
</property>
<property>
<name>phoenix.upsert.batch.size</name>
<value>800</value>
</property>
<property>
<name>phoenix.coprocessor.maxServerCacheTimeToLiveMs</name>
<value>1800000</value>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>200</value>
</property>
<property>
<name>hbase.hlog.asyncer.number</name>
<value>10</value>
</property>
<property>
<name>hbase.client.scanner.timeout.period</name>
<value>1800000</value>
</property>
<property>
<name>hbase.rest.threads.max</name>
<value>200</value>
</property>
<property>
<name>hbase.ipc.server.listen.queue.size</name>
<value>200</value>
</property>
<property>
<name>hbase.client.ipc.pool.size</name>
<value>10</value>
</property>
<property>
<name>hfile.block.cache.size</name>
<value>0.4</value>
</property>
<property>
<name>hbase.hstore.flusher.count</name>
<value>10</value>
</property>
<property>
<name>hbase.hstore.blockingStoreFiles</name>
<value>30</value>
</property>
<property>
<name>hbase.regionserver.thread.compaction.small</name>
<value>5</value>
</property>
<property>
<name>hbase.regionserver.thread.compaction.large</name>
<value>5</value>
</property>
<property>
<name>hbase.ipc.server.max.callqueue.size</name>
<value>2140000000</value>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>21474836480</value>
</property>
<property>
<name>hbase.hstore.compactionThreshold</name>
<value>8</value>
</property>
<property>
<name>hbase.ipc.server.callqueue.handler.factor</name>
<value>0.4</value>
</property>
<property>
<name>hbase.ipc.server.callqueue.read.ratio</name>
<value>0.4</value>
</property>
<property>
<name>hbase.ipc.server.call.queue.scan.ratio</name>
<value>0.4</value>
</property>
<property>
<name>hbase.regionserver.executor.openregion.threads</name>
<value>100</value>
</property>
<property>
<name>hbase.hstore.blockingWaitTime</name>
<value>30000</value>
</property>
<property>
<name>hbase.hstore.flusher.count</name>
<value>10</value>
</property>
<property>
<name>hbase.client.write.buffer</name>
<value>8388608</value>
</property>
<property>
<name>hbase.hregion.memstore.block.multiplier</name>
<value>8</value>
</property>
<property>
<name>hbase.hstore.compaction.min</name>
<value>10</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>phoenix.schema.isNamespaceMappingEnabled</name>
<value>true</value>
</property>
<property>
<name>phoenix.schema.mapSystemTablesToNamespace</name>
<value>true</value>
</property>
</configuration>
上述配置用到了phoenix。
- 配置regionservers
$ cat regionservers
hadoop2
hadoop3
- 配置backup-masters(需要手动创建)
$ cat backup-masters
hadoop2
- 软链Hadoop配置
$ cd /data1/hadoop/hbase/conf/ && ln -s /data1/hadoop/hadoop/etc/hadoop/core-site.xml core-site.xml
$ cd /data1/hadoop/hbase/conf/ && ln -s /data1/hadoop/hadoop/etc/hadoop/hdfs-site.xml hdfs-site.xml
4、phoenix安装配置
$ tar zxvf apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz -C /data1/hadoop
$ cd /data1/hadoop && mv apache-phoenix-5.0.0-HBase-2.0-bin phoenix
- 软链hbase配置
$ cd /data1/hadoop/phoenix/bin/ && ln -s /data1/hadoop/hbase/conf/hbase-site.xml hbase-site.xml
- 拷贝phoenix的核心包到hbase lib库下面
cd /data1/hadoop/phoenix && cp phoenix-5.0.0-HBase-2.0-server.jar phoenix-core-5.0.0-HBase-2.0.jar /data1/hadoop/hbase/lib/
5、拷贝当前的hbase目录到其他两个节点,这里提供一个脚本
[hduser@hadoop1 prallel]$ cat parallelScp.sh
#!/bin/bash
source conf.properties
# create pipe file
PIPE_FILE_FILE=./$$.fifo
mkfifo $PIPE_FILE_FILE
# read and write pipe, filedescribe is four
exec 4<> $PIPE_FILE_FILE
# rm pipe file
rm -rf $PIPE_FILE_FILE
# write empty line to pipe
for ((i=0;i<$THREAD_NUM;i++));do
echo ""
done >&4
for host in `cat $HOSTS_FILE`;do
read -u4
{
scp -r $SOURCE_DATA_DIR $host:$TARGER_DATA_DIR
sleep 3
echo "" >&4
}&
done
# wait thread exec down
wait
# rm fd
exec 4>&- # close write
exec 4<&- # close read
exit 0
配置文件:
(base) [hduser@hadoop1 prallel]$ cat ip.txt
hadoop2
hadoop3
(base) [hduser@hadoop1 prallel]$ cat conf.properties
#源文件所在目录
SOURCE_DATA_DIR=/data1/hadoop/kafka
#目标目录
TARGER_DATA_DIR=/data1/hadoop
#并发数量
THREAD_NUM=2
#主机文件
HOSTS_FILE=./ip.txt
6、启动hbase
Hadoop1节点启动:
$ start-hbase.sh