5 Hbase(nosql)体系结构有基本操作 flume pig
Google bigtable的开源实现
列式数据库
可集群化
可以使用shell web api多种方式访问
适合高速读写的场景
Hql查询语言
noSQL的典型代表
逻辑模型
以表的形式存放数据
表由行和列组成,每个列属于某个列族,由行和列确定的存储单元称为元素
每个元素保存了同一份数据的多个版本,由时间戳来区分
行键
数据行在表里的唯 一标识,作为检索记录的主键
访问表里的行:
通过单个行键访问
给定行键的范围访问
全表扫描
行键可以是最大长度不超过64k的任意字符串
列族和列
列族需要在定义表时指定
表是在插入记录时动态生成
列表示: <列族>:<限定符>
Hbase在磁盘上按列族存储
时间戳
对应每次数据操作的时间
Hbase支持两种数据版本的回收方式:
每个数据单元,只存储指定个数的最新版本
保存指定时间长度的版本
时间查询: 最新数据/全部版本数据
元素由行键 列族:限定符 时间戳来决定
元素以字节码形式存放,没有类型之分
物理模型
适合海量数据的秒组查询
表中的记录,按照行键进行拆分,拆分成一个个的region(startkey,endkey)
Region存储在region server(单独的物理机器)中
Hbase-default.xml 列族存放的最大值为10g
体系结构
主从式结构,由hmaster和hregionServer组成
通过zookeeper的master election机制来保证hmaster的运行
Hbase中有两张物殊的表
-root- 记录了.meta.表的region信息
.meta. 记录用户表的region信息
用户访问数据先访问zookeeper--->-root-,接着找.meta.找到用户数据的位置
Hbase的伪分布的安装
Hbase的安装与配置 查找0.20.2对应hbase的版本
单机安装
下载地址:http://mirror.bjtu.edu.cn/apache/hbase/hbase-0.90.5
解压到指定目录
可以将hbase添加到二环境变量中etc/profile
Export HBASE_HOME=解压路径
使配置文件生效source /etc/profile
修改/software/hbase/hbase-0.90.5/conf/hbase-env.sh文件,设置java_home
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/sdk/jdk1.6.0_34
/配置hbase-ste.xml文件,添加如下内容:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///software/hbase/hbase-0.90.5/data</value>
</property>
</configuration>
启动hbase并验证
安装目录下的bin/start-hbase.sh
查看启动情况: jdk_home/bin/ jps,可以看到如下内容:
root@vm:/sdk/jdk1.6.0_34/bin# jps
4131 DataNode
5761 TaskTracker
3375 NameNode
4894 SecondaryNameNode
4955 JobTracker
6079 Jps
5973 HMaster
安装目录下的bin/hbase shell,显示如下:
root@vm:/software/hbase/hbase-0.90.5# bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011
hbase(main):001:0> quit
root@vm:/software/hbase/hbase-0.90.5#
伪分布模式
在单点模式的基础上
1 编辑hbase-env.sh添加HBASE_CLASSPATH环境变量,添加如下内容
# Extra Java CLASSPATH elements. Optional.
export HBASE_CLASSPATH=/software/hadoop/hadoop-0.20.2/conf
#打开文件最后的配置项,hbase用自已实例的zookeeper来管理
export HBASE_MANAGES_ZK=true
2 编辑hbase-site.xml打开分布模式
<configuration>
<property>
<name>hbase.rootdir</name>
<!--
<value>file:///software/hbase/hbase-0.90.5/data</value>
-->
<value>hdfs://localhost:9000/hbase<value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>vm</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
可选文件 regionservers所在节点,设置节点名
覆盖hadoop核心jar包
//首先禁用原有的jar文件
root@vm:/software/hbase/hbase-0.90.5/lib# mv hadoop-core-0.20-append-r1056497.jar hadoop-core-0.20-append-r1056497.old
//添加hadoop安装目录下的核心jar包到lib目录
root@vm:/software/hbase/hbase-0.90.5/lib# cp /software/hadoop/hadoop-0.20.2/hadoop-0.20.2-core.jar .
root@vm:/software/hbase/hbase-0.90.5/lib# ls
...
guava-r06.jar log4j-1.2.16.jar
hadoop-0.20.2-core.jar protobuf-java-2.3.0.jar
hadoop-core-0.20-append-r1056497.old ruby
jackson-core-asl-1.5.5.jar servlet-api-2.5-6.1.14.jar
...
启动hbase,同上,显示如下结果:
root@vm:/sdk/jdk1.6.0_34/bin# jps
10477 Jps
5145 JobTracker
5049 SecondaryNameNode
5898 TaskTracker
4285 DataNode
9562 HMaster
10320 HRegionServer
3496 NameNode
9509 HQuorumPeer
root@vm:/sdk/jdk1.6.0_34/bin#
验证启动
完全分布模式
配置hosts,确保主机名可以解析为ip
编辑hbase-env.xml
编辑hbase-site.xml
编辑regionservers文件
把hbase复制到其它节点
启动hbase
验证启动
也可以通过ie访问http://localhost:60010/master.jsp
Shell操作
Notallmetaregionsonlineexception问题
修改/etc/hosts文件,添加如下内容:
127.0.0.1 localhost
127.0.0.1 vm
Help 帮肋
State 查看数据库状态
hbase(main):003:0> status
1 servers, 0 dead, 0.0000 average load
Version 查看数据库版本
hbase(main):004:0> version
0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011
创建表
Create ‘表名’,’列族名称1’,’列族名称2’
添加记录
Put ‘表名称’,’行名称’,’列名称’,’值’
查看记录
Get ‘表名’,’行名称
查看表中的记录总数
Count ‘表名’
删除记录
Delete ‘表名’,’行名称’,’列名称’
删除一张表
Drop ‘表名’
查看所有记录
Scan ‘表名’
查看某个表中的列有所有数据
Scan ‘表名’,{COLUMNS=>’列族名称:列名称’}
更新记录
重写一遍进行覆盖
具体操作如下:
创建表
hbase(main):001:0> create 'user','user_id','address','info'
0 row(s) in 2.2520 seconds
查看
hbase(main):002:0> list
TABLE
user
1 row(s) in 0.0300 seconds
查看表中列族的描述信息
hbase(main):003:0> describe 'user'
DESCRIPTION ENABLED
{NAME => 'user', FAMILIES => [{NAME => 'address', B true
LOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COM
PRESSION => 'NONE', VERSIONS => '3', TTL => '214748
3647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}, {NAME => 'info', BLOOMFILTER
=> 'NONE', REPLICATION_SCOPE => '0', COMPRESSION =
> 'NONE', VERSIONS => '3', TTL => '2147483647', BLO
CKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
=> 'true'}, {NAME => 'user_id', BLOOMFILTER => 'NO
NE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE
', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE
=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'tr
ue'}]}
1 row(s) in 0.0340 seconds
删除表
Disable ‘user’
Drop ‘user’
添加记录
put 'users','retacn','info:age','32';
put 'users','retacn','info:birthday','1984-0*-0*';
put 'users','retacn','info:company','none';
put 'users','retacn','address:contry','china';
put 'users','retacn','address:province','shandong';
put 'users','retacn','address:city','zibo';
put 'users','zhansan','info:age','30';
put 'users','zhansan','info:birthday','1984-08-08';
put 'users','zhansan','info:company','hony';
put 'users','zhansan','address:contry','china';
put 'users','zhansan','address:province','shandong';
put 'users','zhansan','address:city','zibo'
取得一行(id)记录
hbase(main):020:0> get 'users','retacn'
COLUMN CELL
address:city timestamp=1449211197068, value=zibo
address:contry timestamp=1449211197048, value=china
address:province timestamp=1449211197061, value=shandong
info:age timestamp=1449211197012, value=32
info:birthday timestamp=1449211197025, value=1984-09-04
info:company timestamp=1449211197038, value=none
取得一行(id),一个列族的所有数据
hbase(main):022:0> get 'users','retacn','address'
COLUMN CELL
address:city timestamp=1449211197068, value=zibo
address:contry timestamp=1449211197048, value=china
address:province timestamp=1449211197061, value=shandong
取得一行(id),一个列族的一列的所有数据
hbase(main):023:0> get 'users','retacn','info:age'
COLUMN CELL
info:age timestamp=1449211197012, value=32
更新记录
重复添加即可覆盖
hbase(main):024:0> put 'users','retacn','info:age','31'
0 row(s) in 0.0220 seconds
hbase(main):025:0> get 'users','retacn','info:age'
COLUMN CELL
info:age timestamp=1449211528248, value=31
取得单元格数据的版本数据
hbase(main):026:0> get 'users','retacn',{COLUMN=>'info:age',VERSIONS=>1}
COLUMN CELL
info:age timestamp=1449211528248, value=31
1 row(s) in 0.0360 seconds
hbase(main):027:0> get 'users','retacn',{COLUMN=>'info:age',VERSIONS=>2}
COLUMN CELL
info:age timestamp=1449211528248, value=31
info:age timestamp=1449211197012, value=32
版本号
hbase(main):030:0> describe 'users'
DESCRIPTION ENABLED
{NAME => 'users', FAMILIES => [{NAME => 'address', true
BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', CO
MPRESSION => 'NONE', VERSIONS => '3', TTL => '21474
83647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}, {NAME => 'info', BLOOMFILTE
R => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION
=> 'NONE', VERSIONS => '3', TTL => '2147483647', BL
OCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACH
E => 'true'}, {NAME => 'user_id', BLOOMFILTER => 'N
ONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NON
E', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE
=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 't
rue'}]}
取得单元格数据的某个版本数据
hbase(main):029:0> get 'users','retacn',{COLUMN=>'info:age',TIMESTAMP=>1449211528248}
COLUMN CELL
info:age timestamp=1449211528248, value=31
全表扫描
hbase(main):031:0> scan 'users'
ROW COLUMN+CELL
retacn column=address:city, timestamp=1449211197068, value=zibo
retacn column=address:contry, timestamp=1449211197048, value=chin
a
retacn column=address:province, timestamp=1449211197061, value=sh
andong
retacn column=info:age, timestamp=1449211528248, value=31
retacn column=info:birthday, timestamp=1449211197025, value=1984-
09-04
retacn column=info:company, timestamp=1449211197038, value=none
zhansan column=address:city, timestamp=1449211208677, value=zibo
zhansan column=address:contry, timestamp=1449211208664, value=chin
a
zhansan column=address:province, timestamp=1449211208670, value=sh
andong
zhansan column=info:age, timestamp=1449211208579, value=30
zhansan column=info:birthday, timestamp=1449211208596, value=1984-
09-03
zhansan column=info:company, timestamp=1449211208605, value=hony
2 row(s) in 0.0720 seconds
删除行的’info:age’ 字段
Delete ‘users’,’retacn’,’info:age’
删除整行
Deleteall ‘users’,’retacn’
统计表的行数
Count ‘users’
清空表
Truncate ‘users’
退出hbase shell
quit
Hbase 中javaAPI的操作
示例代码如下:
/**
* Copyright (C) 2015
*
* FileName:HbaseApiTest.java
*
* Author:<a href="mailto:zhenhuayue@sina.com">Retacn</a>
*
* CreateTime: 2015-12-4
*/
// Package Information
package cn.yue.hbase;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
/****
* 测试hbase javaAPI
*
* @version
*
* @Description:
*
* @author <a href="mailto:zhenhuayue@sina.com">Retacn</a>
*
* @since 2015-12-4
*
*/
public class HbaseApiTest {
public static final String TABLE_NAME="employee";
public static final String FAMILY_NAME="id";
public static final String ROW_KEY="retacn";
/**
*
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
Configuration conf=new Configuration();
conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase");
conf.set("hbase.zookeeper.quorum","127.0.0.1");
conf.set("hbase.zookeeper.property.clientPort","2181");
//用于创建删除表
final HBaseAdmin hbaseAdmin=new HBaseAdmin(conf);
//创建表
createTable(hbaseAdmin);
}
/**
* 创建表
* @param hbaseAdmin
* @throws IOException
*/
private static void createTable(final HBaseAdmin hbaseAdmin) throws IOException {
if(!hbaseAdmin.isTableEnabled(TABLE_NAME)){
HTableDescriptor tableDescriptor=new HTableDescriptor(TABLE_NAME);
HColumnDescriptor family=new HColumnDescriptor(FAMILY_NAME);
tableDescriptor.addFamily(family);
hbaseAdmin.createTable(tableDescriptor);
}
}
}
创建完成查看结果如下:
hbase(main):001:0> list
TABLE
employee
users
2 row(s) in 0.7510 seconds
hbase(main):002:0> describe 'employee'
DESCRIPTION ENABLED
{NAME => 'employee', FAMILIES => [{NAME => 'id', BL true
OOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMP
RESSION => 'NONE', VERSIONS => '3', TTL => '2147483
647', BLOCKSIZE => '65536', IN_MEMORY => 'false', B
LOCKCACHE => 'true'}]}
添加一条记录
/**
* Copyright (C) 2015
*
* FileName:HbaseApiTest.java
*
* Author:<a href="mailto:zhenhuayue@sina.com">Retacn</a>
*
* CreateTime: 2015-12-4
*/
// Package Information
package cn.yue.hbase;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
/****
* 测试hbase javaAPI
*
* @version
*
* @Description:
*
* @author <a href="mailto:zhenhuayue@sina.com">Retacn</a>
*
* @since 2015-12-4
*
*/
public class HbaseApiTest {
public static final String TABLE_NAME="employee";
public static final String FAMILY_NAME="id";
public static final String ROW_KEY="retacn";
/**
*
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
Configuration conf=new Configuration();
conf.set("hbase.rootdir", "hdfs://localhost:9000/hbase");
conf.set("hbase.zookeeper.quorum","127.0.0.1");
conf.set("hbase.zookeeper.property.clientPort","2181");
//用于创建删除表
final HBaseAdmin hbaseAdmin=new HBaseAdmin(conf);
//创建表
createTable(hbaseAdmin);
//添加一条记录
final HTable hTable=new HTable(conf, TABLE_NAME);
Put put=new Put(ROW_KEY.getBytes());
put.add(FAMILY_NAME.getBytes(), "age".getBytes(), "32".getBytes());
hTable.put(put);
//删除表
//hbaseAdmin.deleteTable(TABLE_NAME);
}
/**
* 创建表
* @param hbaseAdmin
* @throws IOException
*/
private static void createTable(final HBaseAdmin hbaseAdmin) throws IOException {
if(!hbaseAdmin.isTableEnabled(TABLE_NAME)){
HTableDescriptor tableDescriptor=new HTableDescriptor(TABLE_NAME);
HColumnDescriptor family=new HColumnDescriptor(FAMILY_NAME);
tableDescriptor.addFamily(family);
hbaseAdmin.createTable(tableDescriptor);
}
}
}
添加完成后查看结果如下:
hbase(main):003:0> get 'employee','retacn'
COLUMN CELL
id:age timestamp=1449369554428, value=32
如果出现以下错误信息,需要添加配置以下
15/12/06 10:37:31 ERROR zookeeper.ZKConfig: no clientPort found in zoo.cfg
示例代码如下:
conf.set("hbase.zookeeper.property.clientPort","2181");
查询一条记录,示例代码如下:
/**
* 查询一条记录
*
* @param hTable
* @throws IOException
*/
private static void getRecord(final HTable hTable) throws IOException {
Get get = new Get(ROW_KEY.getBytes());
final Result result = hTable.get(get);
final byte[] value = result.getValue(FAMILY_NAME.getBytes(), "age".getBytes());
System.out.println(result + " " + new String(value));
}
查询结果如下:
keyvalues={retacn/id:age/1449369554428/Put/vlen=2} 32
查询所有记录,示例代码如下:
/**
* 查询所有记录
*
* @param hTable
* @throws IOException
*/
private static void getAll(final HTable hTable) throws IOException {
Scan scan = new Scan();
final ResultScanner scanner = hTable.getScanner(scan);
for (Result result : scanner) {
final byte[] value = result.getValue(FAMILY_NAME.getBytes(), "age".getBytes());
System.out.println(result + " " + new String(value));
}
}
查询结果如下:
keyvalues={retacn/id:age/1449369554428/Put/vlen=2} 32