HBase 是什么

Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.

HBase 是 Hadoop database 一个分布式(文件在HDFS上)的可拓展(分区域存储的)的大数据仓库。用于存储和检索海量数据

Use Apache HBase™ when you need random随机, realtime实时 read/write access存取 to your Big Data. This project's goal is the hosting of very large tables -- billions数十亿 of rows X millions百万 of columns -- atop clusters of commodity hardware商用机器. Apache HBase is an open-source, distributed, versioned多版本(数据可以有多个版本的值), non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

与传统 RDBMS 数据库，HBase 対与海量数据查询检索速度有明显速度上的优势

Table in HBase

Schema：TableName & Column Family Name
意味着 HBase 每一行的列不一定相同，不占据空间(RDBMS为NULL也占据空间)
Value 和 Name 都是使用 byte[] 数组存储在 HDFS 中

HBase 是一个面向列的数据库，数据按列存储

一个数据单元 Cell 包括了：rowkey + columnfamily + [column] + timestamp : value

columnfamily(列簇)：字段的类别 eg: basic 包括了(name,age,birthday...)
rowkey(行关键字)：类似RDBMS中的主键，作为行的唯一标识符，每个 cell 都，快速查询的关键有 eg:ID

Example

![](http://images2017.cnblogs.com/blog/1047249/201707/1047249-20170731172419911-340741011.png

相关阅读:
HTTP 协议详解
SQL中Group By的使用
转mysql 多表 update sql语句总结
数据库SQL优化大总结之百万级数据库优化方案
PHP中include和require的区别详解
【奇怪的知识二】：部分测试名词解释
【python】UI自动化测试浏览器内部命令模拟F12
【python】pip设置永久阿里云镜像源
【奇怪的知识一】：网页缓存清理
【python】脚本输出接口json数据为表格

原文地址：https://www.cnblogs.com/cenzhongman/p/7263710.html