一. 用hbase自带的mapreduce程序来计算如下,
/usr/hdp/2.3.0.0-2557/hbase/bin [root@node1 bin]# ./hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'testtable0728'
计算结果如下:
2016-07-28 16:52:31,924 INFO [main] mapreduce.Job: map 0% reduce 0% 2016-07-28 16:53:41,616 INFO [main] mapreduce.Job: map 25% reduce 0% 2016-07-28 16:54:21,630 INFO [main] mapreduce.Job: map 50% reduce 0% 2016-07-28 16:55:11,663 INFO [main] mapreduce.Job: map 75% reduce 0% 2016-07-28 16:55:15,750 INFO [main] mapreduce.Job: map 100% reduce 0% 2016-07-28 16:55:16,798 INFO [main] mapreduce.Job: Job job_1469584130491_0002 completed successfully 2016-07-28 16:55:17,263 INFO [main] mapreduce.Job: Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=620452 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=816 HDFS: Number of bytes written=0 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=4 Data-local map tasks=4 Total time spent by all maps in occupied slots (ms)=987150 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=493575 Total vcore-seconds taken by all map tasks=493575 Total megabyte-seconds taken by all map tasks=758131200 Map-Reduce Framework Map input records=10103174 Map output records=0 Input split bytes=816 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=1687 CPU time spent (ms)=153920 Physical memory (bytes) snapshot=775319552 Virtual memory (bytes) snapshot=8299081728 Total committed heap usage (bytes)=1178075136 org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters ROWS=10103174 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0
二、加大count命令的CACHE
如下:
hbase(main):001:0> count 'testtable0728', INTERVAL => 100000,CACHE => 100000
结果,数据量大时依然较慢,不及RowCounter一半,RowCounter数一千万条数据三分钟