一、协处理器的种类
1、observer:与触发器相似,回调函数在一些特定事件发生时候被执行。主要接口有RegionObserver、MasterObserver、WALObserver
2、endpoint:与存储过程类似,通过一些远程过程调用来动态扩展RPC协议。
二、 Coprocessor 接口、CoprocessorEnvironment 、CoprocessorHost
Coprocessor 是所有 observer 和 endpoint 共同的接口。
public interface Coprocessor { public void start(CoprocessorEnvironment ce) throws IOException; public void stop(CoprocessorEnvironment ce) throws IOException; }
CoprocessorEnvironment 保存协处理器的环境,其中
public interface CoprocessorEnvironment { public int getVersion(); public String getHBaseVersion(); public Coprocessor getInstance(); public int getPriority(); public int getLoadSequence(); public Configuration getConfiguration(); public HTableInterface getTable(TableName tn) throws IOException; public HTableInterface getTable(TableName tn, ExecutorService es) throws IOException; public ClassLoader getClassLoader(); }
协处理器的状态Coprocessor.State :
UNINSTALLED, INSTALLED, STARTING, ACTIVE, STOPPING, STOPPED
CoprocessorHost 维护所有的Coprocessor实例和CoprocessorEnvironment
public abstract class CoprocessorHost<E extends CoprocessorEnvironment> { public static class Environment implements CoprocessorEnvironment {} public CoprocessorHost(Abortable abortable) { } public static Set<String> getLoadedCoprocessors() {} public Set<String> getCoprocessors() { } protected void loadSystemCoprocessors(Configuration conf, String confKey) {} public E load(Path path, String className, int priority, Configuration conf) throws IOException { } public void load(Class<?> implClass, int priority, Configuration conf) throws IOException { } public E loadInstance(Class<?> implClass, int priority, Configuration conf) throws IOException { } public abstract E createEnvironment(Class<?> type, Coprocessor cprcsr, int i, int i1, Configuration c); public void shutdown(CoprocessorEnvironment ce) {} public Coprocessor findCoprocessor(String string) {} public CoprocessorEnvironment findCoprocessorEnvironment(String string) {} Set<ClassLoader> getExternalClassLoaders() { } protected void abortServer(CoprocessorEnvironment ce, Throwable thrwbl) {} protected void abortServer(String string, Throwable thrwbl) {} protected void handleCoprocessorThrowable(CoprocessorEnvironment ce, Throwable thrwbl) throws IOException {} }
Coprocessor 接口、CoprocessorEnvironment 、CoprocessorHost 构成了协处理器的基础。
三、协处理器的加载
hbase-site.xml 的属性,如:
<property> <name>hbase.coprocessor.master.classes</name> <value>class1,class2</value> </property> <property> <name>hbase.coprocessor.region.classes</name> <value>class3,class4</value> </property> <property> <name>hbase.coprocessor.wal.classes</name> <value>class5,class6</value> </property>
也可以在代码中加载。
四、RegionObserver (observer类别)
1、RegionObserver 处理region 生命周期的事件
public interface RegionObserver extends Coprocessor { public void preOpen(ObserverContext<RegionCoprocessorEnvironment> oc) ; public void postOpen(ObserverContext<RegionCoprocessorEnvironment> oc); public void preFlush(ObserverContext<RegionCoprocessorEnvironment> oc) ; public void postFlush(ObserverContext<RegionCoprocessorEnvironment> oc) ; public InternalScanner preFlush(ObserverContext<RegionCoprocessorEnvironment> oc, Store store, InternalScanner is); public void postFlush(ObserverContext<RegionCoprocessorEnvironment> oc, Store store, StoreFile sf) ; public void preCompactSelection(ObserverContext<RegionCoprocessorEnvironment> oc, Store store, List<StoreFile> list, CompactionRequest cr) ; public void postCompactSelection(ObserverContext<RegionCoprocessorEnvironment> oc, Store store, ImmutableList<StoreFile> il, CompactionRequest cr); public InternalScanner preCompact(ObserverContext<RegionCoprocessorEnvironment> oc, Store store, InternalScanner is, ScanType st, CompactionRequest cr) ; public void postCompact(ObserverContext<RegionCoprocessorEnvironment> oc, Store store, StoreFile sf, CompactionRequest cr) ; public void preSplit(ObserverContext<RegionCoprocessorEnvironment> oc) ; public void preSplit(ObserverContext<RegionCoprocessorEnvironment> oc, byte[] bytes) ; public void postSplit(ObserverContext<RegionCoprocessorEnvironment> oc, HRegion hr, HRegion hr1) ; public void preSplitBeforePONR(ObserverContext<RegionCoprocessorEnvironment> oc, byte[] bytes, List<Mutation> list) ; public void preSplitAfterPONR(ObserverContext<RegionCoprocessorEnvironment> oc) throws IOException; public void preRollBackSplit(ObserverContext<RegionCoprocessorEnvironment> oc) ; public void postRollBackSplit(ObserverContext<RegionCoprocessorEnvironment> oc) ; public void postCompleteSplit(ObserverContext<RegionCoprocessorEnvironment> oc) ; public void preClose(ObserverContext<RegionCoprocessorEnvironment> oc, boolean bln) ; public void postClose(ObserverContext<RegionCoprocessorEnvironment> oc, boolean bln); 。。。。。。。。。。。。。。。 }
2、RegionObserver 处理客户端API事件
public interface RegionObserver extends Coprocessor { 。。。。。。。。。。。。。。。。。。。。。。。。 public void preGetClosestRowBefore(ObserverContext<RegionCoprocessorEnvironment> oc, byte[] bytes, byte[] bytes1, Result result) ; public void postGetClosestRowBefore(ObserverContext<RegionCoprocessorEnvironment> oc, byte[] bytes, byte[] bytes1, Result result) ; public void preGetOp(ObserverContext<RegionCoprocessorEnvironment> oc, Get get, List<Cell> list) ; public void postGetOp(ObserverContext<RegionCoprocessorEnvironment> oc, Get get, List<Cell> list) ; public boolean preExists(ObserverContext<RegionCoprocessorEnvironment> oc, Get get, boolean bln) ; public boolean postExists(ObserverContext<RegionCoprocessorEnvironment> oc, Get get, boolean bln) ; public void prePut(ObserverContext<RegionCoprocessorEnvironment> oc, Put put, WALEdit wale, Durability drblt) ; public void postPut(ObserverContext<RegionCoprocessorEnvironment> oc, Put put, WALEdit wale, Durability drblt) ; public void preDelete(ObserverContext<RegionCoprocessorEnvironment> oc, Delete delete, WALEdit wale, Durability drblt) ; public void postDelete(ObserverContext<RegionCoprocessorEnvironment> oc, Delete delete, WALEdit wale, Durability drblt) ; public void preBatchMutate(ObserverContext<RegionCoprocessorEnvironment> oc, MiniBatchOperationInProgress<Mutation> mboip) ; public void postBatchMutate(ObserverContext<RegionCoprocessorEnvironment> oc, MiniBatchOperationInProgress<Mutation> mboip); public Result preAppend(ObserverContext<RegionCoprocessorEnvironment> oc, Append append) ; public Result preAppendAfterRowLock(ObserverContext<RegionCoprocessorEnvironment> oc, Append append) ; public Result postAppend(ObserverContext<RegionCoprocessorEnvironment> oc, Append append, Result result); public Result preIncrement(ObserverContext<RegionCoprocessorEnvironment> oc, Increment i) ; public Result preIncrementAfterRowLock(ObserverContext<RegionCoprocessorEnvironment> oc, Increment i) ; public Result postIncrement(ObserverContext<RegionCoprocessorEnvironment> oc, Increment i, Result result) ; public void preScannerClose(ObserverContext<RegionCoprocessorEnvironment> oc, InternalScanner is); public void postScannerClose(ObserverContext<RegionCoprocessorEnvironment> oc, InternalScanner is) ; public void preWALRestore(ObserverContext<RegionCoprocessorEnvironment> oc, HRegionInfo hri, HLogKey hlk, WALEdit wale) ; public void postWALRestore(ObserverContext<RegionCoprocessorEnvironment> oc, HRegionInfo hri, HLogKey hlk, WALEdit wale) ; public void preBulkLoadHFile(ObserverContext<RegionCoprocessorEnvironment> oc, List<Pair<byte[], String>> lists) ; public boolean postBulkLoadHFile(ObserverContext<RegionCoprocessorEnvironment> oc, List<Pair<byte[], String>> lists, boolean bln) ;
3、RegionCoprocessor 的环境 RegionCoprocessorEnvironment
public interface RegionCoprocessorEnvironment extends CoprocessorEnvironment { public HRegion getRegion(); public RegionServerServices getRegionServerServices(); public ConcurrentMap<String, Object> getSharedData(); }
其中RegionServerServices 包含内容
public interface RegionServerServices extends OnlineRegions, FavoredNodesForRegion, PriorityFunction { public boolean isStopping(); public HLog getWAL(HRegionInfo hri) throws IOException; public CompactionRequestor getCompactionRequester(); public FlushRequester getFlushRequester(); public RegionServerAccounting getRegionServerAccounting(); public TableLockManager getTableLockManager(); public void postOpenDeployTasks(HRegion hr, CatalogTracker ct) throws KeeperException, IOException; public boolean reportRegionTransition(RegionServerStatusProtos.RegionTransition.TransitionCode tc, long l, HRegionInfo[] hris); public boolean reportRegionTransition(RegionServerStatusProtos.RegionTransition.TransitionCode tc, HRegionInfo[] hris); public RpcServerInterface getRpcServer(); public ConcurrentMap<byte[], Boolean> getRegionsInTransitionInRS(); public FileSystem getFileSystem(); public Leases getLeases(); public ExecutorService getExecutorService(); public CatalogTracker getCatalogTracker(); public Map<String, HRegion> getRecoveringRegions(); public ServerNonceManager getNonceManager(); }
4、RegionCoprocessor 回调函数 的上下文 ObserverContext
用于提供访问当前系统环境的入口,同时添加一些关键功能用以通知协处理器框架在回调函数完成时候需要做什么。
public class ObserverContext<E extends CoprocessorEnvironment> { public E getEnvironment() { } public void prepare(E env) { } public void bypass() { } public void complete() { } public boolean shouldBypass() { } public static <T extends CoprocessorEnvironment> ObserverContext<T> createAndPrepare(T env, ObserverContext<T> context) { } }
其中 bypass() 方法:使用用户提供的值,不使用原本的值。
complete() 方法:完成,跳过后续的处理和其他协处理器。
5、空操作BaseRegionObserver
默认无任何功能,用户自定义协处理器可覆盖其方法。
五、MasterObserver (observer类别)
1、MasterObserver 处理DDL事件
public interface MasterObserver extends Coprocessor { void preCreateTable(final ObserverContext<MasterCoprocessorEnvironment> ctx, HTableDescriptor desc, HRegionInfo[] regions) throws IOException; void postCreateTable(final ObserverContext<MasterCoprocessorEnvironment> ctx, HTableDescriptor desc, HRegionInfo[] regions) throws IOException; void preCreateTableHandler(final ObserverContext<MasterCoprocessorEnvironment> ctx, HTableDescriptor desc, HRegionInfo[] regions) throws IOException; void postCreateTableHandler(final ObserverContext<MasterCoprocessorEnvironment> ctx, HTableDescriptor desc, HRegionInfo[] regions) throws IOException; void preDeleteTable(final ObserverContext<MasterCoprocessorEnvironment> ctx, TableName tableName) throws IOException; void postDeleteTable(final ObserverContext<MasterCoprocessorEnvironment> ctx, TableName tableName) throws IOException; void preDeleteTableHandler( final ObserverContext<MasterCoprocessorEnvironment> ctx, TableName tableName) throws IOException; void postDeleteTableHandler( final ObserverContext<MasterCoprocessorEnvironment> ctx, TableName tableName) throws IOException; void preModifyTable(final ObserverContext<MasterCoprocessorEnvironment> ctx, final TableName tableName, HTableDescriptor htd) throws IOException; void postModifyTable(final ObserverContext<MasterCoprocessorEnvironment> ctx, final TableName tableName, HTableDescriptor htd) throws IOException; 。。。。。。。。。。。。。。。。 }
2、MasterCoprocessor 的 环境 MasterCoprocessorEnvironment
public interface MasterCoprocessorEnvironment extends CoprocessorEnvironment { MasterServices getMasterServices(); }
其中MasterServices 内容:
public interface MasterServices extends Server { AssignmentManager getAssignmentManager(); MasterFileSystem getMasterFileSystem(); ServerManager getServerManager(); ExecutorService getExecutorService(); TableLockManager getTableLockManager(); MasterCoprocessorHost getCoprocessorHost(); void checkTableModifiable(final TableName tableName) ; void createTable(HTableDescriptor desc, byte[][] splitKeys) ; void deleteTable(final TableName tableName) ; void modifyTable(final TableName tableName, final HTableDescriptor descriptor) ; void enableTable(final TableName tableName) throws IOException; void disableTable(final TableName tableName) throws IOException; void addColumn(final TableName tableName, final HColumnDescriptor column) ; void modifyColumn(TableName tableName, HColumnDescriptor descriptor) ; void deleteColumn(final TableName tableName, final byte[] columnName) ; TableDescriptors getTableDescriptors(); boolean isServerShutdownHandlerEnabled(); boolean registerService(Service instance); 。。。。。。。。。。。。。。。。。。。。。 }
3、空类 BaseMasterObserver
默认什么都不做,用户可覆盖其方法。
六、endpoint
用户要实现聚合函数,仅仅使用observer类型的协处理器是不行的。
因为rowkey决定了哪一个region 处理这个请求,所以计算请求只能发送到这个region 。
而要实现聚合函数,就要将请求发送到所有的region,收集所有的region返回结果并合并计算。
endpoint 可实现此功能。
1、CoprocessorProtocol 接口
CoprocessorProtocol 定义如下
public interface CoprocessorProtocol extends VersionedProtocol { public static final long VERSION = 1L; }
其中
public interface VersionedProtocol { public ProtocolSignature getProtocolSignature(String string, long l, int i) throws IOException; }
2、BaseEndpointCoprocessor 类
public abstract class BaseEndpointCoprocessor implements Coprocessor, CoprocessorProtocol, VersionedProtocol { public CoprocessorEnvironment getEnvironment() { } public void start(CoprocessorEnvironment env) { } public void stop(CoprocessorEnvironment env) { } public ProtocolSignature getProtocolSignature(String protocol, long version, int clientMethodsHashCode) throws IOException { } public long getProtocolVersion(String protocol, long clientVersion) throws IOException { } }
3、实现自定义的Endpoint
步骤:
(1)、新建接口A,扩展CoprocessorProtocol接口
(2)、新建 Endpiont 类 B,实现接口A,并继承类 BaseEndpointCoprocessor
案例,实现统计行数的聚合函数:
(1)、建立接口RowCountProtocol
public interface RowCountProtocal extends CoprocessorProtocol { long getRowCount() throws IOException; long getRowCount(Filter filter) throws IOException; }
(2)、新建Endpoint 类RowCountEndPoint
class RowCountEndPoint extends BaseEndpointCoprocessor implements RowCountProtocal { @Override public long getRowCount() throws IOException { return this.getRowCount(new FirstKeyOnlyFilter()); } @Override public long getRowCount(Filter filter) throws IOException { Scan scan = new Scan(); scan.setMaxVersions(1); if (filter != null) { scan.setFilter(null); } RegionCoprocessorEnvironment environment = (RegionCoprocessorEnvironment) this.getEnvironment(); HRegion region = environment.getRegion(); RegionScanner scanner = region.getScanner(scan); int result = 0; List<KeyValue> curVals = new ArrayList<KeyValue>(); try { boolean hasNext = true; while (hasNext) { hasNext = scanner.next(curVals); result++; curVals.clear();; } } finally { scanner.close(); } return result; } }
3、在HBase-site.xml 中添加
<property> <name>hbase.coprocessor.region.classes</name> <value>package.RowCountEndPoint</value> </property>
4、调用
使用 Htable 的 coprocessorExec 方法
HTable mytable = new HTable(conf, "tablename"); byte[] startKey = Bytes.toBytes("100000"); byte[] endKey = Bytes.toBytes("999999"); Map<byte[], Object> map = mytable.coprocessorExec(RowCountProtocal.class, startKey, endKey, Batch.forMethod(RowCountProtocal.class, "getRowCount", new Object[]{new FirstKeyOnlyFilter()})); long total = 0; for (Map.Entry<byte[], Object> en : map.entrySet()) { System.out.println("Region " + Bytes.toString(en.getKey()) + " Count:" + en.getValue()); total += (Long) en.getValue(); } System.out.println("Total Count:" + total);
其运算过程类似MapReduce:Map任务在RegionServer端完成,Reduce任务在客户端完成。