前置条件:您已经安装好Hbase、python2.7
题外话:最好自己安装个虚拟环境,以下操作都是在虚拟环境中的
(ma) hadoop@master:/usr/local/pycharm/bin$ sudo pip install thrift
[sudo] password for hadoop:
The directory '/home/hadoop/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/hadoop/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting thrift
Downloading thrift-0.10.0.zip (87kB)
100% |████████████████████████████████| 92kB 415kB/s
Requirement already satisfied: six>=1.7.2 in /usr/local/lib/python2.7/dist-packages (from thrift)
Installing collected packages: thrift
Running setup.py install for thrift ... done
Successfully installed thrift-0.10.0
(ma) hadoop@master:/usr/local/pycharm/bin$ sudo pip install hbase-thrift
[sudo] password for hadoop:
The directory '/home/hadoop/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/hadoop/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting hbase-thrift
Downloading hbase-thrift-0.20.4.tar.gz
Requirement already satisfied: Thrift in /usr/local/lib/python2.7/dist-packages (from hbase-thrift)
Requirement already satisfied: six>=1.7.2 in /usr/local/lib/python2.7/dist-packages (from Thrift->hbase-thrift)
Installing collected packages: hbase-thrift
Running setup.py install for hbase-thrift ... done
Successfully installed hbase-thrift-0.20.4
Hbase的bin目录下启动bin/./hbase-daemon.sh start thrift
hadoop@master:/opt/Hadoop/hbase-1.3.1/bin$ ./hbase-daemon.sh start thrift
启动pycharm
注意在虚拟环境中启动,其它环境中有可能程序运行不了。
(ma) hadoop@master:/usr/local/pycharm/bin$ ./pycharm.sh
参考文档:http://www.cnblogs.com/hitandrew/archive/2013/01/21/2870419.html,此文档中有的例子运行有问题
创建hbase表:
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
from hbase.ttypes import *
transport = TSocket.TSocket('localhost', 9090);
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport);
client = Hbase.Client(protocol)
transport.open()
contents = ColumnDescriptor(name='cf:', maxVersions=1)
client.createTable('test', [contents])
print client.getTableNames()
输出内容:
/usr/bin/python2.7 /home/py/PycharmProjects/ThirdTest/testThrift.py
['member', 'test']
Process finished with exit code 0
在hbase shell中用list查看有刚才创建的test.
插入数据:
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
from hbase.ttypes import *
transport = TSocket.TSocket('localhost', 9090)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
transport.open()
row = 'row-key1'
mutations = [Mutation(column="cf:a", value="1")]
client.mutateRow('test', row, mutations)
在hbase shell中用scan 'test'查看有刚才创建的test.
hbase(main):001:0> scan 'test'
ROW COLUMN+CELL
row-key1 column=cf:a, timestamp=1506406128150, value=1
1 row(s) in 0.3570 seconds
获取一行数据:
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
from hbase.ttypes import *
transport = TSocket.TSocket('localhost', 9090)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
transport.open()
tableName = 'test'
rowKey = 'row-key1'
result = client.getRow(tableName, rowKey)
print result
for r in result:
print 'the row is ' , r.row
print 'the values is ' , r.columns.get('cf:a').value
输出内容:
/usr/bin/python2.7 /home/py/PycharmProjects/ThirdTest/getOneRow.py
[TRowResult(columns={'cf:a': TCell(timestamp=1506406612641, value='2')}, row='row-key1')]
the row is row-key1
the values is 2
查询多行:
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
from hbase.ttypes import *
transport = TSocket.TSocket('localhost', 9090)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
transport.open()
tableName = 'test'
id = client.scannerOpenWithStop(tableName,'','','')
result2 = client.scannerGetList(id, 10)
print result2
输出内容:
/usr/bin/python2.7 /home/py/PycharmProjects/ThirdTest/getMultiRow.py
[TRowResult(columns={'cf:a': TCell(timestamp=1506406612641, value='2')}, row='row-key1'), TRowResult(columns={'cf:a': TCell(timestamp=1506406650902, value='2')}, row='row-key2')]