最近因工作需要,研究了一下boto3中dynamoDB部分,略有心得,在此总结一下。
首先是boto3的安装,在装有python和pip的机器上,运行
sudo pip install boto3
官网文档里,boto3提供的与dynamoDB交互的接口有以下几种:
batch_get_item()
batch_write_item()
can_paginate()
create_table()
delete_item()
delete_table()
describe_limits()
describe_table()
describe_time_to_live()
generate_presigned_url()
get_item()
get_paginator()
get_waiter()
list_tables()
list_tags_of_resource()
put_item()
query()
scan()
tag_resource()
untag_resource()
update_item()
update_table()
update_time_to_live()
说白了,就是对表和记录的增、删、查、改。本文主要描述我最近使用的那几个接口。
要在python中使用boto3,就得先import boto3。当然,这是废话。为了使用方便,我先写了一个json格式的配置文件,如下:
{ "region_name":"xxx", "aws_access_key_id":"xxx", "aws_secret_access_key":"xxx" }
然后封装了一个专门用于操作dynamoDB的类,目前什么都没有
class dynamodb_operation():
它需要一个读取json文件的方法:
def load_json(self,path): try: with open(path) as json_file: data = json.load(json_file) except Exception as e: print 'ERROR: no such file like ' + path exit(-1) else: return data
由于读进来的文件可能不是json格式,我这里就是想让他报个错,然后退出。如果不想让它退出,在except里改改就好了。
然后,我希望这个类有一个私有成员client,在我实例化对象的时候就建立好连接,于是,有了以下初始化方法:
def __init__(self,path): conf = self.load_json(path) self.client = boto3.client('dynamodb',region_name=conf['region_name'],aws_access_key_id=conf['aws_access_key_id'], aws_secret_access_key=conf['aws_secret_access_key'])
与之前的配置文件是对应的。
有了这个基础,就可以封装自己想要使用的方法了。各方法的在官网上的说明就不照搬过来了。
1、列出dynamoDB中的所有的表
def list_all_table(self): page=1 LastEvaluationTableName = "" while True: if page == 1: response = self.client.list_tables() else: response = self.client.list_tables( ExclusiveStartTableName=LastEvaluationTableName ) TableNames = response['TableNames'] for table in TableNames: print table if response.has_key('LastEvaluatedTableName'): LastEvaluationTableName = response["LastEvaluatedTableName"] else: break page += 1
list_table()方法一次最多只能获取100张表的表名,并且在每次返回的时候,key为"LastEvaluatedTableName"的值为最后一张表的表名,可以做为下次请求的时候的参数。这样循环调用,即可获取所有的表名。如果后面没有表了,response里将不会有LastEvaluatedTableName。此处我只是想把表名打印到终端,如果想保存起来,也是可以的。
2、获取某张表的信息 describe_table()
def get_table_desc_only(self,table): try: response = self.client.describe_table(TableName=table) except Exception as e: print 'ERROR: no such table like ' + table exit(-1) else: return response["Table"]
此处只是将response["Table"]原原本本地返回,没有做其它处理。
如果我想知道一张表的大小,可以:
def get_table_size(self,table): response = self.get_table_desc_only(table) stastic = {} stastic['TableSizeBytes'] = response['TableSizeBytes'] stastic['ItemCount'] = response['ItemCount'] return stastic
如果想知道其它信息,而且是只想知道那些信息的话,也可以写出对应的方法。
3、创建一张表
def create_table(self,tablename,keySchema,attributeDefinitions,provisionedThroughput): table = self.client.create_table( TableName=tablename, KeySchema=keySchema, AttributeDefinitions=attributeDefinitions, ProvisionedThroughput=provisionedThroughput ) # Wait until the table exists. self.client.get_waiter('table_exists').wait(TableName=tablename) response = self.client.describe_table(TableName=tablename) print response
这是在创建一张没有索引的表。创表需要时间,所以使用了get_waiter()方法。
4、插入数据
def put_item(self,tableName,item): try: self.client.put_item( TableName=tableName, Item=item ) except Exception as e: print 'ERROR: put item fail. msg: ' + str(e) exit(-1) else: return
封装的此方法需要传入的是一个格式正确的json,并且key要与表对应。比如:
{'uid':{'N':'999'},'aid':{'N':'999'},'sid':{'N':'999'},'ksid':{'N':'999'}}
5、删表
def delete_table(self,table): try: self.client.delete_table( TableName=table ) except Exception as e: print 'ERROR: delete table ' + table + ' fail. msg: ' + str(e) else: print 'delete table ' + table + ' succ'
其它方法不多说了。接下来就是表的备份与恢复。要做到什么程度呢,备份的时候,保存好表的结构,大小,以及所有条目,包括索引,恢复的时候,要能建一张一模一样的表,并把数据灌进去。
首先是备份表的结构。为了方便恢复表,对describe_table()方法的response进行了处理,同时对init方法进行修改:
def __init__(self,path): conf = self.load_json(path) self.client = boto3.client('dynamodb',region_name=conf['region_name'],aws_access_key_id=conf['aws_access_key_id'], aws_secret_access_key=conf['aws_secret_access_key']) self.conf_path = path self.items = ['TableName','AttributeDefinitions','KeySchema','LocalSecondaryIndexes','GlobalSecondaryIndexes','ProvisionedThroughput','StreamSpecification']
items里为创表时create_table()方法中的所有参数,其中,TableName,AttributeDefinitions,KeySchema,ProvisionedThroughput四项是创表时必传参数,另外三项为选传。同样地,describe_table()方法的response中,这四项也是一定存在的。故:
def get_SecondaryIndexes_desc(self,content): result = [] for sub_item in content: sub_content = {} sub_content['IndexName'] = sub_item['IndexName'] sub_content['KeySchema'] = sub_item['KeySchema'] sub_content['Projection'] = sub_item['Projection'] result.append(sub_content) return result
LocalSecondaryIndexes与GlobalSecondaryIndexes都是列表,所以默认不存在的项直接赋值一个空列表。
def get_table_desc_for_create_table(self,table): response = self.get_table_desc_only(table) result = {} for item in self.items: try: content = response[item] except Exception as e: continue else: if item == 'TableName': if content != table: print 'ERROR: dynamoDB get table desc error' exit(-1) result[item] = content elif item == 'LocalSecondaryIndexes' or item == 'GlobalSecondaryIndexes': result[item] = self.get_SecondaryIndexes_desc(content) continue elif item == 'ProvisionedThroughput': continue else: result[item] = content continue return json.dumps(result)
由于表的阈值不是固定的,所以不做保存。在创表的时候直接设置成一个固定的值即可。
对应地,恢复表的方法们:
def get_item_desc(self,item,content): try: result = content[item] except Exception as e: result = [] return result def create_table_from_desc(self,path): table_desc = self.load_json(path) provisionedThroughput={ 'ReadCapacityUnits': 5, 'WriteCapacityUnits': 5 } tableName = self.get_item_desc('TableName',table_desc) attributeDefinitions = self.get_item_desc('AttributeDefinitions',table_desc) keySchema = self.get_item_desc('KeySchema',table_desc) localSecondaryIndexes = self.get_item_desc('LocalSecondaryIndexes',table_desc) globalSecondaryIndexes = self.get_item_desc('GlobalSecondaryIndexes',table_desc) streamSpecification = self.get_item_desc('StreamSpecification',table_desc) if len(globalSecondaryIndexes): for item in globalSecondaryIndexes: item['ProvisionedThroughput'] = provisionedThroughput try: if len(localSecondaryIndexes): if len(globalSecondaryIndexes): table = self.client.create_table( TableName=tableName, KeySchema=keySchema, AttributeDefinitions=attributeDefinitions, ProvisionedThroughput=provisionedThroughput, LocalSecondaryIndexes=localSecondaryIndexes, GlobalSecondaryIndexes=globalSecondaryIndexes ) else: table = self.client.create_table( TableName=tableName, KeySchema=keySchema, AttributeDefinitions=attributeDefinitions, ProvisionedThroughput=provisionedThroughput, LocalSecondaryIndexes=localSecondaryIndexes ) else: if len(globalSecondaryIndexes): table = self.client.create_table( TableName=tableName, KeySchema=keySchema, AttributeDefinitions=attributeDefinitions, ProvisionedThroughput=provisionedThroughput, GlobalSecondaryIndexes=globalSecondaryIndexes ) else: table = self.client.create_table( TableName=tableName, KeySchema=keySchema, AttributeDefinitions=attributeDefinitions, ProvisionedThroughput=provisionedThroughput ) except Exception as e: print 'ERROR: error desc like file: ' + path + ' msg: ' + str(e) exit(-1) else: # Wait until the table exists. self.client.get_waiter('table_exists').wait(TableName=tableName) response = self.client.describe_table(TableName=tableName) print response
传入的path为之前保存好的表的结构。由于两个索引非必传,所以写了四个创表的方法。之前尝试过,不存在索引的时候传空列表[],None,或者()之类,都会报错,只能用这么一样比较笨的方法。由于StreamSpecification我没有用到,所以只是写在这里。
由于时间关系,dump和灌表工具直接就使用git_hub中的 dynamo-archive了。不然自己实现一个也是极好的。
在考虑备份表的时候,最近是否使用到成为一个判断因素。此处要使用到cloudwatch。
def get_stastic(self,dimension): conf = self.load_json(self.conf_path) cw = boto3.client('cloudwatch',region_name=conf['region_name'],aws_access_key_id=conf['aws_access_key_id'], aws_secret_access_key=conf['aws_secret_access_key']) stastic={} stastic['Write'] = 0 stastic['Read'] = 0 # write table_stastic = cw.get_metric_statistics(Namespace='AWS/DynamoDB', MetricName='ConsumedWriteCapacityUnits', Dimensions=dimension, StartTime=datetime.utcnow()-timedelta(days=9), EndTime=datetime.utcnow(), Period=900, Statistics=['Sum', 'Maximum'], Unit='Count')['Datapoints'] if len(table_stastic) > 1: for item in table_stastic: stastic['Write'] += int(item['Sum']) #read table_stastic = cw.get_metric_statistics(Namespace='AWS/DynamoDB', MetricName='ConsumedReadCapacityUnits', Dimensions=dimension, StartTime=datetime.utcnow()-timedelta(days=9), EndTime=datetime.utcnow(), Period=900, Statistics=['Sum', 'Maximum'], Unit='Count')['Datapoints'] if len(table_stastic) > 1: for item in table_stastic: stastic['Read'] += int(item['Sum']) return stastic def get_table_use(self,table_name): dimension = [{'Name': 'TableName', 'Value': table_name}] stastic = self.get_stastic(dimension) table = self.get_table_desc_only(table_name) for index in table.get('GlobalSecondaryIndexes', []): if index['IndexStatus'] != 'ACTIVE': return dimension = [{'Name': 'TableName', 'Value': table_name}, {'Name': 'GlobalSecondaryIndexName', 'Value': index['IndexName']}] tmp = self.get_stastic(dimension) stastic['Write'] += tmp['Write'] stastic['Read'] += tmp['Read'] return stastic def check_table_is_use(self,stastic): read = stastic['Write'] write = stastic['Read'] if read == 0 and write == 0: return False else: return True
直接获取了最近9天的总使用量。
当然,如果想知道这张表是不是存在的,也是可以的
def check_table_is_exist(self,table): try: response = self.client.describe_table(TableName=table) except Exception as e: return 0 else: return 1
总体写得挺龊的QAQ
参考资料:http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html
原始文件 dynamodb_operation.py
1 #!/usr/bin/python 2 #-*- encoding: utf-8 -*- 3 4 import boto3 5 import json 6 import sys 7 from datetime import datetime, timedelta 8 9 class dynamodb_operation(): 10 def __init__(self,path): 11 conf = self.load_json(path) 12 self.client = boto3.client('dynamodb',region_name=conf['region_name'],aws_access_key_id=conf['aws_access_key_id'], aws_secret_access_key=conf['aws_secret_access_key']) 13 self.conf_path = path 14 self.items = ['TableName','AttributeDefinitions','KeySchema','LocalSecondaryIndexes','GlobalSecondaryIndexes','ProvisionedThroughput','StreamSpecification'] 15 16 def load_json(self,path): 17 try: 18 with open(path) as json_file: 19 data = json.load(json_file) 20 except Exception as e: 21 print 'ERROR: no such file like ' + path 22 exit(-1) 23 else: 24 return data 25 26 def create_table(self,tablename,keySchema,attributeDefinitions,provisionedThroughput): 27 table = self.client.create_table( 28 TableName=tablename, 29 KeySchema=keySchema, 30 AttributeDefinitions=attributeDefinitions, 31 ProvisionedThroughput=provisionedThroughput 32 ) 33 34 # Wait until the table exists. 35 self.client.get_waiter('table_exists').wait(TableName=tablename) 36 37 response = self.client.describe_table(TableName=tablename) 38 print response 39 40 def get_item_desc(self,item,content): 41 try: 42 result = content[item] 43 except Exception as e: 44 result = [] 45 return result 46 47 def create_table_from_desc(self,path): 48 table_desc = self.load_json(path) 49 provisionedThroughput={ 50 'ReadCapacityUnits': 5, 51 'WriteCapacityUnits': 5 52 } 53 tableName = self.get_item_desc('TableName',table_desc) 54 attributeDefinitions = self.get_item_desc('AttributeDefinitions',table_desc) 55 keySchema = self.get_item_desc('KeySchema',table_desc) 56 localSecondaryIndexes = self.get_item_desc('LocalSecondaryIndexes',table_desc) 57 globalSecondaryIndexes = self.get_item_desc('GlobalSecondaryIndexes',table_desc) 58 streamSpecification = self.get_item_desc('StreamSpecification',table_desc) 59 60 if len(globalSecondaryIndexes): 61 for item in globalSecondaryIndexes: 62 item['ProvisionedThroughput'] = provisionedThroughput 63 64 try: 65 if len(localSecondaryIndexes): 66 if len(globalSecondaryIndexes): 67 table = self.client.create_table( 68 TableName=tableName, 69 KeySchema=keySchema, 70 AttributeDefinitions=attributeDefinitions, 71 ProvisionedThroughput=provisionedThroughput, 72 LocalSecondaryIndexes=localSecondaryIndexes, 73 GlobalSecondaryIndexes=globalSecondaryIndexes 74 ) 75 else: 76 table = self.client.create_table( 77 TableName=tableName, 78 KeySchema=keySchema, 79 AttributeDefinitions=attributeDefinitions, 80 ProvisionedThroughput=provisionedThroughput, 81 LocalSecondaryIndexes=localSecondaryIndexes 82 ) 83 else: 84 if len(globalSecondaryIndexes): 85 table = self.client.create_table( 86 TableName=tableName, 87 KeySchema=keySchema, 88 AttributeDefinitions=attributeDefinitions, 89 ProvisionedThroughput=provisionedThroughput, 90 GlobalSecondaryIndexes=globalSecondaryIndexes 91 ) 92 else: 93 table = self.client.create_table( 94 TableName=tableName, 95 KeySchema=keySchema, 96 AttributeDefinitions=attributeDefinitions, 97 ProvisionedThroughput=provisionedThroughput 98 ) 99 100 except Exception as e: 101 print 'ERROR: error desc like file: ' + path + ' msg: ' + str(e) 102 exit(-1) 103 104 else: 105 # Wait until the table exists. 106 self.client.get_waiter('table_exists').wait(TableName=tableName) 107 108 response = self.client.describe_table(TableName=tableName) 109 print response 110 111 def get_table_desc_only(self,table): 112 try: 113 response = self.client.describe_table(TableName=table) 114 except Exception as e: 115 print 'ERROR: no such table like ' + table 116 exit(-1) 117 else: 118 return response["Table"] 119 120 def check_table_is_exist(self,table): 121 try: 122 response = self.client.describe_table(TableName=table) 123 except Exception as e: 124 return 0 125 else: 126 return 1 127 128 def get_SecondaryIndexes_desc(self,content): 129 result = [] 130 for sub_item in content: 131 sub_content = {} 132 sub_content['IndexName'] = sub_item['IndexName'] 133 sub_content['KeySchema'] = sub_item['KeySchema'] 134 sub_content['Projection'] = sub_item['Projection'] 135 result.append(sub_content) 136 return result 137 138 def get_table_desc_for_create_table(self,table): 139 response = self.get_table_desc_only(table) 140 result = {} 141 for item in self.items: 142 try: 143 content = response[item] 144 except Exception as e: 145 continue 146 else: 147 if item == 'TableName': 148 if content != table: 149 print 'ERROR: dynamoDB get table desc error' 150 exit(-1) 151 result[item] = content 152 153 elif item == 'LocalSecondaryIndexes' or item == 'GlobalSecondaryIndexes': 154 result[item] = self.get_SecondaryIndexes_desc(content) 155 continue 156 157 elif item == 'ProvisionedThroughput': 158 continue 159 160 else: 161 result[item] = content 162 continue 163 164 return json.dumps(result) 165 166 def get_table_size(self,table): 167 response = self.get_table_desc_only(table) 168 stastic = {} 169 stastic['TableSizeBytes'] = response['TableSizeBytes'] 170 stastic['ItemCount'] = response['ItemCount'] 171 return stastic 172 173 def list_all_table(self): 174 page=1 175 LastEvaluationTableName = "" 176 while True: 177 if page == 1: 178 response = self.client.list_tables() 179 else: 180 response = self.client.list_tables( 181 ExclusiveStartTableName=LastEvaluationTableName 182 ) 183 TableNames = response['TableNames'] 184 for table in TableNames: 185 print table 186 if response.has_key('LastEvaluatedTableName'): 187 LastEvaluationTableName = response["LastEvaluatedTableName"] 188 else: 189 break 190 page += 1 191 192 def get_stastic(self,dimension): 193 conf = self.load_json(self.conf_path) 194 cw = boto3.client('cloudwatch',region_name=conf['region_name'],aws_access_key_id=conf['aws_access_key_id'], aws_secret_access_key=conf['aws_secret_access_key']) 195 196 stastic={} 197 stastic['Write'] = 0 198 stastic['Read'] = 0 199 200 # write 201 table_stastic = cw.get_metric_statistics(Namespace='AWS/DynamoDB', MetricName='ConsumedWriteCapacityUnits', 202 Dimensions=dimension, 203 StartTime=datetime.utcnow()-timedelta(days=9), EndTime=datetime.utcnow(), 204 Period=900, Statistics=['Sum', 'Maximum'], Unit='Count')['Datapoints'] 205 206 if len(table_stastic) > 1: 207 for item in table_stastic: 208 stastic['Write'] += int(item['Sum']) 209 210 #read 211 table_stastic = cw.get_metric_statistics(Namespace='AWS/DynamoDB', MetricName='ConsumedReadCapacityUnits', 212 Dimensions=dimension, 213 StartTime=datetime.utcnow()-timedelta(days=9), EndTime=datetime.utcnow(), 214 Period=900, Statistics=['Sum', 'Maximum'], Unit='Count')['Datapoints'] 215 216 if len(table_stastic) > 1: 217 for item in table_stastic: 218 stastic['Read'] += int(item['Sum']) 219 220 return stastic 221 222 def get_table_use(self,table_name): 223 dimension = [{'Name': 'TableName', 'Value': table_name}] 224 stastic = self.get_stastic(dimension) 225 table = self.get_table_desc_only(table_name) 226 for index in table.get('GlobalSecondaryIndexes', []): 227 if index['IndexStatus'] != 'ACTIVE': 228 return 229 dimension = [{'Name': 'TableName', 'Value': table_name}, {'Name': 'GlobalSecondaryIndexName', 'Value': index['IndexName']}] 230 tmp = self.get_stastic(dimension) 231 stastic['Write'] += tmp['Write'] 232 stastic['Read'] += tmp['Read'] 233 234 return stastic 235 236 def check_table_is_use(self,stastic): 237 read = stastic['Write'] 238 write = stastic['Read'] 239 if read == 0 and write == 0: 240 return False 241 else: 242 return True 243 244 def delete_table(self,table): 245 try: 246 self.client.delete_table( 247 TableName=table 248 ) 249 except Exception as e: 250 print 'ERROR: delete table ' + table + ' fail. msg: ' + str(e) 251 else: 252 print 'delete table ' + table + ' succ' 253 254 def list_dynamodb_conf(self): 255 conf = self.load_json(self.conf_path) 256 print 'region_name=' + '"' + conf['region_name'] + '"' 257 print 'aws_access_key_id=' + '"' + conf['aws_access_key_id'] + '"' 258 print 'aws_secret_access_key=' + '"' + conf['aws_secret_access_key'] + '"' 259 260 def put_item(self,tableName,item): 261 try: 262 self.client.put_item( 263 TableName=tableName, 264 Item=item 265 ) 266 except Exception as e: 267 print 'ERROR: put item fail. msg: ' + str(e) 268 exit(-1) 269 else: 270 return 271 272 def put_items(self,tableName,item_path): 273 for item in open(item_path): 274 self.put_item(tableName,eval(item)) 275 276 if __name__ == "__main__": 277 if len(sys.argv) < 2: 278 print "cmd args" 279 print "list_all_table" 280 print "list_dynamodb_conf" 281 print "get_table_desc_for_create_table table" 282 print "get_table_desc_only table" 283 print "get_table_size table" 284 print "create_table_from_desc table_desc_file" 285 print "check_table_is_exist table" 286 print "get_table_use table" 287 print "delete_table table password" 288 print "put_item table item(json)" 289 print "put_items table item_file_path" 290 exit(-1) 291 292 db = dynamodb_operation('../conf/dynamoDB.conf') 293 294 cmd = str(sys.argv[1]) 295 if len(sys.argv) == 2: 296 if cmd == 'list_all_table': 297 db.list_all_table() 298 if cmd == 'list_dynamodb_conf': 299 db.list_dynamodb_conf() 300 301 if len(sys.argv) == 3: 302 if cmd == 'get_table_desc_for_create_table': 303 table = str(sys.argv[2]) 304 print db.get_table_desc_for_create_table(table) 305 306 if cmd == 'get_table_desc_only': 307 table = str(sys.argv[2]) 308 print db.get_table_desc_only(table) 309 310 if cmd == 'check_table_is_exist': 311 table = str(sys.argv[2]) 312 print db.check_table_is_exist(table) 313 314 if cmd == 'get_table_size': 315 table = str(sys.argv[2]) 316 print db.get_table_size(table) 317 318 if cmd == 'create_table_from_desc': 319 desc_file_path = str(sys.argv[2]) 320 db.create_table_from_desc(desc_file_path) 321 322 if cmd == 'get_table_use': 323 table = str(sys.argv[2]) 324 stastic = db.get_table_use(table) 325 print stastic 326 print db.check_table_is_use(stastic) 327 328 329 if len(sys.argv) == 4: 330 if cmd == 'delete_table': 331 table = str(sys.argv[2]) 332 password = str(sys.argv[3]) 333 if password == 'password': 334 db.delete_table(table) 335 else: 336 print 'ERROR: password error!' 337 exit(-1) 338 339 if cmd == 'put_item': 340 table = str(sys.argv[2]) 341 tmp = str(sys.argv[3]) 342 item = eval(tmp) 343 db.put_item(table,item) 344 345 if cmd == 'put_items': 346 table = str(sys.argv[2]) 347 item_file_path = str(sys.argv[3]) 348 db.put_items(table,item_file_path)