data persistence
https://docs.python.org/3.7/library/persistence.html
支持python内存中的数据以持久化的形式存储在磁盘中。
同时支持从磁盘中将数据恢复到内存中。
The modules described in this chapter support storing Python data in a persistent form on disk.
The
pickle
andmarshal
modules can turn many Python data types into a stream of bytes and then recreate the objects from the bytes.The various DBM-related modules support a family of hash-based file formats that store a mapping of strings to other strings.
序列化工具 -- pickle
https://docs.python.org/3.7/library/pickle.html
将内存中的python对象转换成二进制的字节码,
或者将二进制的字节码恢复为内存中的对象。
The
pickle
module implements binary protocols for serializing and de-serializing a Python object structure.“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.
Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” 1 or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.
保存
import pickle # An arbitrary collection of objects supported by pickle. data = { 'a': [1, 2.0, 3, 4+6j], 'b': ("character string", b"byte string"), 'c': {None, True, False} } with open('data.pickle', 'wb') as f: # Pickle the 'data' dictionary using the highest protocol available. pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
恢复
import pickle with open('data.pickle', 'rb') as f: # The protocol version used is detected automatically, so we do not # have to specify it. data = pickle.load(f)
应用层工具 --- shelve
https://docs.python.org/3.7/library/shelve.html
pickle主要是提供序列化和反序列化方法,
至于二进制字节码需要保存到磁盘中,还需要开发者自己编码解决。
shelve提供直接的接口,应用只需要关注数据的变更。
于dbm不同,其可以存储任意类型的python对象, 底层使用pickle进行序列化。
A “shelf” is a persistent, dictionary-like object.
The difference with “dbm” databases is that the values (not the keys!) in a shelf can be essentially arbitrary Python objects — anything that the
pickle
module can handle.This includes most class instances, recursive data types, and objects containing lots of shared sub-objects. The keys are ordinary strings.
import shelve d = shelve.open(filename) # open -- file may get suffix added by low-level # library d[key] = data # store data at key (overwrites old data if # using an existing key) data = d[key] # retrieve a COPY of data at key (raise KeyError # if no such key) del d[key] # delete data stored at key (raises KeyError # if no such key) flag = key in d # true if the key exists klist = list(d.keys()) # a list of all existing keys (slow!) # as d was opened WITHOUT writeback=True, beware: d['xx'] = [0, 1, 2] # this works as expected, but... d['xx'].append(3) # *this doesn't!* -- d['xx'] is STILL [0, 1, 2]! # having opened d without writeback=True, you need to code carefully: temp = d['xx'] # extracts the copy temp.append(5) # mutates the copy d['xx'] = temp # stores the copy right back, to persist it # or, d=shelve.open(filename,writeback=True) would let you just code # d['xx'].append(5) and have it work as expected, BUT it would also # consume more memory and make the d.close() operation slower. d.close() # close it
底层存储工具 -- dbm
https://docs.python.org/3.7/library/dbm.html
dbm
is a generic interface to variants of the DBM database —dbm.gnu
ordbm.ndbm
. If none of these modules is installed, the slow-but-simple implementation in moduledbm.dumb
will be used. There is a third party interface to the Oracle Berkeley DB.
https://en.wikipedia.org/wiki/DBM_(computing)
键值对数据库, 早期的NoSQL数据库, 具有查询速度快的优点。
因为其使用key的hash值作为索引。
同时也导致了更新速度慢的缺点。
In computing, a DBM is a library and file format providing fast, single-keyed access to data. A key-value database from the original Unix, dbm is an early example of a NoSQL system.[1][2][3]
The original dbm library and file format was a simple database engine, originally written by Ken Thompson and released by AT&T in 1979. The name is a three letter acronym for DataBase Manager, and can also refer to the family of database engines with APIs and features derived from the original dbm.
https://pymotw.com/3/dbm/index.html
大概关系如下
shelve --> pickle --> dbm --> dbm database
dbm
is a front-end for DBM-style databases that use simple string values as keys to access records containing strings. It useswhichdb()
to identify databases, then opens them with the appropriate module. It is used as a back-end forshelve
, which stores objects in a DBM database usingpickle
.
import dbm # Open database, creating it if necessary. with dbm.open('cache', 'c') as db: # Record some values db[b'hello'] = b'there' db['www.python.org'] = 'Python Website' db['www.cnn.com'] = 'Cable News Network' # Note that the keys are considered bytes now. assert db[b'www.python.org'] == b'Python Website' # Notice how the value is now in bytes. assert db['www.cnn.com'] == b'Cable News Network' # Often-used methods of the dict interface work too. print(db.get('python.org', b'not present')) # Storing a non-string key or value will raise an exception (most # likely a TypeError). db['www.yahoo.com'] = 4 # db is automatically closed when leaving the with statement.
关系数据库工具 -- sqlite3
https://docs.python.org/3.7/library/sqlite3.html
轻量级磁盘数据库,不需要独立的server。
可以使用sql语言。
SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. Some applications can use SQLite for internal data storage. It’s also possible to prototype an application using SQLite and then port the code to a larger database such as PostgreSQL or Oracle.
The sqlite3 module was written by Gerhard Häring. It provides a SQL interface compliant with the DB-API 2.0 specification described by PEP 249.
import sqlite3 persons = [ ("Hugo", "Boss"), ("Calvin", "Klein") ] con = sqlite3.connect(":memory:") # Create the table con.execute("create table person(firstname, lastname)") # Fill the table con.executemany("insert into person(firstname, lastname) values (?, ?)", persons) # Print the table contents for row in con.execute("select firstname, lastname from person"): print(row) print("I just deleted", con.execute("delete from person").rowcount, "rows") # close is not a shortcut method and it's not called automatically, # so the connection object should be closed manually con.close()
pyc专用代码编译缓存工具 --- marshal
其生成的字节码具有机器架构相关性。
专门用于pyc文件缓存。
不能用作rpc交换数据, pickle的字节码是可以进行机器交换数据的。
This module contains functions that can read and write Python values in a binary format. The format is specific to Python, but independent of machine architecture issues (e.g., you can write a Python value to a file on a PC, transport the file to a Sun, and read it back there). Details of the format are undocumented on purpose; it may change between Python versions (although it rarely does). 1
This is not a general “persistence” module. For general persistence and transfer of Python objects through RPC calls, see the modules
pickle
andshelve
. Themarshal
module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of.pyc
files. Therefore, the Python maintainers reserve the right to modify the marshal format in backward incompatible ways should the need arise. If you’re serializing and de-serializing Python objects, use thepickle
module instead – the performance is comparable, version independence is guaranteed, and pickle supports a substantially wider range of objects than marshal.