数据序列化导读(1)[JSON]

所谓数据序列化(Data Serialization), 就是将某个对象的状态信息转换为可以存储或传输的形式的过程。那么，为什么要进行序列化？

首先，为了方便数据存储；
其次，为了方便数据传递。

在数据序列化期间，某个对象的当前状态被写入到临时或永久存储区。随后，可以把序列化到存储区的数据（通过网络）传输出去，然后进行反序列化，重新创建该对象。运行在节点A上的某个对象X的当前状态，可以理解为保存在节点A的内存里的某个结构体。那么要把节点A上的对象X的状态信息传递到节点B上，把对象X的状态信息从内存中dump出来并序列化是必不可少的。支持数据序列化的常见格式有XML, JSON 和YAML。接下来本系列将首先介绍一下JSON。

1. 什么是JSON?

JSON是JavaScript Object Notation的缩写。简单来说，JSON是一种轻量级的数据交换格式，易于人类阅读和书写，同时也易于机器解析和生成。它基于JavaScript语言而实现, 是open ECMAScript standard的一个子集。 JSON采用完全独立于语言的文本格式，但也使用了类似于C语言家族的习惯。这些特性使得JSON成为了一种理想的数据交换格式。

特别注意： JSON的字符串必须用双引号引用起来。（因为后面会讲到YAML, YAML的字符串没有这样的限制）

2. 构建JSON的结构

A collection of name/value pairs（键/值对集合)，即对象(object), 也就是字典(dict)。使用{ }表示，与Python的dict类似。
An ordered list of values (值的有序表)，即数组(array)。使用[ ]表示，与Python的list类似。

注意：上面的截图来源戳这里。

2.1 对象(object)

在其他语言中，对象（object）又称为字典（dict），纪录（record），结构（struct），哈希表（hash table），有键列表（keyed list），或者关联数组（associative array）。在JSON中，通常把键/值对集合称之为对象(Object)(P.S. 本人习惯于字典的叫法)。对象是一个无序的“‘键/值’对”集合。一个对象以“{”开始，“}”结束。每个“键”后跟一个“:”（冒号）；“‘键/值’ 对”之间使用“,”（逗号）分隔。例如：

1 var Goddess = {
2    "FirstName" : "Grace",
3    "LastName" : "Liu",
4    "Age" :  "18"
5 };

2.2 数组(array)

数组很好理解，跟C语言的数组没什么不同，跟Python的list一样。数组是值（value）的有序集合。一个数组以“[”（左中括号）开始，“]”（右中括号）结束。值之间使用“,”（逗号）分隔。例如：

1 var Students = [
2     {"name":"John", "age":"23", "city":"Agra"},
3     {"name":"Steve", "age":"28", "city":"Delhi"},
4     {"name":"Peter", "age":"32", "city":"Chennai"},
5     {"name":"Chaitanya", "age":"28", "city":"Bangalore"}
6 ];

3. 值(value)的类型

字符串(string)
数字(number)
对象(object(即字典))
数组(array)
布尔值(boolean)
空值(null)

3.1 字符串(string)

字符串（string）是由双引号包围的任意数量Unicode字符的集合，使用反斜线转义。
单个字符（character）即一个单独的字符串（character string）。
字符串（string）与C语言的字符串非常相似。

3.2 数值(number)

数值（number）也与C的数值非常相似。
不使用8进制和16进制编码。

3.3 对象(object)

对象（object)即字典（dict），参见2.1。

3.4 数组(array)

数组(array)即列表（list），参见2.2。

3.5 布尔值（boolean）

要么为真(true), 要么为假(false)。对应于Python中的True/False。注意在Python中, 真/假的开头字母是大写，而JSON一律用小写。

3.6 空值（null)

JSON的空值用null表示，类似于C语言的NULL, Python语言的None，Go语言的nil。

P.S. 由3.5和3.6可以看出，JSON偏好使用一律小写的关键字。

4 在Python中使用JSON

4.1 JSON值类型 v.s. Python值类型

4.2 将Python对象序列化(serialize)为JSON格式的文本

Python提供了专门的模块json, 使用json.dump()或者json.dumps()就可以把一个Python对象序列化为JSON格式的文本。有关json模块的具体用法，请参见这里。

foo_python2json.py

 1 #!/usr/bin/python3
 2 
 3 """ Serialize a Python Object by using json.dumps() """
 4 
 5 import sys
 6 import json
 7 
 8 obj = {
 9         "students":
10         [
11                 {
12                         "name": "John",
13                         "age": 23,
14                         "city": "Agra",
15                         "married": False,
16                         "spouse": None
17                 },
18                 {
19                         "name": "Steve",
20                         "age": 28,
21                         "city": "Delhi",
22                         "married": True,
23                         "spouse": "Grace"
24                 },
25                 {
26                         "name": "Peter",
27                         "age": 32,
28                         "city": "Chennai",
29                         "married": True,
30                         "spouse": "Rachel"
31                 }
32         ]
33 }
34 
35 def main(argc, argv):
36     if argc != 2:
37         sys.stderr.write("Usage: %s <json file to save obj>
" % argv[0])
38         return 1
39 
40     with open(argv[1], 'w') as f:
41         txt = json.dumps(obj, indent=4)
42         print("DEBUG> " + str(type(obj)))
43         print("DEBUG> " + str(obj))
44         print("DEBUG> " + str(type(txt)))
45         print("DEBUG> " + txt)
46         f.write(txt + '
')
47 
48     return 0
49 
50 if __name__ == '__main__':
51     sys.exit(main(len(sys.argv), sys.argv))

Run foo_python2json.py

huanli$ rm -f /tmp/foo.json
huanli$ ./foo_python2json.py /tmp/foo.json
DEBUG> <class 'dict'>
DEBUG> {'students': [{'spouse': None, 'age': 23, 'city': 'Agra', 'name': 'John', 'married': False}, {'spouse': 'Grace', 'age': 28, 'city': 'Delhi', 'name': 'Steve', 'married': True}, {'spouse': 'Rachel', 'age': 32, 'city': 'Chennai', 'name': 'Peter', 'married': True}]}
DEBUG> <class 'str'>
DEBUG> {
    "students": [
        {
            "spouse": null,
            "age": 23,
            "city": "Agra",
            "name": "John",
            "married": false
        },
        {
            "spouse": "Grace",
            "age": 28,
            "city": "Delhi",
            "name": "Steve",
            "married": true
        },
        {
            "spouse": "Rachel",
            "age": 32,
            "city": "Chennai",
            "name": "Peter",
            "married": true
        }
    ]
}
huanli$
huanli$ cat -n /tmp/foo.json
     1  {
     2      "students": [
     3          {
     4              "spouse": null,
     5              "age": 23,
     6              "city": "Agra",
     7              "name": "John",
     8              "married": false
     9          },
    10          {
    11              "spouse": "Grace",
    12              "age": 28,
    13              "city": "Delhi",
    14              "name": "Steve",
    15              "married": true
    16          },
    17          {
    18              "spouse": "Rachel",
    19              "age": 32,
    20              "city": "Chennai",
    21              "name": "Peter",
    22              "married": true
    23          }
    24      ]
    25  }
huanli$

4.3 将JSON格式的文本反序列化(deserialize)为Python对象

使用json.load()或者json.loads()就可以将一个JSON格式的文本反序列化为一个Python对象。

foo_json2python.py

 1 #!/usr/bin/python3
 2 
 3 """ Deserialize JSON text to a Python Object by using json.loads() """
 4 
 5 import sys
 6 import json
 7 
 8 def main(argc, argv):
 9     if argc != 2:
10         sys.stderr.write("Usage: %s <json file>
" % argv[0])
11         return 1
12 
13     with open(argv[1], 'r') as f:
14         txt = ''.join(f.readlines())
15         obj = json.loads(txt)
16         print("DEBUG> " + str(type(txt)))
17         print("DEBUG> " + txt)
18         print("DEBUG> " + str(type(obj)))
19         print("DEBUG> " + str(obj))
20 
21     return 0
22 
23 if __name__ == '__main__':
24     sys.exit(main(len(sys.argv), sys.argv))

Run foo_json2python.py

huanli$ cat -n /tmp/foo.json
     1  {
     2      "students": [
     3          {
     4              "spouse": null,
     5              "age": 23,
     6              "city": "Agra",
     7              "name": "John",
     8              "married": false
     9          },
    10          {
    11              "spouse": "Grace",
    12              "age": 28,
    13              "city": "Delhi",
    14              "name": "Steve",
    15              "married": true
    16          },
    17          {
    18              "spouse": "Rachel",
    19              "age": 32,
    20              "city": "Chennai",
    21              "name": "Peter",
    22              "married": true
    23          }
    24      ]
    25  }
huanli$
huanli$ ./foo_json2python.py /tmp/foo.json
DEBUG> <class 'str'>
DEBUG> {
    "students": [
        {
            "spouse": null,
            "age": 23,
            "city": "Agra",
            "name": "John",
            "married": false
        },
        {
            "spouse": "Grace",
            "age": 28,
            "city": "Delhi",
            "name": "Steve",
            "married": true
        },
        {
            "spouse": "Rachel",
            "age": 32,
            "city": "Chennai",
            "name": "Peter",
            "married": true
        }
    ]
}

DEBUG> <class 'dict'>
DEBUG> {'students': [{'city': 'Agra', 'name': 'John', 'married': False, 'spouse': None, 'age': 23}, {'city': 'Delhi', 'name': 'Steve', 'married': True, 'spouse': 'Grace', 'age': 28}, {'city': 'Chennai', 'name': 'Peter', 'married': True, 'spouse': 'Rachel', 'age': 32}]}
huanli$

直接使用json.load()也可以，例如：

huanli$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
...<snip>....................................
>>> import json
>>> fd = open("/tmp/foo.json", "r")
>>> obj = json.load(fd)
>>> type(obj)
<class 'dict'>
>>> obj
{'students': [{'name': 'John', 'married': False, 'age': 23, 'city': 'Agra', 'spouse': None}, {'name': 'Steve', 'married': True, 'age': 28, 'city': 'Delhi', 'spouse': 'Grace'}, {'name': 'Peter', 'married': True, 'age': 32, 'city': 'Chennai', 'spouse': 'Rachel'}]}
>>>

4.4 序列化/反序列化用户定制的Python对象

在Python中，有一个模块pickle能把所有的Python对象都序列化。例如：

>>> import pickle
>>> 
>>> a = 1 + 2j
>>> s = pickle.dumps(a)
>>> s
b'x80x03cbuiltins
complex
qx00G?xf0x00x00x00x00x00x00G@x00x00x00x00x00x00x00x86qx01Rqx02.'
>>> b = pickle.loads(s)
>>> b
(1+2j)
>>> b == a
True
>>>

但是，要把一个用户定制的Python对象序列化为JSON文本就没有这么容易了，不信请看：

>>> import json
>>> a = 1 + 2j
>>> type(a)
<class 'complex'>
>>> s = json.dumps(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib64/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib64/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib64/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'complex' is not JSON serializable
>>>

怎么办？

自己实现一个序列化/反序列化的hook；
然后交给json.encode()/json.decode()去处理。

4.4.1 序列化用户定制的Python对象

foo_encode.py

 1 #!/usr/bin/python3
 2 
 3 import sys
 4 import json
 5 
 6 def encode_complex(z):
 7     d_out = {}
 8     if isinstance(z, complex):
 9         d_out['__complex__'] = True
10         d_out['real'] = z.real
11         d_out['imag'] = z.imag
12         return d_out
13     else:
14         type_name = z.__class__.__name__
15         raise TypeError(f"Object of type '{type_name}' is not JSON serializable")
16 
17 def main(argc, argv):
18     if argc != 3:
19         sys.stderr.write("Usage: %s <complex> <json file>
" % argv[0])
20         return 1
21 
22     z = complex(argv[1])
23     f = argv[2]
24     with open(f, 'w') as fd:
25         txt = json.dumps(z, indent=4, default=encode_complex)
26         fd.write(txt + '
')
27 
28 if __name__ == '__main__':
29     sys.exit(main(len(sys.argv), sys.argv))

Run foo_encode.py

huanli$ rm -f /tmp/foo.json 
huanli$ ./foo_encode.py '20+1.8j' /tmp/foo.json
huanli$ cat -n /tmp/foo.json 
     1    {
     2        "__complex__": true,
     3        "real": 20.0,
     4        "imag": 1.8
     5    }
huanli$

4.4.2 反序列化用户定制的Python对象

foo_decode.py

 1 #!/usr/bin/python3
 2 
 3 import sys
 4 import json
 5 
 6 def decode_complex(dct):
 7     if ('__complex__' in dct) and (dct['__complex__'] is True):
 8         return complex(dct['real'], dct['imag'])
 9     return dct
10 
11 def main(argc, argv):
12     if argc != 2:
13         sys.stderr.write("Usage: %s <json file>
" % argv[0])
14         return 1
15 
16     f = argv[1]
17     with open(f, 'r') as fd:
18         txt = ''.join(fd.readlines())
19         z = json.loads(txt, object_hook=decode_complex)
20         print(type(z))
21         print(z)
22 
23 if __name__ == '__main__':
24     sys.exit(main(len(sys.argv), sys.argv))

Run foo_decode.py

huanli$ cat -n /tmp/foo.json
     1  {
     2      "__complex__": true,
     3      "real": 20.0,
     4      "imag": 1.8
     5  }
huanli$ ./foo_decode.py /tmp/foo.json
<class 'complex'>
(20+1.8j)

5. JSON的注释

JSON本身并不支持注释，也就是说，不能使用#, //, /* ... */之类的给JSON文件加注释。但是，可以使用一种变通的办法，如果非要给JSON文件加注释的话。因为在JSON中，如果多个key相同，最后一个key被认为是有效的。例如：

qian.json

{
    "a": "# comments for field a: this is a string",
    "a": "qian",

    "b": "# comments for field b: this is a number",
    "b": 35,

    "c": "# comments for field c: this is a boolean",
    "c": true,

    "d": "# comments for field d: this is a null",
    "d": null,

    "e": "# comments for field e: this is an array",
    "e": [1, "abc", false, null],

    "f": "# comments for filed f: this is an object",
    "f": {"name": "qian", "age": 35}
}

使用4.3的foo_json2python.py解析如下

$ ./foo_json2python.py /tmp/qian.json
DEBUG> <class 'str'>
DEBUG> {
        "a": "# comments for field a: this is a string",
        "a": "qian",

        "b": "# comments for field b: this is a number",
        "b": 35,

        "c": "# comments for field c: this is a boolean",
        "c": true,

        "d": "# comments for field d: this is a null",
        "d": null,

        "e": "# comments for field e: this is an array",
        "e": [1, "abc", false, null],

        "f": "# comments for filed f: this is an object",
        "f": {"name": "qian", "age": 35}
}

DEBUG> <class 'dict'>
DEBUG> {'a': 'qian', 'b': 35, 'c': True, 'd': None, 'e': [1, 'abc', False, None], 'f': {'name': 'qian', 'age': 35}}

小结：

JSON作为一种支持数据序列化的文本格式，简单易用，概括起来就是：

It is light-weight 轻量级(相对于XML来说)
It is language independent 与语言无关
Easy to read and write 读写容易
Text based, human readable data exchange format 基于文本的人类可读的数据交换格式

注意绝大多数语言都支持JSON, 所以进行数据序列化和反序列化非常容易。本文以Python语言为例，给出了序列化和反序列化的代码样例。默认情况下，我们使用json.dump()/json.dumps()和json.load()/json.loads()即可；但是，对于用户定制的对象类型，则需要使用json.encode()和json.decode()。

参考资料：

后记：

如果一个JSON文件写得不够clean, 不妨使用jsonfmt.py进行格式化。另外，推荐使用工具jq (Command-line JSON processor), e.g.

$ jq -r <<< '{"name":"foo", "id": 123}'
{
  "name": "foo",
  "id": 123
}

$ jq -r .id <<< '{"name":"foo", "id": 123}'
123

$ jq -r .id,.name <<< '{"name":"foo", "id": 123}'
123
foo

相关阅读:
重构项目使用Spring+Hibernate+HibernateAnnotation+GenericDao技术
 java调用bat
Redis快速入门
 PDF中添加页面/合并 PDF 内容
 eclipse+webservice开发实例
 MYSQL Migration Toolkit 安装
 从CSDN搬家到博客园
 The server does not support version 3.0 of the J2EE Web module specification
HibernateAnnotation入门实例
 github使用总结
原文地址：https://www.cnblogs.com/idorax/p/9100154.html