Python2 中文编码处理

Python2 中文编码处理
今天写了几个脚本，都遇到了中英文混编的情况。需求要将其中的中文标点符号切换为英文符号。
举个例子:
```
tags = '你好，good, 国语'
```
要将其中的中文半角逗号替换为英文逗号，为了方便后续的处理
如下处理:
```
tags = tags.replace('，', ',')
```
会抛出如下异常：
UnicodeDecodeError: 'ascii' codec can't decode byte ...

python中字串分成两种，byte string 和unicode string
一般来说，设定好#coding=utf-8后，所有带中文的参数都会声明成utf-8编码的byte string
但是在函数中产生的字串则是unicode string

byte string 和 unicode string不能混用，所以就会抛出UnicodeDecodeError异常
```
byte_str = 'hello, this is byte string'
unicode_str = u'hello, this is unicode string'
```
所以有三种解决方案：
1. 全都转为byte string
2. 全都转为unicode string
3. 设置系统编码

1. 全都转为byte string
```
'你好' + request.forms.tags.encode('utf-8')
```
2. 全都转unicode.string
```
u'你好' + request.forms.tags
```
byte string 和unicode string相互转换
```
b_s = 'test'
u_s = unicode(b_si, 'utf-8')
back_to_b_s = u_s.encode('utf-8')
```
3. 设置系统默认编码
```
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
```
这样就可以任意的使用了

所以上面的问题就有解了：
```
tags = tags.replace(unicode('，','utf-8'), ',')
```
或者
```
tags = tags.encode('utf-8').replace('，', ',')
```
或者
调用setdefaultencoding设置系统encoding了

此外，还有读取UTF-8文件
可以使用codecs模块
```
import codecs
handler = codecs.open('test', 'r', 'utf-8')
u = handler.read()  # returns a unicode string from the UTF-8 bytes in the file
```
codesc还能将传给write的unicode string转换为任何编码

在编写代码过程中，变量必须是ascii编码的，为了可以在文件中写中文，python需要知道文件不是ASCII编码
在
```
#!/usr/bin/env python
```
下添加
```
# -*- coding: utf-8 -*-
```
以上在python2中有效，在python3中已经区分了unicode string 和byte string,并且默认编码不再是ASCII

参考资料
http://www.evanjones.ca/python-utf8.html
相关阅读:
C#低级Windows API钩子拦截键盘输入
 PowerDesigner 11 使用心得
 c# windows服务状态、启动和停止服务
 PowerDesigner设计数据库
 C#　Windows帐户和目录添加用户权限方法
 ASP.NET的控件Gridview在Firefox中的Border显示问题
 去掉图片连接的虚框
 http://www.ediyang.com/demo/DD_Png/
WEB前端开发规范文档(for: mrthink.net)
.net下载文件的常用方法汇总
原文地址：https://www.cnblogs.com/jiangu66/p/3186881.html