对于mysql数据库而言,解决中文乱码,可以从两个方向考虑,一个是通过修改mysql服务器端的配置文件/etc/mysql/my.cnf来支持中文,比如:
... [mysql] default-character-set=utf8 ...
但是,修改配置文件需要重启服务,尤其是对于已经在线上运行的数据库或者“老”的数据库实例(有可能是多实例或集群)而言,显然通过修改配置文件来操作是不适合的,甚至是不被DBA允许的,通常能想到的办法是在客户端或者JDBC连接时定制支持中文的编码格式(通常使用UTF-8),这样插入数据的时候,让mysql为自动为我们转码,可行的办法有两种:
1、如果是通过DriverManager.getConnection(url)编码方式操作JDBC,可以在JDBC的url中追加useUnicode=true&characterEncoding=UTF-8解决乱码问题。
jdbc.url=jdbc:mysql://127.0.0.1:3306/mydb?useUnicode=true&characterEncoding=UTF-8
2、如果是通过其它数据源,比如DBCP、tomcat-jdbc、c3p0、spring-jdbc、hibernate读取配置文件,在url中追加useUnicode=true&characterEncoding=UTF-8是不起作用的,而是通过数据源自身的配置生效,比如下列配置:
<!-- Tomcat data source --> <bean id="dataSource" class="org.apache.tomcat.jdbc.pool.DataSource"> <property name="driverClassName" value="${jdbc.driverClassName}" /> <property name="url" value="${jdbc.url}" /> <property name="username" value="${jdbc.username}" /> <property name="password" value="${jdbc.password}" /> <property name="dbProperties"> <props> <prop key="useUnicode">yes</prop> <prop key="characterEncoding">utf8</prop> </props> </property> <!-- Configuration refer to optimizing connection performance --> <property name="initialSize" value="10" /> <property name="maxActive" value="100" /> <property name="maxIdle" value="50" /> <property name="minIdle" value="10" /> <property name="suspectTimeout" value="60" /> <property name="timeBetweenEvictionRunsMillis" value="30000" /> <property name="minEvictableIdleTimeMillis" value="60000" /> <property name="testOnBorrow" value="true" /> <property name="validationQuery" value="SELECT 1" /> <property name="validationInterval" value="30000" /> <!-- End Configuration refer to optimizing connection performance --> </bean>
其中:
<props> <prop key="useUnicode">yes</prop> <prop key="characterEncoding">utf8</prop> </props>
等价于url中的useUnicode=true&characterEncoding=UTF-8
查看当前数据库的字符集(切换到某个数据库前后比较):
mysql> show variables like 'char_%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec) mysql> use robot_classifymark; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> show variables like 'char_%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec)
可以看到数据库robot_classifymark采用的是utf8编码的,正好对应创建数据库时指定的编码,对应建库SQL:
CREATE DATABASE IF NOT EXISTS robot_classifymark DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
如果想在命令行下插入中文,可以执行set names utf8,此命令会影响三个参数:
mysql> show variables like 'char_%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec) mysql> set names utf8 -> ; Query OK, 0 rows affected (0.00 sec) mysql> show variables like 'char_%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+ 8 rows in set (0.00 sec)
可见:执行命令set names utf8,影响了三个参数:character_set_client,character_set_connection,character_set_results,如此,可以在当前命令行下插入中文了,但是,此命令只对当前命令行有效(*),如果想每次登入mysql命令行都生效,同样需要修改mysql服务器端的配置文件/etc/mysql/my.cnf
... [client] default-character-set=utf8 ...
所以,如果JDBC连接时,如果url未指定useUnicode=true&characterEncoding=UTF-8,可以变相的通过每次执行insert或select语句之前先执行set names utf8来插入或查询中文,显然,此方式没有在url中指定useUnicode=true&characterEncoding=UTF-8或者通过上面的数据源配置形式简洁!