Preface
MySQL use character set & collation to organize the different charater.It provides a flexible way in setting individual character set on a database,a table even on a single column of table.Each character set has a series of collations with one default collation.We can generally see the character set in MySQL as the combination of code page & character encoding.
In the early version of MySQL(eg. version 5.5) ,latin1 is the default character set which does not support Chinese characters.There're some other commonly used character set such as GBK,UTF-8.MySQL 5.7 chooses UTF-8 as default character set nowadays in order to support more characters of different languages.
Messy code is a general issue about character set.It always occur in our MySQL databases if we do not use character rules appropriately.Worse,it leads to data loss in some cases what is really a big trouble we should avoid.
Introduce
MySQL provides a lot of parameters to specify character set in various dimentionality.We should know clearly about the principle and function of each parameter to avert underlying messy code issue.Let's see details about it.
Procedure
Check the character set supported by MySQL(version 5.7).
1 (root@localhost mysql3306.sock)[(none)]>show character set; 2 +----------+---------------------------------+---------------------+--------+ 3 | Charset | Description | Default collation | Maxlen | 4 +----------+---------------------------------+---------------------+--------+ 5 | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | 6 | dec8 | DEC West European | dec8_swedish_ci | 1 | 7 | cp850 | DOS West European | cp850_general_ci | 1 | 8 | hp8 | HP West European | hp8_english_ci | 1 | 9 | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | 10 | latin1 | cp1252 West European | latin1_swedish_ci | 1 | 11 | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | 12 | swe7 | 7bit Swedish | swe7_swedish_ci | 1 | 13 | ascii | US ASCII | ascii_general_ci | 1 | 14 | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 | 15 | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 | 16 | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 | 17 | tis620 | TIS620 Thai | tis620_thai_ci | 1 | 18 | euckr | EUC-KR Korean | euckr_korean_ci | 2 | 19 | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 | 20 | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | 21 | greek | ISO 8859-7 Greek | greek_general_ci | 1 | 22 | cp1250 | Windows Central European | cp1250_general_ci | 1 | 23 | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 | 24 | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | 25 | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 | 26 | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | 27 | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | 28 | cp866 | DOS Russian | cp866_general_ci | 1 | 29 | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 | 30 | macce | Mac Central European | macce_general_ci | 1 | 31 | macroman | Mac West European | macroman_general_ci | 1 | 32 | cp852 | DOS Central European | cp852_general_ci | 1 | 33 | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 | 34 | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 | 35 | cp1251 | Windows Cyrillic | cp1251_general_ci | 1 | 36 | utf16 | UTF-16 Unicode | utf16_general_ci | 4 | 37 | utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 | 38 | cp1256 | Windows Arabic | cp1256_general_ci | 1 | 39 | cp1257 | Windows Baltic | cp1257_general_ci | 1 | 40 | utf32 | UTF-32 Unicode | utf32_general_ci | 4 | 41 | binary | Binary pseudo charset | binary | 1 | 42 | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 | 43 | cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 | 44 | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 | 45 | gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 | 46 +----------+---------------------------------+---------------------+--------+ 47 41 rows in set (0.00 sec) 48 49 //There're 41 results of the supported character set. 50 //Each character set has a default collation. 51 //Maxlen is the max bytes of corresponding character set(eg. utf8mb4 supports 4 bytes).
Check character set parameters of current MySQL server.
1 (root@localhost mysql3306.sock)[(none)]>show variables like 'character%'; 2 +--------------------------+----------------------------------------------------------------+ 3 | Variable_name | Value | 4 +--------------------------+----------------------------------------------------------------+ 5 | character_set_client | utf8 | 6 | character_set_connection | utf8 | 7 | character_set_database | utf8 | 8 | character_set_filesystem | binary | 9 | character_set_results | utf8 | 10 | character_set_server | utf8 | 11 | character_set_system | utf8 | 12 | character_sets_dir | /usr/local/mysql-5.7.21-linux-glibc2.12-x86_64/share/charsets/ | 13 +--------------------------+----------------------------------------------------------------+ 14 8 rows in set (0.00 sec) 15 16 //character_set_client:It's used by client when connect to servers for requesting data. 17 //character_set_connection:It's used for those literals not have a character set introducer for conversion. 18 //character_set_database:It's used by default database.The value of "character_set_server" will be inherited if it is not specified. 19 //character_set_filesystem:It's used to interpret string literals refer to file names. 20 //character_set_results:It's used to return query results to the client. 21 //character_set_server:It's the default character set of server. 22 //character_set_system:It's used by server for storing identifiers,the value is utf8 forever. 23 //character_sets_dir:It's the directory where contains the xml files of installed character set.
The relationship of above character set parameters shows below.
Check collation parameters of current MySQL server.
1 (root@localhost mysql3306.sock)[(none)]>show variables like 'collation%'; 2 +----------------------+-----------------+ 3 | Variable_name | Value | 4 +----------------------+-----------------+ 5 | collation_connection | utf8_general_ci | 6 | collation_database | utf8_general_ci | 7 | collation_server | utf8_general_ci | 8 +----------------------+-----------------+ 9 3 rows in set (0.01 sec) 10 11 //collation_connection:The collation of connection character set. 12 //collation_database:The collation of default database.It will inherite the value of "collation_server" if not specified. 13 //collation_server:The default collation of server. 14 //"ci" means Case Insensitive.
Case of change character set from utf8 to latin1 with "set names ...;".
1 (root@localhost mysql3306.sock)[(none)]>s 2 -------------- 3 mysql Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using EditLine wrapper 4 5 Connection id: 7 6 Current database: 7 Current user: root@localhost 8 SSL: Not in use 9 Current pager: stdout 10 Using outfile: '' 11 Using delimiter: ; 12 Server version: 5.7.21-log MySQL Community Server (GPL) 13 Protocol version: 10 14 Connection: Localhost via UNIX socket 15 Server characterset: utf8 16 Db characterset: utf8 17 Client characterset: utf8 18 Conn. characterset: utf8 19 UNIX socket: /tmp/mysql3306.sock 20 Uptime: 4 hours 33 min 11 sec 21 22 Threads: 1 Questions: 52 Slow queries: 0 Opens: 110 Flush tables: 1 Open tables: 103 Queries per second avg: 0.003 23 -------------- 24 25 (root@localhost mysql3306.sock)[(none)]>set names latin1; 26 Query OK, 0 rows affected (0.00 sec) 27 28 (root@localhost mysql3306.sock)[(none)]>s 29 -------------- 30 mysql Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using EditLine wrapper 31 32 Connection id: 7 33 Current database: 34 Current user: root@localhost 35 SSL: Not in use 36 Current pager: stdout 37 Using outfile: '' 38 Using delimiter: ; 39 Server version: 5.7.21-log MySQL Community Server (GPL) 40 Protocol version: 10 41 Connection: Localhost via UNIX socket 42 Server characterset: utf8 43 Db characterset: utf8 44 Client characterset: latin1 45 Conn. characterset: latin1 46 UNIX socket: /tmp/mysql3306.sock 47 Uptime: 4 hours 33 min 18 sec 48 49 Threads: 1 Questions: 56 Slow queries: 0 Opens: 110 Flush tables: 1 Open tables: 103 Queries per second avg: 0.003 50 -------------- 51 52 (root@localhost mysql3306.sock)[(none)]>select @@character_set_client; 53 +------------------------+ 54 | @@character_set_client | 55 +------------------------+ 56 | latin1 | 57 +------------------------+ 58 1 row in set (0.00 sec) 59 60 (root@localhost mysql3306.sock)[(none)]>select @@character_set_connection; 61 +----------------------------+ 62 | @@character_set_connection | 63 +----------------------------+ 64 | latin1 | 65 +----------------------------+ 66 1 row in set (0.00 sec) 67 68 (root@localhost mysql3306.sock)[(none)]>select @@character_set_results; 69 +-------------------------+ 70 | @@character_set_results | 71 +-------------------------+ 72 | latin1 | 73 +-------------------------+ 74 1 row in set (0.00 sec) 75 76 (root@localhost mysql3306.sock)[(none)]>show variables like '%collation%'; 77 +----------------------+-------------------+ 78 | Variable_name | Value | 79 +----------------------+-------------------+ 80 | collation_connection | latin1_swedish_ci | 81 | collation_database | utf8_general_ci | 82 | collation_server | utf8_general_ci | 83 +----------------------+-------------------+ 84 3 rows in set (0.00 sec) 85 86 //The influence of command "set names latin1" to character set is to change "character_set_client","character_set_connection","character_set_results" into latin1. 87 //The influence of command "set names latin1" to collation is to change "collation_connection" into latin1.
Case of change character set from utf8 to latin1 with "set character set ...;".
1 (root@localhost mysql3306.sock)[(none)]>s 2 -------------- 3 mysql Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using EditLine wrapper 4 5 Connection id: 7 6 Current database: 7 Current user: root@localhost 8 SSL: Not in use 9 Current pager: stdout 10 Using outfile: '' 11 Using delimiter: ; 12 Server version: 5.7.21-log MySQL Community Server (GPL) 13 Protocol version: 10 14 Connection: Localhost via UNIX socket 15 Server characterset: utf8 16 Db characterset: utf8 17 Client characterset: latin1 18 Conn. characterset: utf8 19 UNIX socket: /tmp/mysql3306.sock 20 Uptime: 4 hours 44 min 8 sec 21 22 Threads: 1 Questions: 72 Slow queries: 0 Opens: 111 Flush tables: 1 Open tables: 104 Queries per second avg: 0.004 23 -------------- 24 25 (root@localhost mysql3306.sock)[(none)]>select @@character_set_client; 26 +------------------------+ 27 | @@character_set_client | 28 +------------------------+ 29 | latin1 | 30 +------------------------+ 31 1 row in set (0.00 sec) 32 33 (root@localhost mysql3306.sock)[(none)]>select @@character_set_connection; 34 +----------------------------+ 35 | @@character_set_connection | 36 +----------------------------+ 37 | utf8 | 38 +----------------------------+ 39 1 row in set (0.00 sec) 40 41 (root@localhost mysql3306.sock)[(none)]>select @@character_set_results; 42 +-------------------------+ 43 | @@character_set_results | 44 +-------------------------+ 45 | latin1 | 46 +-------------------------+ 47 1 row in set (0.00 sec) 48 49 (root@localhost mysql3306.sock)[(none)]>show variables like '%collation%'; 50 +----------------------+-----------------+ 51 | Variable_name | Value | 52 +----------------------+-----------------+ 53 | collation_connection | utf8_general_ci | 54 | collation_database | utf8_general_ci | 55 | collation_server | utf8_general_ci | 56 +----------------------+-----------------+ 57 3 rows in set (0.00 sec) 58 59 //The two variables about "connection" was not altered this time.They were still utf8 relevant.
Case of change character set of server & database from utf8 to latin1.
1 (root@localhost mysql3306.sock)[(none)]>s 2 -------------- 3 mysql Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using EditLine wrapper 4 5 Connection id: 7 6 Current database: 7 Current user: root@localhost 8 SSL: Not in use 9 Current pager: stdout 10 Using outfile: '' 11 Using delimiter: ; 12 Server version: 5.7.21-log MySQL Community Server (GPL) 13 Protocol version: 10 14 Connection: Localhost via UNIX socket 15 Server characterset: utf8 16 Db characterset: utf8 17 Client characterset: utf8 18 Conn. characterset: utf8 19 UNIX socket: /tmp/mysql3306.sock 20 Uptime: 4 hours 50 min 33 sec 21 22 Threads: 1 Questions: 92 Slow queries: 0 Opens: 111 Flush tables: 1 Open tables: 104 Queries per second avg: 0.005 23 -------------- 24 25 (root@localhost mysql3306.sock)[(none)]>set character_set_server=latin1; 26 Query OK, 0 rows affected (0.00 sec) 27 28 (root@localhost mysql3306.sock)[(none)]>set character_set_database=latin1; 29 Query OK, 0 rows affected, 1 warning (0.00 sec) 30 31 (root@localhost mysql3306.sock)[(none)]>show warnings; 32 +---------+------+-------------------------------------------------------------------------------------------------+ 33 | Level | Code | Message | 34 +---------+------+-------------------------------------------------------------------------------------------------+ 35 | Warning | 1681 | Updating 'character_set_database' is deprecated. It will be made read-only in a future release. | 36 +---------+------+-------------------------------------------------------------------------------------------------+ 37 1 row in set (0.00 sec) 38 39 (root@localhost mysql3306.sock)[(none)]>s 40 -------------- 41 mysql Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using EditLine wrapper 42 43 Connection id: 7 44 Current database: 45 Current user: root@localhost 46 SSL: Not in use 47 Current pager: stdout 48 Using outfile: '' 49 Using delimiter: ; 50 Server version: 5.7.21-log MySQL Community Server (GPL) 51 Protocol version: 10 52 Connection: Localhost via UNIX socket 53 Server characterset: latin1 54 Db characterset: latin1 55 Client characterset: utf8 56 Conn. characterset: utf8 57 UNIX socket: /tmp/mysql3306.sock 58 Uptime: 4 hours 51 min 0 sec 59 60 Threads: 1 Questions: 98 Slow queries: 0 Opens: 111 Flush tables: 1 Open tables: 104 Queries per second avg: 0.005 61 -------------- 62 63 //It shows that change "character_set_server" online is not supported in future release because of safety concern. 64 //Change character set of database may bring about risk of data loss if your client program using the supersetwhile database using subset.(eg. client->utf8mb4,database->utf8,will lost emoji data.)
Example of messy code.
1 (root@localhost mysql3306.sock)[zlm]>create table test_charset( 2 -> s1 char(10) character set latin1 not null, 3 -> s2 char(10) char set gbk, 4 -> s3 varchar(10) charset utf8, 5 -> s4 varchar(10)) character set=utf8mb4 engine=innodb; 6 Query OK, 0 rows affected (0.01 sec) 7 8 (root@localhost mysql3306.sock)[zlm]>show create table test_charset; 9 +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 10 | Table | Create Table | 11 +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 12 | test_charset | CREATE TABLE `test_charset` ( 13 `s1` char(10) CHARACTER SET latin1 NOT NULL, 14 `s2` char(10) CHARACTER SET gbk DEFAULT NULL, 15 `s3` varchar(10) CHARACTER SET utf8 DEFAULT NULL, 16 `s4` varchar(10) DEFAULT NULL 17 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 | 18 +--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 19 1 row in set (0.00 sec) 20 21 (root@localhost mysql3306.sock)[zlm]>insert into test_charset values('ASCII','国标','万国','表情'); 22 Query OK, 1 row affected (0.00 sec) 23 24 (root@localhost mysql3306.sock)[zlm]>select * from test_charset; 25 +-------+--------+--------+--------+ 26 | s1 | s2 | s3 | s4 | 27 +-------+--------+--------+--------+ 28 | ASCII | 国标 | 万国 | 表情 | 29 +-------+--------+--------+--------+ 30 1 row in set (0.00 sec) 31 32 (root@localhost mysql3306.sock)[zlm]>s 33 -------------- 34 mysql Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using EditLine wrapper 35 36 Connection id: 9 37 Current database: zlm 38 Current user: root@localhost 39 SSL: Not in use 40 Current pager: stdout 41 Using outfile: '' 42 Using delimiter: ; 43 Server version: 5.7.21-log MySQL Community Server (GPL) 44 Protocol version: 10 45 Connection: Localhost via UNIX socket 46 Server characterset: utf8 47 Db characterset: utf8 48 Client characterset: utf8 49 Conn. characterset: utf8 50 UNIX socket: /tmp/mysql3306.sock 51 Uptime: 5 hours 26 min 54 sec 52 53 Threads: 1 Questions: 123 Slow queries: 0 Opens: 118 Flush tables: 1 Open tables: 111 Queries per second avg: 0.006 54 -------------- 55 56 (root@localhost mysql3306.sock)[zlm]>set names latin1; 57 Query OK, 0 rows affected (0.00 sec) 58 59 (root@localhost mysql3306.sock)[zlm]>s 60 -------------- 61 mysql Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using EditLine wrapper 62 63 Connection id: 9 64 Current database: zlm 65 Current user: root@localhost 66 SSL: Not in use 67 Current pager: stdout 68 Using outfile: '' 69 Using delimiter: ; 70 Server version: 5.7.21-log MySQL Community Server (GPL) 71 Protocol version: 10 72 Connection: Localhost via UNIX socket 73 Server characterset: utf8 74 Db characterset: utf8 75 Client characterset: latin1 76 Conn. characterset: latin1 77 UNIX socket: /tmp/mysql3306.sock 78 Uptime: 5 hours 18 min 0 sec 79 80 Threads: 1 Questions: 114 Slow queries: 0 Opens: 118 Flush tables: 1 Open tables: 111 Queries per second avg: 0.008 81 -------------- 82 83 (root@localhost mysql3306.sock)[zlm]> 84 85 (root@localhost mysql3306.sock)[zlm]>select * from test_charset; 86 +-------+------+------+------+ 87 | s1 | s2 | s3 | s4 | 88 +-------+------+------+------+ 89 | ASCII | ?? | ?? | ?? | 90 +-------+------+------+------+ 91 1 row in set (0.00 sec) 92 93 (root@localhost mysql3306.sock)[zlm]>set names gbk; 94 Query OK, 0 rows affected (0.00 sec) 95 96 (root@localhost mysql3306.sock)[zlm]>s 97 -------------- 98 mysql Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using EditLine wrapper 99 100 Connection id: 9 101 Current database: zlm 102 Current user: root@localhost 103 SSL: Not in use 104 Current pager: stdout 105 Using outfile: '' 106 Using delimiter: ; 107 Server version: 5.7.21-log MySQL Community Server (GPL) 108 Protocol version: 10 109 Connection: Localhost via UNIX socket 110 Server characterset: utf8 111 Db characterset: utf8 112 Client characterset: gbk 113 Conn. characterset: gbk 114 UNIX socket: /tmp/mysql3306.sock 115 Uptime: 5 hours 29 min 22 sec 116 117 Threads: 1 Questions: 129 Slow queries: 0 Opens: 118 Flush tables: 1 Open tables: 111 Queries per second avg: 0.006 118 -------------- 119 120 (root@localhost mysql3306.sock)[zlm]>select * from test_charset; 121 +-------+------+------+------+ 122 | s1 | s2 | s3 | s4 | 123 +-------+------+------+------+ 124 | ASCII | ¹螠 | β¹� | ±� | 125 +-------+------+------+------+ 126 1 row in set (0.00 sec) 127 128 (root@localhost mysql3306.sock)[zlm]>set names utf8mb4; 129 Query OK, 0 rows affected (0.00 sec) 130 131 (root@localhost mysql3306.sock)[zlm]>s 132 -------------- 133 mysql Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using EditLine wrapper 134 135 Connection id: 9 136 Current database: zlm 137 Current user: root@localhost 138 SSL: Not in use 139 Current pager: stdout 140 Using outfile: '' 141 Using delimiter: ; 142 Server version: 5.7.21-log MySQL Community Server (GPL) 143 Protocol version: 10 144 Connection: Localhost via UNIX socket 145 Server characterset: utf8 146 Db characterset: utf8 147 Client characterset: utf8mb4 148 Conn. characterset: utf8mb4 149 UNIX socket: /tmp/mysql3306.sock 150 Uptime: 5 hours 30 min 15 sec 151 152 Threads: 1 Questions: 134 Slow queries: 0 Opens: 118 Flush tables: 1 Open tables: 111 Queries per second avg: 0.006 153 -------------- 154 155 (root@localhost mysql3306.sock)[zlm]>select * from test_charset; 156 +-------+--------+--------+--------+ 157 | s1 | s2 | s3 | s4 | 158 +-------+--------+--------+--------+ 159 | ASCII | 国标 | 万国 | 表情 | 160 +-------+--------+--------+--------+ 161 1 row in set (0.00 sec) 162 163 //MySQL support define character set on database,table even on a single column. 164 //Messy code will occur when "character_set_result" is subset of the value of character set which has been stored only if it turns back to the value equal or bigger than the stored value.
Example of losting data.
1 (root@localhost mysql3306.sock)[(none)]>s 2 -------------- 3 mysql Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using EditLine wrapper 4 5 Connection id: 4 6 Current database: 7 Current user: root@localhost 8 SSL: Not in use 9 Current pager: stdout 10 Using outfile: '' 11 Using delimiter: ; 12 Server version: 5.7.21-log MySQL Community Server (GPL) 13 Protocol version: 10 14 Connection: Localhost via UNIX socket 15 Server characterset: utf8 16 Db characterset: utf8 17 Client characterset: utf8 18 Conn. characterset: utf8 19 UNIX socket: /tmp/mysql3306.sock 20 Uptime: 1 min 45 sec 21 22 Threads: 2 Questions: 23 Slow queries: 0 Opens: 108 Flush tables: 1 Open tables: 101 Queries per second avg: 0.219 23 -------------- 24 25 (root@localhost mysql3306.sock)[(none)]>set @@character_set_server=latin1; 26 Query OK, 0 rows affected (0.00 sec) 27 28 (root@localhost mysql3306.sock)[(none)]>set @@character_set_database=latin1; 29 Query OK, 0 rows affected, 1 warning (0.01 sec) 30 31 (root@localhost mysql3306.sock)[(none)]>set @@character_set_connection=latin1; 32 Query OK, 0 rows affected (0.00 sec) 33 34 (root@localhost mysql3306.sock)[(none)]>s 35 -------------- 36 mysql Ver 14.14 Distrib 5.7.21, for linux-glibc2.12 (x86_64) using EditLine wrapper 37 38 Connection id: 4 39 Current database: 40 Current user: root@localhost 41 SSL: Not in use 42 Current pager: stdout 43 Using outfile: '' 44 Using delimiter: ; 45 Server version: 5.7.21-log MySQL Community Server (GPL) 46 Protocol version: 10 47 Connection: Localhost via UNIX socket 48 Server characterset: latin1 49 Db characterset: latin1 50 Client characterset: utf8 51 Conn. characterset: latin1 52 UNIX socket: /tmp/mysql3306.sock 53 Uptime: 2 min 16 sec 54 55 Threads: 2 Questions: 29 Slow queries: 0 Opens: 108 Flush tables: 1 Open tables: 101 Queries per second avg: 0.213 56 -------------- 57 58 (root@localhost mysql3306.sock)[(none)]>insert into test_charset values('ASCII','国标','万国','表情'); 59 ERROR 1046 (3D000): No database selected 60 (root@localhost mysql3306.sock)[(none)]>use zlm 61 Reading table information for completion of table and column names 62 You can turn off this feature to get a quicker startup with -A 63 64 Database changed 65 (root@localhost mysql3306.sock)[zlm]>insert into test_charset values('ASCII','国标','万国','表情'); 66 Query OK, 1 row affected, 3 warnings (0.01 sec) 67 68 (root@localhost mysql3306.sock)[zlm]>show warnings; 69 +---------+------+-----------------------------------------------------------+ 70 | Level | Code | Message | 71 +---------+------+-----------------------------------------------------------+ 72 | Warning | 1300 | Invalid utf8 character string: 'xE5x9BxBDxE6xA0x87' | 73 | Warning | 1300 | Invalid utf8 character string: 'xE4xB8x87xE5x9BxBD' | 74 | Warning | 1300 | Invalid utf8 character string: 'xE8xA1xA8xE6x83x85' | 75 +---------+------+-----------------------------------------------------------+ 76 3 rows in set (0.00 sec) 77 78 (root@localhost mysql3306.sock)[zlm]>select @@character_set_results; 79 +-------------------------+ 80 | @@character_set_results | 81 +-------------------------+ 82 | utf8 | 83 +-------------------------+ 84 1 row in set (0.00 sec) 85 86 (root@localhost mysql3306.sock)[zlm]>select * from test_charset; 87 +-------+--------+--------+--------+ 88 | s1 | s2 | s3 | s4 | 89 +-------+--------+--------+--------+ 90 | ASCII | 国标 | 万国 | 表情 | 91 | ASCII | ?? | ?? | ?? | 92 +-------+--------+--------+--------+ 93 2 rows in set (0.00 sec) 94 95 (root@localhost mysql3306.sock)[zlm]>set @@character_set_results=latin1; 96 Query OK, 0 rows affected (0.00 sec) 97 98 (root@localhost mysql3306.sock)[zlm]>select @@character_set_results; 99 +-------------------------+ 100 | @@character_set_results | 101 +-------------------------+ 102 | latin1 | 103 +-------------------------+ 104 1 row in set (0.00 sec) 105 106 (root@localhost mysql3306.sock)[zlm]>select * from test_charset; 107 +-------+------+------+------+ 108 | s1 | s2 | s3 | s4 | 109 +-------+------+------+------+ 110 | ASCII | ?? | ?? | ?? | 111 | ASCII | ?? | ?? | ?? | 112 +-------+------+------+------+ 113 2 rows in set (0.00 sec) 114 115 //The data of first row has been correctly resored in database. 116 //The data of second row has lost the Chinese character data. 117 //The value of "character_set_result" only influence the screen output. 118 //There's a data loss risk while value of character set of client is superset of the one of database.
Summary
- MySQL character set is flexible and various,be more careful when modify data.
- The parameter "default_character_set" only affect original mysql client not for the other client tools.
- Make sure your character set of client is a subset but superset of the value of database when modifying data.Meanwhile it needs to be small than character set of connection to avoid data loss.
- It's recommended to set character set of database to a big set such as utf8 even utf8mb4 to be compatible with most characters of various languages.