• 涂抹mysql笔记-mysql字符集


    字符集:查看mysql数据库当前都支持哪些字符集:
    system@(none)>show character set;
    +----------+-----------------------------+---------------------+--------+
    | Charset | Description | Default collation | Maxlen |
    +----------+-----------------------------+---------------------+--------+
    | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
    | dec8 | DEC West European | dec8_swedish_ci | 1 |
    | cp850 | DOS West European | cp850_general_ci | 1 |
    | hp8 | HP West European | hp8_english_ci | 1 |
    | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
    | latin1 | cp1252 West European | latin1_swedish_ci | 1 |
    | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
    | swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
    | ascii | US ASCII | ascii_general_ci | 1 |
    | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
    | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
    | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
    | tis620 | TIS620 Thai | tis620_thai_ci | 1 |
    | euckr | EUC-KR Korean | euckr_korean_ci | 2 |
    | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
    | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
    | greek | ISO 8859-7 Greek | greek_general_ci | 1 |
    | cp1250 | Windows Central European | cp1250_general_ci | 1 |
    | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
    | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
    | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
    | utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
    | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
    | cp866 | DOS Russian | cp866_general_ci | 1 |
    | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
    | macce | Mac Central European | macce_general_ci | 1 |
    | macroman | Mac West European | macroman_general_ci | 1 |
    | cp852 | DOS Central European | cp852_general_ci | 1 |
    | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
    | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
    | cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
    | utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
    | utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
    | cp1256 | Windows Arabic | cp1256_general_ci | 1 |
    | cp1257 | Windows Baltic | cp1257_general_ci | 1 |
    | utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
    | binary | Binary pseudo charset | binary | 1 |
    | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
    | cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
    | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
    +----------+-----------------------------+---------------------+--------+
    40 rows in set (0.00 sec)

    如何确定某个字符集支持哪些校对规则:
    一个字符集至少会拥有一个校对规则,查看latin1字符集所拥有的校队规则:
    system@(none)> show collation like 'latin1%';
    +-------------------+---------+----+---------+----------+---------+
    | Collation | Charset | Id | Default | Compiled | Sortlen |
    +-------------------+---------+----+---------+----------+---------+
    | latin1_german1_ci | latin1 | 5 | | Yes | 1 |
    | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
    | latin1_danish_ci | latin1 | 15 | | Yes | 1 |
    | latin1_german2_ci | latin1 | 31 | | Yes | 2 |
    | latin1_bin | latin1 | 47 | | Yes | 1 |
    | latin1_general_ci | latin1 | 48 | | Yes | 1 |
    | latin1_general_cs | latin1 | 49 | | Yes | 1 |
    | latin1_spanish_ci | latin1 | 94 | | Yes | 1 |
    +-------------------+---------+----+---------+----------+---------+
    8 rows in set (0.01 sec)

    --character_set_server 指定全局粒度的默认字符集。
    --collation_server指定全局粒度的默认校队规则。

    查看字符集和校对规则:show global variables like '%server';

    system@(none)>show global variables like '%server';
    +----------------------+-----------------+
    | Variable_name | Value |
    +----------------------+-----------------+
    | character_set_server | utf8 |
    | collation_server | utf8_general_ci |
    +----------------------+-----------------+
    2 rows in set (0.00 sec)

    将全局粒度默认的字符集改为gbk:
    system@(none)>set global character_set_server=gbk;
    Query OK, 0 rows affected (0.00 sec)
    system@(none)>show global variables like '%server';
    +----------------------+----------------+
    | Variable_name | Value |
    +----------------------+----------------+
    | character_set_server | gbk |
    | collation_server | gbk_chinese_ci |
    +----------------------+----------------+
    2 rows in set (0.00 sec)

    小知识:全局粒度是什么意思?
    在mysql服务运行期间,修改系统变量的值也有作用域的。mysql中的系统变量的作用域分为全局(global)和当前会话(session)两类。对于全局的修改,作用域修改成功后新创建的会话,但对当前执行修改的会话无效,如果是会话级的修改(执行set命令并且未指定global选项就是会话级修改)则只作用于当前会话,本次会话结束后所做的修改也自动结束。此外需要注意mysql中即使是全局的参数修改,并作用域最多也只在当前mysql服务的生命周期内。mysql服务一旦重启那么值钱的设置也全部无效(不管是全局还是会话)因此要是希望所做的设置永久生效,那么除了在全局粒度修改外,还需要手动修改初始化参数文件,或者是在启动mysql服务时,在命令行中显式指定相关选项值。很多情况下,显示的字符出现乱码,也有可能是因为客户端当前的字符集设置与mysql服务端保存字符时所用的字符集不相符所致
    system@jason>show variables like 'character_set_client';
    +----------------------+-------+
    | Variable_name | Value |
    +----------------------+-------+
    | character_set_client | gbk |
    +----------------------+-------+
    1 row in set (0.00 sec)
    修改客户端所使用的字符集:set character_set_results=gbk;执行后与server端的字符集一致就不会出现乱码了。

    ~客户端发出的sql语句所使用的字符集由系统变量character_set_client来指定
    ~mysql服务端接收到语句后会用到character_set_connection和collation_connection两个系统变量中的设置。
    并且会将客户端发送的语句字符集由character_set_client转换到character_set_connection(除非用户执行语句时已对字符列明确指定了字符集)对于语句中指定的字符串的比较或排序,还需要应用collation_connection中指定的校对规则处理而对于语句中指定的列的比较则无关collation_connection的设置了,因为对象的表列拥有自己的校对规则,他们拥有更高的优先级。
    ~mysql服务端执行完语句后会按照character_set_results系统变量设定的字符集返回结果集(或错误信息)到客户端。

    system@jason>show global variables like 'character_set\_%';
    +--------------------------+--------+
    | Variable_name | Value |
    +--------------------------+--------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database | utf8 |
    | character_set_filesystem | binary |
    | character_set_results | utf8 |
    | character_set_server | gbk |
    | character_set_system | utf8 |
    +--------------------------+--------+
    7 rows in set (0.00 sec)
    这几个系统变量的值,默认继承自服务端启动时默认的字符集设置,也就是我们编译时指定的utf8.不过有一项例外,那是因为前面我们将character_set_server的值设置成了gbk

    system@jason> show variables like 'character_set\_%';
    +--------------------------+--------+
    | Variable_name | Value |
    +--------------------------+--------+
    | character_set_client | gbk |
    | character_set_connection | gbk |
    | character_set_database | utf8 |
    | character_set_filesystem | binary |
    | character_set_results | gbk |
    | character_set_server | utf8 |
    | character_set_system | utf8 |
    +--------------------------+--------+
    7 rows in set (0.00 sec)

    一次性修改所有与连接相关的字符集变量设置,例如设置当前会话字符集为utf8执行命令如下:
    system@jason> set names utf8;
    Query OK, 0 rows affected (0.00 sec)
    system@jason>show variables like 'character_set\_%';
    +--------------------------+--------+
    | Variable_name | Value |
    +--------------------------+--------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database | utf8 |
    | character_set_filesystem | binary |
    | character_set_results | utf8 |
    | character_set_server | utf8 |
    | character_set_system | utf8 |
    +--------------------------+--------+
    7 rows in set (0.00 sec)
    system@jason>set names gbk;
    Query OK, 0 rows affected (0.00 sec)

    system@jason>show variables like 'character_set\_%';
    +--------------------------+--------+
    | Variable_name | Value |
    +--------------------------+--------+
    | character_set_client | gbk |
    | character_set_connection | gbk |
    | character_set_database | utf8 |
    | character_set_filesystem | binary |
    | character_set_results | gbk |
    | character_set_server | utf8 |
    | character_set_system | utf8 |
    +--------------------------+--------+
    7 rows in set (0.00 sec)


    system@jason>set names utf8;
    Query OK, 0 rows affected (0.00 sec)

    system@jason>show variables like 'character_set\_%';
    +--------------------------+--------+
    | Variable_name | Value |
    +--------------------------+--------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database | utf8 |
    | character_set_filesystem | binary |
    | character_set_results | utf8 |
    | character_set_server | utf8 |
    | character_set_system | utf8 |
    +--------------------------+--------+
    7 rows in set (0.00 sec)
    set character set latin1;


    system@jason>set character set latin1;
    Query OK, 0 rows affected (0.00 sec)

    system@jason>show variables like 'character_set\_%';
    +--------------------------+--------+
    | Variable_name | Value |
    +--------------------------+--------+
    | character_set_client | latin1 |
    | character_set_connection | utf8 |
    | character_set_database | utf8 |
    | character_set_filesystem | binary |
    | character_set_results | latin1 |
    | character_set_server | utf8 |
    | character_set_system | utf8 |
    +--------------------------+--------+
    7 rows in set (0.00 sec)

    system@jason>

    创建5ienet数据库并指定该库的默认字符集为latin1
    system@jason>create database 5ienet charset latin1;
    Query OK, 1 row affected (0.01 sec)
    修改5ienet数据库的字符集为utf8
    system@(none)>alter database 5ienet charset utf8;
    Query OK, 1 row affected (0.00 sec)

    数据库级的字符集在数据库同名操作系统目录下的db.opt文件中。
    [mysql@linux01 5ienet]$ cat db.opt
    default-character-set=utf8
    default-collation=utf8_general_ci

    system@(none)>show create database 5ienet;
    +----------+-----------------------------------------------------------------+
    | Database | Create Database |
    +----------+-----------------------------------------------------------------+
    | 5ienet | CREATE DATABASE `5ienet` /*!40100 DEFAULT CHARACTER SET utf8 */ |
    +----------+-----------------------------------------------------------------+
    1 row in set (0.00 sec)

    表级指定创建一个t1表现不指定字符集:
    system@5ienet>create table t1 (id int);
    Query OK, 0 rows affected (0.02 sec)
    再重建一个t2表指定该表的默认字符集为latin1

    system@5ienet>create table t2 (id int) charset latin1;
    Query OK, 0 rows affected (0.02 sec)

    system@5ienet>show create table t1 G
    *************************** 1. row ***************************
    Table: t1
    Create Table: CREATE TABLE `t1` (
    `id` int(11) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)

    system@5ienet>show create table t2 G
    *************************** 1. row ***************************
    Table: t2
    Create Table: CREATE TABLE `t2` (
    `id` int(11) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    1 row in set (0.00 sec)
    所以创建对象时没有指定字符集那么它会继承数据库粒度的字符集,如果明确指定了字符集那么该表的字符集就是我们指定的字符集。

    操作示例:
    创建表t3,拥有三个列 每个列指定不同的字符集
    create table t3(
    v1 varchar(20) charset latin1,
    v2 varchar(20) charset gbk,
    v3 varchar(20) charset utf8
    );
    设置当前客户端连接会话的字符集为utf8
    system@5ienet>set names utf8;
    Query OK, 0 rows affected (0.00 sec)

    system@5ienet>show variables like 'character%';
    +--------------------------+------------------------+
    | Variable_name | Value |
    +--------------------------+------------------------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database | utf8 |
    | character_set_filesystem | binary |
    | character_set_results | utf8 |
    | character_set_server | gbk |
    | character_set_system | utf8 |
    | character_sets_dir | /mysql/share/charsets/ |
    +--------------------------+------------------------+
    8 rows in set (0.00 sec)

    system@5ienet> insert into t3 values('cn中国','cn中国','cn中国');
    ERROR 1366 (HY000): Incorrect string value: 'xD6xD0xB9xFA' for column 'v1' at row 1
    报错提示v1列指定了错误的字符串值,导致插入失败。这是因为latin1是西文字符集,并不支持中文这样的多字节字符。
    1366在mysql5.6之前是警告级别,可以插入数据,在5.6之后改为错误级别了。由sql_mode控制。

    system@5ienet>insert into t3 values('china','cn中国','cn中国');
    ERROR 1366 (HY000): Incorrect string value: 'xD6xD0xB9xFA' for column 'v2' at row 1
    这次提示v2有错误
    system@5ienet>insert into t3 values('china','china','cn中国');
    ERROR 1366 (HY000): Incorrect string value: 'xD6xD0xB9xFA' for column 'v3' at row 1
    提示v3错误
    插入时报错是因为客户端的字符集和存储所用的字符集不匹配所致。
    system@5ienet> show variables like 'character%';
    +--------------------------+------------------------+
    | Variable_name | Value |
    +--------------------------+------------------------+
    | character_set_client | utf8 |
    | character_set_connection | utf8 |
    | character_set_database | utf8 |
    | character_set_filesystem | binary |
    | character_set_results | utf8 |
    | character_set_server | gbk |
    | character_set_system | utf8 |
    | character_sets_dir | /mysql/share/charsets/ |
    +--------------------------+------------------------+
    8 rows in set (0.00 sec)

    system@5ienet>set names gbk;
    Query OK, 0 rows affected (0.00 sec)

    system@5ienet> show variables like 'character%';
    +--------------------------+------------------------+
    | Variable_name | Value |
    +--------------------------+------------------------+
    | character_set_client | gbk |
    | character_set_connection | gbk |
    | character_set_database | utf8 |
    | character_set_filesystem | binary |
    | character_set_results | gbk |
    | character_set_server | gbk |
    | character_set_system | utf8 |
    | character_sets_dir | /mysql/share/charsets/ |
    +--------------------------+------------------------+
    8 rows in set (0.00 sec)

    system@5ienet>insert into t3 values('china','cn中国','cn中国');
    Query OK, 1 row affected (0.00 sec)
    成功插入。
    system@5ienet>select * from t3;
    +-------+--------+--------+
    | v1 | v2 | v3 |
    +-------+--------+--------+
    | china | cn中国 | cn中国 |
    +-------+--------+--------+
    1 row in set (0.00 sec)
    所以连接时的字符集和存储时的字符集一致结果才能正常显示。那为啥utf8的v3列为什么也能正常显示,这是因为character_result被转换成了gbk字符集。把当前会话字符集改为utf8再做查询。
    system@5ienet> set names utf8;
    Query OK, 0 rows affected (0.00 sec)

    system@5ienet>select v1,v2,v3,length(v1),length(v2),length(v3) from t3;
    +-------+----------+----------+------------+------------+------------+
    | v1 | v2 | v3 | length(v1) | length(v2) | length(v3) |
    +-------+----------+----------+------------+------------+------------+
    | china | cn涓噁浗 | cn涓噁浗 | 5 | 6 | 8 |
    +-------+----------+----------+------------+------------+------------+
    1 row in set (0.00 sec)

    system@5ienet>set names gbk;
    Query OK, 0 rows affected (0.00 sec)

    system@5ienet>select * from t3;
    +-------+--------+--------+
    | v1 | v2 | v3 |
    +-------+--------+--------+
    | china | cn中国 | cn中国 |
    +-------+--------+--------+
    1 row in set (0.00 sec)

    system@5ienet> alter table t3 modify v3 varchar(20) charset gb2312;
    Query OK, 1 row affected (0.04 sec)
    Records: 1 Duplicates: 0 Warnings: 0

    system@5ienet>select * from t3;
    +-------+--------+--------+
    | v1 | v2 | v3 |
    +-------+--------+--------+
    | china | cn中国 | cn中国 |
    +-------+--------+--------+
    1 row in set (0.00 sec)

    system@5ienet>alter table t3 modify v3 varchar(20) charset latin1;
    ERROR 1366 (HY000): Incorrect string value: 'xD6xD0xB9xFA' for column 'v3' at row 1
    把v3改为latin1报错。

  • 相关阅读:
    OI中的小智慧
    洛谷 P2335 SDOI 2005 毒瘤 位图(也补上注释了)
    洛谷P4779 Dijkstra 模板
    洛谷 P1156 垃圾陷阱 谈论剪枝,非满分
    8/14考试 JWG
    一个好消息 JWG
    刷水题(一) JWG
    C语言运算符优先级从没像现在这样深刻体会
    cron 备忘
    CentOS
  • 原文地址:https://www.cnblogs.com/datalife/p/6723403.html
Copyright © 2020-2023  润新知