转 mysql 的推荐字符集和排序规则的使用介绍

转 mysql 的推荐字符集和排序规则的使用介绍

##sample 1

https://www.percona.com/blog/migrating-to-utf8mb4-things-to-consider/

解释0：
utf8mb4 是推荐的字符集和，但是在从utf8迁移到utf8mb4 字符集的时候，需要注意几个事项

Conclusion
Converting to the utf8mb4 character set brings you the advantages of better performance, a larger range of characters that you can use, including emojis and new collations (sorting rules). This conversion comes at almost no price, and it can be done smoothly.

Ensure:
但是在从utf8迁移到utf8mb4 字符集的时候，需要注意几个事项，比如字段最大长度，索引最大长度等等，具体sql可以查询
https://www.percona.com/blog/migrating-to-utf8mb4-things-to-consider/

You converted all VARCHAR columns that could store more than 16383 characters to the TEXT or LONGTEXT data type
You adjusted index definitions that could take more than 767 bytes for the REDUNDANT and COMPACT row formats, and 3072 bytes for DYNAMIC and COMPRESSED row formats after migration.
You optimized your queries so that they should not start using internal disk-based temporary tables

##sample 2

https://www.percona.com/blog/2017/03/28/troubleshooting-issues-with-mysql-character-sets-q-a/

解释1 ：每个级别，table ,database 都可以配置排序规则COLLATE， CHARSET，
column 级别可以配置 CHARSET，COLLATE

--table 配置 COLLATE

CREATE TABLE `his_config_info` (
`id` bigint unsigned NOT NULL,
`tenant_id` varchar(128) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '' COMMENT 'ç§Ÿæˆ·å—æ®µ',
PRIMARY KEY (`nid`) USING BTREE,
KEY `idx_gmt_create` (`gmt_create`) USING BTREE,
KEY `idx_gmt_modified` (`gmt_modified`) USING BTREE,
KEY `idx_did` (`data_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=196 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC COMMENT='å¤šç§Ÿæˆ·æ”¹é€ ' |

--database 配置 COLLATE
CREATE DATABASE `account` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */ /*!80016 DEFAULT ENCRYPTION='N' */ |

解释2 ： --table 的字段名字配置 character
注意的是如果是在字段级别配置的排序规则，那么所有字段的character规则必须是一致的，不然的话有在执行排序的时候，有可能可能导致报错
"Illegal mix of collations for operation 'concat_ws' ":

set names utf8是设置插入的客户的环境变量。类似nls_lang

如下为测试案例：表的cp1251的一个字段字符集为 latin1，一个字段字符集为cp1251，虽然可以插入成功，但是执行排序的时候会报错。

mysql> create table cp1251(f1 varchar(100) character set latin1, f2 varchar(100) character set cp1251) engine=innodb;
Query OK, 0 rows affected (0,32 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0,00 sec)

mysql> insert into cp1251 values('Sveta', 'Света');
Query OK, 1 row affected (0,07 sec)

sveta@Thinkie:~/build/mysql-8.0/mysql-test$ ~/build/percona-toolkit/bin/pt-table-checksum h=127.0.0.1,P=13000,u=root,D=test
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
03-18T03:51:58 Error executing EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*explain checksum table*/: DBD::mysql::st execute failed: Illegal mix of collations for operation 'concat_ws' [for Statement "EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*explain checksum table*/"] at /home/sveta/build/percona-toolkit/bin/pt-table-checksum line 11351.

03-18T03:51:58 Error checksumming table db1.cp1251: Error executing checksum query: DBD::mysql::st execute failed: Illegal mix of collations for operation 'concat_ws' [for Statement "REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `f1`, `f2`, CONCAT(ISNULL(`f1`), ISNULL(`f2`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db1`.`cp1251` /*checksum table*/" with ParamValues: 0='db1', 1='cp1251', 2=1, 3=undef, 4=undef, 5=undef] at /home/sveta/build/percona-toolkit/bin/pt-table-checksum line 10741.

A: With this combination, you won’t have any issues: pt-table-checksum uses a complicated set of functions that joins columns and calculates a crc32 checksum on them. In your case, all data will be converted to utf8mb4 and no conflicts will happen.

However, if you use incompatible character sets in a single table, you may get the error "Illegal mix of collations for operation 'concat_ws' ":

解释3：关于排序规则的使用范围，是指sql 在排序时候使用的，

如下为案例：yo
<charset name="utf8mb4">
<collation name="utf8mb4_russian_ci" id="1033">
<rules>
<reset>u0415</reset><p>u0451</p><t>u0401</t>
</rules>
</collaiton>
</charset>

mysql> show collation like 'utf8mb4_russian_ci';
+--------------------+---------+------+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------+---------+------+---------+----------+---------+
| utf8mb4_russian_ci | utf8mb4 | 1033 | | | 8 |
+--------------------+---------+------+---------+----------+---------+
1 row in set (0,03 sec)

mysql> create table test_yo(gen varchar(100) CHARACTER SET utf8mb4, yo varchar(100) CHARACTER SET utf8mb4 collate utf8mb4_russian_ci) engine=innodb default character set=utf8mb4;
Query OK, 0 rows affected (0,25 sec)

mysql> set names utf8mb4;
Query OK, 0 rows affected (0,02 sec)

mysql> insert into test_yo values('ел', 'ел'), ('ель', 'ель'), ('ёлка', 'ёлка');
Query OK, 3 rows affected (0,05 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> insert into test_yo values('Ел', 'Ел'), ('Ель', 'Ель'), ('Ёлка', 'Ёлка');
Query OK, 3 rows affected (0,06 sec)
Records: 3 Duplicates: 0 Warnings: 0

插入2个字段同样的记录，使用不同的字段作为排序字段，可以发现结果输出的不同之处， gen字段是默认的排序规则
mysql> select * from test_yo order by gen;
+----------+----------+
| gen | yo |
+----------+----------+
| ел | ел |
| Ел | Ел |
| ёлка | ёлка | <-
| Ёлка | Ёлка | <-
| ель | ель |
| Ель | Ель |
+----------+----------+
6 rows in set (0,00 sec)

插入2个字段同样的记录，使用不同的字段作为排序字段，可以发现结果输出的不同之处， yo字段是自定义的排序规则 utf8mb4_russian_ci
mysql> select * from test_yo order by yo;
+----------+----------+
| gen | yo |
+----------+----------+
| ел | ел |
| Ел | Ел |
| ель | ель | <-
| Ель | Ель | <-
| ёлка | ёлка |
| Ёлка | Ёлка |
+----------+----------+
6 rows in set (0,00 sec)

炊烟起了；夕阳下了；细雨来了多调试,交互式编程体验记录，独立思考，对比感谢转载作者修车国产化 read and connect 匍匐前进, 讲故事
相关阅读:
Ansible配置管理Windows主机
 Docker中安装rabbitmq并启用插件
 解决Ubuntu不能直接root用户连接ssh
Error in invoking target ‘agent nmhs’ of makefile ‘/home/dong/tools/oracle11g/product/11.2.0/dbhome_1/sysman/lib/ins_emagent.mk’
Ubuntu16.04安装sqlserver-快速入门
 记录一次归档日志爆满事件
 Linux下Mysql忘记root密码
 Ubuntu16.04安装Rabbitmq
Oracle查看执行计划
 Ubuntu16.04随笔
原文地址：https://www.cnblogs.com/feiyun8616/p/16134205.html

转 mysql 的推荐字符集 和 排序规则的使用介绍

转 mysql 的推荐字符集和排序规则的使用介绍