• utf8mb4的大小写敏感性测试及其修改方法


    utf8mb4的大小写敏感性测试及其修改方法

     utf8mb4_ unicode_ ci 与 utf8mb4_ general_ ci 如何选择
    字符除了需要存储,还需要排序或比较大小,涉及到与编码字符集对应的 排序字符集(collation)。ut8mb4对应的排序字符集常用的有 utf8mb4_unicode_ci 、 utf8mb4_general_ci ,到底采用哪个在 stackoverflow 上有个讨论, What’s the difference between utf8_general_ci and utf8_unicode_ci
    主要从排序准确性和性能两方面看:
        准确性
        utf8mb4_unicode_ci 是基于标准的Unicode来排序和比较,能够在各种语言之间精确排序
        utf8mb4_general_ci 没有实现Unicode排序规则,在遇到某些特殊语言或字符是,排序结果可能不是所期望的。
        但是在绝大多数情况下,这种特殊字符的顺序一定要那么精确吗。比如Unicode把 ? 、 ? 当成 ss 和 OE 来看;而general会把它们当成 s 、 e ,再如 àá??ā? 各自都与  A 相等。
        性能
        utf8mb4_general_ci 在比较和排序的时候更快
        utf8mb4_unicode_ci 在特殊情况下,Unicode排序规则为了能够处理特殊字符的情况,实现了略微复杂的排序算法。
        但是在绝大多数情况下,不会发生此类复杂比较。general理论上比Unicode可能快些,但相比现在的CPU来说,它远远不足以成为考虑性能的因素,索引涉及、SQL设计才是。 我个人推荐是 utf8mb4_unicode_ci ,将来 8.0 里也极有可能使用变为默认的规则。

    # 测试utf8mb4的大小写敏感性及其修改方法
    
    -- 以下是utf8mb4不区分大小写
    # 修改数据库:  
    ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci;  
    # 修改表:  
    ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;  
    # 修改表字段:  
    ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL;  
    
    -- 以下是utf8mb4区分大小写
    # 修改数据库:  
    ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_bin;  
    # 修改表:  
    ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;  
    # 修改表字段:  
    ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL ; 
    -- 1、删除库 drop database if exists db2020; 
    mysql> drop database if exists db2020; 
    Query OK, 0 rows affected, 1 warning (0.00 sec) 
    -- 2、创建字符集为utf8的库 create database db2020 DEFAULT CHARACTER SET utf8mb4; 
    mysql> create database db2020 DEFAULT CHARACTER SET utf8mb4; 
    Query OK, 1 row affected (0.00 sec) 
    -- 3、查看建库语句 show create database db2020; 
    mysql> show create database db2020; 
    +----------+--------------------------------------------------------------------+ 
    | Database | Create Database                                                     | 
    +----------+--------------------------------------------------------------------+ 
    | db2020   | CREATE DATABASE `db2020` /*!40100 DEFAULT CHARACTER SET utf8mb4 */ | 
    +----------+--------------------------------------------------------------------+ 
    1 row in set (0.00 sec) 
    -- 4、创建测试表和数据 use db2020; 
    -- drop table if exists tbl_test ; 
    create table tbl_test ( 
    id bigint(20) NOT NULL AUTO_INCREMENT, 
    name varchar(20) NOT NULL, 
    PRIMARY KEY (id), 
    KEY idx_name (name) 
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ; 
    
    -- 5、查看创建表的语句 
    -- use db2020; 
    show create table tbl_testG; 
    mysql> show create table tbl_testG; 
    *************************** 1. row ***************************
           Table: tbl_test
    Create Table: CREATE TABLE `tbl_test` (
      `id` bigint(20) NOT NULL AUTO_INCREMENT,
      `name` varchar(20) NOT NULL,
      PRIMARY KEY (`id`),
      KEY `idx_name` (`name`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
    1 row in set (0.05 sec)
    
    ERROR: 
    No query specified
    -- 6、查看默认字符集 
    -- 方法1、show variables like '%character%'; 
    mysql> show variables like '%character%';
    +--------------------------+----------------------------------------------------------------+
    | Variable_name            | Value                                                          |
    +--------------------------+----------------------------------------------------------------+
    | character_set_client     | utf8                                                           |
    | character_set_connection | utf8                                                           |
    | character_set_database   | utf8mb4                                                        |
    | character_set_filesystem | binary                                                         |
    | character_set_results    | utf8                                                           |
    | character_set_server     | utf8mb4                                                        |
    | character_set_system     | utf8                                                           |
    | character_sets_dir       | /opt/mysql/mysql-5.6.43-linux-glibc2.12-x86_64/share/charsets/ |
    +--------------------------+----------------------------------------------------------------+
    8 rows in set (0.00 sec)
    
    -- 方法2、show variables like 'collation%'; 
    mysql> show variables like 'collation%';
    +----------------------+--------------------+
    | Variable_name        | Value              |
    +----------------------+--------------------+
    | collation_connection | utf8_general_ci    |
    | collation_database   | utf8mb4_general_ci |
    | collation_server     | utf8mb4_general_ci |
    +----------------------+--------------------+
    3 rows in set (0.00 sec)
    
    -- 8、查看默认校对规则 show collation like 'utf8mb4%'; 
    mysql> show collation like 'utf8mb4%';
    +------------------------+---------+-----+---------+----------+---------+
    | Collation              | Charset | Id  | Default | Compiled | Sortlen |
    +------------------------+---------+-----+---------+----------+---------+
    | utf8mb4_general_ci     | utf8mb4 |  45 | Yes     | Yes      |       1 |
    | utf8mb4_bin            | utf8mb4 |  46 |         | Yes      |       1 |
    | utf8mb4_unicode_ci     | utf8mb4 | 224 |         | Yes      |       8 |
    ......
    ......
    +------------------------+---------+-----+---------+----------+---------+
    26 rows in set (0.52 sec)
    -- 9、插入测试数据 
    -- use db2020; 
    insert into tbl_test(name) values('aaa'); 
    insert into tbl_test(name) values('bbb'); 
    insert into tbl_test(name) values('AAA'); 
    insert into tbl_test(name) values('BBB'); 
    mysql> select * from tbl_test; 
    +----+------+
    | id | name |
    +----+------+
    |  1 | aaa  |
    |  3 | AAA  |
    |  2 | bbb  |
    |  4 | BBB  |
    +----+------+
    4 rows in set (0.08 sec)
    
    mysql>  select * from tbl_test where name='aaa'; 
    +----+------+
    | id | name |
    +----+------+
    |  1 | aaa  |
    |  3 | AAA  |
    +----+------+
    2 rows in set (0.04 sec)
    
    use db2020; 
    insert into tbl_test(name) values('aaa'); 
    insert into tbl_test(name) values('bbb'); 
    insert into tbl_test(name) values('AAA'); 
    insert into tbl_test(name) values('BBB'); 
    mysql> select * from tbl_test; 
    +----+------+ 
    | id | name | 
    +----+------+ 
    | 1 | aaa | 
    | 3 | AAA | 
    | 2 | bbb | 
    | 4 | BBB | 
    +----+------+ 
    4 rows in set (0.00 sec) 
    mysql> select * from tbl_test where name='aaa';
    +----+------+ 
    | id | name | 
    +----+------+ 
    | 1 | aaa | 
    | 3 | AAA | 
    +----+------+ 
    2 rows in set (0.00 sec) 
    -- 10、默认情况下,不区分大小写,修改成大小写敏感 
    -- alter database db2020 character set=utf8mb4; 
    alter database db2020 character set=utf8mb4 collate=utf8mb4_bin; 
    -- alter table tbl_test convert to character set utf8mb4 ; 
    alter table tbl_test convert to character set utf8mb4 collate utf8mb4_bin; 
    -- 只修改这个即可实现区分大小写 
    -- alter table tbl_test change name name varchar(20) character set utf8mb4 collate utf8mb4_general_ci not null; 
    -- alter table tbl_test modify name varchar(20) character set utf8mb4 collate utf8mb4_general_ci not null; 
    alter table tbl_test change name name varchar(20) character set utf8mb4 collate utf8mb4_bin not null; 
    alter table tbl_test modify name varchar(20) character set utf8mb4 collate utf8mb4_bin not null; 
    mysql> alter database db2020 character set=utf8mb4 collate=utf8mb4_bin; 
    Query OK, 1 row affected (0.00 sec) 
    mysql> show create database db2020; 
    +----------+----------------------------------------------------------------------------------------+ 
    | Database | Create Database | 
    +----------+----------------------------------------------------------------------------------------+ 
    | db2020 | CREATE DATABASE `db2020` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin */ | 
    +----------+----------------------------------------------------------------------------------------+ 
    1 row in set (0.00 sec) 
    mysql> select * from tbl_test where name='aaa'; 
    +----+------+ 
    | id | name | 
    +----+------+ 
    | 1 | aaa | 
    | 3 | AAA | 
    +----+------+ 
    2 rows in set (0.00 sec) 
    -- 此时只修改库级别的还不行,仍然还需要修改表级别的 
    mysql> alter table tbl_test convert to character set utf8mb4 collate utf8mb4_bin; 
    Query OK, 4 rows affected (0.08 sec) Records: 4 Duplicates: 0 Warnings: 0 
    mysql> select * from tbl_test where name='aaa'; 
    +----+------+ 
    | id | name | 
    +----+------+ 
    | 1 | aaa | 
    +----+------+ 
    1 row in set (0.00 sec) 
    -- 附录 修改MySQL配置文件,新增如下参数: 
    [client] 
    default-character-set = utf8mb4 
    
    [mysql] 
    default-character-set = utf8mb4 
    
    [mysqld] 
    character-set-client-handshake = FALSE 
    character-set-server = utf8mb4 
    collation-server = utf8mb4_unicode_ci 
    init_connect='SET NAMES utf8mb4'
    
     
    
     
  • 相关阅读:
    【bzoj1704】[Usaco2007 Mar]Face The Right Way 自动转身机 贪心
    【poj2104】K-th Number 主席树
    【bzoj3772】精神污染 STL+LCA+主席树
    【bzoj3932】[CQOI2015]任务查询系统 离散化+主席树
    【bzoj3545/bzoj3551】[ONTAK2010]Peaks/加强版 Kruskal+树上倍增+Dfs序+主席树
    【bzoj3524】[Poi2014]Couriers 主席树
    【bzoj2223】[Coci 2009]PATULJCI 主席树
    【bzoj2588】Spoj 10628. Count on a tree 离散化+主席树
    【bzoj1901】Zju2112 Dynamic Rankings 离散化+主席树+树状数组
    【bzoj1552】[Cerc2007]robotic sort Splay
  • 原文地址:https://www.cnblogs.com/bjx2020/p/10224985.html
Copyright © 2020-2023  润新知