一、位图索引
我将使用一个例子,来描述位图索引的存储,并分析它的优点。
Table :Loans 放贷信息
ID | userId | 行业投向 | 币种 | 证件类型 | 还本付息方式 | 状态 |
1 | 1 | 农业 | 人民币 | 身份证 | 等额本息还款法 | 已上报 |
2 | 2 | 农业 | 人民币 | 身份证 | 等本还款法 | 未上报 |
3 | 1 | 工业 | 人民币 | 护照 | 按季计息到期还本法 | 已上报 |
4 | 2 | 个体 | 人民币 | 身份证 | 等本还款法 | 已上报 |
5 | 5 | 其他 | 人民币 | 身份证 | 按月计息到期还本法 | 未上报 |
我对行业投向,和还本付息方式添加了位图索引
create bitmap index index_投向 on loans(行业投向);
create bitmap index index_还本付息方式 on loans(还本付息方式);
那么它会这么对位图索引进行存储:当前列的每一种值,存放在一个块中,通过0和1来标示改rownumber是否存在改值。
行业投向位图索引/还本付息方式
值/行 | 第一行 | 第二行 | 第三行 | 第四行 |
农业 | 1 | 1 | 0 | 0 |
工业 | 0 | 0 | 1 | 0 |
个体 | 0 | 0 | 0 | 1 |
其他 | 0 | 0 | 0 | 0 |
值/行 | 第一行 | 第二行 | 第三行 | 第四行 |
等额本息还款法 | 1 | 0 | 0 | 0 |
等本还款法 | 0 | 1 | 0 | 0 |
按季计息到期还本法 | 0 | 0 | 1 | 0 |
按月计息到期还本法 | 0 | 0 | 0 | 1 |
有图可以看出, 农业、工业、个体都各以一个块来存放 所有列“自己是否为真”。
所以暂时可以得出:
1、位图索引,必须创建在“仅仅几种值的情况”。
如果在低重复度的列上创建位图索引是很恐怖的,他将创建N多个块来存储。不论创建,还是查询,都是不聪明的。
2、位图索引,不适合放在常修改的字段列(如状态列)容易发生死锁。
位图索引死锁情况举例
--SESSION 1(持有者) DELETE FROM LOANS WHERE 行业投向='农业' AND status=1; ---SESSION 2(其他会话) 插入带'农业'的记录就立即被阻挡,以下三条语句都会被阻止 insert into loans(Id,投向.....) values (1,'农业',....); update t set 投向='工业' WHERE id=25; delete from loans WHERE 行业投向='农业'; --以下是可以进行不受阻碍的 insert into loans(Id,投向.....) values (1,'工业',....); delete from t where gender='工业' ; UPDATE T SET status='aa' WHERE ROWID NOT IN ( SELECT ROWID FROM T WHERE 投向='工业' ) ; --update只要不更新位图索引所在的列即可
3、索引通过 比特位 存储01,来标示真假,占用内存很小,检索效率极高。
count(*) where 行业投向 = 农业,效率是很高的,
当采集平台完成这些金融数据采集后,金融监管部门要对信息进行分析、统计,形成报表。有位图索引效率是很好的。
具体案例
1 /* 2 总结:本质原因:其实就是位图索引存放的是0,1的比特位,占字节数特别少。 3 */ 4 5 --位图索引跟踪前准备 6 drop table t purge; 7 set autotrace off 8 create table t as select * from dba_objects; 9 insert into t select * from t; 10 insert into t select * from t; 11 insert into t select * from t; 12 insert into t select * from t; 13 insert into t select * from t; 14 insert into t select * from t; 15 update t set object_id=rownum; 16 commit; 17 18 --观察COUNT(*)全表扫描的代价 19 set autotrace on 20 set linesize 1000 21 select count(*) from t; 22 23 24 25 ------------------------------------------ 26 COUNT(*) 27 ---------- 28 4684992 29 执行计划 30 ---------------------------------------------------------- 31 Plan hash value: 2966233522 32 33 ------------------------------------------------------------------- 34 | Id | Operation | Name | Rows | Cost (%CPU)| Time | 35 ------------------------------------------------------------------- 36 | 0 | SELECT STATEMENT | | 1 | 20420 (11)| 00:04:06 | 37 | 1 | SORT AGGREGATE | | 1 | | | 38 | 2 | TABLE ACCESS FULL| T | 294M| 20420 (11)| 00:04:06 | 39 ------------------------------------------------------------------- 40 统计信息 41 ---------------------------------------------------------- 42 0 recursive calls 43 0 db block gets 44 66731 consistent gets 45 0 physical reads 46 0 redo size 47 426 bytes sent via SQL*Net to client 48 415 bytes received via SQL*Net from client 49 2 SQL*Net roundtrips to/from client 50 0 sorts (memory) 51 0 sorts (disk) 52 1 rows processed 53 54 55 56 57 58 --观察COUNT(*)用普通索引的代价 59 create index idx_t_obj on t(object_id); 60 alter table T modify object_id not null; 61 set autotrace on 62 select count(*) from t; 63 64 65 66 67 68 COUNT(*) 69 ---------- 70 4684992 71 普通索引的执行计划 72 --------------------------------------------------------------------------- 73 | Id | Operation | Name | Rows | Cost (%CPU)| Time | 74 --------------------------------------------------------------------------- 75 | 0 | SELECT STATEMENT | | 1 | 3047 (2)| 00:00:37 | 76 | 1 | SORT AGGREGATE | | 1 | | | 77 | 2 | INDEX FAST FULL SCAN| IDX_T_OBJ | 4620K| 3047 (2)| 00:00:37 | 78 --------------------------------------------------------------------------- 79 普通索引的统计信息 80 ---------------------------------------------------------- 81 0 recursive calls 82 0 db block gets 83 10998 consistent gets 84 0 physical reads 85 0 redo size 86 426 bytes sent via SQL*Net to client 87 415 bytes received via SQL*Net from client 88 2 SQL*Net roundtrips to/from client 89 0 sorts (memory) 90 0 sorts (disk) 91 1 rows processed 92 93 94 95 96 --观察COUNT(*)用位图索引的代价(注意,这里我们特意取了status这个重复度很高的列做索引) 97 create bitmap index idx_bitm_t_status on t(status); 98 select count(*) from t; 99 100 SQL> select count(*) from t; 101 102 103 104 105 106 107 COUNT(*) 108 ---------- 109 4684992 110 111 位图索引的执行计划 112 ------------------------------------------------------------------------------------------- 113 | Id | Operation | Name | Rows | Cost (%CPU)| Time | 114 ------------------------------------------------------------------------------------------- 115 | 0 | SELECT STATEMENT | | 1 | 115 (0)| 00:00:02 | 116 | 1 | SORT AGGREGATE | | 1 | | | 117 | 2 | BITMAP CONVERSION COUNT | | 4620K| 115 (0)| 00:00:02 | 118 | 3 | BITMAP INDEX FAST FULL SCAN| IDX_BITM_T_STATUS | | | | 119 ------------------------------------------------------------------------------------------- 120 位图索引的统计信息 121 ---------------------------------------------------------- 122 0 recursive calls 123 0 db block gets 124 125 consistent gets 125 0 physical reads 126 0 redo size 127 426 bytes sent via SQL*Net to client 128 415 bytes received via SQL*Net from client 129 2 SQL*Net roundtrips to/from client 130 0 sorts (memory) 131 0 sorts (disk) 132 1 rows processed 133 134
二、反向索引
假如 我现在有些ID 100001,100002,100003,100004,100005 ,那么反向索引 ,他的索引创建的就是 100001,200001,300001,400001,500001。 由于序列本身有序,会根据范围放在不同的叶子块中
详见:索引,组合索引篇 那么索引就被放在不同的快中,有效的减少了热快争用。
再看一下这张图, 最下面就是叶子块 ,100001 和200001 和300001 会放在不同的块中,而一般常常会频繁的访问近期的数据,那么由于他们在不同的块中,在索引进行检索的时候,能够有效的减少资源竞争。
创建反向索引的sql
---反向索引 create index rev_index on t(column) reverse; ---将反向索引转换成普通索引。 alter index rev_index rebuild noreverse;
2、反向索引,在进行范围查询的时候无效,
3、反向索引无序了,所以无法走索引排序,
三、函数索引
我们现在,有一个场景:有一列数据是有大小写的,但是查询的时候,不需要区分大小写。
那么语句只能这么写 select * from t where upper(object_name)='T' ;
首先有一个常识,就是 走了函数查询,不会走索引。 就像有些查询 列的类型与值类型不匹,会进行值类型函数转换,然后无法进行索引查询。
eg: id为varchar类型 而查询语句为:select * from t where id = 1。 由于数据字段为varchar类型,而参数为number 类型,故会进行值类型转换。检索就走了全表扫描。
那么如何实现场景需求呢? 只能让函数索引一展身手:
create index idx_func_ojbnam on t( upper(object_name) );
upper()是Oracle内部函数
现在 select * from t where upper(object_name)='T' ; 这条语句就能走上索引。
请看具体案例
1 --测函数索引前准备 2 drop table t purge; 3 create table t as select * from dba_objects; 4 create index idx_object_id on t(object_id); 5 create index idx_object_name on t(object_name); 6 create index idx_created on t(created); 7 8 9 10 11 --对列做UPPER操作,无法用到索引 12 set autotrace traceonly 13 set linesize 1000 14 ---以下语句由于列运算,所以走的是全表扫描 15 select * from t where upper(object_name)='T' ; 16 执行计划 17 -------------------------------------------------------------------------- 18 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 19 -------------------------------------------------------------------------- 20 | 0 | SELECT STATEMENT | | 12 | 2484 | 293 (1)| 00:00:04 | 21 |* 1 | TABLE ACCESS FULL| T | 12 | 2484 | 293 (1)| 00:00:04 | 22 -------------------------------------------------------------------------- 23 统计信息 24 ---------------------------------------------------------- 25 0 recursive calls 26 0 db block gets 27 1049 consistent gets 28 0 physical reads 29 0 redo size 30 1500 bytes sent via SQL*Net to client 31 415 bytes received via SQL*Net from client 32 2 SQL*Net roundtrips to/from client 33 0 sorts (memory) 34 0 sorts (disk) 35 2 rows processed 36 37 --去掉列的UPPER操作后立即用索引 38 select * from t where object_name='T' ; 39 执行计划 40 ---------------------------------------------------------- 41 Plan hash value: 1138138579 42 43 ----------------------------------------------------------------------------------------------- 44 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 45 ----------------------------------------------------------------------------------------------- 46 | 0 | SELECT STATEMENT | | 2 | 414 | 4 (0)| 00:00:01 | 47 | 1 | TABLE ACCESS BY INDEX ROWID| T | 2 | 414 | 4 (0)| 00:00:01 | 48 |* 2 | INDEX RANGE SCAN | IDX_OBJECT_NAME | 2 | | 3 (0)| 00:00:01 | 49 ----------------------------------------------------------------------------------------------- 50 统计信息 51 ---------------------------------------------------------- 52 0 recursive calls 53 0 db block gets 54 6 consistent gets 55 0 physical reads 56 0 redo size 57 1506 bytes sent via SQL*Net to client 58 415 bytes received via SQL*Net from client 59 2 SQL*Net roundtrips to/from client 60 0 sorts (memory) 61 0 sorts (disk) 62 2 rows processed 63 64 65 --如果必须用upper的条件,那你想用到索引,就得去建函数索引 66 create index idx_func_ojbnam on t(upper(object_name)); 67 --继续执行,终于走索引了。 68 69 select * from t where upper(object_name)='T' ; 70 执行计划 71 ----------------------------------------------------------------------------------------------- 72 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 73 ----------------------------------------------------------------------------------------------- 74 | 0 | SELECT STATEMENT | | 775 | 206K| 152 (0)| 00:00:02 | 75 | 1 | TABLE ACCESS BY INDEX ROWID| T | 775 | 206K| 152 (0)| 00:00:02 | 76 |* 2 | INDEX RANGE SCAN | IDX_FUNC_OJBNAM | 310 | | 3 (0)| 00:00:01 | 77 ----------------------------------------------------------------------------------------------- 78 统计信息 79 ---------------------------------------------------------- 80 0 recursive calls 81 0 db block gets 82 6 consistent gets 83 0 physical reads 84 0 redo size 85 1500 bytes sent via SQL*Net to client 86 415 bytes received via SQL*Net from client 87 2 SQL*Net roundtrips to/from client 88 0 sorts (memory) 89 0 sorts (disk) 90 2 rows processed 91 92 93 94 95 96 97
1 /* 2 结论:什么类型就放什么值,否则会发生类型转换,导致系能问题! 3 (是存放字符的字段就设varchar2类型,是存放数值的字段就设置number类型,是存放日期的字段就设置date类型) 4 这里的案例宏中 5 select * from t_col_type where id=6; 用不到索引,要改成select * from t_col_type where id='6'; 6 如果送来的参数无法保证是'6',只能写成select * from t_col_type where to_number(id)=6;并且建to_number(id)的函数索引 7 方可,这是很无奈的事。 8 9 */ 10 11 12 --举例说明: 13 drop table t_col_type purge; 14 create table t_col_type(id varchar2(20),col2 varchar2(20),col3 varchar2(20)); 15 insert into t_col_type select rownum,'abc','efg' from dual connect by level<=10000; 16 commit; 17 create index idx_id on t_col_type(id); 18 set linesize 1000 19 set autotrace traceonly 20 21 select * from t_col_type where id=6; 22 23 执行计划 24 -------------------------------------------------------------------------------- 25 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 26 -------------------------------------------------------------------------------- 27 | 0 | SELECT STATEMENT | | 1 | 36 | 9 (0)| 00:00:01 | 28 |* 1 | TABLE ACCESS FULL| T_COL_TYPE | 1 | 36 | 9 (0)| 00:00:01 | 29 -------------------------------------------------------------------------------- 30 1 - filter(TO_NUMBER("ID")=6) 31 统计信息 32 ---------------------------------------------------------- 33 0 recursive calls 34 0 db block gets 35 32 consistent gets 36 0 physical reads 37 0 redo size 38 540 bytes sent via SQL*Net to client 39 415 bytes received via SQL*Net from client 40 2 SQL*Net roundtrips to/from client 41 0 sorts (memory) 42 0 sorts (disk) 43 1 rows processed 44 45 --实际上只有如下写法才可以用到索引,这个很不应该,是什么类型的取值就设置什么样的字段。 46 47 select * from t_col_type where id='6'; 48 执行计划 49 ------------------------------------------------------------------------------------------ 50 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 51 ------------------------------------------------------------------------------------------ 52 | 0 | SELECT STATEMENT | | 1 | 36 | 2 (0)| 00:00:01 | 53 | 1 | TABLE ACCESS BY INDEX ROWID| T_COL_TYPE | 1 | 36 | 2 (0)| 00:00:01 | 54 |* 2 | INDEX RANGE SCAN | IDX_ID | 1 | | 1 (0)| 00:00:01 | 55 ------------------------------------------------------------------------------------------ 56 2 - access("ID"='6') 57 统计信息 58 ---------------------------------------------------------- 59 0 recursive calls 60 0 db block gets 61 4 consistent gets 62 0 physical reads 63 0544 bytes sent via SQL*Net to client 65 415 bytes received via SQL*Net from client 66 2 SQL*Net roundtrips to/to/from client 67 0 sorts (memory) 68 0 sorts (disk) 69 1 rows processed 70 71 create index idx_func_tonumber_id on t_col_type(to_number(id)); 72 select * from t_col_type where to_number(id)=6; 73lt;span style="color: #008080;">75 ---------------------------------------------------------------------------------------------------- 76 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 77t;>| Time | 77 ---------------------------------------------------------------------------------------------------- 78 | 0 | SELECT STATEMENT | | 100 | 4900 | 2 (0)| 00:00:01 | 79 | 1 | TABLE ACCESS BY INDEX ROWID| T_COL_TYPE | 100 | 4900 | 2 (0)| 00:00:01 | 80 |* 2 | INDEX RANGE SCAN | IDX_FUNC_TONUMBER_ID | 40 | | 1 (0)| 00:00:01 | 81 ---------------------------------------------------------------------------------------------------- 82 2 - access(TO_NUMBER("ID")=6) 83 统计信息 84 ---------------------------------------------------------- 85 0 recursive calls 86 0 db block gets 87 4 consistent gets 88 0 physical reads 89 0 redo size 90 540 bytes sent via SQL*Net to client 91 416 bytes received via SQL*Net from client 92 2 SQL*Net roundtrips to/from client 93 0 sorts (memory) 94 0 sorts (disk) 95 1 rows processed
2、自定义函数索引
首先创建一个自定义函数让 id-1 的形式创建序列,当然没什么意义。
create or replace function f_minus1(i int) return int DETERMINISTIC is begin return(i-1); end;
create index idx_test on test (f_minus1(object_id));
DETERMINISTIC关键字很重要
四、全文检索
干、、写了四个小时,Google 浏览器崩溃了。