• PostgreSQL index types and index bloating


    warehouse_db=# create table item (item_id integer not null,item_name text,item_price numeric,item_data text);
    CREATE TABLE
    warehouse_db=# create index item_idx on item(item_id);
    CREATE INDEX

    warehouse_db=# di item_idx
    List of relations
    Schema | Name | Type | Owner | Table
    --------+----------+-------+----------+-------
    public | item_idx | index | postgres | item
    (1 row)
    warehouse_db=# h create index
    Command: CREATE INDEX
    Description: define a new index
    Syntax:
    CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table_name [ USING method ]
    ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
    [ WITH ( storage_parameter = value [, ... ] ) ]
    [ TABLESPACE tablespace_name ]
    [ WHERE predicate ]
    warehouse_db=# di
    List of relations
    Schema | Name | Type | Owner | Table
    --------+--------------------------------+-------+----------+---------------
    public | PRIM_KEY | index | postgres | warehouse_tb1
    public | PRM_KEY | index | postgres | history
    public | cards_card_id_owner_number_key | index | postgres | cards
    public | item_idx | index | postgres | item
    public | item_item_id_idx | index | postgres | item
    public | movies_title_copies_excl | index | postgres | movies
    public | tools_pkey | index | postgres | tools
    (7 rows)

    warehouse_db=# di item_item_id_idx
    List of relations
    Schema | Name | Type | Owner | Table
    --------+------------------+-------+----------+-------
    public | item_item_id_idx | index | postgres | item
    (1 row)

    warehouse_db=# drop index item_item_id_idx ;
    DROP INDEX
    http://www.postgresql.org/docs/9.4/static/sql-createindex.html.
    types of index
    single index
    create index index_name on table_name(column);
    warehouse_db=# create index item_single_index on item (item_id);
    CREATE INDEX
    warehouse_db=# create index item_multi_index on item (item_id,item_price);
    CREATE INDEX
    partial index:creating an index on the subset of the table

    CREATE INDEX index_name ON table_name (column) WHERE (condition);

    warehouse_db=# CREATE INDEX item_partial_index ON item (item_id) WHERE
    (item_id < 106);
    warehouse_db=# d item;
    Table "item"
    Column | Type | Modifiers
    ------------+--------------------+-----------
    item_id | integer | not null
    item_name | text |
    item_price | numeric |
    item_data | text |
    Indexes:
    "item_index" btree (item_id)
    "item_multi_index" btree (item_id, item_price)
    "item_partial_index" btree (item_id) WHERE item_id < 106

    the unique index
    a unique index can be created on any column;it not only creates an index ,but also
    enforces uniqueness of the column.

    warehouse_db=# CREATE UNIQUE INDEX item_unique_idx ON item (item_id);
    CREATE INDEX
    Time: 485.644 ms
    warehouse_db=# d item_unique_idx;
    List of relations
    Schema | Name | Type | Owner | Table
    --------+-----------------+-------+----------+-------
    public | item_unique_idx | index | postgres | item
    (1 row)

    we can create a unique index explicitly using the CREATE UNIQUE
    INDEX command and that it can be created implicitly by declaring a primary key on a table.
    warehouse_db=# create table item
    warehouse_db-# (
    warehouse_db(# item_unique integer primary key,
    warehouse_db(# item_name text,
    warehouse_db(# item_price numeric,
    warehouse_db(# item_data text);
    CREATE TABLE
    warehouse_db=# d item
    Table "public.item"
    Column | Type | Modifiers
    -------------+---------+-----------
    item_unique | integer | not null
    item_name | text |
    item_price | numeric |
    item_data | text |
    Indexes:
    "item_pkey" PRIMARY KEY, btree (item_unique), tablespace "tbs_yl"
    Tablespace: "tbs_yl"

    Here is an example of an implicit creation of a unique index by defining unique
    constraints:
    warehouse_db=# alter table item add constraint primary_key unique(item_unique);
    ALTER TABLE
    warehouse_db=# d item;
    Table "public.item"
    Column | Type | Modifiers
    -------------+---------+-----------
    item_unique | integer | not null
    item_name | text |
    item_price | numeric |
    item_data | text |
    Indexes:
    "item_pkey" PRIMARY KEY, btree (item_unique), tablespace "tbs_yl"
    "primary_key" UNIQUE CONSTRAINT, btree (item_unique), tablespace "tbs_yl"
    Tablespace: "tbs_yl"
    The ALTER command adds a unique constraint to the item_id column and can be used as
    the primary key.

    explicitlycreate a unique index explicitly using the already discussed CREATE INDEX
    command as follows:
    warehouse_db=# create table item(
    warehouse_db(# item_id integer primary key,
    warehouse_db(# item_name text,
    warehouse_db(# item_price numeric,
    warehouse_db(# item_data text);
    CREATE TABLE
    warehouse_db=#
    warehouse_db=# d item
    Table "public.item"
    Column | Type | Modifiers
    ------------+---------+-----------
    item_id | integer | not null
    item_name | text |
    item_price | numeric |
    item_data | text |
    Indexes:
    "item_pkey" PRIMARY KEY, btree (item_id), tablespace "tbs_yl"
    Tablespace: "tbs_yl"
    warehouse_db=# create unique index idx_unique_id on item(item_id);
    CREATE INDEX
    warehouse_db=# d item;
    Table "public.item"
    Column | Type | Modifiers
    ------------+---------+-----------
    item_id | integer | not null
    item_name | text |
    item_price | numeric |
    item_data | text |
    Indexes:
    "item_pkey" PRIMARY KEY, btree (item_id), tablespace "tbs_yl"
    "idx_unique_id" UNIQUE, btree (item_id), tablespace "tbs_yl"
    Tablespace: "tbs_yl"

    warehouse_db=# insert into item values (1,'boxing',200,'glaves');
    INSERT 0 1
    warehouse_db=# insert into item values (1,'hockey',300,'shoes');
    ERROR: duplicate key value violates unique constraint "item_pkey"
    DETAIL: Key (item_id)=(1) already exists.
    warehouse_db=# insert into item values (2,'hockey',300,'shoes');
    INSERT 0 1

    the expression index
    For example, if we want to search for a case-insensitive item name,
    then the normal way of doing this is as follows:
    warehouse_db=# SELECT * FROM item WHERE UPPER(item_name) LIKE 'COFFEE';
    The preceding query will scan each row or table and convert item_name to uppercase and
    compare it with COFFEE; this is really expensive. The following is the command to create
    an expression index on the item_name column:
    warehouse_db=# create index item_expression_index on item(upper(item_name));
    CREATE INDEX
    warehouse_db=# d item;
    Table "public.item"
    Column | Type | Modifiers
    ------------+---------+-----------
    item_id | integer | not null
    item_name | text |
    item_price | numeric |
    item_data | text |
    Indexes:
    "item_pkey" PRIMARY KEY, btree (item_id), tablespace "tbs_yl"
    "idx_unique_id" UNIQUE, btree (item_id), tablespace "tbs_yl"
    "item_expression_index" btree (upper(item_name)), tablespace "tbs_yl"
    Tablespace: "tbs_yl"

    the implicit index
    An index that is created automatically by the database is called an implicit index. The
    primary key or unique constraint implicitly creates an index on that column.

    index
    creation on a table is a very expensive operation, and on a sizeably huge table, it can take
    hours to build an index. This can cause difficulty in regards to performing any write
    operations. To solve this issue, PostgreSQL has the concurrent index, which is useful
    when you need to add indexes in a live database.

    The syntax of a concurrent index is as follows:
    CREATE INDEX CONCURRENTLY index_name ON table_name using btree(column);

    The concurrent index is slower than the normal index because it completes index building
    in two parts. This can be explained with the help of the following example:

    warehouse_db=# CREATE INDEX idx_id ON item (item_id);
    Time: 8265.473 ms
    Time taken in creating a concurrent index idx_id using CREATE INDEX CONCURRENTLY:
    warehouse_db=# CREATE INDEX CONCURRENTLY idx_id ON item (item_id);
    Time: 51887.942 ms

    index types
    PostgreSQL supports the B-tree, hash, GiST, and GIN index methods. The index method
    or type can be selected via the USING method. Different types of indexes have different
    purposes, for example, the B-tree index is effectively used when a query involves the
    range and equality operators and the hash index is effectively used when the equality
    operator is used in a query.
    Here is a simple example of how to use the index types:
    warehouse_db=# CREATE INDEX index_name ON table_name USING btree(column);
    the B-tree index
    The B-tree index is effectively used when a query involves the equality operator (=) and
    range operators (<, <=, >, >=, BETWEEN, and IN).

    the hash index
    Hash indexes are utilized when a query involves simple equivalent operators only. Here,
    we create a hash index on the item table. You can see in the following example that the
    planner chooses the hash index in the case of an equivalent operator and does not utilize
    the hash index in the case of the range operator:
    the hash index is the best for queries that have equivalent operators in the
    WHERE clause. This can be explained with the help of the following example:
    warehouse_db=# EXPLAIN SELECT COUNT(*) FROM item WHERE item_id = 100;
    QUERY PLAN
    ------------------------------------------------------------------
    Aggregate (cost=8.02..8.03 rows=1 width=0)
    -> Index Scan using item_hash_index on item (cost=0.00..8.02 rows=1
    width=0)
    Index Cond: (item_id = 100)
    (3 rows)
    The hash index method is not suitable for range operators, so the planner will not select a
    hash index for range queries:
    warehouse_db=# EXPLAIN SELECT COUNT(*) FROM item WHERE item_id > 100;
    QUERY PLAN
    ------------------------------------------------------------------
    Aggregate (cost=25258.75..25258.76 rows=1 width=0)
    -> Seq Scan on item (cost=0.00..22759.00 rows=999900 width=0)
    Filter: (item_id > 100)
    (3 rows)

    To get the size of a table and an index, we can use the following:
    SELECT pg_relation_size('table_name')
    AS table_size,pg_relation_size('index_name') index_size
    FROM pg_tables WHERE table_name like 'table_name';
    the GiST index
    The Generalized Search Tree (GiST) index provides the possibility to create custom
    data types with indexed access methods. It additionally provides an extensive set of
    queries.
    It can be utilized for operations beyond equivalent and range comparisons. The GiST
    index is lossy, which means that it can create incorrect matches.
    The syntax of the GiST index is as follows:
    warehouse_db=# CREATE INDEX index_name ON table_name USING
    gist(column_name);

    the GIN index
    “GIN stands for Generalized Inverted Index. GIN is designed for handling cases
    where the items to be indexed are composite values, and the queries to be handled by
    the index need to search for element values that appear within the composite items.
    For example, the items could be documents, and the queries could be searches for
    documents containing specific words”

    Here is the syntax for the creation of a GIN index:
    warehouse_db=# CREATE INDEX index_name ON table_name USING
    gin(column_name);
    The GIN index requires three times more space than GiST, but is three times faster than
    GiST.
    warehouse_db=# create extension pg_trgm ;
    CREATE EXTENSION
    Time: 117.645 ms
    warehouse_db=# create table words(lineno int,simple_words text,special_words text);
    CREATE TABLE
    Time: 32.913 ms
    warehouse_db=# insert into words values (generate_series(1,2000000),md5(random()::text),md5(random()::text));
    INSERT 0 2000000
    Time: 18268.619 ms
    warehouse_db=# select count(*) from words where simple_words like '%a31%' and special_words like '%a31%';
    count
    -------
    115
    (1 row)

    Time: 669.342 ms
    warehouse_db=# create index words_idx on words (simple_words,special_words);
    CREATE INDEX
    Time: 22136.229 ms
    warehouse_db=# select count(*) from words where simple_words like '%a31%' and special_words like '%a31%';
    count
    -------
    115
    (1 row)

    Time: 658.988 ms
    warehouse_db=# create index words_idx on words using gin(simple_words gin_trgm_ops,special_words gin_trgm_ops);
    ERROR: relation "words_idx" already exists
    Time: 0.952 ms
    warehouse_db=# drop index words_idx ;
    DROP INDEX
    Time: 75.698 ms
    warehouse_db=# create index words_idx on words using gin(simple_words gin_trgm_ops,special_words gin_trgm_ops);
    CREATE INDEX
    Time: 271499.350 ms
    warehouse_db=# select count(*) from words where simple_words like '%a31%' and special_words like '%a31%';
    count
    -------
    115
    (1 row)

    Time: 10.260 ms

    http://www.sai.msu.su/~megera/wiki/Gin
    http://www.postgresql.org/docs/9.4/static/pgtrgm.html

    index bloating

    As the architecture of PostgreSQL is based on MVCC, tables have the difficulty of dead
    rows. Rows that are not visible to any transaction are considered dead rows. In a
    continuous table, some rows are deleted or updated. These operations cause dead space in
    a table. Dead space can potentially be reused when new data is inserted. Due to a lot of
    dead rows, bloating occurs. There are various reasons for index bloating, and it needs to
    be fixed to achieve more performance, because it hurts the performance of the database.
    AUTO VACUUM is the best obviation from bloating, but it is a configurable parameter and can
    be incapacitated or erroneously configured. There are multiple ways to fix index bloating;
    To know more about MVCC, check out
    http://www.postgresql.org/docs/current/static/mvcc-intro.html

    dump and restore
    In the case of bloating, the simplest way of prevention is to back up the table utilizing
    pg_dump, drop the table, and reload the data into the initial table. This is an expensive
    operation and sometimes seems too restrictive.

    VCUUM
    Vacuuming the table using the VACUUM command is another solution that can be used to fix
    the bloat. The VACUUM command reshuffles the rows to ensure that the page is as full as
    possible, but database file shrinking only happens when there are 100 percent empty pages
    at the end of the file. This is the only case where VACUUM is useful to reduce the bloat. Its
    syntax is as follows:
    VACUUM table_name
    The following example shows the usage of VACUUM on the item table:
    warehouse_db=# VACUUM item;
    The other way of using VACUUM is as follows:
    warehouse_db=# VACUUM FULL item;


    CLUSTER
    As we discussed previously, rewriting and reordering of rows can fix the issue that can be
    indirectly achieved using dump/restore, but this is an expensive operation. The other way
    to do this is the CLUSTER command, which is used to physically reorder rows based on the
    index. The CLUSTER command is used to create a whole initial copy of the table and the old
    copy of the data is dropped. The CLUSTER command requires enough space, virtually twice
    the disk space, to hold the initial organized copy of the data. Its syntax is as follows:
    CLUSTER table_name USING index_name
    As we discussed previously, rewriting and reordering of rows can fix the issue that can be
    indirectly achieved using dump/restore, but this is an expensive operation. The other way
    to do this is the CLUSTER command, which is used to physically reorder rows based on the
    index. The CLUSTER command is used to create a whole initial copy of the table and the old
    copy of the data is dropped. The CLUSTER command requires enough space, virtually twice
    the disk space, to hold the initial organized copy of the data. Its syntax is as follows:
    CLUSTER table_name USING index_name

    Reindexing
    If an index becomes inefficient due to bloating or data becomes randomly scattered, then
    reindexing is required to get the maximum performance from the index. Its syntax is as
    follows:
    warehouse_db=# REINDEX TABLE item;

    points to ponder
    When using an index, you need to keep in mind the following things:
    It will make sense to index a table column when you have a handsome number of
    rows in a table.
    When retrieving data, you need to make sure that good candidates for an index are
    foreign keys and keys where min() and max () can be used when retrieving data.
    This means column selectivity is very important to index effectively.
    Don’t forget to remove unused indexes for better performance. Also, perform
    REINDEX on all indexes once a month to clean up the dead tuples.
    Use table partitioning along with an index if you have large amounts of data.
    When you are indexing columns with null values, consider using a conditional index
    with WHERE column_name IS NOT NULL.

  • 相关阅读:
    第一、二章读书笔记
    # 学号 20191221 《Python程序设计》实验一报告
    20191221实验四实验报告
    学号:20191221,《python实验设计》实验报告三
    20191221实验二报告
    快速浏览教材遇到
    何应霆 20191221
    2019-2020-1 20191319 《信息安全专业导论》第3周学习总结
    2019-2020-1 20191319《信息安全专业导论》第二周学习总结
    师生关系
  • 原文地址:https://www.cnblogs.com/songyuejie/p/4885435.html
Copyright © 2020-2023  润新知