Notes on <High Performance MySQL> Ch3: Schema Optimization and Indexing

Notes on <High Performance MySQL> Ch3: Schema Optimization and Indexing

Choosing Optimal Data Types

Smaller is usually better.

Simple is good.

For example, integers are cheaper to compare than characters.

Avoid NULL is possible.

You should define fields as NOT NULL whenever you can. A nullable column uses more storage space and requires special processing inside MySQL. When a nulla ble column is indexed, it requires an extra byte per entry and can even cause a fixed-size index (such as an index on a single integer column) to be converted to a variable-sized one in MyISAM. If you’re planning to index columns, avoid making them nullable if possible.

Whole Numbers

TINYINT(8 bits), SMALLINT(16 bits), MEDIUMINT(24 bits), INT(32 bits), or BIGINT(64 bits)

Signed and unsigned types use the same amount of storage space and have the same performance, so use whatever’s best for your data range.

Real Numbers

The FLOAT and DOUBLE types support approximate calculations with standard floating math.

The DECIMAL type is for storing exact fractional numbers. In MySQL 5.0 and newer, the DECIMAL type supports exact math.

The server itself performs DECIMAL math in MySQL 5.0 and newer, because CPUs don’t support the computations directly. Floating-point math is somewhat faster, because the CPU performs the computations natively.

Because of the additional space requirements and computational cost, you should use DECIMAL only when you need exact results for fractional numbers -- for example, when storing financial data.

String Types

Since MySQL 4.1, each string column can have its own character set and set of sorting rules for that character set, or collation. This can impact performance greatly.

VARCHAR and CHAR

VARCHAR

                VARCHAR uses 1 or 2 extra bytes to record the value’s length.

                In version 5.0 and newer, MySQL preserves trailing spaces when you store and retrieve values. In version 4.1 and older, MySQL strips trailing spaces.

CHAR

                CHAR is fixed-length: MySQL always allocates enough space for the specified number of characters. When storing a CHAR value, MySQL removes any trailing spaces. Values are padded with spaces as needed for comparisons.

                For very short columns, CHAR is more efficient than VARCHAR; a CHAR(1) designed to hold only Y and N values will use only one byte in a single character set, but a VARCHAR(1) would use two bytes because of the length byte.

The sibling types for CHAR and VARCHAR are BINARY and VARBINARY, which store binary strings. These types are useful when you need to store binary data and want MySQL to compare the values as bytes instead of characters. MySQL literally compares BINARY strings one byte at a time, according to the numeric value of each byte. As a result, binary comparisons can be much simpler than character compilations, so they are faster.

Notes: Generosity Can Be Unwise

Storing the value 'hello' requires the same amount of space in a VARCHAR(5) and a VARCHAR(200) column. Is there any advantage to using the shorter column? As it turns out, there is a big advantage. The larger column can use much more memory, because MySQL often allocates fixed-size chunks of memory to hold values internally. This is especially bad for sorting or operations that use in-memory temporary tables. The same thing happens with filesorts that use on-disk temporary tables. The best strategy is to allocate only as much space as you really need.

BLOB and TEXT

TINYTEXT, SMALLTEXT, TEXT, MEDIUMTEXT, and LONGTEXT

TINYBLOB, SMALLBLOB, BLOB, MEDIUMBLOB, and LONGBLOB

BLOB is a synonym for SMALLBLOB, and TEXT is a synonym for SMALLTEXT.

MySQL handles each BLOB and TEXT value as an object within its own identity. The only difference between BLOB and TEXT families is that BLOB types store binary data with no collation or character set, but TEXT types have a character set and collation.

MySQL sorts BLOB and TEXT columns differently from other types: instead of sorting the full length of the string, it sorts only the first max_sort_length bytes of such columns. MySQL cannot index the full length of these data types and can’t use the indexes for sorting.

Using ENUM instead of a string type

An ENUM column can store up to 65,535 distinct values. MySQL stores them very compactly, packed into one or two bytes depending on the number of values in the list. It stores each value internally as an integer representing its position in the field definition list, and it keeps the “lookup table” that defines the number-to-string correspondence in the table’s .frm file.

ENUM field sorts by the internal integer values, not by the string themselves:

You can work around this by specifying ENUM members in the order in which you want them to sort. You can also use FIELD() to specify a sort order explicitly in your queries, but this prevents MySQL from using the index for sorting:

The biggest downside of ENUM is that the list of strings is fixed, and adding or removing strings requires the use of ALTER TABLE. In particular, it can be slower to join a CHAR or VARCHAR column to an ENUM column than to another CHAR or VARCHAR column.

Date and Time Types

The finest granularity of time MySQL can store is one second. However, it can do temporal computations with microsecond granularity.

DATETIME

DATETIME uses 8 bytes of storage space.
TIMESTAMP

TIMESTAMP uses only 4 bytes of storage, so it has a much smaller range than DATETIME: from the year 1970 to partway through the year 2038. TIMESTAMP columns are NOT NULL by default. By default, MySQL will set the first TIMESTAMP column to the current time when you insert a row without specifying a value for the column.

In general if you can use TIMESTAMP you should, as it is more space-efficient than DATETIME.

What if you need to store a date and time value with subsecond resolution? You can use the BIGINT data type and store the value as a timestamp in micorseconds, or you can use a DOUBLE and store the fractional part of the second after the decimal point.

Bit-Packed Data Types

BIT

                The maximum length of a BIT column is 64 bits.

                MySQL treats BIT as a string type, not a numeric type. When you retrieve a BIT(1) value, the result is a string but the content are the binary value 0 or 1, not the ASCII value “0” or “1”.

              If you want to store a true/false value in a single bit of storage space, another option is to create a nullable CHAR(0) column. This column is capable of storing either the absence of a value (NULL) or a zero-length value (the empty string).

SET

                If you need to store many true/false values, consider combining many columns into one with MySQL’s native SET data type, which MySQL represents interally as a packed set of bits. It uses storage efficiently, and MySQL has functions such as FIND_IN_SET() and FIELD() that make it easy to use in queries.

Bitwise operations on integer columns

Choosing Identifiers

MySQL stores ENUM and SET types internally as integers but converts them to strings when doing comparisons in a string context.

Avoid string types for identifiers if possible, as they take up a lot of space and are generally slower than integer types. Be especially cautious when using string identifiers with MyISAM tables. MyISAM uses packed indexes for strings by default, which makes lookups much slower.

You should also be very careful with completely “random” strings, such as those produced by MD5(), SHA1(), or UUID(). Each new value you generate with them will be distributed in arbitrary ways over a large space, which can slow INSERT and some types of SELECT queries.

Special Types of Data

One example, you should store IP address as unsigned integers intead of VARCHAR(15). MySQL provides the INET_ATON() and INET_NTOA() functions to convert between the two representations.

Indexing Basics

Indexes are implemented in the storage engine layer, not the server layer.

B-Tree Indexes

                In general, if a B-Tree can help you find a row in a particular way, it can help you sort rows by the same criteria.

                Some limitations of B-Tree indexes:

-          They are not useful if the lookup does not start from the leftmost side of the indexed columns.

-          You cannot skip columns in the index.

-          The storage engine can’t optimize access with any columns to the right of the first range condition.

Hash Indexes

A hash index is built on a hash table and is useful only for exact lookups that use every column in the index.

It stores the hash codes in the index and stores a pointer to each row in a hash table. In MySQL, only the Memory storage engine supports explicit hash indexes.

Because the indexes themselves store only short hash values, hash indexes are very compact.

Some limitation:

-          MySQL can’t use hash indexes for sorting because they don’t store rows in sorted order.

-          Hash indexes support only equality comparisons that use the =, IN(), and <=> operators. They can’t speed up range queries.

-          Accessing data in a hash index is very quick, unless there are many collisions (multiple values with the same hash). When there are collisions, the storage engine must follow each row pointer in the linked list and compare their values to lookup value to find the right row(s).

URL example – adding a column storing the hash value (number value) of the long URL, and index that column instead of the URL column. (like function index)

Spatial (R-Tree) indexes

Full-Text indexes

Indexing Strategies for High Performance

Isolate the Column

Prefix Indexes and Index Selectivity

Clustered Indexes

Each leaf node in the clustered index contains the primary key value, the transaction ID and rollback pointer InnodDB uses for transactional and MVCC purposes, and the rest of the columns. If the primary key is on a column prefix, InnoDB includes the full column value with the rest of the columns.

Inserting rows in primary key order with InnoDB

It is best to avoid random (nonsequential) clustered keys. For example, using UUID values is a poor choice from a performance standpoint: it makes clustered index insertion random, which is a worst case scenario, and does not give you any helpful data clustering.

You should strive to insert data in primary key order when using InnoDB, and you should try to use a clustering key that will give a monotonically increasing value for each new row.

Covering Indexes

An index that contains (or “covers”) all the data needed to satisfy a query is called a covering index.

Most storage engines cache indexes better than data. Some store engines, such as MyISAM, cache only the index in MySQL’s memory. Because the operating system caches the data for MyISAM, accessing it typically requires a system call. This may cause a huge performance impact, especially for ached workloads where the system call is the most expensive part of data access.

Covering indexes are especially helpful for InnoDB tables – secondary inex (hold the row’s primary key)

The index must store the values from the columns it contains. Hash, spatial, and full-text indexes don’t store these values, so MySQL can use only B-Tree indexes to cover queries.

Using Index Scans for Sorts

MySQL has two ways to produce ordered results: it can use a filesort, or it can scan an index in order.

If the query joins multiple tables, it works only when all columns in the ORDER BY clause refer to the first table.

Packed (Prefix-Compressed) Indexes

MyISAM uses prefix compression to reduce index size, allowing more of the index to fit in memory and dramatically improving performance in some cases. It packs string values by default, but you can even tell it to compress integer values.

Redundant and Duplicate Indexes

MySQL allows you to create multiple indexes on the same column.

The main reason to use a redundant index is when extending an existing index, the redundant index will make it much larger.

Indexes and Locking

Indexes play a very important role for InnoDB, because they let queries lock fewer rows.

InnoDB locks rows only when it accesses them, and an index can reduce the number of rows InnoDB accesses and therefore locks. However, this works only if InnoDB can filter out the undesired rows at the storage engine level. If the index doesn’t permit InnoDB to do that, the MySQL server will have to apply a WHERE clause after InnoDB retrieves the rows and returns then to the server level. At this point, it’s too late to avoid locking the rows: InnoDB will already have locked then, and the server won’t be able to unlock them.

InnoDB can lock rows it doesn’t really need even when it uses an index. The problem is even worse when it can’t use an index to find and lock the rows: if there’s no index for the query, MySQL will do a full table scan and lock every row, whether it “needs” it or not.

InnoDB can place shared (read) locks on secondary indexes, but exclusive (write) locks require access to the primary key. That eliminates the possibility of using a covering index and can make SELECT FOR UPDATE much slower than LOCK IN SHARE MODE or a nonlocking query.

Index Case Study

Keep the range criterion at the end of the index, so the optimizer will use as much of the index as possible.

Index and Table Maintenance

Finding and Repairing Table Corruption

Run CHECK TABLE to see if the table is corrupt

You can fix corrupt tables with REPAIR TABLE command, but not all storage engines support this. In these cases you can do a “no-op” ALTER, such as altering a table to use the same storage engine it currently uses.

Updating Index Statistics

The MySQL query optimizer uses two API calls to ask the storage engines how index values are distributed when deciding how to use indexes. The first is the records_in_range() call, which accepts range end points and return the (possibly estimated) number of records in that range. The second is info(), which can return various types of data, including index cardinality.

Run ANALYZE TABLE to gather index statistics.

-          The Memory storage engine does not store index statistics at all.

-          MyISAM stores statistics on disk, and ANALYZE TABLE performs a full index scan to compute cardinality. The entire table is locked during this process.

-          InnoDB does not store statistics on disk, but rather estimates them with random index dives the first time a table is opened. ANALYZE TABLE uses random dives for InnoDB, so InnoDB statistics are less accurate, but they may not need manual updates unless you keep your server running for a very long time. Also, ANALYZE TABLE is nonblocking and relatively inexpensive in InnoDB, so you can update the statistics online without affecting the server much.

Reducing Index and Data Fragmentation

There are two types of data fragmentation:

-          Row fragmentation

-          Intra-row fragmentation

To defragment data, you can either run OPTIMIZE TABLE or dump and reload the data.

For storage engines that don’t support OPTIMIZE TABLE, you can rebuild the table with a no-op ALTER TABLE. Just alter the table to have the same engine it currently uses.

Normalization and Denormalization

A person with two watches never knows what time it is. J

Pros and Cons of a Normalized Schema

Pros and Cons of a Denormalized Schema

A Mixture of Normalized and Denormalized

Cache and Summary Tables

-          Counter Tables

You can get higher concurrency by keeping more than one row and updating a random row.

Prepopulate the table by adding 100 rows to it. Now the query can just choose a random slot and update it:

UPDATE hit_counter SET cnt = cnt + 1 WHERE slot = RAND() * 100;

Speeding Up Alter Table

MySQL performs mot alterations by making an empty table with desired new structure, inserting all the data from the old table into the new one, and deleting the old table.

Change the column default value from 3 to 5 --

ALTER TABLE film MODIFY COLUMN rental_duration TINYINT(3) NOT NULL DEFAULT 5;

This is expensive as it will drop and create new table and insert data to new table. Any MODIFY COLUMN will cause a table rebuild.

You can change a column’s default with ALTER COLUMN –

ALTER TABLE film ALTER COLUMN rental_duration SET DEFAULT 5.

This statement modifies the .frm file and leaves the table alone. As a result, it is very fast.

Modifying Only the .frm File

The basic technique is to create a .frm file for the desired table structure and copy it into the place of the existing table’s .frm file, as follows:

-          Create an empty table with exactly the same layout, except for the desired modification

-          Execute FLUSH TABLES WITH READ LOCK. This will close all tables in use and prevent any tables from being opened.

-          Swap the .frm files.

-          Execute UNLOCK TABLES to release the read lock.

Building MyISAM Indexes Quickly

ALTER TABLE test.load_data DISABLE KEYS;

ALTER TABLE test.load_data ENABLE KEYS;

DISABLE KEYS applies only to nonunique indexes. MyISAM builds unqiue indexes in memory and checks the uniqueness as it loads each row. Loading becomes extremely slow as soon as the index’s size exceeds the available memory.

Notes on Storage Engines

--------------------------------------
Regards,
FangwenYu
相关阅读:
mac鼠标滚动方向自然问题
 Android学习笔记(9):使用XML文件和Java代码控制UI界面
 新版Eclipse找不到Java EE Module Dependencies选项
 JAR包放在WEB-INF/lib子目录报ClassNotFoundException解决方案
 Eclipse中將Java项目转变为Java Web项目
 java.lang.NoClassDefFoundError: org/springframework/dao/support/PersistenceE解决方法
 MyBatis数据持久化（九）动态sql
MyBatis数据持久化（八）sql复用
 MyBatis数据持久化（七）多表连接查询
 JQuery实现的弹窗效果
原文地址：https://www.cnblogs.com/fangwenyu/p/2581295.html

Notes on <High Performance MySQL> Ch3: Schema Optimization and Indexing

Choosing Optimal Data Types

Whole Numbers

Real Numbers

String Types

VARCHAR and CHAR

BLOB and TEXT

Using ENUM instead of a string type

Date and Time Types

Bit-Packed Data Types

Choosing Identifiers

Special Types of Data

Indexing Basics

Indexing Strategies for High Performance

Isolate the Column

Prefix Indexes and Index Selectivity

Clustered Indexes

Covering Indexes

Using Index Scans for Sorts

Packed (Prefix-Compressed) Indexes

Redundant and Duplicate Indexes

Indexes and Locking

Index Case Study

Index and Table Maintenance

Finding and Repairing Table Corruption

Updating Index Statistics

Reducing Index and Data Fragmentation

Normalization and Denormalization

Pros and Cons of a Normalized Schema

Pros and Cons of a Denormalized Schema

A Mixture of Normalized and Denormalized

Cache and Summary Tables

Speeding Up Alter Table

Modifying Only the .frm File

Building MyISAM Indexes Quickly

Notes on Storage Engines