redis源码系列-数据结构(adlist/ziplist/dict)

redis源码系列-数据结构(adlist/ziplist/dict)
该系列基于redis-2.8.18，主要记录自己的理解或者想法。redis以自己支持存储的数据结构丰富吸引了大批人，把memcached比了下去。本文就从简单基本的数据结构入手。

双向链表-adlist
```
typedef struct listNode {
    struct listNode *prev;
    struct listNode *next;
    void *value;
} listNode;

typedef struct listIter {
    listNode *next;
    int direction;
} listIter;

typedef struct list {
    listNode *head;
    listNode *tail;
    void *(*dup)(void *ptr);//使用函数指针便于扩展，不同的业务指定自己的实现
    void (*free)(void *ptr);
    int (*match)(void *ptr, void *key);
    unsigned long len;
} list;
```
压缩链表-ziplist

ziplist是双向链表的一种实现，采用一些编码技巧高效的使用一块连续的内存，每个节点可以保存字符串或者整数。ziplist常用做其他数据结构的一种底层实现方式。

内存结构如下图：

zlbytes：ziplist占用的字节数（不包括zlbytes自己，最小是7），unsigned int类型。可以算出一个ziplist最多2**32-1byte，大概4G。

zltail：从zltail到最后一个entry的距离(byte)-zlbytes结尾到last entry的开头，zltail最少是6，len(zltail)+len(zllen)。这样可以O(1)找到尾部，不用遍历整个list。

zllen：ziplist的节点数，大小是2byte，最多代表2**16-2=65534个（为啥是减2？因为减1是2byte表示的最大数，被用于表示超过zllen范围了，zllen无效了），超过的就得遍历list获得有效节点数。

zlend：各位全是1（255），表示ziplist的结尾。zltail可以找到last entry，那不用zlend行不行？那在遍历链表时需要记录遍历的个数，等于zltail时才能停止。

entry：一个具体的节点，其内部结构如下。

pre-entry-len：上一个节点的长度(byte)，便于回退。

encoding：表示该节点存储的数据类型，string或整型。

entry-len：该节点的value大小，entry-len占用的bit数与存储类型有关，如下。

字符串类型：
encoding=00，strlen<=63，entry-len占6bit
encoding=01, strlen <= 16383，entry-len占14bit
encoding=10, strlen>=16384, entry-len占38bit

整型类型：

encoding+entry-len共占1byte，encoding=11，后面6bit用于表示整数类型（int16_t/int32_t/int64_t等）。

11 000000代表int16_t，
11 010000代表int32_t，
11 100000代表int64_t，
11 110000代表int24
11 111110代表int8
11 11xxxx：0001 <= xxxx <= 1101(从上面看0000/11110不能用)，xxxx就是存储的整数值，虽然实际存储的是1到13，解释成0到12。

补：

ziplist是一块连续的内存，所以每次新增都重新分配内存，删除会压缩内存(后面的往前移，void * memmove ( void * destination, const void * source, size_t num ))

值和bit的变换-需要用到void * memcpy ( void * destination, const void * source, size_t num )，整型还会用到强制类型转换和移位。

散列表-dict

散列表，使用拉链法解决hash冲突。比较有意思的是使用渐进式重hash的方法来扩大散列表大小。

何时resize？新增元素时，检查ht[0]中used/size > ratio（默认为5），如果true，会新生成ht[1]，大小大于等于2*used，且为2的指数次方。used是保存的元素个数，size是散列表的桶个数。开始渐进式rehash，将ht[0]中的元素逐渐rehash到ht[1]中。

何为渐进式？每当新增、删除、查找元素时，如果散列表处于rehash(rehashidx!=-1)时，就rehash一个桶(有相同hash值的链表)到ht[1]，直到rehash结束。

过程中的注意点？rehash中新增元素都放到ht[1]中，查找要遍历2个ht，以ht[1]中的为准。如果有safe iterator就要暂停rehash，以保证基于iterator的操作安全。

优缺点？resize过程中不中断散列表的使用，缺点是新旧列表并存，resize过程中长期占用多余的空间。
```
typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;
/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];//ht[0]为默认使用，resize时将ht[0]中的元素逐一rehash到ht[1]，最后ht[1]复制给后台[0]
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int iterators; /* number of iterators currently running，记录safe iterator个数，如果非0，就暂停rehash*/
} dict;
/* If safe is set to 1 this is a safe iterator, that means, you can call
 * dictAdd, dictFind, and other functions against the dictionary even while
 * iterating. Otherwise it is a non safe iterator, and only dictNext()
 * should be called while iterating. */
typedef struct dictIterator {
    dict *d;
    long index;
    int table, safe;//0非安全，1安全
    dictEntry *entry, *nextEntry;
    /* unsafe iterator fingerprint for misuse detection. */
    long long fingerprint;//是散列表各属性的异或
} dictIterator;
//dictType,散列表实现操作，如key比较/hash函数
typedef struct dictType {
    unsigned int (*hashFunction)(const void *key);//hash function
    void *(*keyDup)(void *privdata, const void *key);//duplicate key
    void *(*valDup)(void *privdata, const void *obj);//duplicate value
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);//key compare
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;
```
函数指针挺好的，与面向对象编程中的继承有异曲同工的效果，实现了多态。下图是散列表结构：
相关阅读:
.NET框架设计—常被忽视的C#设计技巧
 判断网络是否链接
 ADO.NET入门教程（五）细说数据库连接池
 爬虫selenium中截图
 爬虫极滑块验证思路
 Linux 磁盘分区、挂载
 linux中crontab任务调度
 第30课操作符重载的概念
 第29课类中的函数重载
 第28课友元的尴尬能力
原文地址：https://www.cnblogs.com/whuqin/p/4981967.html

redis源码系列-数据结构(adlist/ziplist/dict)

双向链表-adlist

压缩链表-ziplist

散列表-dict