并发安全 sync.Map

原创歪鼻子歪鼻子 2020-12-27

深入理解并发安全的 sync.Map https://mp.weixin.qq.com/s/s-JbeVsymFojtv3owaGILg

olang中内置了map关键字，但是它是非线程安全的。从go 1.9开始，标准库加入了sync.Map，提供用于并发安全的map。

普通map的并发问题

map的并发读写代码

func main() {  m := make(map[int]int)
  go func() {    for {      _ = m[1] // 读    }  }()
  go func() {    for {      m[2] = 2 // 写    }  }()
  select {} // 维持主goroutine}

以上是一段并发读写map的代码, 其中一个goroutine一直读，另外一个goroutine一直写。即使读写的map键不相同，且不存在"扩容"等操作，代码还是会报错。

fatal error: concurrent map read and map write

锁+map

那普通map有没有能力实现并发安全呢？答案是肯定的。通过给map额外绑定一个锁（sync.Mutex或sync.RWMutex），封装成一个新的struct，实现并发安全。

定义带有锁的对象M

type M struct {  sync.RWMutex  Map map[int]int}

执行并发读写

func main() {  m := M{Map: make(map[int]int)}
  go func() {    for {      m.RLock()      v := m.Map[2] // 读      fmt.Println(v)      m.RUnlock()    }  }()
  go func() {    for i := 1; i > 0; i++ {      m.Lock()      m.Map[2] = i // 写      m.Unlock()    }  }()
  select {}}

在读goroutine读数据时，通过读锁锁定，在写goroutine写数据时，写锁锁定，程序就能并发安全的运行，运行结果示意如下。

...112311241125...

sync.Map

既然通过加锁的方式就能解决map的并发问题，实现方式简洁，并且利用读写锁而不是Mutex可以进一步减少读写的时候因为锁而带来的性能损耗。那么为什么还会有sync.Map的出现？

当map的数据量非常大时，会引发并发的大量goroutine争夺同一把锁，这种现象将直接导致性能的急剧下降。在java中有类似于map的hashMap，它同样是并发不安全，但是JDK提供了线程安全的ConcurrentHashMap，它在面对上述场景时，其核心解决方案是锁分段技术，即内部使用多个锁，每个区间共享一把锁，当多线程访问map中的不同数据段的数据时，线程间就不会存在锁竞争，从而提高了并发访问效率。那sync.Map采取的是什么策略来提升并发性能的呢？

sync.Map的源码结构（基于1.14.1）

type Map struct {  // 此锁是为了保护Map中的dirty数据  mu Mutex  // 用来存读的数据，只读类型，不会造成读写冲突  read atomic.Value // readOnly  // dirty包含最新的写入数据（entry），在写的同时，将read中未删除的数据拷贝到dirty中  // 因为是go中内置的普通map类型，且涉及写操作，所以需要通过mu加锁  dirty map[interface{}]*entry  // 当读数据时,该字段不在read中，尝试从dirty中读取，不管是否在dirty中读取到数据，misses+1  // 当累计到len（dirty）时，会将dirty拷贝到read中，并将dirty清空，以此提升读性能。  misses int}

（左右滑动查看完整代码图片）

在sync.Map中用到了两个冗余数据结构read、dirty。其中read的类型为atomic.Value，它会通过atomic.Value的Load方法将其断言为readOnly对象。

read, _ := m.read.Load().(readOnly) // m为sync.Map

因此，read的真实类型即是readOnly，其数据类型如下。

type readOnly struct {  // read 中的go内置map类型，但是它不需要锁。  m       map[interface{}]*entry  // 当sync.Map.diry中的包含了某些不在m中的key时，amended的值为true.  amended bool}

（左右滑动查看完整代码图片）

amended属性的作用是指明dirty中是否有readOnly.m中未包含的数据，因此当对sync.Map的读操作在read中找不到数据时，将进一步到dirty中查找。

readOnly.m和Map.dirty中map存储的值类型是*entry,它包含一个指针p，指向用户存储的value值。

type entry struct {  p unsafe.Pointer // *interface{}}

entry.p的值有三种类型：

nil：entry已经被删除，且m.dirty为nil
expunged：entry被删除，m.dirty不为nil，但entry不存在m.dirty中
其他：entry有效，记录在m.read中，若dirty不为空，也会记录在dirty中。

虽然read和dirty存在冗余数据，但是这些数据entry是通过指针指向的，因此，尽管Map的value可能会很大，但是空间存储还是足够的。

以上是sync.Map的数据结构，下面着重看看它的四个方法实现：Load、Store、Delete和Range。

Load

加载方法，通过提供的键key，查找对应的值value。

func (m *Map) Load(key interface{}) (value interface{}, ok bool) {  // 首先从m.read中通过Load方法得到readOnly  read, _ := m.read.Load().(readOnly)  // 从read中的map中查找key  e, ok := read.m[key]  // 如果在read中没有找到，且表明有新数据在diry中（read.amended为true）  // 那么，就需要在dirty中查找，这时需要加锁。  if !ok && read.amended {    m.mu.Lock()    // 双重检查:避免在本次加锁的时候，有其他goroutine正好将Map中的dirty数据复制到了read中。    // 能发生上述可能的原因是以下两行代码语句，并不是原子操作。     // if !ok && read.amended {    //   m.mu.Lock()    // }    // 而Map.read其并发安全性的保障就在于它的修改是通过原子操作进行的。    // 因此需要再检查一次read.    read, _ = m.read.Load().(readOnly)    e, ok = read.m[key]    // 如果m.read中key还是不存在，且dirty中有新数据，则检查dirty中的数据。    if !ok && read.amended {      e, ok = m.dirty[key]      // 不管是否从dirty中得到了数据，都会将misses的计数+1      m.missLocked()    }    m.mu.Unlock()  }  if !ok {    return nil, false  }
  // 通过Map的load方法，将entry.p加载为对应指针，再返回指针指向的值  return e.load()}

（左右滑动查看完整代码图片）

Map.missLocked函数是保证sync.Map性能的重要函数，它的目的是将存在有锁的dirty中的数据，转移到只读线程安全的read中去。

func (m *Map) missLocked() {  m.misses++ // 计数+1  if m.misses < len(m.dirty) {    return  }  m.read.Store(readOnly{m: m.dirty}) // 将dirty数据复制到read中去  m.dirty = nil // dirty清空  m.misses = 0 // misses重置为0}

（左右滑动查看完整代码图片）

Store

该方法更新或新增键值对key-value。

func (m *Map) Store(key, value interface{}) {  // 如果m.read中存在该键，且该键没有被标记删除（expunged）  // 则尝试直接存储（见entry的tryStore方法）  // 注意： 如果m.dirty中也有该键（key对应的entry），由于都是通过指针指向，所有m.dirty中也会保持最新entry值。  read, _ := m.read.Load().(readOnly)  if e, ok := read.m[key]; ok && e.tryStore(&value) {    return  }  // 如果不满足上述条件，即m.read不存在或者已经被标记删除  m.mu.Lock()  read, _ = m.read.Load().(readOnly)  if e, ok := read.m[key]; ok { // 如果read中有该键    if e.unexpungeLocked() { // 判断entry是否被标记删除      // 如果entry被标记删除，则将entry添加进m.dirty中      m.dirty[key] = e    }    // 更新entry指向value地址    e.storeLocked(&value)  } else if e, ok := m.dirty[key]; ok { //dirty中有该键：更新    e.storeLocked(&value)  } else { // dirty和read中均无该键：新增    if !read.amended { // 表明dirty中没有新数据，在dirty中增加第一个新键      m.dirtyLocked() // 从m.read中复制未删除的数据到dirty中      m.read.Store(readOnly{m: read.m, amended: true})    }    m.dirty[key] = newEntry(value) // 将entry增加到dirty中  }  m.mu.Unlock()}

（左右滑动查看完整代码图片）

Store的每次操作都是先从read开始，当不满足条件时，才加锁操作dirty。但是由于存在从read中复制数据的情况（例如dirty刚复制完数据给m.read，又来了一个新键），当m.read中数据量很大时，可能对性能造成影响。

Delete

删除某键值。

func (m *Map) Delete(key interface{}) {  read, _ := m.read.Load().(readOnly)  e, ok := read.m[key]  if !ok && read.amended {    m.mu.Lock()    read, _ = m.read.Load().(readOnly)    e, ok = read.m[key]    if !ok && read.amended {      delete(m.dirty, key)    }    m.mu.Unlock()  }  if ok {    e.delete()  }}
// 如果read中有该键，则从read中删除，其删除方式是通过原子操作func (e *entry) delete() (hadValue bool) {  for {    p := atomic.LoadPointer(&e.p)    // 如果p指针为空，或者被标记清除    if p == nil || p == expunged {      return false    }    // 通过原子操作，将e.p标记为nil.    if atomic.CompareAndSwapPointer(&e.p, p, nil) {      return true    }  }}

（左右滑动查看完整代码图片）

Delete中的逻辑和Store逻辑相似，都是从read开始，如果这个key（也即是entry）不在read中，且dirty中有新数据，则加锁从dirty中删除。注意，和Load与Store方法一样，也是需要双检查。

Range

想要遍历sync.Map，不能通过for range的形式，因此，它自身提供了Range方法，通过回调的方式遍历。

func (m *Map) Range(f func(key, value interface{}) bool) {  read, _ := m.read.Load().(readOnly)  // 判断dirty中是否有新的数据  if read.amended {    m.mu.Lock()    // 双检查    read, _ = m.read.Load().(readOnly)    if read.amended {      // 将dirty中的数据复制到read中      read = readOnly{m: m.dirty}      m.read.Store(read)      m.dirty = nil      m.misses = 0    }    m.mu.Unlock()  }
  // 遍历已经整合过dirty的read  for k, e := range read.m {    v, ok := e.load()    if !ok {      continue    }    if !f(k, v) {      break    }  }}

（左右滑动查看完整代码图片）

sync.Map的优化总结

1. 空间换时间：通过两个冗余的数据结构（read、write），减小锁对性能的影响。

2. 读操作使用read，避免读写冲突。

3. 动态调整：通过misses值，避免dirty数据过多。

4. 双检查机制：避免在非原子操作时产生数据错误。

5. 延迟删除机制：删除一个键值只是先打标记，只有等提升dirty（复制到read中，并清空自身）时才清理删除的数据。

6. 优先从read中读、改和删除，因为对read的操作不用加锁，大大提升性能。

sync.Map的使用例子

func main() {  var sm sync.Map  // 注意：同一个sync.Map，和map不一样，每个item的key或value可以和其他的数据类型不一样  // 只要满足key能hash即可  sm.Store(1, "a")  sm.Store("b", 2)  sm.Store("c", 3)  // 和map获取key值类似  if v, ok := sm.Load("b"); ok {    fmt.Println(v)  }
 // 删除某个key的键值对  sm.Delete(1)
  // 参数fun中的参数是遍历获得的key和value，返回一个bool值  // 返回true时，继续遍历  // 返回false，遍历结束  sm.Range(func(key, value interface{}) bool {    fmt.Println(key,value)    return true  })}

（左右滑动查看完整代码图片）

输出

2b 2c 3

sync.Map的性能

在Go源码$GOROOT/src/sync中，提供了测试代码。

map_reference_test.go: 定义了测试用的mapInterface接口，sync.Map、RwMutexMap和DeepCopyMap对象实现该接口方法。
map_test.go: 三个对象的方法测试代码。
map_bench_test.go: 三个对象的benchmark性能对比测试代码。

在小菜刀的机器上，运行性能测试结果如下。

$ go test -bench=LoadMostlyHits -benchmemBenchmarkLoadMostlyHits/*sync_test.DeepCopyMap-8           80252629          13.5 ns/op         7 B/op         0 allocs/opBenchmarkLoadMostlyHits/*sync_test.RWMutexMap-8            23025050          51.8 ns/op         7 B/op         0 allocs/opBenchmarkLoadMostlyHits/*sync.Map-8                        67718686          14.9 ns/op         7 B/op         0 allocs/op
$ go test -bench=LoadMostlyMisses -benchmemBenchmarkLoadMostlyMisses/*sync_test.DeepCopyMap-8           128480215          11.2 ns/op         7 B/op         0 allocs/opBenchmarkLoadMostlyMisses/*sync_test.RWMutexMap-8            23989224          47.4 ns/op         7 B/op         0 allocs/opBenchmarkLoadMostlyMisses/*sync.Map-8                        132403878           9.30 ns/op         7 B/op         0 allocs/op
$ go test -bench=LoadOrStoreBalanced -benchmemBenchmarkLoadOrStoreBalanced/*sync_test.RWMutexMap-8             3909409         553 ns/op        99 B/op         2 allocs/opBenchmarkLoadOrStoreBalanced/*sync.Map-8                         3574923         368 ns/op        97 B/op         3 allocs/op
$ go test -bench=LoadOrStoreUnique -benchmemBenchmarkLoadOrStoreUnique/*sync_test.RWMutexMap-8             2053806         647 ns/op       174 B/op         2 allocs/opBenchmarkLoadOrStoreUnique/*sync.Map-8                         2456720         577 ns/op       140 B/op         4 allocs/op
$ go test -bench=LoadOrStoreCollision -benchmemBenchmarkLoadOrStoreCollision/*sync_test.DeepCopyMap-8           153679003           8.18 ns/op         0 B/op         0 allocs/opBenchmarkLoadOrStoreCollision/*sync_test.RWMutexMap-8            13718534          87.9 ns/op         0 B/op         0 allocs/opBenchmarkLoadOrStoreCollision/*sync.Map-8                        175620835           7.08 ns/op         0 B/op         0 allocs/op
$ go test -bench=Range -benchmemBenchmarkRange/*sync_test.DeepCopyMap-8             416906        2947 ns/op         0 B/op         0 allocs/opBenchmarkRange/*sync_test.RWMutexMap-8               22784       52370 ns/op     16384 B/op         1 allocs/opBenchmarkRange/*sync.Map-8                          369955        3194 ns/op         0 B/op         0 allocs/op
$ go test -bench=AdversarialAlloc -benchmemBenchmarkAdversarialAlloc/*sync_test.DeepCopyMap-8            1875109         646 ns/op       539 B/op         1 allocs/opBenchmarkAdversarialAlloc/*sync_test.RWMutexMap-8            19454866          61.6 ns/op         8 B/op         1 allocs/opBenchmarkAdversarialAlloc/*sync.Map-8                         3712470         320 ns/op        51 B/op         1 allocs/op
$ go test -bench=AdversarialDelete -benchmemBenchmarkAdversarialDelete/*sync_test.DeepCopyMap-8            6749067         215 ns/op       168 B/op         1 allocs/opBenchmarkAdversarialDelete/*sync_test.RWMutexMap-8            16046545          76.9 ns/op        25 B/op         1 allocs/opBenchmarkAdversarialDelete/*sync.Map-8                        18678104          64.2 ns/op        18 B/op         1 allocs/op

（左右滑动查看完整代码图片）

如何选择Map

从性能测试结果可以看出，sync.Map并不是为了代替锁+map的组合。它的设计，是为了在某些并发场景下，相对前者能有较小的性能损耗。

源码文档中（$GOROOT/src/sync/map.go）已经给出了sync.Map的合适场景。

// The Map type is specialized. Most code should use a plain Go map instead,// with separate locking or coordination, for better type safety and to make it// easier to maintain other invariants along with the map content.//// The Map type is optimized for two common use cases: (1) when the entry for a given// key is only ever written once but read many times, as in caches that only grow,// or (2) when multiple goroutines read, write, and overwrite entries for disjoint// sets of keys. In these two cases, use of a Map may significantly reduce lock// contention compared to a Go map paired with a separate Mutex or RWMutex.

（左右滑动查看完整代码图片）

两种情况应该选择sync.Map

key值一次写入，多次读取（即写少读多场景）。
多个goroutine的读取、写入和覆盖在不相交的key集。

ConcurrentHashMap (Java Platform SE 8 ) https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html

java.lang.Object
- java.util.AbstractMap<K,V>
- - java.util.concurrent.ConcurrentHashMap<K,V>

Type Parameters:
K - the type of keys maintained by this map
V - the type of mapped values
All Implemented Interfaces:
Serializable, ConcurrentMap<K,V>, Map<K,V>
```
public class ConcurrentHashMap<K,V>
extends AbstractMap<K,V>
implements ConcurrentMap<K,V>, Serializable
```
A hash table supporting full concurrency of retrievals and high expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.) For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time. Bear in mind that the results of aggregate status methods including size, isEmpty, and containsValue are typically useful only when a map is not undergoing concurrent updates in other threads. Otherwise the results of these methods reflect transient states that may be adequate for monitoring or estimation purposes, but not for program control.

The table is dynamically expanded when there are too many collisions (i.e., keys that have distinct hash codes but fall into the same slot modulo the table size), with the expected average effect of maintaining roughly two bins per mapping (corresponding to a 0.75 load factor threshold for resizing). There may be much variance around this average as mappings are added and removed, but overall, this maintains a commonly accepted time/space tradeoff for hash tables. However, resizing this or any other kind of hash table may be a relatively slow operation. When possible, it is a good idea to provide a size estimate as an optional initialCapacity constructor argument. An additional optional loadFactor constructor argument provides a further means of customizing initial table capacity by specifying the table density to be used in calculating the amount of space to allocate for the given number of elements. Also, for compatibility with previous versions of this class, constructors may optionally specify an expected concurrencyLevel as an additional hint for internal sizing. Note that using many keys with exactly the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

A Set projection of a ConcurrentHashMap may be created (using newKeySet() or newKeySet(int)), or viewed (using keySet(Object) when only keys are of interest, and the mapped values are (perhaps transiently) not used or all take the same mapping value.

A ConcurrentHashMap can be used as scalable frequency map (a form of histogram or multiset) by using LongAdder values and initializing via computeIfAbsent. For example, to add a count to a ConcurrentHashMap<String,LongAdder> freqs, you can use freqs.computeIfAbsent(k -> new LongAdder()).increment();

This class and its views and iterators implement all of the optional methods of the Map and Iterator interfaces.

Like Hashtable but unlike HashMap, this class does not allow null to be used as a key or value.

ConcurrentHashMaps support a set of sequential and parallel bulk operations that, unlike most Stream methods, are designed to be safely, and often sensibly, applied even with maps that are being concurrently updated by other threads; for example, when computing a snapshot summary of the values in a shared registry. There are three kinds of operation, each with four forms, accepting functions with Keys, Values, Entries, and (Key, Value) arguments and/or return values. Because the elements of a ConcurrentHashMap are not ordered in any particular way, and may be processed in different orders in different parallel executions, the correctness of supplied functions should not depend on any ordering, or on any other objects or values that may transiently change while computation is in progress; and except for forEach actions, should ideally be side-effect-free. Bulk operations on Map.Entry objects do not support method setValue.
- forEach: Perform a given action on each element. A variant form applies a given transformation on each element before performing the action.
- search: Return the first available non-null result of applying a given function on each element; skipping further search when a result is found.
- reduce: Accumulate each element. The supplied reduction function cannot rely on ordering (more formally, it should be both associative and commutative). There are five variants:
  - Plain reductions. (There is not a form of this method for (key, value) function arguments since there is no corresponding return type.)
  - Mapped reductions that accumulate the results of a given function applied to each element.
  - Reductions to scalar doubles, longs, and ints, using a given basis value.
These bulk operations accept a parallelismThreshold argument. Methods proceed sequentially if the current map size is estimated to be less than the given threshold. Using a value of Long.MAX_VALUE suppresses all parallelism. Using a value of 1 results in maximal parallelism by partitioning into enough subtasks to fully utilize the ForkJoinPool.commonPool() that is used for all parallel computations. Normally, you would initially choose one of these extreme values, and then measure performance of using in-between values that trade off overhead versus throughput.

The concurrency properties of bulk operations follow from those of ConcurrentHashMap: Any non-null result returned from get(key) and related access methods bears a happens-before relation with the associated insertion or update. The result of any bulk operation reflects the composition of these per-element relations (but is not necessarily atomic with respect to the map as a whole unless it is somehow known to be quiescent). Conversely, because keys and values in the map are never null, null serves as a reliable atomic indicator of the current lack of any result. To maintain this property, null serves as an implicit basis for all non-scalar reduction operations. For the double, long, and int versions, the basis should be one that, when combined with any other value, returns that other value (more formally, it should be the identity element for the reduction). Most common reductions have these properties; for example, computing a sum with basis 0 or a minimum with basis MAX_VALUE.

Search and transformation functions provided as arguments should similarly return null to indicate the lack of any result (in which case it is not used). In the case of mapped reductions, this also enables transformations to serve as filters, returning null (or, in the case of primitive specializations, the identity basis) if the element should not be combined. You can create compound transformations and filterings by composing them yourself under this "null means there is nothing there now" rule before using them in search or reduce operations.

Methods accepting and/or returning Entry arguments maintain key-value associations. They may be useful for example when finding the key for the greatest value. Note that "plain" Entry arguments can be supplied using new AbstractMap.SimpleEntry(k,v).

Bulk operations may complete abruptly, throwing an exception encountered in the application of a supplied function. Bear in mind when handling such exceptions that other concurrently executing functions could also have thrown exceptions, or would have done so if the first exception had not occurred.

Speedups for parallel compared to sequential forms are common but not guaranteed. Parallel operations involving brief functions on small maps may execute more slowly than sequential forms if the underlying work to parallelize the computation is more expensive than the computation itself. Similarly, parallelization may not lead to much actual parallelism if all processors are busy performing unrelated tasks.

All arguments to all task methods must be non-null.

This class is a member of the Java Collections Framework.

并发安全 sync.Map

谈谈Go的并发安全相关

sync.Map源码分析

概述

原理

源码分析

特别提醒

适用场景

Class ConcurrentHashMap<K,V>

Class ConcurrentHashMap<K,V>