• [读源码] abslspinlock/mutex


    SpinLock 和 Mutex 是两种不同类型的锁, 它们的目的都是实现临界区的互斥访问, 在不考虑优化的情况下, SpinLock 就是让当前线程在它所在的时间片轮训等待加锁成功, Mutex 则是由操作系统提供, 如果当前无法加锁, 就会放弃当前时间片转去其他线程, 直到加锁成功.

    为了更好的理解这两者的区别和实现, 本文来学习 absl 库中两者的实现.

    SpinLock

    首先是 SpinLock, absl 在正常锁的功能上考虑了线程调度和等待时间问题, 它用 atomic<uint32_t> 存储状态, 来看他是如何被编码的

    // Description of lock-word:
    //  31..00: [............................3][2][1][0]
    //
    //     [0]: kSpinLockHeld
    //     [1]: kSpinLockCooperative
    //     [2]: kSpinLockDisabledScheduling
    // [31..3]: ONLY kSpinLockSleeper OR
    //          Wait time in cycles >> PROFILE_TIMESTAMP_SHIFT
    //
    // Detailed descriptions:
    //
    // Bit [0]: The lock is considered held iff kSpinLockHeld is set.
    //
    // Bit [1]: Eligible waiters (e.g. Fibers) may co-operatively reschedule when
    //          contended iff kSpinLockCooperative is set.
    //
    // Bit [2]: This bit is exclusive from bit [1].  It is used only by a
    //          non-cooperative lock.  When set, indicates that scheduling was
    //          successfully disabled when the lock was acquired.  May be unset,
    //          even if non-cooperative, if a ThreadIdentity did not yet exist at
    //          time of acquisition.
    //
    // Bit [3]: If this is the only upper bit ([31..3]) set then this lock was
    //          acquired without contention, however, at least one waiter exists.
    //
    //          Otherwise, bits [31..3] represent the time spent by the current lock
    //          holder to acquire the lock.  There may be outstanding waiter(s).
    static constexpr uint32_t kSpinLockHeld = 1;
    static constexpr uint32_t kSpinLockCooperative = 2;
    static constexpr uint32_t kSpinLockDisabledScheduling = 4;
    static constexpr uint32_t kSpinLockSleeper = 8;
    // Includes kSpinLockSleeper.
    static constexpr uint32_t kWaitTimeMask =
        ~(kSpinLockHeld | kSpinLockCooperative | kSpinLockDisabledScheduling);
    
    

    0位记录当前是否上锁, 1位记录上锁失败是否重新调度, 2位记录是否禁止调度, 剩下的位记录了线程的等待所花时间.

    设计的很好, 可惜在代码中, 1,2 位涉及到线程调度, 这也超出了 absl 的范畴, 可能被谷歌其他项目所使用, 但没有放到 absl 里.

    在分析实现之前, 还有一个点需要提到的是 tsan_mutex_inferface , 这里加入了许多 tasn 的宏定义, 用来做线程安全性扫描, 但也可以帮助我们理解代码中的行为, 但在分析中, 为了代码的简洁, 我会把这些注解删除.

    Thread Safety Analysis

    Lock

    首先我们来看 Lock, TryLock 的实现.

    
    inline bool TryLockImpl() {
      uint32_t lock_value = lockword_.load(std::memory_order_relaxed);
      return (TryLockInternal(lock_value, 0) & kSpinLockHeld) == 0;
    }
    
    inline void Lock() () {
      if (!TryLockImpl()) {
        SlowLock();
      }
    }
    
    inline bool TryLock() {
      bool res = TryLockImpl();
      return res;
    }
    
    inline bool IsHeld() const {
      return (lockword_.load(std::memory_order_relaxed) & kSpinLockHeld) != 0;
    }
    

    Lock 函数体现了 Futex 的思想, 先 try lock, 失败了再进入 忙等待.

    之后我们继续看 TryLockInternal, SlowLock.

    // If (result & kSpinLockHeld) == 0, then *this was successfully locked.
    // Otherwise, returns last observed value for lockword_.
    inline uint32_t SpinLock::TryLockInternal(uint32_t lock_value,
                                              uint32_t wait_cycles) {
      if ((lock_value & kSpinLockHeld) != 0) {
        return lock_value;
      }
    
      uint32_t sched_disabled_bit = 0;
      if ((lock_value & kSpinLockCooperative) == 0) {
        // For non-cooperative locks we must make sure we mark ourselves as
        // non-reschedulable before we attempt to CompareAndSwap.
        if (base_internal::SchedulingGuard::DisableRescheduling()) {
          sched_disabled_bit = kSpinLockDisabledScheduling;
        }
      }
    
      if (!lockword_.compare_exchange_strong(
              lock_value,
              kSpinLockHeld | lock_value | wait_cycles | sched_disabled_bit,
              std::memory_order_acquire, std::memory_order_relaxed)) {
        base_internal::SchedulingGuard::EnableRescheduling(sched_disabled_bit != 0);
      }
    
      return lock_value;
    }
    

    TryLockInternal 有两个参数, lock_value 从 lockword_ load 出来的值, 和 wait_cycles 新的等待时间.

    首先 test 是否已经上锁, 如果已经上锁说明是被别人持有, 上锁失败, 直接返回.

    否则可以上锁, 这里先处理调度的逻辑, 因为我们构造函数只设置了 kSpinLockCooperative, 如果 kSpinLockCooperative == 0 且当前确实不允许Rescheduling, 我们就要设置 kSpinLockDisabledScheduling. (因为总是先加锁再解锁, 与解锁时 kSpinLockDisabledScheduling 的逻辑也是对应上的)

    然后做一次 CAS, 把上锁, 等待时间, 禁止调度都给他加上, 最后直接返回交换出来的 lock_value. (如果交换成功了, lock_value 的值是不带锁的, 如果被其他线程抢先而加锁失败, lock_value 就是被其他线程写入后的已经带锁的值)

    然后是 SlowLock 函数, 它只有在加锁成功后才允许返回, 且加锁操作只能被 TryLockInternal 完成,

    void SpinLock::SlowLock() {
      uint32_t lock_value = SpinLoop();
      lock_value = TryLockInternal(lock_value, 0);
      if ((lock_value & kSpinLockHeld) == 0) {
        return;
      }
    
      base_internal::SchedulingMode scheduling_mode;
      if ((lock_value & kSpinLockCooperative) != 0) {
        scheduling_mode = base_internal::SCHEDULE_COOPERATIVE_AND_KERNEL;
      } else {
        scheduling_mode = base_internal::SCHEDULE_KERNEL_ONLY;
      }
    
      // The lock was not obtained initially, so this thread needs to wait for
      // it.  Record the current timestamp in the local variable wait_start_time
      // so the total wait time can be stored in the lockword once this thread
      // obtains the lock.
      int64_t wait_start_time = CycleClock::Now();
      uint32_t wait_cycles = 0;
      int lock_wait_call_count = 0;
      while ((lock_value & kSpinLockHeld) != 0) {
        // If the lock is currently held, but not marked as having a sleeper, mark
        // it as having a sleeper.
        if ((lock_value & kWaitTimeMask) == 0) {
          // Here, just "mark" that the thread is going to sleep.  Don't store the
          // lock wait time in the lock -- the lock word stores the amount of time
          // that the current holder waited before acquiring the lock, not the wait
          // time of any thread currently waiting to acquire it.
          if (lockword_.compare_exchange_strong(
                  lock_value, lock_value | kSpinLockSleeper,
                  std::memory_order_relaxed, std::memory_order_relaxed)) {
            // Successfully transitioned to kSpinLockSleeper.  Pass
            // kSpinLockSleeper to the SpinLockWait routine to properly indicate
            // the last lock_value observed.
            lock_value |= kSpinLockSleeper;
          } else if ((lock_value & kSpinLockHeld) == 0) {
            // Lock is free again, so try and acquire it before sleeping.  The
            // new lock state will be the number of cycles this thread waited if
            // this thread obtains the lock.
            lock_value = TryLockInternal(lock_value, wait_cycles);
            continue;   // Skip the delay at the end of the loop.
          } else if ((lock_value & kWaitTimeMask) == 0) {
            // The lock is still held, without a waiter being marked, but something
            // else about the lock word changed, causing our CAS to fail. For
            // example, a new lock holder may have acquired the lock with
            // kSpinLockDisabledScheduling set, whereas the previous holder had not
            // set that flag. In this case, attempt again to mark ourselves as a
            // waiter.
            continue;
          }
        }
    
        // SpinLockDelay() calls into fiber scheduler, we need to see
        // synchronization there to avoid false positives.
        // Wait for an OS specific delay.
        base_internal::SpinLockDelay(&lockword_, lock_value, ++lock_wait_call_count,
                                     scheduling_mode);
        // Spin again after returning from the wait routine to give this thread
        // some chance of obtaining the lock.
        lock_value = SpinLoop();
        wait_cycles = EncodeWaitCycles(wait_start_time, CycleClock::Now());
        lock_value = TryLockInternal(lock_value, wait_cycles);
      }
    }
    
    // Monitor the lock to see if its value changes within some time period
    // (adaptive_spin_count loop iterations). The last value read from the lock
    // is returned from the method.
    uint32_t SpinLock::SpinLoop() {
      // We are already in the slow path of SpinLock, initialize the
      // adaptive_spin_count here.
      ABSL_CONST_INIT static absl::once_flag init_adaptive_spin_count;
      ABSL_CONST_INIT static int adaptive_spin_count = 0;
      base_internal::LowLevelCallOnce(&init_adaptive_spin_count, []() {
        adaptive_spin_count = base_internal::NumCPUs() > 1 ? 1000 : 1;
      });
    
      int c = adaptive_spin_count;
      uint32_t lock_value;
      do {
        lock_value = lockword_.load(std::memory_order_relaxed);
      } while ((lock_value & kSpinLockHeld) != 0 && --c > 0);
      return lock_value;
    }
    

    这里的注释很清楚, SpinLoop 是一次忙等待, 直到 被解锁或 循环一定次数才会退出, 返回 lockword_ 的状态.

    但代码比较长, 用伪代码简化下, 并加上理解.

    void SpinLock::SlowLock() {
      // 首先等待一轮, 再尝试加锁
      old_value = SpinLoop();
      old_value =  TryLockInternal(old_value, 0)
      if (old_value 没有持锁)
        return
      // 加锁失败, 调用系统调用等待, 而不是继续轮循等待
      记录开始时间
      while (old_value 还持有锁) 
        // 在系统调用之前必须加上时间标记, 就是声明有线程在等待, 否则系统调用等待后可能无法被唤醒.
        // 那么什么时候清掉时间标记? 看 TryLockInternal 的上两次调用, 如果线程加锁时没有阻塞, 说明其他线程没有在等待, 就可以清掉这个标记了.
        if (old_value 没有时间标记) 
          // 加上初始时间标记
          success = compare_and_warp(loadword_, old_value, old_value | kSpinLockSleeper);
          if (success)
            old_value |= kSpinLockSleeper;
          else if (old_value 没有持锁)
            // 已经被解锁, 尝试去加锁
            old_value = TryLockInternal(old_value) 
            continue
          else if (old_value 还是没有时间标记) 
            // compare_and_warp 有问题, 回到开始, 继续尝试设置时间标记
            continue
        // 系统调用, sleep/futex
        SpinLockDelay(&loadword_, old_value);
        old_value = SpinLoop();
        wait_cycles = 当前经过时间
        old_value =  TryLockInternal(old_value, wait_cycles)
    }
    

    加锁的代码就分析完毕, 我们来总结下,

    1. 0号比特表示是否有锁, 3-31比特表示当前等待时间.
    2. 加锁会有三个阶段 1 直接加锁, 2. 轮询加锁, 3. 循环系统调用等待,
    3. 最终的加锁都是靠 TryLockInternal 完成的.
    4. 是否有锁在等待 是通过等待时间标记来判断的.

    UnLock

    接着是 unlock

    
    inline void Unlock() {
      uint32_t lock_value = lockword_.load(std::memory_order_relaxed);
      lock_value = lockword_.exchange(lock_value & kSpinLockCooperative,
                                      std::memory_order_release);
    
      if ((lock_value & kSpinLockDisabledScheduling) != 0) {
        base_internal::SchedulingGuard::EnableRescheduling(true);
      }
      if ((lock_value & kWaitTimeMask) != 0) {
        // Collect contentionz profile info, and speed the wakeup of any waiter.
        // The wait_cycles value indicates how long this thread spent waiting
        // for the lock.
        SlowUnlock(lock_value);
      }
    }
    

    Unlock 可能一眼不太能看出要做什么, 首先前两行是将老 lockword_ 换出来, 新值保留 kSpinLockCooperative bit, 这里还用到了 memory_order_release, 因此 TryLock 的时候需要 memory_order_acquire 与之对应.

    然后我们去拿老 lockword_ 值与 kSpinLockDisabledScheduling 做 Rescheduling 的逻辑, (这里留个问题, 假如有多个 SpinLock, 一个允许 rescheduling, 另一个不允许, 如何处理?)

    之后是 &kWaitTimeMask , 如果有值, 说明有其他线程在等待当前的锁, 调用 SlowUnlock 去激活它.

    接下来是 SlowUnlock

    void SpinLock::SlowUnlock(uint32_t lock_value) {
      base_internal::SpinLockWake(&lockword_,
                                  false);  // wake waiter if necessary
    
      // If our acquisition was contended, collect contentionz profile info.  We
      // reserve a unitary wait time to represent that a waiter exists without our
      // own acquisition having been contended.
      if ((lock_value & kWaitTimeMask) != kSpinLockSleeper) {
        const uint64_t wait_cycles = DecodeWaitCycles(lock_value);
        submit_profile_data(this, wait_cycles);
      }
    }
    

    通过系统调用唤醒等待线程, 如果等待时长超过初始值, 就上报一次 profile.

    Mutex

    Mutex 相较于 SpinLock 要复杂许多, 与 std::mutex 相比, 它也多了如下的功能

    //   * Conditional predicates intrinsic to the `Mutex` object
    //   * Shared/reader locks, in addition to standard exclusive/writer locks
    //   * Deadlock detection and debug support.
    

    TODO

    Relate Reading:

    design doc
    简书上一位大佬写的

    std::mutex

    我所使用的 libcxx 会使用 pthread_mutex / mtx_t 实现

    Performance Benchmark

    这张图是 mutex_benchmark:BM_Contended 的结果. 不同的锁在不同临界区时间下, 随着线程数增加所花费的时间.

    可以看到, std::mutex 在线程数较少时, 性能增加明显, 随着到达一定界限后, 所需时间反而下降, 而最终性能三种锁差别都不大.

    对于 absl::Mutex , 在临界区非常短的情况下, 性能也没有逊色于 absl:base_internal::SpinLock, 总体也是性能最好的, 优化做的十分到位, 在正常的使用中应该首选 absl::Mutex.

  • 相关阅读:
    .NET 事件模型教程(一)
    [转帖]Predicate的用法
    MSBuild入门(续)
    浅析C# 中object sender与EventArgs e
    调用一个按钮的Click事件(利用反射)
    Delegate,Action,Func,匿名方法,匿名委托,事件
    投票系统如何防止一个用户多次投票
    如何发送表单
    SharePoint NLB选Unicast还是选Multicast?
    为SharePoint的多台WFE配置Windows自带的NLB遇到的一个问题
  • 原文地址:https://www.cnblogs.com/xxrlz/p/15981913.html
Copyright © 2020-2023  润新知