• 温故知新-多线程-Cache Line存在验证




    简述

    本地旨在验证在《深入刨析volatile关键词》中提到的CPU Cache中缓存一致性协议可能会出现的CacheMiss;

    缓存行Cache Line

    缓存是由缓存行组成的。一般一行缓存行有64字节。CPU在操作缓存时是以缓存行为单位的,可以通过如下命令查看缓存行的大小:

    [root@yangsc-01 ~]# cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
    64
    [root@yangsc-01 ~]#
    

    由于CPU存取缓存都是按行为最小单位操作的。对于long类型来说,一个long类型的数据有64位,也就是8个字节,所以对于数组来说,由于数组中元素的地址是连续的,所以在加载数组中第一个元素的时候会把后面的元素也加载到缓存行中。如果一个long类型的数组长度是8,那么也就是64个字节了,CPU这时操作该数组,似乎应该会把数组中所有的元素都放入缓存行,但是答案却是否定的,原因就是在Java中,对象在内存中的结构包含对象头。在《深入剖析synchronized关键词》一个对象的内存布局小节 有相关描述;

    一张经典的Cache Line

    image-20200602163634628

    一个运行在处理器core 1上的线程想要更新变量X的值, 同时另外一个运行在处理器core 2上的线程想要更新变量Y的值. 但是, 这两个频繁改动的变量都处于同一条缓存行. 两个线程就会轮番发送RFO消息, 占得此缓存行的拥有权. 当core 1取得了拥有权开始更新X, 则core 2对应的缓存行需要设为I状态. 当core 2取得了拥有权开始更新Y, 则core 1对应的缓存行需要设为I状态(失效态). 轮番夺取拥有权不但带来大量的RFO消息, 而且如果某个线程需要读此行数据时, L1和L2缓存上都是失效数据, 只有L3缓存上是同步好的数据;而L3的Cache性能不好;

    验证CacehLine存在?

    先看结果

    • VolatileLong耗时:31028毫秒

    private static VolatileLong[] longs = new VolatileLong[NUM_THREADS];

    • VolatileLong2耗时:7650毫秒

    private static VolatileLong2[] longs = new VolatileLong2[NUM_THREADS]; // 7650

    • VolatileLong3耗时:7385毫秒

    private static VolatileLong3[] longs = new VolatileLong3[NUM_THREADS]; // 7650

    public class FalseSharing implements Runnable {
        public final static int NUM_THREADS = 4; // change
        public final static long ITERATIONS = 500L * 1000L * 1000L;
        private final int arrayIndex;
        //    private static VolatileLong[] longs = new VolatileLong[NUM_THREADS];          // 31028
        private static VolatileLong2[] longs = new VolatileLong2[NUM_THREADS];              // 7650
        //    private static VolatileLong3[] longs = new VolatileLong3[NUM_THREADS];        // 7385
        static {
            for (int i = 0; i < longs.length; i++) {
                longs[i] = new VolatileLong2();
            }
            VolatileLong volatileLong = new VolatileLong();
            VolatileLong2 volatileLong2 = new VolatileLong2();
            VolatileLong3 volatileLong3 = new VolatileLong3();
    
            System.out.println(ClassLayout.parseInstance(volatileLong).toPrintable());
            System.out.println(ClassLayout.parseInstance(volatileLong2).toPrintable());
            System.out.println(ClassLayout.parseInstance(volatileLong3).toPrintable());
        }
    
        public FalseSharing(final int arrayIndex) {
            this.arrayIndex = arrayIndex;
        }
    
        public static void main(final String[] args) throws Exception {
            long start = System.currentTimeMillis();
            runTest();
            System.out.println("duration = " + (System.currentTimeMillis() - start));
        }
    
        private static void runTest() throws InterruptedException {
            Thread[] threads = new Thread[NUM_THREADS];
            for (int i = 0; i < threads.length; i++) {
                threads[i] = new Thread(new FalseSharing(i));
            }
            for (Thread t : threads) {
                t.start();
            }
            for (Thread t : threads) {
                t.join();
            }
        }
    
        @Override
        public void run() {
            long i = ITERATIONS + 1;
            while (0 != --i) {
                longs[arrayIndex].value = i;
            }
        }
    
        public final static class VolatileLong {
            public volatile long value = 0L;
        }
    
        // long padding避免false sharing
        public final static class VolatileLong2 {
            volatile long p0, p1, p2, p3, p4, p5, p6;
            public volatile long value = 0L;
            volatile long q0, q1, q2, q3, q4, q5, q6;
        }
    
        /**
         * jdk8新特性,Contended注解避免false sharing
         * 需要加参数运行: -XX:-RestrictContended
         */
        @sun.misc.Contended
        public final static class VolatileLong3 {
            public volatile long value = 0L;
        }
    }
    
    • ClassLayout内存布局分析

    开启了指针压缩,markword+classporint+padding,VolatileLong占用了24bytes,不满足CacheLine在大多数机器上的64字节的条件,volatile又是线程可见的,不同的线程修改了之后,需要让别的线程看到,在不同的CacheLine

    • ClassLayout2内存布局分析

    markword+classporint+padding+(p+q自主)padding占用136bytes,可以分布到不同的CacheLine上;

    • ClassLayout3内存布局分析

    markword+classporint+padding+(p+q自主)padding占用280bytes,可以分布到不同的CacheLine上;

    com.yangsc.juc.FalseSharing$VolatileLong object internals:
     OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
          0     4        (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
          4     4        (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
          8     4        (object header)                           c1 c1 00 f8 (11000001 11000001 00000000 11111000) (-134168127)
         12     4        (alignment/padding gap)                  
         16     8   long VolatileLong.value                        0
    Instance size: 24 bytes
    Space losses: 4 bytes internal + 0 bytes external = 4 bytes total
    
    com.yangsc.juc.FalseSharing$VolatileLong2 object internals:
     OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
          0     4        (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
          4     4        (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
          8     4        (object header)                           47 c1 00 f8 (01000111 11000001 00000000 11111000) (-134168249)
         12     4        (alignment/padding gap)                  
         16     8   long VolatileLong2.p0                          0
         24     8   long VolatileLong2.p1                          0
         32     8   long VolatileLong2.p2                          0
         40     8   long VolatileLong2.p3                          0
         48     8   long VolatileLong2.p4                          0
         56     8   long VolatileLong2.p5                          0
         64     8   long VolatileLong2.p6                          0
         72     8   long VolatileLong2.value                       0
         80     8   long VolatileLong2.q0                          0
         88     8   long VolatileLong2.q1                          0
         96     8   long VolatileLong2.q2                          0
        104     8   long VolatileLong2.q3                          0
        112     8   long VolatileLong2.q4                          0
        120     8   long VolatileLong2.q5                          0
        128     8   long VolatileLong2.q6                          0
    Instance size: 136 bytes
    Space losses: 4 bytes internal + 0 bytes external = 4 bytes total
    
    com.yangsc.juc.FalseSharing$VolatileLong3 object internals:
     OFFSET  SIZE   TYPE DESCRIPTION                               VALUE
          0     4        (object header)                           01 00 00 00 (00000001 00000000 00000000 00000000) (1)
          4     4        (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
          8     4        (object header)                           05 c2 00 f8 (00000101 11000010 00000000 11111000) (-134168059)
         12   132        (alignment/padding gap)                  
        144     8   long VolatileLong3.value                       0
        152   128        (loss due to the next object alignment)
    Instance size: 280 bytes
    Space losses: 132 bytes internal + 128 bytes external = 260 bytes total
    

    参考文档写的比我好,想了解更多,请移步到参考连接文章。

    参考

    从Java视角理解伪共享(False Sharing)

    从Java视角理解CPU缓存(CPU Cache)

    理解CPU-Cache


    你的鼓励也是我创作的动力

    打赏地址

  • 相关阅读:
    机器学习Python包
    Linux网卡的相关配置总结
    [转]Ajax跨域请求
    [转]git在eclipse中的配置
    java代码运行linux shell操作
    虚拟机NAT模式无法上网问题的解决办法
    [转]关于网络通信,byte[]和String的转换问题
    ARP协议工作流程
    pycharm常用快捷键总结
    数据挖掘主要解决的四类问题
  • 原文地址:https://www.cnblogs.com/yangsanchao/p/13062898.html
Copyright © 2020-2023  润新知