• 核中汇编写的字符串函数代码分析


    *************************************************************** 
    开始啃用汇编写的字符串函数: 
    *************************************************************** 
    --------------------------------------------------------------- 
    _I386_STRING_H_宏 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #ifndef _I386_STRING_H_ 
    #define _I386_STRING_H_ 
    当包括了该汇编写的字符串处理函数的头文件后,就定义这个宏予以说明。 
    --------------------------------------------------------------- 
    __KERNEL__宏 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #ifdef __KERNEL__ 
    #include <linux/config.h> 
    注意: 
    只有定义的了__KERNEL__宏才会包含config.h头文件。 
    /* 
    * On a 486 or Pentium, we are better off not using the 
    * byte string operations. But on a 386 or a PPro the 
    * byte string ops are faster than doing it by hand 
    * (MUCH faster on a Pentium). 
    */ 
    下面这段注释很重要,建议看看: 
    /* 
    * This string-include defines all string functions as inline 
    * functions. Use gcc. It also assumes ds=es=data space, this *should be normal. Most of the string-functions are rather *heavily hand-optimized, 
    * see especially strsep,strstr,str[c]spn. They should work, but are not 
    * very easy to understand. Everything is done entirely within the register 
    * set, making the functions fast and clean. String instructions have been 
    * used through-out, making for "slightly" unclear code :-) 

    * NO Copyright (C) 1991, 1992 Linus Torvalds, 
    * consider these trivial functions to be PD. 
    */ 

    /* AK: in fact I bet it would be better to move this stuff all out of line. */ 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRCPY strcpy() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRCPY 
    static inline char * strcpy(char * dest,const char *src) 

    int d0, d1, d2; 
    __asm__ __volatile__( 
    "1:\tlodsb\n\t" 
    "stosb\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b" 
    : "=&S" (d0), "=&D" (d1), "=&a" (d2) 
    :"0" (src),"1" (dest) 
    : "memory"); 
    return dest; 


    分析: 
    1.改写指令更清楚点: 
    1: ---> 1: 
    lodsb ---> mov al,ds:[si] 
    inc si 
    stosb ---> mov es:[di],al 
    inc di 
    testb al,al ---> test al,al 
    jne 1 ---> jne 1 
    明显该循环以0结束,当读到最后一个为0的字节后,该循环终止。 

    2.参数分析: 
    S: si/esi 
    &: 一般情况下,gcc会把输入操作数和输出操作数分配在同一个寄存器中,因为它假设在输出产生之前所有的输入都被消耗掉了。在输出操作数之前加上"&",可以保证输出操作数不会覆盖掉输入,即gcc将为此输出操作数分配一个输入操作数还没使用的寄存器,除非特殊声明(如用数字0-9,见下面) 

    0-9: 指定一个操作数,它既作输入,又作输出,而且输入操作数和输出操作数占据同一个位置(寄存器)。数字标志只能出现在输入中,指出与第I个输出操作数占据同一个位置。 

    int d0, d1, d2; 
    "=&S" (d0), "=&D" (d1), "=&a" (d2) 
    "0" (src),"1" (dest) 
    代码分析: 
    该输入操作数src和dst是既用作为输入操作数,又用作输出操作数的。在最开始时,src,dest作为整个函数的入口参数。将src,dest这两个char*型指针送入si/esi,di/edi中。在"0"与"1"的作用下,src与d0占据同一个寄存器si/esi,dst与d1占据同一个寄存器di/edi,所以d0,d1将分别从si/esi,di/edi中取出src,dest存入其中的函数入口参数,从而实现了将参数转移到函数局部变量上来。在函数的执行中si/esi,di/edi寄存器发生了变化。最后函数执行完毕返回时。由于src,dest前面指定的"0"和"1"说明了src,dest是既用作为输入操作数,又用作输出操作数的。且又分别与第0,1个输出操作数d0,d1占据同一个寄存器si/esi,di/edi。且又在"&"的保护下,明确指明输出操作数不能覆盖输入操作数,所以src,dest分别存入si/esi,di/edi中作为输出。 

    D: di/edi 
    a: ax/eax 
    "memory": 这是register-modified部分。说明内存修改不可预测,禁止编译器将其值缓存于寄存器中。 

    3.指令分析: 
    lodsb: == mov al,[si] 
    inc si / dec si 
    stosb: == mov es:[di],al 
    inc di / dec di 
    testb: == test oprd1,oprd2 
    把oprd1 & oprd2指令执行后,设置标志ZF,PF,SF. 

    --------------------------------------------------------------- 
    __HAVE_ARCH_STRNCPY strncpy() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRNCPY 
    static inline char * strncpy(char * dest,const char *src,size_t count) 

    int d0, d1, d2, d3; 
    __asm__ __volatile__( 
    "1:\tdecl %2\n\t" 
    "js 2f\n\t" 
    "lodsb\n\t" 
    "stosb\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b\n\t" 
    "rep\n\t" 
    "stosb\n" 
    "2:" 
    : "=&S" (d0), "=&D" (d1), "=&c" (d2), "=&a" (d3) 
    :"0" (src),"1" (dest),"2" (count) 
    : "memory"); 
    return dest; 


    指令重排: 
    1: decl ecx ===> 1: dec cx 
    js 2 ===> js 2 
    lodsb ===> mov al,ds:[si] 
    inc si / dec si 
    stosb ===> mov es:[di],al 
    inc di /dec si 
    testb al,al ===> test al,al 
    jne 1 ===> jne 1 

    rep ===> rep 
    stosb ===> mov es:[di],al 
    inc di /dec si 
    2: ===> 2: 

    分析: 
    对这段代码的分析分3种情况: 
    若内存中为: abcde\0, 
    1)要求复制3个字符: 
    (1)初始值CX == 3 
    然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
    3-->2: copy a 
    2-->1: copy b 
    1-->0: copy c 
    0-->-1 js 2 

    2)要求复制5个字符: 
    (1)初始值CX == 5 
    然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
    5-->4: copy a 
    4-->3: copy b 
    3-->2: copy c 
    2-->1: copy d 
    1-->0: copy e 
    0-->-1 js 2 
    (2)所以复制5个字符: 复制5个字符:5个字符. 

    3)要求复制6个字符: 
    (1)初始值CX == 6 
    然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
    6-->5: copy a 
    5-->4: copy b 
    4-->3: copy c 
    3-->2: copy d 
    2-->1: copy e 
    1-->0: copy \0 
    test al,al ===> al == \0 ZF == 1成立. 
    jne 1 ===> 不会跳转到1 

    继续往下执行:此时CX == 0,al == \0 
    rep: 判断CX是否为0,而cx == 0,就结束循环 
    (2)所以复制6个字符: 复制6个字符:5个字符+一个'\0'. 

    4)要求复制10个字符: 
    初始值CX == 10 
    然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
    10-->9: copy a 
    9-->8: copy b 
    8-->7: copy c 
    7-->6: copy d 
    6-->5: copy e 
    5-->4: copy \0 
    test al,al ===> al == \0 ZF == 1成立. 
    jne 1 ===> 不会跳转到1 

    继续往下执行:此时CX == 4,al == \0 
    rep : CX==4,CX!=0,(CX=CX-1)==3,继续往下执行 
    copy al == \0 
    重复循环: 
    rep : CX==3,CX!=0,(CX=CX-1)==2,继续往下执行 
    copy al == \0 
    重复循环: 
    rep : CX==2,CX!=0,(CX=CX-1)==1,继续往下执行 
    copy al == \0 
    重复循环: 
    rep : CX==1,CX!=0,(CX=CX-1)==0,继续往下执行 
    copy al == \0 
    重复循环:rep: cx==0,就结束循环 
    (2)所以复制10个字符,先复制6个字符:5个字符+一个'\0',再填充4个'\0' 

    5)要求复制0个字符: 
    (1)初始值CX == 0 
    0-->-1 js 2 
    (2)所以复制了0个字符。 

    6)要求复制-1个字符: 
    (1)初始值CX == -1 
    -1-->-2 js 2 
    (2)所以复制了0个字符。 
    注意: 
    static inline char * strncpy(char * dest,const char *src,size_t count),该函数中的count是送往cx/ecx中去了,而ecx最大为32位故对有符号数最多复制2G-1个字节,即字符串不能超过(2G-1)B。 
    当时产生疑问,当CX<=0时,都是不复制,为何不干脆用个无符号数,这样可以扩大到4G。请看下一个函数就解决了。因为当要把两个字符串串联起来时,也是用ECX作为计数器的,而ECX为32位,最大表示范围为4G-1,所以这两个字符串的长度就各分了一半为2G-1. 
    rep指令说明: 
    重复其后面的串操作指令动作,每一次重复都先判断CX是否为0,如为0就结束循环,否则CX的值减1。 
    类似于loop指令,但loop指令是先把CX的值减1,后再来判断是否为0。 
    注意在重复过程中的减一操作,不会影响各标志。 
    --------------------------------------------------------------- 
    strcat() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRCAT 
    static inline char * strcat(char * dest,const char * src) 

    int d0, d1, d2, d3; 
    __asm__ __volatile__( 
    "repne\n\t" 
    "scasb\n\t" 
    "decl %1\n" 
    "1:\tlodsb\n\t" 
    "stosb\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b" 
    : "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
    : "0" (src), "1" (dest), "2" (0), "3" (0xffffffffu) 
    :"memory"); 
    return dest; 

    指令重排: 
    repne ===> while(ECX != 0 && ZF != 1) 
    scasb ===> { 
    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ECX--; 


    decl %1 ===> dec edi 
    1: ===> 1: 
    lodsb ===> mov al, ds:[esi] 
    inc esi 
    stosb ===> mov es:[edi], al 
    inc edi 
    testb %%al,%%al ===> test al, al 
    jne 1 ===> jne 1 

    参数初始值分析: 
    : "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
    : "0" (src), "1" (dest), "2" (0), "3" (0xffffffffu) 
    src ==> si/esi 此处为: esi 
    dest ==> di/edi 此处为: edi 
    0 ==> ax/eax 此处为: ax 
    0xffffffffu ===> ecx 此处为: ecx 
    所以,esi,edi指向两个字符串的起始位置;而ax==0;ecx==0xffffffffu 

    一般情况分析: 
    初始值: 
    esi--->'abc\0' (src) 
    edi--->'123\0' (dest) 
    al == 0 
    ecx == 0xffffffffu 
    while(ECX != 0 && ZF != 1) 

    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ECX--; 

    在edi所指向的字符串中一直找到以'\0'结束的地方。然后,edi指向'\0'字节的下一个字节,ECX--;再就循环结束。此时edi=edi+4;ECX=ECX-4。 

    说明:可见要么在es:[edi]所指向的字符串中找到为'\0'的字符,从而能结束循环。要么该字符串大于或等于0xffffffff(2G-1B)(不计结尾处的'\0'),使得ECX减为0,从而结束循环。 

    dec edi 
    edi = edi - 1;edi就指向es:[edi]所指向的字符串中的'\0'结束处字符。 

    此时寄存器的值为: 
    esi--->'abc\0' (src) 
    edi--->'123\0'中的为'\0'结尾处字符 (dest) 
    al == 0 
    ecx == 0xffffffffbu 

    1: 
    mov al, ds:[esi] 
    inc esi 
    mov es:[edi], al 
    inc edi 
    test al, al 
    jne 1 
    将ds:[esi]所指向的字符串复制到es:[edi]所指向的字符串的结尾处,从es:[edi]所指向字符串的'\0'处开始。该'\0'被覆盖。 

    esi--->'abc\0?'中的'?'处. (src) 
    edi--->'123abc\0?'中的最后为'?'结尾处字符 (dest) 
    al == 0 ,注意这个0是从esi所指向的字符串中取出的结尾字符,而非初始化的0 

    功能:strcat(char * dest,const char * src),将src所指向的字符串复制到dest所指向的字符串的后面,将dest的'\0'覆盖,dest-src串成一个字符串后,再将src的'\0'复制过来使dest-src串结的字符串结束。 

    算法过程: 
    1.先扫描dest所指向的字符串,找到其的为'\0'处; 
    2.再从src所指向的字符串中一一将src所指向的字符串的各个字节复制到dest以'\0'为起始处。一直复制到src所指向的字符串的最后一个'\0',将这个'\0'复制完后。就结束程序。 
    可见,该函数要求src,dest所向的字符串要以'\0'结束。 

    特殊情况1: 
    初始值: 
    esi--->'abc\0' (src) 
    edi--->'123456789... ...YX' 该字符串>=0xffffffff (dest) 
    设edi指向es这个段的开始处,为0基址。 
    即:edi[0]=='1',edi[0xffffffff]=='X',由于edi只有32位,表示范围为0X0--->0xffffffff,共4G个字符。所以就算该字符串有多于4G的字符,esi将无法引用,所以该edi所指向的字符串到edi[0xffffffff]=='X'止。字符再多,edi再++,edi又变为了0。 
    esi的分析也同此。 
    al == 0 
    ecx == 0xffffffffu 
    while(ECX != 0 && ZF != 1) 

    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ECX--; 

    循环体执行0xffffffff次 
    由于edi所指向的字符串>=0xffffffff,则在上面的寻找edi所指向的字符串的'\0'结束符时候,就会使ECX == 0,从而结束循环,此时edi指向(0xffffffff)处的字节。(不考虑段越界) 
    出循环时,ECX == 0,edi == 0xffffffff。 

    dec edi 
    edi = edi - 1;edi == 0xffffffff-1,即:edi[0xffffffff-1]=='Y'。 

    此时寄存器的值为: 
    esi--->'abc\0' (src) 
    edi--->'123456......YX',edi==0xffffffff-1,edi就指向edi[0xffffffff-1]=='Y'(即:0xffffffff-1)处的字节 (dest) 
    al == 0 
    ecx == 0x00000000u 

    1: 
    mov al, ds:[esi] 
    inc esi 
    mov es:[edi], al 
    inc edi 
    test al, al 
    jne 1 
    将ds:[esi]所指向的字符串'abc\0'中的esi[0]=='a'复制到es:[edi]==es:edi[0xffffffff-1]=='Y'处。该es:[0xffffffff-1]=='Y'的字节'Y'被覆盖为'a'。即:esi[0]=='a'--->edi[0xffffffff-1]=='Y' 
    edi--->'123456......aX'。 
    这时,esi++,esi[1]=='b';edi++,edi[0xffffffff]=='X'。 

    再从ds:[esi]中复制下一个esi[1]=='b',到edi[0xffffffff]=='X' 
    edi--->'123.....ab',edi++,edi==0x00000000,就指向edi[0]=='1'处的字节 
    esi++,esi[2]=='c'.esi--->'abc\0?'中的'c'处, (src) 

    再从esi[2]=='c',复制到edi[0x00000000]=='1'处。 
    esi++,esi[3]=='\0',esi--->'abc\0?'中的'\0'处. (src) 
    edi--->'c23.....ab',edi++,edi==0x00000001,就指向edi[0x00000001]=='2'处的字节 

    再从esi[3]=='\0',复制到edi[0x00000001]=='2'处。 
    esi++,esi[4]=='?',esi--->'abc\0?'中的'?'处. (src) 
    edi--->'c\03.....ab',edi++,edi==0x00000002,就指向edi[0x00000002]=='3'处的字节。 

    所以合并后的字符串为"c\0". 

    与此类似,当src中的字符等于4G时,情况同上;而当src,dest均等于4G时,情况也同上。 
    只要src,dest中的字符之和不大于4G-1,留一个给'\0',就OK! 

    当src,dest中有一个或多个为空时,情况简单: 
    当dest为空,而src不为空:将src所指向的字符串连同'\0'复制到dest中去! 
    当src为空,而dest不为空:dest不动,只将src所指的'\0',复制并覆盖dest中的最后一个'\0'! 
    当src为空,而dest为空:只将src所指的'\0',复制并覆盖dest中那个'\0'! 

    参考资料: 
    S:si/esi 
    D:di/edi 
    a:ax/eax 
    c:cx/ecx 
    &: 一般情况下,gcc会把输入操作数和输出操作数分配在同一个寄存器中,因为它假设在输出产生之前所有的输入都被消耗掉了。在输出操作数之前加上"&",可以保证输出操作数不会覆盖掉输入,即gcc将为此输出操作数分配一个输入操作数还没使用的寄存器,除非特殊声明(如用数字0-9,见下面) 

    0-9: 指定一个操作数,它既作输入,又作输出,而且输入操作数和输出操作数占据同一个位置(寄存器)。数字标志只能出现在输入中,指出与第I个输出操作数占据同一个位置。 
    --------------------------------------------------------------- 
    strncat() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRNCAT 
    static inline char * strncat(char * dest,const char * src,size_t count) 

    int d0, d1, d2, d3; 
    __asm__ __volatile__( 
    "repne\n\t" 
    "scasb\n\t" 
    "decl %1\n\t" 
    "movl %8,%3\n" 
    "1:\tdecl %3\n\t" 
    "js 2f\n\t" 
    "lodsb\n\t" 
    "stosb\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b\n" 
    "2:\txorl %2,%2\n\t" 
    "stosb" 
    : "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
    : "0" (src),"1" (dest),"2" (0),"3" (0xffffffffu), "g" (count) 
    : "memory"); 
    return dest; 

    指令重排: 
    repne ===> while(ecx != 0 && ZF != 1) 
    scasb ===> { 
    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ecx--; 

    decl %1 ===> decl edi 
    movl %8,%3 ===> movl count,ecx 
    1: ===> 1:
    decl %3 ===> decl ecx
    js 2 ===> js 2
    lodsb ===> mov al,ds:[esi]
    inc esi 
    stosb ===> mov es:[edi],al 
    inc edi 
    testb %%al,%%al ===> test al,al
    jne 1 ===> jne 1
    2: ===> 2:
    xorl %2,%2 ===> xor eax,eax
    stosb ===> mov es:[edi],al 
    inc edi
    参数初始值分析: 
    : "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
    : "0" (src),"1" (dest),"2" (0),"3" (0xffffffffu), "g" (count) 
    esi: esi = src 
    edi: edi = dest 
    eax: eax = 0 
    ecx: ecx = 0xffffffff 
    "g": 让编译器决定如何装入它。 

    代码分析: 
    while(ecx != 0 && ZF != 1) 

    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ecx--; 

    decl edi 
    在es:[edi]所指向的字符串中寻找'\0'处。然后回调edi指向该'\0'。 
    当该字符串在4G-1个字节中时,以'\0'正常结束。而当该字符串等于4G时,以ecx==0结束循环,edi回调后指向edi[0xffffffff-1]处。而字符串大于4G则不可能。 

    movl count,ecx 
    1:
    decl ecx
    js 2
    mov al,ds:[esi]
    inc esi 
    mov es:[edi],al 
    inc edi 
    test al,al
    jne 1
    2:
    xor eax,eax
    mov es:[edi],al 
    inc edi

    1:表示开始复制esi所指向的字符串到edi中去。 
    2:表示复制结束后,在未尾再加个'\0'。 
    分情况讨论: 
    1)若count数大于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串连同'\0'复制过了后,结束1:循环,在2:中再在'\0'的后面再复制一个'\0',再edi++,结束程序。 

    2)若count数小于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串中只复制count个后,ecx将减为-1后,由js 2跳出1:,在2:中接着再在后面复制一个'\0',再edi++,结束程序。 

    3)若count等于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串中复制count个后,ecx将减为0后,再在开始处ecx--,ecx == -1, 由js 2跳出1:,在2:中接着再在后面复制一个'\0',再edi++,结束程序。 

    4)若count为负数,在开始处ecx--,ecx == 负数, 由js 2跳出1:,在2:中接着再在后面复制一个'\0',即给edi所指向的字符串的那个'\0'再用'\0'重写一遍'\0',再edi++,结束程序。 

    尽管可以复制4G个字节,由于count为有符号数,则最多复制2G-2(除掉'\0')个字节。这显然是假设es:[edi]这个字符串最大为2G而来的,因为作者也不知道es:[edi]所指向的字符串有多长,虽然大部分不可能有2G,只有点点大,但作者却是作了最一般化的处理。 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRCMP strcmp() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRCMP 
    static inline int strcmp(const char * cs,const char * ct) 

    int d0, d1; 
    register int __res; 
    __asm__ __volatile__( 
    "1:\tlodsb\n\t" 
    "scasb\n\t" 
    "jne 2f\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b\n\t" 
    "xorl %%eax,%%eax\n\t" 
    "jmp 3f\n" 
    "2:\tsbbl %%eax,%%eax\n\t" 
    "orb $1,%%al\n" 
    "3:" 
    :"=a" (__res), "=&S" (d0), "=&D" (d1) 
    :"1" (cs),"2" (ct) 
    :"memory"); 
    return __res; 


    初始值分析: 
    ax/eax:register int __res; 
    si/esi:const char* cs; 
    di/edi:const char* ct; 
    ZF == 0 

    指令重排: 
    1: lodsb ===> 1: mov al,ds:[esi]
    inc esi 
    scasb ===> if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    jne 2 ===> jne 2; 
    testb %%al,%%al ===> testb al,al 
    jne 1 ===> jne 1 
    xorl %%eax,%%eax ===> xorl eax,eax 
    jmp 3 ===> jmp 3 
    2: sbbl %%eax,%%eax ===> 2: sbbl eax,eax 
    orb $1,%%al ===> orb al ,1 
    3: ===> 3: 

    1)代码剖析: 
    这是比较ds:[esi]和es:[edi]两个字符串是否相等。这两个字符串当以'\0'结束。函数返回值存放在eax中。将ds:[esi]中的每个字符送往al中,再与es:[edi]中的相应的各个字符进行比较,相同就置位ZF=1,然后测试al该字符是否为'\0',如果不是则继续比较下一个字符;如果是'\0',则就清eax为0,结束比较函数,该eax就为函数的返回值。 

    2)情况: 
    1.ds:[esi]和es:[edi]两个字符串是相等:同上,eax返回0 
    2.ds:[esi]和es:[edi]两个字符串不相等: 
    (1)ds:[esi]的字符串ASCII小于es:[edi]的ASCII 
    ds:[esi]=="abc\0" 
    es:[edi]=="xyz\0" 
    if((al-es:[edi])==0) ===>if( ('a'-'x')==0 ) 
    ZF = 1; 条件不成立; CF == 1 
    edi++; edi++; edi指向'y' 
    jne 2 ; jne 2 
    2: sbbl eax,eax eax = eax-eax-CF=-1=0xffffffff 
    orb al ,1 al = 0xff 

    结论: 
    cs所指的字符串中第一个不同的字符的ASCII<ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0xffffffff==-1 

    (2)ds:[esi]的字符串ASCII大于es:[edi]的ASCII 
    ds:[esi]=="xyz\0" 
    es:[edi]=="abc\0" 
    if((al-es:[edi])==0) ===>if( ('x'-'a')==0 ) 
    ZF = 1; 条件不成立; CF == 0 
    edi++; edi++; edi指向'y' 
    jne 2 ; jne 2 
    2: sbbl eax,eax eax = eax-eax-CF=0 
    orb al ,1 al = 0|1=1=0x00000001 
    输出: eax==0x00000001 
    结论: 
    cs所指的字符串第一个不同的字符的ASCII>ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0x00000001==1 

    (3)当其中一个字符串是另一个字符串的子字符串时: 
    ds:[esi]=="abc\0" 
    es:[edi]=="abc123\0" 
    当比较到'\0'-'1'时,结束循环,返回-1. 
    而是这种情况时候 : 
    ds:[esi]=="abc123\0" 
    es:[edi]=="abc\0" 
    当比较到'1'-'\0'时,结束循环,返回1. 

    (4)若其中一个为无限长的字符串,另一个为有限长的字符串时: 
    则要么在其中的一个位置不同,跳出来同上面的分析;要么一个相当于为另一个的子字符串,分析同上。 
    可见,只要一个字符串符合以'\0'结束的规则,另一个字符串就算没有'\0'结束,也能正常终止函数。 

    (5)两个字符串均为无限长的字符串: 
    若两者在中间某处不等,就终止跳出,分析同上。 
    若两者完全相等且又无限长,则就地直比较下去。esi,edi将递增到0xffffffff,然后又回到0x00000000。若两字符串是从0x00000000开始的话,就又重复比较下去,一个死循环。若两字符串是从中间某处开始,这个内存中的0x00000000开始处或其后面有不同的字符,就会终止函数。 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRNCMP strncmp() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRNCMP 
    static inline int strncmp(const char * cs,const char * ct,size_t count) 

    register int __res; 
    int d0, d1, d2; 
    __asm__ __volatile__( 
    "1:\tdecl %3\n\t" 
    "js 2f\n\t" 
    "lodsb\n\t" 
    "scasb\n\t" 
    "jne 3f\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b\n" 
    "2:\txorl %%eax,%%eax\n\t" 
    "jmp 4f\n" 
    "3:\tsbbl %%eax,%%eax\n\t" 
    "orb $1,%%al\n" 
    "4:" 
    :"=a" (__res), "=&S" (d0), "=&D" (d1), "=&c" (d2) 
    :"1" (cs),"2" (ct),"3" (count) 
    :"memory"); 
    return __res; 

    初始值: 
    ax/eax:__res 
    si/esi:const char * cs 
    di/edi:const char * ct 
    cx/ecx:count 

    指令重排: 
    1: decl %3 ===> 1: decl ecx 
    js 2 ===> js 2 
    lodsb ===> mov al,ds:[esi]
    inc esi 
    scasb ===> if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    jne 3 ===> jne 3 
    testb %%al,%%al ===> testb al,al 
    jne 1 ===> jne 1 
    2: xorl %%eax,%%eax ===> 2: xorl eax,eax 
    jmp 4 ===> jmp 4 
    3: sbbl %%eax,%%eax ===> 3: sbbl eax,eax 
    orb $1,%%al ===> orb 1,al 
    4: ===> 4: 

    此函数分析同上: 
    1)当指定的要比较的字符个数小于两个字符串长度时: 
    a:两字符串相同:ecx变为-1,由js 2出循环,再由xorl eax,eax将eax清0,作为函数的返回值返回。 
    b:两字符串不相同:由jne 3跳出来: 
    b-1:当cs所指的字符串第一个不同的字符的ASCII>ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0x00000001==1; 
    b-2:当cs所指的字符串中第一个不同的字符的ASCII<ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0xffffffff==-1 

    2)当指定的要比较的字符个数count等于两个字符串长度时: 
    a:两者相等时: 
    由testb al,al跳出循环,再由xorl eax,eax,将eax清0,返回这个0,结束函数。 
    b:两者不相等时: 
    同上分析。 

    3)当指定的要比较的字符个数count大于两个字符串时: 
    a:两者相等时: 
    比较到'\0'时,由testb al,al跳出循环,再由xorl eax,eax,将eax清0,返回这个0,结束函数。 
    b:两者不相等时: 
    同上分析。 

    4)当指定的要比较的字符个数count<=0时: 
    程序流程如下: 
    根本就不比较,直接返回0,结束函数。 
    1: decl %3 ===> 1: decl ecx 
    js 2 ===> js 2 
    ... ... 
    2: xorl %%eax,%%eax ===> 2: xorl eax,eax 
    jmp 4 ===> jmp 4 
    ... ... 
    4: ===> 4: 

    --------------------------------------------------------------- 
    __HAVE_ARCH_STRCHR strchr() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRCHR 
    static inline char * strchr(const char * s, int c) 

    int d0; 
    register char * __res; 
    __asm__ __volatile__( 
    "movb %%al,%%ah\n" 
    "1:\tlodsb\n\t" 
    "cmpb %%ah,%%al\n\t" 
    "je 2f\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b\n\t" 
    "movl $1,%1\n" 
    "2:\tmovl %1,%0\n\t" 
    "decl %0" 
    :"=a" (__res), "=&S" (d0) 
    :"1" (s),"0" (c) 
    :"memory"); 
    return __res; 


    初始值: 
    ax/eax:int c 
    si/esi:const char *s 

    指令重排: 
    movb %%al,%%ah ===> movl al,ah 
    1: lodsb ===> 1: mov al,ds:[esi]
    inc esi 
    cmpb %%ah,%%al ===> cmpb ah,al
    je 2 ===> je 2 
    testb %%al,%%al ===> testb al,al 
    jne 1 ===> jne 1 
    movl $1,%1 ===> movl 1,esi 
    2: movl %1,%0 ===> 2: movl esi,eax 
    decl %0 ===> decl eax 

    功能: 
    ds:[esi]所指向的字符串以'\0'结束,在其中从前往后寻找c字符。如果找到,就返回该字符所在字符串中的位置。如果没找到,就返回0。 

    改写成C语言: 
    al == 要找寻的字符c; 
    esi == 该字符串的起始偏移地址; 
    int eax; 
    char ah; 
    ah = al; 
    1: 
    al = *(ds*16 + esi); 
    esi++; 
    if( al == ah ) 
    goto 2; 
    if( al != 0 ) 
    goto 1; 
    esi = 1; 
    2: 
    eax = esi; 
    eax--; 
    return eax; 

    极端情况: 
    如果ds:[esi]所指向的字符串不以'\0'结束的话,esi一个劲的++,直到变到0xffffffff,然后又变为0x00000000,又从头开始寻找,如果开头及到ds:[esi]处都找不到该字符c,或是也没有'\0'时,就陷入一个死循环。 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRRCHR strrchr() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRRCHR 
    static inline char * strrchr(const char * s, int c) 

    int d0, d1; 
    register char * __res; 
    __asm__ __volatile__( 
    "movb %%al,%%ah\n" 
    "1:\tlodsb\n\t" 
    "cmpb %%ah,%%al\n\t" 
    "jne 2f\n\t" 
    "leal -1(%%esi),%0\n" 
    "2:\ttestb %%al,%%al\n\t" 
    "jne 1b" 
    :"=g" (__res), "=&S" (d0), "=&a" (d1) 
    :"0" (0),"1" (s),"2" (c) 
    :"memory"); 
    return __res; 



    初始值分析: 
    __res : 0 
    si/esi : const char * s 
    ax/eax : c 

    指令重排: 
    movb %%al,%%ah ===> movb al,ah 
    1: lodsb ===> 1: mov al,ds:[esi]
    inc esi 
    cmpb %%ah,%%al ===> cmpb ah,al 
    jne 2 ===> jne 2 
    leal -1(%%esi),%0 ===> leal [esi-1],__res(g) 
    2: testb %%al,%%al ===> 2: testb al,al 
    jne 1 ===> jne 1 
    本函数分析类似上面的strchr()。只不过是找到在const char *s所指向的字符串c出现的最后的位置。找到了,返回其所在地址;没找到,返回0。分析类似上面的strchr(),不再重复。 
    strrchr - Find the last occurrence of a character in a string. 

    如果s为空指针,则后果无法预料。 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRLEN strlen() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRLEN 
    static inline size_t strlen(const char * s) 

    int d0; 
    register int __res; 
    __asm__ __volatile__( 
    "repne\n\t" 
    "scasb\n\t" 
    "notl %0\n\t" 
    "decl %0" 
    :"=c" (__res), "=&D" (d0) 
    :"1" (s),"a" (0), "0" (0xffffffffu) 
    :"memory"); 
    return __res; 


    参数初始值分析: 
    di/edi:const char * s 
    ax/eax:0 
    cx/ecx:0xffffffff 
    size_t ecx = 0xffffffff; 
    ZF = 0; 
    char * edi = s; 
    指令重排: eax = 0; 
    repne ===> while(ecx != 0 && ZF == 0) 
    scasb ===> { 
    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ecx--; 

    notl %0 ===> ecx = !ecx; 
    decl %0 ===> ecx--; 

    此处函数主要是ecx = !ecx,由于ecx是从0xffffffff递减下来的。记住:递减计数和递增计数是一样的计数,只要在最后,取个反,就让两者相互转化了。在递减计数或递增计数过程中多计数了的值,在最后取反后,要(转化后的数--)。 

    至于各种情况分析,很简单,同前,无须多说。 
    而对于极端情况分析,edi++,ecx--到0xfffffffff--->0x00000000,情况同前。 

    参考: 
    typedef unsigned int __kernel_size_t; 
    typedef __kernel_size_t size_t; 
    --------------------------------------------------------------- 
    __memcpy() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    static inline void * __memcpy(void * to, const void * from, size_t n) 

    int d0, d1, d2; 
    __asm__ __volatile__( 
    "rep ; movsl\n\t" 
    "movl %4,%%ecx\n\t" 
    "andl $3,%%ecx\n\t" 
    #if 1 /* want to pay 2 byte penalty for a chance to skip microcoded rep? */ 
    "jz 1f\n\t" 
    #endif 
    "rep ; movsb\n\t" 
    "1:" 
    : "=&c" (d0), "=&D" (d1), "=&S" (d2) 
    : "0" (n/4), "g" (n), "1" ((long) to), "2" ((long) from) 
    : "memory"); 
    return (to); 


    参数初始值: 
    cx/ecx:n/4 
    di/edi:to 
    si/esi:from 


    指令重排: ecx = n/4; 
    rep ===> while( ecx-- != 0 ) 
    movsl ===> (long)ds:[esi] = (long)es:[edi]; 
    movl %4,%%ecx ===> ecx = n; 
    andl $3,%%ecx ===> ZF = ecx & 0x00000003 
    #if 1 
    jz 1 ===> if(ZF==0) goto 1; 
    #endif 
    rep ===> while( ecx-- != 0 ) 
    movsb ===> (char)ds:[esi] = (char)es:[edi]; 
    1: ===> 1:

    分析: 
    1.先进行4B为单位的复制: 
    ecx = n/4;然后就开始复制。 
    2.求出ecx = ecx % 4;对不足4B的字节进行复制。 
    ZF = ecx & 0x00000003; 
    以上为一般情况分析。 

    3.如果 0< n <4: 
    则ecx = n/4 == 0; 
    if( ecx-- !=0 )条件不成立,不进行4B单位的复制。直接进行以字节为单位的复制。 

    4.如果n = 0: 
    两个if条件均不满足,根本就不复制。 

    5.如果n < 0: 
    函数依然工作,只是牵涉到补码了,后果未知。 

    如果0<n<4 

    参考: 
    typedef unsigned int __kernel_size_t; 
    typedef __kernel_size_t size_t; 
    --------------------------------------------------------------- 
    __constant_memcpy() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    /* 
    * This looks ugly, but the compiler can optimize it totally, 
    * as the count is constant. 
    */ 
    static inline void * __constant_memcpy(void * to, const void * from, size_t n) 

    long esi, edi; 
    if (!n) return to; 
    #if 1 /* want to do small copies with non-string ops? */ 
    switch (n) 

    case 1: *(char*)to = *(char*)from; return to; 
    case 2: *(short*)to = *(short*)from; return to; 
    case 4: *(int*)to = *(int*)from; return to; 
    #if 1 /* including those doable with two moves? */ 
    case 3: *(short*)to = *(short*)from; 
    *((char*)to+2) = *((char*)from+2); return to; 
    case 5: *(int*)to = *(int*)from; 
    *((char*)to+4) = *((char*)from+4); return to; 
    case 6: *(int*)to = *(int*)from; 
    *((short*)to+2) = *((short*)from+2); return to; 
    case 8: *(int*)to = *(int*)from; 
    *((int*)to+1) = *((int*)from+1); return to; 
    #endif/* 1 */ 
    }/* switch */ 
    #endif/* 1 */ 
    esi = (long) from; 
    edi = (long) to; 
    if (n >= 5*4) 

    /* large block: use rep prefix */ 
    int ecx; 
    __asm__ __volatile__( 
    "rep ; movsl" 
    : "=&c" (ecx), "=&D" (edi), "=&S" (esi) 
    : "0" (n/4), "1" (edi),"2" (esi) 
    : "memory" 
    ); 
    }/* if */ 

    else 

    /* small block: don't clobber ecx + smaller code */ 
    if (n >= 4*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    if (n >= 3*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    if (n >= 2*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    if (n >= 1*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    }/* else */ 

    switch (n % 4) 

    /* tail */ 
    case 0: return to; 

    case 1: __asm__ __volatile__( 
    "movsb" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 

    case 2: __asm__ __volatile__( 
    "movsw" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 

    default: __asm__ __volatile__( 
    "movsw\n\tmovsb" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 
    }/* switch */ 


    代码分析: 
    1.对1-8,(一包括7)个字节的复制,采用不同类型的变量进行复制: 
    #if 1 /* want to do small copies with non-string ops? */ 
    switch (n) 

    case 1: *(char*)to = *(char*)from; return to; 
    case 2: *(short*)to = *(short*)from; return to; 
    case 4: *(int*)to = *(int*)from; return to; 
    #if 1 /* including those doable with two moves? */ 
    case 3: *(short*)to = *(short*)from; 
    *((char*)to+2) = *((char*)from+2); return to; 
    case 5: *(int*)to = *(int*)from; 
    *((char*)to+4) = *((char*)from+4); return to; 
    case 6: *(int*)to = *(int*)from; 
    *((short*)to+2) = *((short*)from+2); return to; 
    case 8: *(int*)to = *(int*)from; 
    *((int*)to+1) = *((int*)from+1); return to; 
    #endif/* 1 */ 
    }/* switch */ 
    #endif/* 1 */ 
    当要复制的字节数为:1-8个之间时。执行以上这段程序。当字节数为: 
    1个:用char * 
    2个:用short * 
    4个:用int* 

    2.复制的字节数在[20,>20],[16,19],[12,15],[8,11],[4,7]: 
    if (n >= 5*4) //当要复制的字节数在[20,>20]时: 

    /* large block: use rep prefix */ 
    int ecx; 
    __asm__ __volatile__( 
    "rep ; movsl" 
    : "=&c" (ecx), "=&D" (edi), "=&S" (esi) 
    : "0" (n/4), "1" (edi),"2" (esi) 
    : "memory" 
    ); 
    }/* if */ 

    分析: esi = (long) from; 
    edi = (long) to; 
    ecx = n/4; 
    rep ===> if( ecx-- != 0 ) 
    movsl ===> { 
    (unsigned long)es:[edi] = ds:[esi]; 

    然后就转入下一个switch{}结构体中执行: 
    switch (n % 4) 

    /* tail */ 
    case 0: return to; 

    case 1: __asm__ __volatile__( 
    "movsb" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 

    case 2: __asm__ __volatile__( 
    "movsw" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 

    default: __asm__ __volatile__( 
    "movsw\n\tmovsb" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 
    }/* switch */ 
    代码简单,不再啰嗦。就是再将剩下的不足4B的字节复制过去。 
    default是表示,n%4 == 3,先复制一个字,再复制一个字节,共3B。 
    -------------------------------------------------------------- 
    else //当要复制的字节数在 4<= n <=19时: 

    /* small block: don't clobber ecx + smaller code */ 
    //当要复制的字节数在[16,19]时: 
    if (n >= 4*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    //当要复制的字节数在[12,15]时: 
    if (n >= 3*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    //当要复制的字节数在[8,11]时: 
    if (n >= 2*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    //当要复制的字节数在[4,7]时: 
    if (n >= 1*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    }/* else */ 

    分析: 
    ???: ecx初始值没指定???ecx = n/4这才对啊! 
    其实这些代码合并成一个: 
    if( n >- 1*4 )//7,[9,19] 
    __asm__ __volatile__( 
    "rep; movsl\t\n" 
    :"=&D"(edi),"=&S"(esi),"=C" 
    :"0"(edi),"1"(esi),"2"(n/4) 
    :"memory"); 

    注意: 
    __constant_memcpy()与__memcpy()很相同,参数个数和类型一样,同时功能作用也一样。 
    --------------------------------------------------------------- 
    __constant_memcpy3d() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_MEMCPY 
    #ifdef CONFIG_X86_USE_3DNOW/* 对下面的__constant_memcpy3d() 
    __memcpy3d(),memcpy()*/ 
    #include <asm/mmx.h> 
    /* 
    * This CPU favours 3DNow strongly (eg AMD Athlon) 
    */ 
    static inline void * __constant_memcpy3d(void * to, const void * from, size_t len) 

    if (len < 512) 
    return __constant_memcpy(to, from, len); 
    return _mmx_memcpy(to, from, len); 

    ????_mmx_memcpy()函数找不到,只好罢手!!! 
    --------------------------------------------------------------- 
    __memcpy3d() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    static __inline__ void *__memcpy3d(void *to, const void *from, size_t len) 

    if (len < 512) 
    return __memcpy(to, from, len); 
    return _mmx_memcpy(to, from, len); 

    ????_mmx_memcpy()函数找不到,只好罢手!!! 
    --------------------------------------------------------------- 
    memcpy() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define memcpy(t, f, n) \ 
    (__builtin_constant_p(n) ? \ 
    __constant_memcpy3d((t),(f),(n)) : \ 
    __memcpy3d((t),(f),(n))) 
    #else/* CONFIG_X86_USE_3DNOW */ 
    /* 
    * No 3D Now! 
    */ 
    #define memcpy(t, f, n) \ 
    (__builtin_constant_p(n) ? \ 
    __constant_memcpy((t),(f),(n)) : \ 
    __memcpy((t),(f),(n))) 
    #endif/* CONFIG_X86_USE_3DNOW */ 

    int __builtin_constant_p(exp)学习: 
    You can use the built-in function __builtin_constant_p to determine if a value is known to be constant at compile-time and hence that GCC can perform constantfolding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compiletime constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the ‘-O’ option. 
    You would typically use this function in an embedded application where memory was a critical resource. If you have some complex calculation, you may want it to be folded if it involves constants, but need to call a function if it does not. For example: 

    #define Scale_Value(X) \ 
    (__builtin_constant_p (X) \ 
    ? ((X) * SCALE + OFFSET) : Scale (X)) 

    You may use this built-in function in either a macro or an inline function. However, if you use it in an inlined function and pass an argument of the function as the argument to the built-in, GCC will never return 1 when you call the inline function with a string constant or compound literal and will not return 1 when you pass a constant numeric value to the inline function unless you specify the ‘-O’ option. 

    使用__builtin_constant_p()要和gcc中的-O选项配合使用。 

    You may also use __builtin_constant_p in initializers for static data. For instance,you can write 
    static const int table[] = { 
    __builtin_constant_p (EXPRESSION) ? (EXPRESSION) : -1, 
    /* . . . */ 
    }; 
    This is an acceptable initializer even if EXPRESSION is not a constant expression. 
    GCC must be more conservative about evaluating the built-in in this case, because it has no opportunity to perform optimization.Previous versions of GCC did not accept this built-in in data initializers. The earliest version where it is completely safe is 3.0.1. 

    --------------------------------------------------------------- 
    __HAVE_ARCH_MEMMOVE 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_MEMMOVE 
    void *memmove(void * dest,const void * src, size_t n); 
    memmove()延用string.c中的函数。 

    #define memcmp __builtin_memcmp 
    --------------------------------------------------------------- 
    __HAVE_ARCH_MEMCHR memchr() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_MEMCHR 
    static inline void * memchr(const void * cs,int c,size_t count) 

    int d0; 
    register void * __res; 
    if (!count) return NULL; 
    __asm__ __volatile__( 
    "repne\n\t" 
    "scasb\n\t" 
    "je 1f\n\t" 
    "movl $1,%0\n" 
    "1:\tdecl %0" 
    :"=D" (__res), "=&c" (d0) 
    :"a" (c),"0" (cs),"1" (count) 
    :"memory"); 
    return __res; 


    功能:cs指定内存的起始位置,count指定查找的个数,c指定要查找的内容。在以cs指定的内存为查找的起始位置,以cs+count为终止位置来查找内容c。找到就返回所找到的位置;没找到就返回0。 

    参数初始值: 
    ax/eax: c 
    di/edi: const void * cs 
    cx/ecx: count 
    ZF = 0; 
    ax = c; 
    edi = cs; 
    ecx = count; 
    指令重排: 
    repne ===> while( ecx-- != 0 && ZF == 0) 

    scasb ===> if((al-es:[edi++])==0) 
    ZF = 1; 

    je 1 ===> if(ZF == 1) goto 1; 
    movl $1,%0 ===> edi = 1; 
    1: ===> 1: 
    decl %0 ===> edi--; 
    return edi; 
    返回值:如果找到了c,就返回c所在的位置,如果没找到,就返回0。 
    一般情况代码简单,就此住手。 

    特殊情况: 
    1.若ecx==0:则两个if条件均不满足,直接返回0,结束程序。 
    2.若ecx为0xffffffff巨大的数:要么在其中能找到能与c相匹配的数,返回其位置;要么找不到,当ecx--变为0时,(当ecx==0时,跳出循环时,ecx还要再--又变为0xffffffff),并返回0。 
    3.此处无负数,故ecx<0一情况无须多虑。由于是内存操作函数,连'\0'也可以进入比较范围。 
    --------------------------------------------------------------- 
    __memset_generic() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    static inline void * __memset_generic(void * s, char c,size_t count) 

    int d0, d1; 
    __asm__ __volatile__( 
    "rep\n\t" 
    "stosb" 
    : "=&c" (d0), "=&D" (d1) 
    :"a" (c),"1" (s),"0" (count) 
    :"memory"); 
    return s; 

    ax = c; 
    edi = s; 
    ecx = count; 
    rep ====> while( ecx !=0 ) 

    stosb ====> es:[edi] = al; 

    return s; 
    --------------------------------------------------------------- 
    __constant_count_memset() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    /* we might want to write optimized versions of these later */ 
    #define __constant_count_memset(s,c,count) __memset_generic((s),(c),(count)) 
    --------------------------------------------------------------- 
    __constant_c_memset() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    /* 
    * memset(x,0,y) is a reasonably common thing to do, so we want to fill 
    * things 32 bits at a time even when we don't know the size of the 
    * area at compile-time.. 
    */ 
    static inline void * __constant_c_memset(void * s, unsigned long c, size_t count) 

    int d0, d1; 
    __asm__ __volatile__( 
    "rep ; stosl\n\t" 
    "testb $2,%b3\n\t" 
    "je 1f\n\t" 
    "stosw\n" 
    "1:\ttestb $1,%b3\n\t" 
    "je 2f\n\t" 
    "stosb\n" 
    "2:" 
    :"=&c" (d0), "=&D" (d1) 
    :"a" (c), "q" (count), "0" (count/4), "1" ((long) s) 
    :"memory"); 
    return (s); 

    参数初始值分析: 
    ax/eax: c 
    cx/ecx: count/4 
    di/edi: void *s 

    指令重排: 
    rep ====> while( ecx-- != 0 ) 

    stosl ====> (long)es:[edi] = eax; 
    edi += 4; 

    testb $2,%b3 ====> if( (0x02 & (char)count) == 0 ) 
    je 1 ====> goto 1; 
    stosw ====> (short)es:[edi] = ax; 
    edi += 2; 
    1: testb $1,%b3 ====> 1: if( (0x01 & (char)count) == 0) 
    je 2 ====> goto 2; 
    stosb ====> (char)es:[edi] = al; 
    2: ====> 2: 
    分析: 
    先以4B为单位进行复制字节。完成后,再分别测试倒数第2位,最后一位是否为1,从而判断是否还剩3,2,1,0个字节。若还剩3B,则复制一个字后,还剩1B;若还剩2B,则复制一个字后,还剩0B.与后面还剩2,0B的情况一样。 

    特殊情况: 
    若count==0,则while,if条件均不满足,跳出循环。 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRNLEN strnlen() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    /* Added by Gertjan van Wingerde to make minix and sysv module work */ 
    #define __HAVE_ARCH_STRNLEN 
    static inline size_t strnlen(const char * s, size_t count) 

    int d0; 
    register int __res; 
    __asm__ __volatile__( 
    "movl %2,%0\n\t" 
    "jmp 2f\n" 
    "1:\tcmpb $0,(%0)\n\t" 
    "je 3f\n\t" 
    "incl %0\n" 
    "2:\tdecl %1\n\t" 
    "cmpl $-1,%1\n\t" 
    "jne 1b\n" 
    "3:\tsubl %2,%0" 
    :"=a" (__res), "=&d" (d0) 
    :"c" (s),"1" (count) 
    :"memory"); 
    return __res; 

    /* end of additional stuff */ 

    参数初始值分析: 
    cx/ecx: const char * s 
    dx/edx: count 
    ax/eax: __res 

    指令重排: 
    size_t edx; 
    edx = count; 
    char * eax,ecx; 
    ecx = s; 

    movl %2,%0 ====> eax = s; //ecx = eax = s; 
    jmp 2 ====> goto 2; 

    1: cmpb $0,(%0) ====> 1: if( ((char)(ds:[eax]))==0 )
    je 3 ====> goto 3; 
    incl %0 ====> eax++; 

    2: decl %1 ====> 2: edx--; 
    cmpl $-1,%1 ====> if( (0xffffffff & edx) != 0) 
    jne 1 ====> goto 1; 

    3: subl %2,%0 ====> 3: eax -= ecx; 
    return eax; 
    各种情况分析: 
    1.字符串的长度(不含'\0') < count: 
    s==>"abcd\0?" 
    count == 5: eax已经指向'\0',但还尚未比较之。edx==1,经过edx--后变为edx==0,从而结束函数。再经过eax-=ecx;后,eax==4,为字符串的长度(不含'\0')作为函数返回值。 

    count == 6: edx==1,尚未变为0,但eax=='\0',且经过if条件的比较后,跳出循环,eax==4,为字符串的长度(不含'\0')作为函数返回值。 

    2.字符串的长度(不含'\0') == count: 
    s==>"abcd\0?" 
    count == 4: count总共比较3次,eax最后指向'd'(但尚未比较),eax-=ecx后,eax==3,为count-1的值,也即循环的次数。 

    3.字符串的长度(不含'\0') > count: 
    s==>"abcd\0?" 
    count == 3: 共循环2次后,count变为0,从而结束循环。此时比较了两个字符'a'和'b',eax指向'c',但尚未比较。eax-=ecx后,eax=2,为count-1,也就是所循环的次数。 

    4.字符串的长度(不含'\0')== 0: 
    s==>"\0?" 
    count == 4: 返回eax==0。 

    5.count == 1 
    s==>"abcd\0?" 
    count == 1: 返回eax==0。 

    6.count == 0 
    s==>"abcd\0?" : edx--后,edx变为0xffffffff,要么当edx又减为0时,终止循环,eax当为0,共加了0xffffffff次,又回到原来的值;要么找到为'\0'处,此时返回字符串的长度(不含'\0')。 

    功能分析: 
    s指定一个字符串的首地址,count指定一个长度。对该字符串进行扫描,若字符串的总长度(不含'\0')小于count,就返回该字符串的总长度(不含'\0');若字符串的总长度(不含'\0')>= count,就返回count-1;若字符串的总长度(不含'\0')== 0或count==1就返回0。若count==0则情况未知。 

    --------------------------------------------------------------- 
    __HAVE_ARCH_STRSTR strstr() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRSTR 
    extern char *strstr(const char *cs, const char *ct); 
    此处当是引用string.c中的strstr()函数。 
    --------------------------------------------------------------- 
    __constant_c_and_count_memset() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 
    /* 
    * This looks horribly ugly, but the compiler can optimize it totally, 
    * as we by now know that both pattern and count is constant.. 
    */ 
    static inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count) 

    switch (count) 

    case 0: 
    return s; 
    case 1: 
    *(unsigned char *)s = pattern; 
    return s; 
    case 2: 
    *(unsigned short *)s = pattern; 
    return s; 
    case 3: 
    *(unsigned short *)s = pattern; 
    *(2+(unsigned char *)s) = pattern; 
    return s; 
    case 4: 
    *(unsigned long *)s = pattern; 
    return s; 

    #define COMMON(x) \ 
    __asm__ __volatile__( \ 
    "rep ; stosl" \ 
    x \ 
    : "=&c" (d0), "=&D" (d1) \ 
    : "a" (pattern),"0" (count/4),"1" ((long) s) \ 
    : "memory") 

    int d0, d1; 
    switch (count % 4) 

    case 0: COMMON(""); return s; 
    case 1: COMMON("\n\tstosb"); return s; 
    case 2: COMMON("\n\tstosw"); return s; 
    default: COMMON("\n\tstosw\n\tstosb"); return s; 



    #undef COMMON 

    分析: 
    1.count == [0,4] : 
    switch (count) 

    case 0: 
    return s; 
    case 1: 
    *(unsigned char *)s = pattern; 
    return s; 
    case 2: 
    *(unsigned short *)s = pattern; 
    return s; 
    case 3: 
    *(unsigned short *)s = pattern; 
    *(2+(unsigned char *)s) = pattern; 
    return s; 
    case 4: 
    *(unsigned long *)s = pattern; 
    return s; 


    2.count > 4 : 
    #define COMMON(x) \ 
    __asm__ __volatile__( \ 
    "rep ; stosl" \ 
    x \ 
    : "=&c" (d0), "=&D" (d1) \ 
    : "a" (pattern),"0" (count/4),"1" ((long) s) \ 
    : "memory") 

    int d0, d1; 
    switch (count % 4) 

    case 0: COMMON(""); return s; 
    case 1: COMMON("\n\tstosb"); return s; 
    case 2: COMMON("\n\tstosw"); return s; 
    default: COMMON("\n\tstosw\n\tstosb"); return s; 



    #undef COMMON 

    a):注意这种在函数内部使用宏的方法: 
    1)先用#define定义宏; 
    2)再用一对{}括住函数体; 
    3)再在后面用#undef取消所定义的的宏; 

    b):#define COMMON(x) \ 
    __asm__ __volatile__( \ 
    "rep ; stosl" \ 
    x \ 
    : "=&c" (d0), "=&D" (d1) \ 
    : "a" (pattern),"0" (count/4),"1" ((long) s) \ 
    : "memory") 

    参数初始值: 
    ax/eax: pattern 
    cx/ecx: count/4 
    di/edi: s 

    指令重排: 
    COMMON("")展开为: 
    eax = pattern; 
    edi = s; 
    ecx = count/4; 
    rep ===> while( ecx-- != 0 ) 

    stosl ===> es:[edi] = eax; 
    edi += 4; 

    return s; 

    COMMON("\n\tstosb")展开为: 
    eax = pattern; 
    edi = s; 
    ecx = count/4; 
    rep ===> while( ecx-- != 0 ) 

    stosl ===> es:[edi] = eax; 
    edi += 4; 

    x ===> stosb ===> es:[edi] = al; 
    edi += 1; 
    return s; 

    COMMON("\n\tstosw")展开为: 
    eax = pattern; 
    edi = s; 
    ecx = count/4; 
    rep ===> while( ecx-- != 0 ) 

    stosl ===> es:[edi] = eax; 
    edi += 4; 

    x ===> stosw ===> es:[edi] = ax; 
    edi += 2; 
    return s; 

    COMMON("\n\tstosw\n\tstosb")展开为: 
    eax = pattern; 
    edi = s; 
    ecx = count/4; 
    rep ===> while( ecx-- != 0 ) 

    stosl ===> es:[edi] = eax; 
    edi += 4; 

    x => stosw;stosb=> es:[edi] = ax; 
    edi += 2; 
    es:[edi] = al; 
    edi += 1; 

    return s; 

    c): 进一步分析: 

    int d0, d1; 
    switch (count % 4) 

    case 0: COMMON(""); return s; 
    case 1: COMMON("\n\tstosb"); return s; 
    case 2: COMMON("\n\tstosw"); return s; 
    default: COMMON("\n\tstosw\n\tstosb"); return s; 


    对剩下的字节数进行移动!!! 

    --------------------------------------------------------------- 
    __constant_c_x_memset()
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __constant_c_x_memset(s, c, count) \ 
    (__builtin_constant_p(count) ? \ 
    __constant_c_and_count_memset((s),(c),(count)) : \ 
    __constant_c_memset((s),(c),(count))) 

    功能:对s所指定的的字符串用c填充指定的个数count个字节。 

    参考资料: 
    1.__constant_c_and_count_memset(): 
    static inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count) 

    switch (count) 

    case 0: 
    return s; 
    case 1: 
    *(unsigned char *)s = pattern; 
    return s; 
    case 2: 
    *(unsigned short *)s = pattern; 
    return s; 
    case 3: 
    *(unsigned short *)s = pattern; 
    *(2+(unsigned char *)s) = pattern; 
    return s; 
    case 4: 
    *(unsigned long *)s = pattern; 
    return s; 

    #define COMMON(x) \ 
    __asm__ __volatile__( \ 
    "rep ; stosl" \ 
    x \ 
    : "=&c" (d0), "=&D" (d1) \ 
    : "a" (pattern),"0" (count/4),"1" ((long) s) \ 
    : "memory") 

    int d0, d1; 
    switch (count % 4) 

    case 0: COMMON(""); return s; 
    case 1: COMMON("\n\tstosb"); return s; 
    case 2: COMMON("\n\tstosw"); return s; 
    default: COMMON("\n\tstosw\n\tstosb"); return s; 



    #undef COMMON 


    2.__constant_c_memset(): 
    static inline void * __constant_c_memset(void * s, unsigned long c, size_t count) 

    int d0, d1; 
    __asm__ __volatile__( 
    "rep ; stosl\n\t" 
    "testb $2,%b3\n\t" 
    "je 1f\n\t" 
    "stosw\n" 
    "1:\ttestb $1,%b3\n\t" 
    "je 2f\n\t" 
    "stosb\n" 
    "2:" 
    :"=&c" (d0), "=&D" (d1) 
    :"a" (c), "q" (count), "0" (count/4), "1" ((long) s) 
    :"memory"); 
    return (s);

    --------------------------------------------------------------- 
    __memset() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __memset(s, c, count) \ 
    (__builtin_constant_p(count) ? \ 
    __constant_count_memset((s),(c),(count)) : \ 
    __memset_generic((s),(c),(count))) 

    功能:将s所指定的内存区域用c字符填充count次数。 

    参考资料: 
    1.__constant_count_memset(): 
    #define __constant_count_memset(s,c,count) __memset_generic((s),(c),(count)) 

    2.__memset_generic(): 
    static inline void * __memset_generic(void * s, char c,size_t count) 

    int d0, d1; 
    __asm__ __volatile__( 
    "rep\n\t" 
    "stosb" 
    : "=&c" (d0), "=&D" (d1) 
    :"a" (c),"1" (s),"0" (count) 
    :"memory"); 
    return s; 

    --------------------------------------------------------------- 
    __HAVE_ARCH_MEMSET memset() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_MEMSET 
    #define memset(s, c, count) \ 
    (__builtin_constant_p(c) ? \ 
    __constant_c_x_memset((s),(0x01010101UL*(unsigned char)(c)),(count)) : \ 
    __memset((s),(c),(count))) 

    功能同上: 

    参考资料: 
    1.__constant_c_x_memset(): 
    #define __constant_c_x_memset(s, c, count) \ 
    (__builtin_constant_p(count) ? \ 
    __constant_c_and_count_memset((s),(c),(count)) : \ 
    __constant_c_memset((s),(c),(count))) 

    2.__memset()同上。 

    ?????(0x01010101UL*(unsigned char)(c))是什么意思??? 
    --------------------------------------------------------------- 
    __HAVE_ARCH_MEMSCAN memscan() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    /* 
    * find the first occurrence of byte 'c', or 1 past the area if none 
    */ 
    #define __HAVE_ARCH_MEMSCAN 
    static inline void * memscan(void * addr, int c, size_t size) 

    if (!size) return addr; 
    __asm__("repnz; scasb\n\t" 
    "jnz 1f\n\t" 
    "dec %%edi\n" 
    "1:" 
    : "=D" (addr), "=c" (size) 
    : "0" (addr), "1" (size), "a" (c) 
    : "memory"); 
    return addr; 


    重排指令: 
    edi = addr; 
    ecx = size; 
    eax = c; 
    ZF = 0; 
    repnz ====> while( ecx-- != 0 && ZF == 0 ) 

    scasb ====> if( (al - es:[edi++]) == 0 ) 
    ZF = 1; 

    jnz 1 ====> if( ZF != 0 ) goto 1; 
    dec %%edi ====> edi--; 
    1: ====> 1: 

    此函数的汇编非常简单,就不再啰嗦了。 
    线性扫描内存,找到了第一个'c',就返回找到的地址;没找到就返回所比较的最后一个位置。 
    #endif /* __KERNEL__ */ 

    #endif /* !_I386_STRING_H_ */ 
    *************************************************************** 
    汇编写的字符串函数终于啃完了!!! 

  • 相关阅读:
    bzoj4950
    P1377发奖金
    环信SDK与Apple Watch的结合(1)
    环信SDK与Apple Watch的结合(2)
    【Objective-C】0-第一个OC的类
    【Objective-C】4-空指针和野指针
    【Objective-C】3 -self关键字
    【Objective-C】2.自定义构造方法和description方法
    Swift函数|闭包
    Swift枚举|结构体|类|属性|方法|下标脚本|继承
  • 原文地址:https://www.cnblogs.com/taek/p/2338939.html
Copyright © 2020-2023  润新知