1) Background
Sometimes, when we meet with a problem very hard to explain or understand, we can try more powerful tools, e.g. math or assembly code. In these days, I meet with a problem at work, which can be simplified as following.
I thought a function was not so efficient, so I decided to improve it.
From
vector<int> f()
{
vector<int> v;
v.push_back(2);
return v;
}
To
int g(vector<int> &v)
{
v.push_back(2);
return 1;
}
I thought f will use more memory and cause memory fragments, and I believed g was better than f. However, after I tested these function both with get_memory_usage and my customized allocator, I found that they are the same from the aspect of memory usage. It is amazing. So I decided to dig in more detail about how these code was compiled to assembly code.
2) Knowledge on understanding assembly code
1. Machine language is what the computer (CPU) deal with. Every command the computer sees is a given as a sequence of numbers. It is in binary, normally presented in hex to simplify and be more readable. For example, 83 ec 08.
Assembly language is the same as machine language, except that the commands and parameters are replaced as letter sequence which are more readable and easier to memorize. For example, 83 ec 08 -> sub $0x8,%esp
High-lever languages are to make programming easier, e.g. c/c++. Code written in high-level languages may be compiled to machine languages.
2. Some general rules for most assembly languages are listed below:
- Source can be in memory, register or constant
- Destination can be in memory or non-segment register
- Only one of source and destination can be in memory
- Source and destination must be same size
3. Complier, assembler, linker and loader
a. Preprocessing processes include files, conditional compilation instructions and macros. gcc –E test.c
b. compilation takes the output of preprocessor, and the source code and generates assemble source code. gcc –S test.c
c. Assembly is the third stage of compilation. It takes assemble source code and generate object file. E.g. gcc –c test.c, test.o is produced which is an ELF file.
d. linking is the final stage of compilation. It takes one or more objects files or libraries as input and combines them to produce a single executable file, e,g, a.out, an ELF file. In doing so, it resolves references to external symbols. There are two types of linker, static linker and dynamic linker.
e. loading the executable file into memory for program running.
4. ELF sections and segments
Figure 3: Simplified object file format: linking view and execution view.
Use readelf/objdump to get more information about elf file.
[torstan]$ more simple.c
void main()
{
printf("hello world\n");
}
[torstan]$ gcc simple.c -o simple
simple.c: In function `main':
simple.c:2: warning: return type of 'main' is not `int
[torstan]$ readelf -d simple
Dynamic section at offset 0x6a0 contains 20 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x4003a8
0x000000000000000d (FINI) 0x400598
0x0000000000000004 (HASH) 0x400240
0x0000000000000005 (STRTAB) 0x4002e0
0x0000000000000006 (SYMTAB) 0x400268
0x000000000000000a (STRSZ) 83 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x500838
0x0000000000000002 (PLTRELSZ) 48 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x400378
0x0000000000000007 (RELA) 0x400360
0x0000000000000008 (RELASZ) 24 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffe (VERNEED) 0x400340
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x400334
0x0000000000000000 (NULL) 0x0
[torstan]$ readelf -l simple
Elf file type is EXEC (Executable file)
Entry point 0x4003f0
There are 8 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001c0 0x00000000000001c0 R E 8
INTERP 0x0000000000000200 0x0000000000400200 0x0000000000400200
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000000674 0x0000000000000674 R E 100000
LOAD 0x0000000000000678 0x0000000000500678 0x0000000000500678
0x0000000000000200 0x0000000000000208 RW 100000
DYNAMIC 0x00000000000006a0 0x00000000005006a0 0x00000000005006a0
0x0000000000000190 0x0000000000000190 RW 8
NOTE 0x000000000000021c 0x000000000040021c 0x000000000040021c
0x0000000000000020 0x0000000000000020 R 4
GNU_EH_FRAME 0x00000000000005bc 0x00000000004005bc 0x00000000004005bc
0x0000000000000024 0x0000000000000024 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 8
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag
06 .eh_frame_hdr
07
5. process memory layout
Figure 5: Illustration of C’s process memory layout on an x86.
The process load segments (corresponding to “text” and “data” in the diagram) at the process’s base address. The main stack is located just below and grows downwards. Any additional threads that are created will have their own stacks, located below the main stack. Each of the stack frames is separated by a guard page to detect stack overflows among stacks frame.
6. c function and stack frame
In high-level languages, one of the most important techniques introduced to construct programs is function. Programmer use functions to break program into pieces of routines with specific task which can be independently developed, tested and reused. From The memory point of view, the high-level abstraction of function is implemented with the help of the STACK. A stack frame is a portion of memory that has been allocated for function to execute. When a function is called, a stack frame is allocated to store old frame of caller function, some registers, local variables within this function. When a function returns to the calling function, the stack will be dismantled (clean up).
7. the c function convention
A convention is a way of doing things that is standardized, but not a documented standard. For example, the c/c++ function calling convention tells the compiler thing such as:
- The order in which function arguments are pushed onto the stack
- Whether the caller or callee responsibility to remove the arguments from the stack at the end of the call that is the stack cleanup process
- The name decorating convention convention that the compiler uses to identify individual functions
There are 3 kinds of conventions, __cdecl, __stdcall, __fastcall (for Micro visual C++). The default is __cdecl.
void __cdecl TestFunc(float a, char b, char c); // Borland and Microsoft
void TestFunc (float a, char b, char c) __attribute__((cdecl)); //gnu gcc
For __cdecl, parameters are pushed onto the stack in reverse order (right to left), and the caller cleans up stack.
8. stack layout during function call
When function call takes place, data elements are pushed onto the stack in the following way:
- Push function parameters onto stack, from right to left
- Push return address onto stack which equals the value in EIP register
- Push the EBP onto stack which belongs to the caller, and make EBP point to this address in stack
- If a function includes try/catch or any other exception handling structure such as SEH (structured exception handing –Microsoft implementation), the compiler will include exception handling information on the stack
- The callee save registers such as ESI, EDI, and EBX if they will be used in the callee function
- Local variable declared in the callee function
9. EBP and ESP
EBP and ESP are the two important registers for the stack frame which holds necessary information. ESP and EBP are the names in 32 bits system. In 64 bits system, they are RSP and RBP. The EBP is called Frame Pointer, which points to the bottom of the stack frame after a new function is called, while ESP points to the top of the stack frame of callee. The callee’s frame stack is from EBP to ESP. The data in this frame stack can be referenced by EBP or ESP. Since EBP points to a fixed location within the frame, local variables and parameters are preferred to be referenced with an offset from EBP.
EBP can be modified by the following commands, move ESP, EBP; leave;
ESP can be modified by the following commands, pop; push; sub esp, 0ch; call; leave; ret
10. contorl instructions
i. call myfunc ;
a) push return address, value in EIP register, onto stack
b) jump to the starting address of myfunc
ii. leave
a) mov rbp, rsp ; clean up stack frame of callee
b) pop rbp ; restore rbp to caller’s rbp
iii. ret
a) pop stack top element (return address) to eip
b) jump to the address of eip
11. The caller is responsible for allocating memory for parameter which used in callee stack frame, and caller is also responsible for clean up these elements in stack after the calling of function completes.
The callee is responsible for allocating memory and free memory for the elements on the stack including: old EBP, exception handle frame if any, saved register if any, local variables. So after the calling of the function completes, the callee should pop out these elements including old EBP by leave command, and jump to the next instruction in caller function by ret command.
12 the commands in gdb supporting debug assembly code
a) disassemble functionname; or disassemble function_address;
b) nexti ;next instruction, ni for short
c) stepi ; si for short
d) info registers; or info registers rbp rsp
e) display /3i $pc
f) x /fmt address
g) print
h) bt
3) Some examples
1. a simple case to show the frame stack
[torstan]$ more ass2.c
#include<stdio.h>
int swap_add(int *xp, int* yp)
{
int x = *xp;
int y = *yp;
*xp = y;
*yp = x;
return x+y;
}
int caller()
{
int arg1 = 534;
int arg2 = 1057;
int sum = swap_add(&arg1, &arg2);
int diff = arg1 - arg2;
return sum*diff;
}
int main()
{
int res = 0;
res = caller();
printf("result is %d\n", res);
return 0;
}
[torstan]$ gcc ass2.c -o ass
[torstan]$ gdb ass
….
Copyright … Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
……
(gdb) b swap_add
Breakpoint 1 at 0x4004ac
(gdb) r
Starting program: /home/torstan /ass
(no debugging symbols found)
(no debugging symbols found)
Breakpoint 1, 0x00000000004004ac in swap_add ()
(gdb) disassemble swa
swab swap_add swapcontext swapoff swapon
(gdb) disassemble swap_add
Dump of assembler code for function swap_add:
0x00000000004004a8 <swap_add+0>: push %rbp
0x00000000004004a9 <swap_add+1>: mov %rsp,%rbp
0x00000000004004ac <swap_add+4>: mov %rdi,0xfffffffffffffff8(%rbp) ; save rdi to [rbp-8]
0x00000000004004b0 <swap_add+8>: mov %rsi,0xfffffffffffffff0(%rbp) ; save rsi to [rbp-16], since they are pointer in 64 bits system, they need 8 bytes for each
0x00000000004004b4 <swap_add+12>: mov 0xfffffffffffffff8(%rbp),%rax
0x00000000004004b8 <swap_add+16>: mov (%rax),%eax
0x00000000004004ba <swap_add+18>: mov %eax,0xffffffffffffffec(%rbp) ;save *[rbp-8] to [rbp-20]
0x00000000004004bd <swap_add+21>: mov 0xfffffffffffffff0(%rbp),%rax
0x00000000004004c1 <swap_add+25>: mov (%rax),%eax
0x00000000004004c3 <swap_add+27>: mov %eax,0xffffffffffffffe8(%rbp) ; save *[rbp-16] to [rbp-24]
0x00000000004004c6 <swap_add+30>: mov 0xfffffffffffffff8(%rbp),%rdx ;
0x00000000004004ca <swap_add+34>: mov 0xffffffffffffffe8(%rbp),%eax
0x00000000004004cd <swap_add+37>: mov %eax,(%rdx) ; save [rbp-24] to*[rbp-8]
0x00000000004004cf <swap_add+39>: mov 0xfffffffffffffff0(%rbp),%rdx
0x00000000004004d3 <swap_add+43>: mov 0xffffffffffffffec(%rbp),%eax
0x00000000004004d6 <swap_add+46>: mov %eax,(%rdx) ; save [rbp-20] to *[rbp-16]
0x00000000004004d8 <swap_add+48>: mov 0xffffffffffffffe8(%rbp),%eax
0x00000000004004db <swap_add+51>: add 0xffffffffffffffec(%rbp),%eax ; add [rbp-20] and [rbp-24] to eax, and return value is stored in eax
0x00000000004004de <swap_add+54>: leaveq
0x00000000004004df <swap_add+55>: retq
End of assembler dump.
(gdb) disassemble caller
Dump of assembler code for function caller:
0x00000000004004e0 <caller+0>: push %rbp ; save caller’s rbp on stack
0x00000000004004e1 <caller+1>: mov %rsp,%rbp ;now rbp points to caller’s rbp
0x00000000004004e4 <caller+4>: sub $0x10,%rsp ; allocate 16 bytes for local variables
0x00000000004004e8 <caller+8>: movl $0x216,0xfffffffffffffffc(%rbp) ; store 534 to [rbp-4]
0x00000000004004ef <caller+15>: movl $0x421,0xfffffffffffffff8(%rbp) ; store 1057 to [rbp-8]
0x00000000004004f6 <caller+22>: lea 0xfffffffffffffff8(%rbp),%rsi ; store the address of [rbp-8] to rsi
0x00000000004004fa <caller+26>: lea 0xfffffffffffffffc(%rbp),%rdi ; store the address of [rbp-4] to rdi
0x00000000004004fe <caller+30>: callq 0x4004a8 <swap_add> ; rsi, rdi are prepared for this function call, swap_add
0x0000000000400503 <caller+35>: mov %eax,0xfffffffffffffff4(%rbp) ; save return value to [rbp-12]
0x0000000000400506 <caller+38>: mov 0xfffffffffffffff8(%rbp),%edx ; move [rbp-8] to edx
0x0000000000400509 <caller+41>: mov 0xfffffffffffffffc(%rbp),%eax ; move [rbp-4] to eax
0x000000000040050c <caller+44>: sub %edx,%eax
0x000000000040050e <caller+46>: mov %eax,0xfffffffffffffff0(%rbp) ; save the diff to [rbp-16]
0x0000000000400511 <caller+49>: mov 0xfffffffffffffff4(%rbp),%eax ;move [rbp-12] to eax
0x0000000000400514 <caller+52>: imul 0xfffffffffffffff0(%rbp),%eax ; multiple [rbp-16], result is in eax as return value
0x0000000000400518 <caller+56>: leaveq ; move rbp to rsp, and pop rbp to restore rbp with caller’s rbp
0x0000000000400519 <caller+57>: retq ; pop the top element to EIP and jump to EIP instruction
End of assembler dump.
(gdb) disassemble main
Dump of assembler code for function main:
0x000000000040051a <main+0>: push %rbp
0x000000000040051b <main+1>: mov %rsp,%rbp
0x000000000040051e <main+4>: sub $0x10,%rsp ; allocate 16 bytes for local variables
0x0000000000400522 <main+8>: movl $0x0,0xfffffffffffffffc(%rbp) ; move 0 to [rbp-4]
0x0000000000400529 <main+15>: mov $0x0,%eax ;move 0 to eax, prepare for calling function
0x000000000040052e <main+20>: callq 0x4004e0 <caller> ; call function caller, (push 0x0000000000400533 onto the stack, and jump to the first instruction of caller)
0x0000000000400533 <main+25>: mov %eax,0xfffffffffffffffc(%rbp) ; save function return value to [rbp-4]
0x0000000000400536 <main+28>: mov 0xfffffffffffffffc(%rbp),%esi
0x0000000000400539 <main+31>: mov $0x40063c,%edi
0x000000000040053e <main+36>: mov $0x0,%eax
0x0000000000400543 <main+41>: callq 0x4003e0 ; esi, edi, eax are prepared for this calling
0x0000000000400548 <main+46>: mov $0x0,%eax ; return value is 0
0x000000000040054d <main+51>: leaveq ; equal to mov %rbp, %rsp; pop %rbp
0x000000000040054e <main+52>: retq ; pop out the return address to rip,and jump to the address to keep execution
0x000000000040054f <main+53>: nop
4. Reference
a) Computer systems : A programmer’s perspective, by Bryant, and O’Hallaron