Linking(5) - 润新知

Linking(5)
Executable Object Files

The format of an executable object file is similar to that of a relocatable object file.

The ELF header describes the overall format of the file.

It also includes the program’s entry point, which is the address of the first instruction to execute when the program runs.

The .text, .rodata, and .data sections are similar to those in a relocatable object file, except that these sections have been relocated to their eventual run-time memory addresses.

The .init section defines a small function, called _init, that will be called by the program’s initialization code. Since the executable is fully linked (relocated), it needs no .rel sections.

ELF executables are designed to be easy to load into memory, with contigu- ous chunks of the executable file mapped to contiguous memory segments. This mapping is described by the segment header table.

Loading Executable Object Files

unix> ./p

Since p does not correspond to a built-in shell command, the shell assumes that p is an executable object file, which it runs for us by invoking some memory- resident operating system code known as the loader. Any Unix program can invoke the loader by calling the execve function. The loader copies the code and data in the executable object file from disk into memory, and then runs the program by jumping to its first instruction, or entry point. This process of copying the program into memory and then running it is known as loading.

On 32-bit Linux systems, the code segment starts at address 0x08048000.

The data segment follows at the next 4 KB aligned address.

The run-time heap fol- lows on the first 4 KB aligned address past the read/write segment and grows up via calls to the malloc library.

There is also a segment that is reserved for shared libraries. The user stack always starts at the largest legal user address and grows down (toward lower memory addresses). The segment starting above the stack is reserved for the code and data in the memory-resident part of the operating system known as the kernel.

When the loader runs, it creates the memory image shown in Figure 7.13.

Guided by the segment header table in the executable, it copies chunks of the executable into the code and data segments.

Next, the loader jumps to the pro- gram’s entry point, which is always the address of the _start symbol.

The startup code at the _start address is defined in the object file crt1.o and is the same for all C programs. Figure 7.14 shows the specific sequence of calls in the startup code.

After calling initialization routines from the .text and .init sections, the startup code calls the atexit routine, which appends a list of routines that should be called when the application terminates normally.The exit function runs the functions registered by atexit, and then returns control to the operating system by calling _exit.

Next, the startup code calls the application’s main routine, which begins executing our C code.

After the application returns, the startup code calls the _exit routine, which returns control to the operating system.

brief procedure for loading:

Each program in a Unix system runs in the context of a process with its own virtual address space.

When the shell runs a program, the parent shell process forks a child process that is a duplicate of the parent.

The child process invokes the loader via the execve system call.

The loader deletes the child’s existing virtual memory segments, and creates a new set of code, data, heap, and stack segments.

The new stack and heap segments are initialized to zero.

The new code and data segments are initialized to the contents of the executable file by mapping pages in the virtual address space to page-sized chunks of the executable file.

Finally, the loader jumps to the _start address, which eventually calls the application’s main routine.

Aside from some header information, there is no copying of data from disk to memory during loading.

The copying is deferred until the CPU references a mapped virtual page, at which point the operating system automatically transfers the page from disk to memory using its paging mechanism.

Dynamic Linking with Shared Libraries

disadvantages for using static libraries:

(1) Static libraries, like all software, need to be maintained and updated periodically.

If application programmers want to use the most recent version of a library, they must somehow become aware that the library has changed, and then explicitly relink their programs against the updated library.

(2) Another issue is that almost every C program uses standard I/O functions such as printf and scanf.

At run time, the code for these functions is duplicated in the text segment of each running process.

This can be a significant waste of scarce memory system resources.

shared libraries:

A shared library is an object module that, at run time, can be loaded at an arbitrary memory address and linked with a program in memory.

This process is known as dynamic linking and is performed by a program called a dynamic linker.

Shared libraries are also referred to as shared objects, and on Unix systems are typically denoted by the .so suffix.

Microsoft operating systems make heavy use of shared libraries, which they refer to as DLLs (dynamic link libraries).

Shared libraries are “shared” in two different ways.

First, in any given file system, there is exactly one .so file for a particular library.

The code and data in this .so file are shared by all of the executable object files that reference the library, as opposed to the contents of static libraries, which are copied and embedded in the executables that reference them.

Second, a single copy of the .text section of a shared library in memory can be shared by different running processes.

We will explore this in more detail when we study virtual memory in Chapter 9.

To build a shared library libvector.so of our example vector arithmetic routines in Figure 7.5 :

unix> gcc -shared -fPIC -o libvector.so addvec.c multvec.c

The -fPIC flag directs the compiler to generate position-independent code .

The -shared flag directs the linker to create a shared object file.

Once we have created the library, we would then link it into our example program :

unix> gcc -o p2 main2.c ./libvector.so

This creates an executable object file p2 in a form that can be linked with libvector.so at run time.

The basic idea is to do some of the linking statically when the executable file is created, and then complete the linking process dynam- ically when the program is loaded.

It is important to realize that none of the code or data sections from libvector.so are actually copied into the executable p2 at this point.

Instead, the linker copies some relocation and symbol table information that will allow references to code and data in libvector.so to be resolved at run time.

When the loader loads and runs the executable p2, it loads the partially linked executable p2.

Next, it notices that p2 contains a .interp section, which contains the path name of the dynamic linker, which is itself a shared object (e.g., ld-linux.so on Linux systems).

Instead of passing control to the application, as it would normally do, the loader loads and runs the dynamic linker.

The dynamic linker then finishes the linking task by performing the following relocations:

. Relocating the text and data of libc.so into some memory segment.

. Relocating the text and data of libvector.so into another memory segment.

. Relocating any references in p2 to symbols defined by libc.so and libvec- tor.so.

Finally, the dynamic linker passes control to the application.

From this point on, the locations of the shared libraries are fixed and do not change during execution of the program.
相关阅读:
三、python函数基础
 二、python算法逻辑基础
 RE正则表达式-语法
 read方法读取ini文件报错'gbk' codec can't decode
git bash常用命令
 xlrd、xlwt、xlutils模块操作excel文件
 Git命令行克隆项目
 浏览器console，web自动化常用的几种定位调试使用方法总结
 css定位正则匹配和模糊匹配
 罗马数字转整数
原文地址：https://www.cnblogs.com/geeklove01/p/9226283.html