简介和分类
Linking is the process of collecting and combining various pieces of code and data into a single file that can be loaded (copied) into memory and executed.
Linking can be performed at compile time, when the source code is translated into machine code;
at load time, when the program is loaded into memory and executed by the loader;
and even at run time, by application programs.
On early computer systems, linking was performed manually. On modern systems, linking is performed automatically by programs called linkers.
advantages:
Linkers play a crucial role in software development because they enable separate compilation.
Instead of organizing a large application as one monolithic source file, we can decompose it into smaller, more manageable modules that can be modified and compiled separately. When we change one of these modules, we simply recompile it and relink the application, without having to recompile the other files.
Compiler Drivers
example code :
we might invoke the gcc driver by typing the following command to the shell:
unix> gcc -O2 -g -o p mian.c swap.c
procedure:
explanation:
The driver first runs the C preprocessor (cpp), which translates the C source file main.c into an ASCII intermediate file main.i:
cpp [other arguments] main.c /tmp/main.i
Next, the driver runs the C compiler (cc1), which translates main.i into an ASCII assembly language file main.s.
cc1 /tmp/main.i main.c -O2 [other arguments] -o /tmp/main.s
Then, the driver runs the assembler (as), which translates main.s into a relocatable object file main.o:
as [other arguments] -o /tmp/main.o /tmp/main.s
The driver goes through the same process to generate swap.o.
Finally, it runs the linker program ld, which combines main.o and swap.o, along with the necessary system object files, to create the executable object file p:
ld -o p [system object files and args] /tmp/main.o /tmp/swap.o
Static Linking
Static linkers such as the Unix ld program take as input a collection of relocatable object files and command-line arguments and generate as output a fully linked executable object file that can be loaded and run.
The input relocatable object files consist of various code and data sections.
Instructions are in one section, initialized global variables are in another section, and uninitialized variables are in yet another section.
A linker concatenates blocks together, decides on run-time locations for the concatenated blocks, and modifies various locations within the code and data blocks.
Linkers have minimal understanding of the target machine. The compilers and assemblers that generate the object files have already done most of the work.
Object Files
Contains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file.
Executable object file.
Contains binary code and data in a form that can be copied directly into memory and executed.
Shared object file
A special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run time.
Compilers and assemblers generate relocatable object files (including shared object files).
Linkers generate executable object files.
Object file formats vary from system to system. Modern Unix systems—such as Linux, later versions of System V Unix, BSD Unix variants, and Sun Solaris—use the Unix Executable and Linkable Format (ELF).
Relocatable Object Files
ELF图示:
说明:
The ELF header begins with a 16-byte sequence that describes the word size and byte ordering of the system that generated the file.
The rest of the ELF header contains information that allows a linker to parse and interpret the object file.
This includes the size of the ELF header, the object file type (e.g., relocatable, executable, or shared), the machine type (e.g., IA32),
the file offset of the section header table, and the size and number of entries in the section header table.
The locations and sizes of the various sections are described by the section header table, which contains a fixed sized entry for each section in the object file.
.text: The machine code of the compiled program.
.rodata: Read-only data such as the format strings in printf statements, and jump tables for switch statements
.data: Initialized global C variables. Local C variables are maintained at run time on the stack, and do not appear in either the .data or .bss sections.
.bss: Uninitialized global C variables. This section occupies no actual space in the object file; it is merely a place holder.
Object file formats distin- guish between initialized and uninitialized variables for space efficiency: uninitialized variables do not have to occupy any actual disk space in the object file.
The use of the term .bss to denote uninitialized data is universal.
.symtab: A symbol table with information about functions and global vari- ables that are defined and referenced in the program.
However, unlike the symbol table inside a compiler, the .symtab symbol table does not contain entries for local variables.
.rel.text:
A list of locations in the .text section that will need to be modified when the linker combines this object file with others.
In general, any instruction that calls an external function or references a global variable will need to be modified.
On the other hand, instructions that call local functions do not need to be modified.
Note that relocation information is not needed in executable object files, and is usually omitted unless the user explicitly instructs the linker to include it.
.rel.data: Relocation information for any global variables that are refer- enced or defined by the module.
In general, any initialized global variable whose initial value is the address of a global variable or externally defined function will need to be modified.
.debug: A debugging symbol table with entries for local variables and typedefs defined in the program, global variables defined and referenced in the program, and the original C source file.
It is only present if the compiler driver is invoked with the -g option.
.line: A mapping between line numbers in the original C source program and machine code instructions in the .text section. It is only present if the compiler driver is invoked with the -g option.
.strtab: A string table for the symbol tables in the .symtab and .debug sections, and for the section names in the section headers. A string table is a sequence of null-terminated character strings.
Symbols and Symbol Tables
Each relocatable object module, m, has a symbol table that contains information about the symbols that are defined and referenced by m.
In the context of a linker, there are three different kinds of symbols:
-
. Global symbols that are defined by module m and that can be referenced by other modules. Global linker symbols correspond to nonstatic C functions and global variables that are defined without the C static attribute.
-
. Global symbols that are referenced by module m but defined by some other module. Such symbols are called externals and correspond to C functions and variables that are defined in other modules.
-
. Local symbols that are defined and referenced exclusively by module m. Some local linker symbols correspond to C functions and global variables that are defined with the static attribute. These symbols are visible anywhere within module m, but cannot be referenced by other modules. The sections in an object file and the name of the source file that corresponds to module m also get local symbols.
It is important to realize that local linker symbols are not the same as local program variables.
The symbol table in .symtab does not contain any symbols that correspond to local nonstatic program variables. These are managed at run time on the stack and are not of interest to the linker.
Interestingly, local procedure variables that are defined with the C static attribute are not managed on the stack.
Instead, the compiler allocates space in .data or .bss for each definition and creates a local linker symbol in the symbol table with a unique name.
For example, suppose a pair of functions in the same module define a static local variable x:
In this case, the compiler allocates space for two integers in .data and exports a pair of unique local linker symbols to the assembler.
For example, it might use x.1 for the definition in function f and x.2 for the definition in function g.