• Compiler


    http://sei.pku.edu.cn/~yaoguo/ACT11/DragonBook-2v2.pdf


    Introduction

    Programming languages are notations for describing computations to people and to machines. The world as we know it depends on programming languages, because all the software running on all the computers was written in some programming language. But, before a program can be run, it first must be translated into a form in which it can be executed by a computer. The software systems that do this translation are called compilers. This book is about how to design and implement compilers. We shall dis- cover that a few basic ideas can be used to construct translators for a wide variety of languages and machines. Besides compilers, the principles and tech- niques for compiler design are applicable to so many other domains that they are likely to be reused many times in the career of a computer scientist. The study of compiler writing touches upon programming languages, machine ar- chitecture, language theory, algorithms, and software engineering. In this preliminary chapter, we introduce the different forms of language translators, give a high level overview of the structure of a typical compiler, and discuss the trends in programming languages and machine architecture that are shaping compilers. We include some observations on the relationship between compiler design and computer-science theory and an outline of the applications of compiler technology that go beyond compilation. We end with a brief outline of key programming-language concepts that will be needed for our study of compilers.

    1.1 Language Processors
    Simply stated, a compiler is a program that can read a program in one lan- guage - the source language - and translate it into an equivalent program in another language - the target language; see Fig. 1.1. An important role of the compiler is to report any errors in the source program that it detects during the translation process.

    If the target program is an executable machine-language program, it can
    then be called by the user to process inputs and produce outputs; see Fig. 1.2.

    An interpreter is another common kind of language processor. Instead of
    producing a target program as a translation, an interpreter appears to directly
    execute the operations specified in the source program on inputs supplied by
    the user, as shown in Fig. 1.3.

    The machine-language target program produced by a compiler is usually
    much faster than an interpreter at mapping inputs to outputs . An interpreter,
    however, can usually give better error diagnostics than a compiler, because it
    executes the source program statement by statement.

    Example 1.1 : Java language processors combine compilation and interpreta­
    tion, as shown in Fig. 1.4. A Java source program may first be compiled into
    an intermediate form called bytecodes. The bytecodes are then interpreted by a
    virtual machine. A benefit of this arrangement is that bytecodes compiled on
    one machine can be interpreted on another machine, perhaps across a network.
    In order to achieve faster processing of inputs to outputs, some Java compil­
    ers, called just-in-time compilers, translate the bytecodes into machine language
    immediately before they run the intermediate program to process the input. 0

    In addition to a compiler, several other programs may be required to create
    an executable target program, as shown in Fig. 1.5. A source program may be
    divided into modules stored in separate files. The task of collecting the source
    program is sometimes entrusted to a separate program, called a preprocessor.
    The preprocessor may also expand shorthands, called macros, into source lan­
    guage statements.
    The modified source program is then fed to a compiler. The compiler may
    produce an assembly-language program as its output, because assembly lan­
    guage is easier to produce as output and is easier to debug. The assembly
    language is then processed by a program called an assembler that produces
    relocatable machine code as its output.
    Large programs are often compiled in pieces, so the relocatable machine
    code may have to be linked together with other relocatable object files and
    library files into the code that actually runs on the machine. The linker resolves
    external memory addresses, where the code in one file may refer to a location
    in another file. The loader then puts together all of the executable object files
    into memory for execution.

    In addition to a compiler, several other programs may be required to create
    an executable target program, as shown in Fig. 1.5. A source program may be
    divided into modules stored in separate files. The task of collecting the source
    program is sometimes entrusted to a separate program, called a preprocessor.
    The preprocessor may also expand shorthands, called macros, into source lan­
    guage statements.
    The modified source program is then fed to a compiler. The compiler may
    produce an assembly-language program as its output, because assembly lan­
    guage is easier to produce as output and is easier to debug. The assembly
    language is then processed by a program called an assembler that produces
    relocatable machine code as its output.
    Large programs are often compiled in pieces, so the relocatable machine
    code may have to be linked together with other relocatable object files and
    library files into the code that actually runs on the machine. The linker resolves
    external memory addresses, where the code in one file may refer to a location
    in another file. The loader then puts together all of the executable object files
    into memory for execution.

    Up to this point we have treated a compiler as a single box that maps a source
    program into a semantically equivalent target program. If we open up this box
    a little, we see that there are two parts to this mapping: analysis and synthesis.
    The analysis part breaks up the source program into constituent pieces and
    imposes a grammatical structure on them. It then uses this structure to cre­
    ate an intermediate representation of the source program. If the analysis part
    detects that the source program is either syntactically ill formed or semanti­
    cally unsound, then it must provide informative messages, so the user can take
    corrective action. The analysis part also collects information about the source

    and groups the characters into meaningful sequences called lexemes. For each
    lexeme, the lexical analyzer produces as output a token of the form
    {token-name, attribute-value)
    that it passes on to the subsequent phase, syntax analysis. In the token, the
    first component token-name is an abstract symbol that is used during syntax
    analysis, and the second component attribute-value points to an entry in the
    symbol table for this token. Information from the symbol-table entry
    '
    is needed
    for semantic analysis and code generation.
    For example, suppose a source program contains the assignment statement
    position = initial + rate * 60 (1.1)
    The characters in this assignment could be grouped into the following lexemes
    and mapped into the following tokens passed on to the syntax analyzer:
    1. position is a lexeme that would be mapped into a token {id, 1), where id
    is an abstract symbol standing for identifier and 1 points to the symbol­
    table entry for position. The symbol-table entry for an identifier holds
    information about the identifier, such as its name and type.
    2. The assignment symbol = is a lexeme that is mapped into the token {=).
    Since this token needs no attribute-value, we have omitted the second
    component. We could have used any abstract symbol such as assign for
    the token-name, but for notational convenience we have chosen to use the
    lexeme itself as the name of the abstract symbol.
    3. initial is a lexeme that is mapped into the token (id, 2), where 2 points
    to the symbol-table entry for initial.
    4. + is a lexeme that is mapped into the token (+).
    5. rate is a lexeme that is mapped into the token (id, 3), where 3 points to
    the symbol-table entry for rate.
    6. * is a lexeme that is mapped into the token (*).
    7. 60 is a lexeme that is mapped into the token (60).
    1

    1.2.2 Syntax Analysis
    The second phase of the compiler is syntax analysis or parsing. The parser uses
    the first components of the tokens produced by the lexical analyzer to create
    a tree-like intermediate representation that depicts the grammatical structure
    of the token stream. A typical representation is a syntax tree in which each
    interior node represents an operation and the children of the node represent the
    arguments of the operation. A syntax tree for the token stream (1.2) is shown
    as the output of the syntactic analyzer in Fig. 1.7.
    This tree shows the order in which the operations in the assignment
    position = initial + rate * 60
    are to be performed. The tree has an interior node labeled * with (id, 3) as
    its left child and the integer 60 as its right child. The node (id, 3) represents
    the identifier rate. The node labeled * makes it explicit that we must first
    multiply the value of rate by 60. The node labeled + indicates that we must
    add the result of this multiplication to the value of initial. The root of the
    tree, labeled =, indicates that we must store the result of this addition into the
    location for the identifier position. This ordering of operations is consistent
    with the usual conventions of arithmetic which tell us that multiplication has
    higher precedence than addition, and hence that the multiplication is to be
    performed before the addition.
    The subsequent phases of the compiler use the grammatical structure to help
    analyze the source program and generate the target program. In Chapter 4
    we shall use context-free grammars to specify the grammatical structure of
    programming languages and discuss algorithms for constructing efficient syntax
    analyzers automatically from certain classes of grammars. In Chapters 2 and 5
    we shall see that syntax-directed definitions can help specify the translation of
    programming language constructs.


    1.2.3 Semantic Analysis
    The semantic analyzer uses the syntax tree and the information in the symbol
    table to check the source program for semantic consistency with the language
    definition. It also gathers type information and saves it in either the syntax tree
    or the symbol table, for subsequent use during intermediate-code generation.
    An important part of semantic analysis is type checking, where the compiler
    checks that each operator has matching operands. For example, many program­
    ming language definitions require an array index to be an integer; the compiler
    must report an error if a floating-point number is used to index an array.
    The language specification may permit some type conversions called coer­
    cions. For example, a binary arithmetic operator may be applied to either a
    pair of integers or to a pair of floating-point numbers. If the operator is applied
    to a floating-point number and an integer, the compiler may convert or coerce
    the integer into a floating-point number.

    Such a coercion appears in Fig. 1.7. Suppose that position, initial, and
    rate have been declared to be floating-point numbers, and that the lexeme 60
    by itself forms an integer. The type checker in the semantic analyzer in Fig. 1.7
    discovers that the operator * is applied to a floating-point number rate and
    an integer 60. In this case, the integer may be converted into a floating-point
    number. In Fig. 1 .7, notice that the output of the semantic analyzer has an
    extra node for the operator inttofloat, which explicitly converts its integer
    argument into a floating-point number. Type checking and semantic analysis
    are discussed in Chapter 6.


    1 .2.4 Intermediate Code Generation


    In the process of translating a source program into target code, a compiler may
    construct one or more intermediate representations, which can have a variety
    of forms. Syntax trees are a form of intermediate representation; they are
    commonly used during syntax and semantic analysis.
    After syntax and semantic analysis of the source program, many compil­
    ers generate an explicit low-level or machine-like intermediate representation,
    which we can think of as a program for an abstract machine. This intermedi­
    ate representation should have two important properties: it should be easy to
    produce and it should be easy to translate into the target machine.
    In Chapter 6, we consider an intermediate form called three-address code,
    which consists of a sequence of assembly-like instructions with three operands
    per instruction. Each operand can act like a register. The output of the inter­
    mediate code generator in Fig. 1. 7 consists of the three-address code sequence

    There are several points worth noting about three-address instructions.
    First, each three-address assignment instruction has at most one operator on the
    right side. Thus, these instructions fix the order in which operations are to be
    done; the multiplication precedes the addition in the source program (1.1). Sec­
    ond, the compiler must generate a temporary name to hold the value computed
    by a three-address instruction. Third, some "three-address instructions" like
    the first and last in the sequence (1.3) , above, have fewer than three operands.
    In Chapter 6, we cover the principal intermediate representations used in
    compilers. Chapters 5 introduces techniques for syntax-directed translation
    that are applied in Chapter 6 to type checking and intermediate-code generation
    for typical programming language constructs such as expressions, flow-of-control
    constructs, and procedure calls.

    1.2.5 Code Optimization
    The machine-independent code-optimization phase attempts to improve the
    intermediate code so that better target code will result. Usually better meanS
    faster, but other objectives may be desired, such as shorter code, or target code
    that consumes less power. For example, a straightforward algorithm generates
    the intermediate code (1.3), using an instruction for each operator in the tree
    representation that comes from the semantic analyzer.
    A simple intermediate code generation algorithm followed by code optimiza­
    tion is a reasonable way to generate good target code. The optimizer can deduce
    that the conversion of 60 from integer to floating point can be done once and for
    all at compile time, so the inttofloat operation can be eliminated by replacing
    the integer 60 by the floating-point number 60.0. Moreover, t3 is used only
    once to transmit its value to id1 so the optimizer can transform (1.3) into the
    shorter sequence
    t1 = id3 * 60 .0
    id1 = id2 + t1 (1.4)
    There is a great variation in the amount of code optimization different com­
    pilers perform. In those that do the most, the so-called "optimizing compilers,"
    a significant amount of time is spent on this phase. There are simple opti­
    mizations that significantly improve the running time of the target program
    without slowing down compilation too much. The chapters from 8 on discuss
    machine-independent and machine-dependent optimizations in detail.
    1 .2.6 Code Generation
    The code generator takes as input an intermediate representation of the source
    program and maps it into the target language. If the target language is machine
    code, registers Or memory locations are selected for each of the variables used by
    the program. Then, the intermediate instructions are translated into sequences
    of machine instructions that perform the same task. A crucial aspect of code
    generation is the judicious assignment of registers to hold variables.
    For example, using registers R1 and R2, the intermediate code in (1.4) might
    get translated into the machine code
    LDF R2 , id3
    MULF R2 , R2 , #60 .0
    LDF R1 , id2 (1.5)
    ADDF R1, R1 , R2
    STF id1 , R1
    The first operand of each instruction specifies a destination. The F in each
    instruction tells us that it deals with floating-point numbers. The code in

    (1.5) loads the contents of address id3 into register R2, then multiplies it with floating-point constant 60.0. The # signifies that 60.0 is to be treated as an immediate constant. The third instruction moves id2 into register Rl and the fourth adds to it the value previously computed in register R2. Finally, the value in register Rl is stored into the address of idl, so the code correctly implements the assignment statement (1.1) . Chapter 8 covers code generation. This discussion of code generation has ignored the important issue of stor­age allocation for the identifiers in the source program. As we shall see in Chapter 7, the organization of storage at run-time depends on the language be­ing compiled. Storage-allocation decisions are made either during intermediate code generation or during code generation. 1.2.7 Symbol-Table Management An essential function of a compiler is to record the variable names used in the source program and collect information about various attributes of each name. These attributes may provide information about the storage allocated for a name, its type, its scope (where in the program its value may be used), and in the case of procedure names, such things as the number and types of its arguments, the method of passing each argument (for example, by value or by reference), and the type returned. The symbol table is a data structure containing a record for each variable name, with fields for the attributes of the name. The data structure should be designed to allow the compiler to find the record for each name quickly and to store or retrieve data from that record quickly. Symbol tables are discussed in Chapter 2. 1.2.8 The Grouping of Phases into Passes The discussion of phases deals with the logical organization of a compiler. In an implementation, activities from several phases may be grouped together into a pass that reads an input file and writes an output file. For example, the front-end phases of lexical analysis, syntax analysis, semantic analysis, and intermediate code generation might be grouped together into one pass. Code optimization might be an optional pass. Then there could be a back-end pass consisting of code generation for a particular target machine. Some compiler collections have been created around carefully designed in­termediate representations that allow the front end for a particular language to interface with the back end for a certain target machine. With these collections, we can produce compilers for different source languages for one target machine by combining different front ends with the back end for that target machine. Similarly, we can produce compilers for different target machines, by combining a front end with back ends for different target machines.

    1.2.9 Compiler-Construction Tools
    The compiler writer, like any software developer, can profitably use modern
    software development environments containing tools such as language editors,
    debuggers, version managers, profilers, test harnesses, and so on. In addition
    to these general software-development tools, other more specialized tools have
    been created to help implement various phases of a compiler.
    These tools use specialized languages for specifying and implementing spe­
    cific components, and many use quite sophisticated algorithms. The most suc­
    cessful tools are those that hide the details of the generation algorithm and
    produce components that can be easily integrated into the remainder of the
    compiler. Some commonly used compiler-construction tools include
    1. Parser generators that automatically produce syntax analyzers from a
    grammatical description of a programming language.
    2. Scanner generators that produce lexical analyzers from a regular-expres­
    sion description of the tokens of a language.
    3. Syntax-directed translation engines that produce collections of routines
    for walking a parse tree and generating intermediate code.
    4. Code-generator generators that produce a code generator from a collection
    of rules for translating each operation of the intermediate language into
    the machine language for a target machine.
    5. Data-flow analysis engines that facilitate the gathering of information
    about how values are transmitted from one part of a program to each
    other part. Data-flow analysis is a key part of code optimization.
    6. Compiler-construction toolkits that provide an integrated set of routines
    for constructing various phases of a compiler.
    We shall describe many of these tools throughout this book.
    1 .3 The Evolution of Programming Languages
    The first electronic computers appeared in the 1940's and were programmed in
    machine language by sequences of O's and 1 's that explicitly told the computer
    what operations to execute and in what order. The operations themselves
    were very low level: move data from one location to another, add the contents
    of two registers, compare two values, and so on. Needless to say, this kind
    of programming was slow, tedious, and error prone. And once written, the
    programs were hard to understand and modify.

    1.3.1 The Move to Higher-level Languages
    13
    The first step towards more people-friendly programming languages was the
    development of mnemonic assembly languages in the early 1950's. Initially,
    the instructions in an assembly language were just mnemonic representations
    of machine instructions. Later, macro instructions were added to assembly
    languages so that a programmer could define parameterized shorthands for
    frequently used sequences of machine instructions.
    A major step towards higher-level languages was made in the latter half of
    the 1950's with the development of Fortran for scientific computation, Cobol
    for business data processing, and Lisp for symbolic computation. The philos­
    ophy behind these languages was to create higher-level notations with which
    programmers could more easily write numerical computations, business appli­
    cations, and symbolic programs. These languages were so successful that they
    are still in use today.
    In the following decades, many more languages were created with innovative
    features to help make programming easier, more natural, and more robust.
    Later in this chapter, we shall discuss some key features that are common to
    many modern programming languages.
    Today, there are thousands of programming languages. They can be classi­
    fied in a variety of ways. One classification is by generation. First-generation
    languages are the machine languages, second-generation the assembly languages,
    and third-generation the higher-level languages like Fortran, Cobol, Lisp, C,
    C++, C#, and Java. Fourth-generation languages are languages designed
    for specific applications like NOMAD for report generation, SQL for database
    queries, and Postscript for text formatting. The term fifth-generation language
    has been applied to logic- and constraint-based languages like Prolog and OPS5.
    Another classification of languages uses the term imperative for languages
    in which a program specifies how a computation is to be done and declarative
    for languages in which a program specifies what computation is to be done.
    Languages such as C, C++, C#, and Java are imperative languages. In imper­
    ative languages there is a notion of program state and statements that change
    the state. Functional languages such as ML and Haskell and constraint logic
    languages such as Prolog are often considered to be declarative languages.
    The term von Neumann language is applied to programming languages
    whose computational model is based on the von Neumann computer archi­
    tecture. Many of today's languages, such as Fortran and C are von Neumann
    languages.
    An object-oriented language is one that supports object-oriented program­
    ming, a programming style in which a program consists of a collection of objects
    that interact with one another. Simula 67 and Smalltalk are the earliest major
    object-oriented languages. Languages such as C++, C#, Java, and Ruby are
    more recent object-oriented languages.
    Scripting languages are interpreted languages with high-level operators de­
    signed for "gluing together" computations. These computations were originally

    called "scripts." Awk, JavaScript, Perl, PHP, Python, Ruby, and Tel are pop­
    ular examples of scripting languages. Programs written in scripting languages
    are often much shorter than equivalent programs written in languages like C.I
    1 .3.2 Impacts on Compilers
    Since the design of programming languages and compilers are intimately related,
    the advances in programming languages placed new demands on compiler writ­
    ers. They had to devise algorithms and representations to translate and support
    the new language features. Since the 1940's, computer architecture has evolved
    as well. Not only did the compiler writers have to track new language fea­
    tures, they also had to devise translation algorithms that would take maximal
    advantage of the new hardware capabilities.
    Compilers can help promote the use of high-level languages by minimizing
    the execution overhead of the programs written in these languages. Compilers
    are also critical in making high-performance computer architectures effective
    on users' applications. In fact, the performance of a computer system is so
    dependent on compiler technology that compilers are used as a tool in evaluating
    architectural concepts before a computer is built.
    Compiler writing is challenging. A compiler by itself is a large program.
    Moreover, many modern language-processing systems handle several source lan­
    guages and target machines within the same framework; that is, they serve as
    collections of compilers, possibly consisting of millions of lines of code. Con­
    sequently, good software-engineering techniques are essential for creating and
    evolving modern language processors.
    A compiler must translate correctly the potentially infinite set of programs
    that could be written in the source language. The problem of generating the
    optimal target code from a source program is undecidable in general; thus,
    compiler writers must evaluate tradeoffs about what problems to tackle and
    what heuristics to use to approach the problem of generating efficient code.
    A study of compilers is also a study of how theory meets practice, as we
    shall see in Section 1.4.
    The purpose of this text is to teach the methodology and fundamental ideas
    used in compiler design. It is not the intention of this text to teach all the
    algorithms and techniques that could be used for building a state-of-the-art
    language-processing system. However, readers of this text will acquire the basic
    knowledge and understanding to learn how to build a compiler relatively easily.

    1 .4 The Science of Building a Compiler
    15
    Compiler design is full of beautiful examples where complicated real-world prob­
    lems are solved by abstracting the essence of the problem mathematically. These
    serve as excellent illustrations of how abstractions can be used to solve prob­
    lems: take a problem, formulate a mathematical abstraction that captures the
    key characteristics, and solve it using mathematical techniques. The problem
    formulation must be grounded in a solid understanding of the characteristics of
    computer programs, and the solution must be validated and refined empirically.
    A compiler must accept all source programs that conform to the specification
    of the language; the set of source programs is infinite and any program can be
    very large, consisting of possibly millions of lines of code. Any transformation
    performed by the compiler while translating a source program must preserve the
    meaning of the program being compiled. Compiler writers thus have influence
    over not just the compilers they create, but all the programs that their com­
    pilers compile. This leverage makes writing compilers particularly rewarding;
    however, it also makes compiler development challenging.
    1.4.1 Modeling in Compiler Design and Implementation
    The study of compilers is mainly a study of how we design the right mathe­
    matical models and choose the right algorithms, while balancing the need for
    generality and power against simplicity and efficiency.
    Some of most fundamental models are finite-state machines and regular
    expressions, which we shall meet in Chapter 3. These models are useful for de­
    scribing the lexical units of programs (keywords, identifiers, and such) and for
    describing the algorithms used by the compiler to recognize those units. Also
    among the most fundamental models are context-free grammars, used to de­
    scribe the syntactic structure of programming languages such as the nesting of
    parentheses or control constructs. We shall study grammars in Chapter 4. Sim­
    ilarly, trees are an important model for representing the structure of programs
    and their translation into object code, as we shall see in Chapter 5.

    1.4.2 The Science of Code Optimization
    The term "optimization" in compiler design refers to the attempts that a com­
    piler makes to produce code that is more efficient than the obvious code. "Op­
    timization" is thus a misnomer, since there is no way that the code produced
    by a compiler can be guaranteed to be as fast or faster than any other code
    that performs the same task.

    In modern times, the optimization of code that a compiler performs has
    become both more important and more complex. It is more complex because
    processor architectures have become more complex, yielding more opportunities
    to improve the way code executes. It is more important because massively par­
    allel computers require substantial optimization, or their performance suffers by
    orders of magnitude. With the likely prevalence of multi core machines (com­
    puters with chips that have large numbers of processors on them), all compilers
    will have to face the problem of taking advantage of multiprocessor machines.
    It is hard, if not impossible, to build a robust compiler out of "hacks."
    Thus, an extensive and useful theory has been built up around the problem of
    optimizing code. The use of a rigorous mathematical foundation allows us to
    show that an optimization is correct and that it produces the desirable effect
    for all possible inputs. We shall see, starting in Chapter 9, how models such
    q,s graphs, matrices, and linear programs are necessary if the compiler is to
    produce well optimized code.
    On the other hand, pure theory alone is insufficient. Like many real..,world
    problems, there are no perfect answers. In fact, most of the questions that
    we ask in compiler optimization are undecidable. One of the most important
    skills in compiler design is the ability to formulate the right problem to solve.
    We need a good understanding of the behavior of programs to start with and
    thorough experimentation and evaluation to validate our intuitions.
    Compiler optimizations must meet the following design objectives:
    • The optimization must be correct, that is, preserve the meaning of the
    compiled program,
    • The opti:rnization must improve the performance of many programs,
    • The compilation time must be kept reasonable, and
    • The engineering effort required must be manageable.
    It is impossible to overemphasize the importance of correctness. It is trivial
    to write a compiler that generates fast code if the generated code need not
    be correct! Optimizing compilers are so difficult to get right that we dare say
    that no optimizing compiler is completely error-free! Thus, the most important
    objective in writing a compiler is that it is correct.
    The second goal is that the compiler must be effective in improving the per­
    formance of many input programs. Normally, performance means the speed of
    the program execution. Especially in embedded applications, we may also wish
    to minimize the size of the generated code. And in the case of mobile devices,
    it is also desirable that the code minimizes power consumption. Typically, the
    same optimizations that speed up execution time also conserve power. Besides
    performance, usability aspects such as error reporting and debugging are also
    important.
    Third, we need to keep the compilation time short to support a rapid devel­
    opment and debugging cycle. This requirement has become easier to meet as

    machines get faster. Often, a program is first developed and debugged without
    program optimizations. Not only is the compilation time reduced, but more
    importantly, unoptimized programs are easier to debug, because the optimiza­
    tions introduced by a compiler often obscure the relationship between the source
    code and the object code. TIuning on optimizations in the compiler sometimes
    exposes new problems in the source program; thus testing must again be per­
    formed on the optimized code. The need for additional testing sometimes deters
    the use of optimizations in applications, especially if their performance is not
    critical.
    Finally, a compiler is a complex system; we must keep the system sim­
    ple to assure that the engineering and maintenance costs of the compiler are
    manageable. There is an infinite number of program optimizations that we
    could implement, and it takes a nontrivial amount of effort to create a correct
    and effective optimization. We must prioritize the optimizations, implementing
    only those that lead to the greatest benefits on source programs encountered in
    practice.
    Thus, in studying compilers, we learn not only how to build a compiler, but
    also the general methodology of solving complex and open-ended problems. The
    approach used in compiler development involves both theory and experimenta­
    tion. We normally start by formulating the problem based on our intuitions on
    what the important issues are.
    1.5 Applications of Compiler Technology
    Compiler design is not only about compilers, and many people use the technol­
    ogy learned by studying compilers in school, yet have never, strictly speaking,
    written (even part of) a compiler for a major programming language. Compiler
    technology has other important uses as well. Additionally, compiler design im­
    pacts several other areas of computer science. In this section, we review the
    most important interactions and applications of the technology.
    1.5.1 Implementation of High-Level Programming
    Languages
    A high-level programming language defines a programming abstraction: the
    programmer expresses an algorithm using the language, and the compiler must
    translate that program to the target language. Generally, higher-level program­
    ming languages are easier to program in, but are less efficient, that is, the target
    programs rUn more slowly. Programmers using a low-level language have more
    control over a computation and can, in principle, produce more efficient code.
    Unfortunately, lower-level programs are harder to write and - worse still -
    less portable, more prone to errors, and harder to maintain. Optimizing com­
    pilers include techniques to improve the performance of generated code, thus
    offsetting the inefficiency introduced by high-level abstractions.

    Example 1.2 : The register keyword in the C programming language is an
    early example of the interaction between compiler technology and language evo­
    lution. When the C language was created in the mid 1970s, it was considered
    necessary to let a programmer control which program variables reside in regis­
    ters. This control became unnecessary as effective register-allocation techniques
    were developed, and most modern programs no longer use this language feature.
    In fact, programs that use the register keyword may lose efficiency, because
    programmers often are not the best judge of very low-level matters like register
    allocation. The optimal choice of register allocation depends greatly on the
    specifics of a machine architecture. Hardwiring low-level resource-management
    decisions like register allocation may in fact hurt performance, especially if the
    program is run on machines other than the one for which it was written. 0
    The many shifts in the popular choice of programming languages have been
    in the direction of increased levels of abstraction. C was the predominant
    systems programming language of the 80's; many of the new projects started
    in the 90's chose C++; Java, introduced in 1995, gained popularity quickly
    in the late 90's. The new programming-language features introduced in each
    round spurred new research in compiler optimization. In the following, we give
    an overview on the main language features that have stimulated significant
    advances in compiler technology.
    Practically all common programming languages, including C, Fortran and
    Cobol, support user-defined aggregate data types, such as arrays and structures,
    and high-level control flow, such as loops and procedure invocations. If we just
    take each high-level construct or data-access operation and translate it directly
    to machine code, the result would be very inefficient. A body of cOJIlpiler
    optimizations, known as data-flow optimizations, has been developed to analyze
    the flow of data through the program and removes redundancies across these
    constructs. They are effective in generating code that resembles code written
    by a skilled programmer at a lower level.
    Object orientation was first introduced in Simula in 1967, and has been
    incorporated in languages such as Smalltalk, C++, C#, and Java. The key
    ideas behind object orientation are
    1. Data abstraction and
    2. Inheritance of properties,
    both of which have been found to make programs more modular and easier to
    maintain. Object-oriented programs are different from those written in many
    other languages, in that they consist of many more, but smaller, procedures
    (called methods in object-oriented terms). Thus, compiler optimizations must
    be able to perform well across the procedural boundaries of the source program.
    Procedure inlining, which is the replacement of a procedure call by the body
    of the procedure, is particularly useful here. Optimizations to speed up virtual
    method dispatches have also been developed.

  • 相关阅读:
    ZeptoLab Code Rush 2015
    UVa 10048 Audiophobia【Floyd】
    POJ 1847 Tram【Floyd】
    UVa 247 Calling Circles【传递闭包】
    UVa 1395 Slim Span【最小生成树】
    HDU 4006 The kth great number【优先队列】
    UVa 674 Coin Change【记忆化搜索】
    UVa 10285 Longest Run on a Snowboard【记忆化搜索】
    【NOIP2016提高A组模拟9.28】求导
    【NOIP2012模拟10.9】电费结算
  • 原文地址:https://www.cnblogs.com/threef/p/3238328.html
Copyright © 2020-2023  润新知