Skip to main content

L53:Directed Acyclic Graph

L53:Directed Acyclic Graph
L53:Directed Acyclic Graph in Compiler Design,DAG Representation Basic,DAG Application in HIndi

Directed Acyclic Graph

A directed acyclic graph(DAG!) is a directed graph that contains no cycles. A rooted tree is a special kind of DAG and a DAG is a special kind of directed graph. For example, a DAG may be used to represent common subexpressions in an optimising compiler.

        +                          +
      .   .                      .   .
    .       .                  .       .
   *        ()               *<---|    ()
  . .      .  .             . .   |   .  .
 .   .    .    .           .   .  |  .   |
 a   b    f    *           a   b  |  f   |
              . .                 ^      v
             .   .                |      |
             a   b                |--<----
      Tree                      DAG
expression: a*b+f(a*b)
Example of Common Subexpression.
The common subexpression a*b need only be compiled once but its value can be used twice.

A DAG can be used to represent prerequisites in a university course, constraints on operations to be carried out in building construction, in fact an arbitrary partial-order `<‘. An edge is drawn from a to b whenever a<b. A partial order `<‘ satisfies:

(i) transitivity, a<b and b<c implies a<c
(ii) non-reflexive, not(a < a)
Standard graph terminology is used throughout this section. In addition,
for a directed edge e we denote by src(e) and dst(e) the source vertex and
the destination vertex of e. Until Section 3.11, when we speak of the subject DAG we mean the expression DAG without the primary inputs, the
primary outputs, and the edges emanating from or incident to these
vertices; these vertices and edges are treated specially in Section 3.11. For instance, Figure 1 shows a C-code basic block and its corresponding DAG, in which primary inputs are shown as squares and primary outputs as double circles.
‘Optimizations’ of Basic Blocks
Equivalent transformations: Two basic block are equivalent if they compute the same set of expressions.
            -Expressions: are the values of the live variables at the exit of the block.
Two important classes of local transformations:
            -structure preserving transformations:
v common sub expression elimination
v dead code elimination
v renaming of temporary variables
v interchange of two independent adjacent statements.
            -algebraic transformations (countlessly many):
v simplify expressions
v replace expensive operations with cheaper ones.
The DAG Representation of Basic Blocks
Directed acyclic graphs (DAGs) give a picture of how the value computed by each statement in the basic block is used in the subsequent statements of the block.
Definition: a dag for a basic block is a directed acyclic graph with the following labels on nodes:
        leaves are labeled with either variable names or constants.
vthey are unique identifiers
vfrom operators we determine whether l- or r-value.
vrepresent initial values of names. Subscript with 0.
        interior nodes are labeled by an operator symbol.
        Nodes are also (optionally) given a sequence of identifiers for labels.
            – interior node º computed values
            – identifiers in the sequence – have that value.
Utility: Constructing a dag from 3AS is a good way of determining:
        common sub expressions (expressions computed more than once),
        which names are used inside the block but evaluated outside,
        which statements of the block could have their computed value used outside the block.
Constructing a DAG
Input: a basic block.  Statements: (i) x:= y op z  (ii) x:= op y (iii) x:= y
Output: a dag for the basic block containing:
            – a label for each node. For leaves an identifier – constants are permitted. For interior nodes an operator symbol.
            – for each node a (possibly empty) list of attached identifiers – constants not permitted.
Method: Initially assume there are no nodes, and node is undefined.
(1)  If node(y) is undefined: created a leaf labeled y, let node(y) be this node. In case(i) if node(z) is undefined create  a leaf labeled z and that leaf be node(z).
(2)  In case(i) determine if there is a node labeled op whose left child is node(y) and right child is node(z). If not create such a node, let be n. case(ii), (iii) similar.
(3)  Delete x from the list attached to node(x). Append x to the list of identify for node n and set node(x) to n.
AlgorithmDAG: Constructing a DAG
Input: A basic block of three address statements. No pointers or arrayreferences.
Output:A DAG where each node n has a value, VALUE(n), which is anoperator in the case of an interior node or a variable name if the node is a leaf. Also, each node n has a (possibly empty) list of identifiers attached,ID(n).
The DAG Representation of Basic Blocks The previous algorithmaims at improving the quality of the target code,but only with respect to register utilization.
There are a number of other issues in the generation of efficient code.One of them is the elimination of redundant computation. Thus, in the sequence
x := b*c*d+b*c*2.0 b := b*c*3.0 y := b*c+1.0
the same computation of b*c is done three times (the fourth occurence isnot the same because
b is reassigned).
The following algorithmidentifies and removes common subexpressionsusing a DAG as intermediate representation. This algorithm assumes that there are no array element or pointers in the basic block. Traditional compiler optimizations do not deal naturally with arrayreferences and pointers.
Two special operators The [ ] operator is used to index a (one dimensional) array
a:=b[i] can be translated as (1)
l R,b(R)
if i is in register R (2)
l R,M l R,b(R)
if i is memory location M (3)
l R,S(A) l R,b(R)
if i is in stack offset S.
The * operator is similar. For example, (2) abore is replaced by
l R,M l R,*R
getreg(y,I) if there is register R such that RD(R) = {y} andNEXTUSE(
y,I) is emptythen
return (R)endif if there is R in REGISTERS such that RD(R) is empty thenreturn(
R:= getanyregister()forall
v in RD(R) doAD(
v) := AD(v) – {R}if SYMTAB.loc(
v) is not in AD(v) thengenerate( st R,SYMTAB.loc(v))AD( v) := AD(v) + {SYMTAB.loc(v)}endif

L52:Loop Optimization

L52:Loop Optimization
L52:,Loop Optimization,Motion Code,Induction Variable,strength reduction in compiler design in hindi

Loop Optimization

Most programs run as a loop in the system. It becomes necessary to optimize the loops in order to save CPU cycles and memory. Loops can be optimized by the following techniques:
  • Invariant code : A fragment of code that resides in the loop and computes the same value at each iteration is called a loop-invariant code. This code can be moved out of the loop by saving it to be computed only once, rather than with each iteration.
  • Induction analysis : A variable is called an induction variable if its value is altered within the loop by a loop-invariant value.
  • Strength reduction : There are expressions that consume more CPU cycles, time, and memory. These expressions should be replaced with cheaper expressions without compromising the output of expression. For example, multiplication (x * 2) is expensive in terms of CPU cycles than (x << 1) and yields the same result.

L51:Code Optimization

L51:Code Optimization
L51:Code Optimization,Basic Blocks,Flow Graph,Common Sub-expression Elimination,copy propagation

ode Optimization is done in the following different ways :
  1. Compile Time Evaluation :
    (i)  A = 2*(22.0/7.0)*r
         Perform 2*(22.0/7.0)*r at compile time.
    (ii)  x = 12.4
          y = x/2.3
          Evaluate x/2.3 as 12.4/2.3 at compile time.
  2. Variable Propagation :
    //Before Optimization
    c = a * b                                              
    x = a                                                 
    d = x * b + 4
    //After Optimization
    c = a * b 
    x = a
    a = a * b + c
    Hence, after variable propagation, a*b and x*b will be identified as common sub-expression.
  3. Dead code elimination : Variable propagation often leads to making assignment statement into dead code
    c = a * b                                               
    x = a                                               
    d = a * b + 4  
    //After elimination :
    c = a * b
    d = a * b + 4
  4. Code Motion :
    • Reduce the evaluation frequency of expression.
    • Bring loop invariant statements out of the loop. 
  5. a = 200;
         b = x + y;
         if (a % b == 0}
         printf(“%d”, a);
    //This code can be further optimized as
    a = 200;
    b = x + y;
         if (a % b == 0}
         printf(“%d”, a);
  6. Induction Variable and Strength Reduction :
    • An induction variable is used in loop for the following kind of assignment i = i + constant.
    • Strength reduction means replacing the high strength operator by the low strength.
    i = 1;                                                                     
    while (i<10)                                                         
        y = i * 4;
    //After Reduction
    i = 1
    t = 4
       while( t<40)
       y = t;
       t = t + 4;

L49:Code Optimization in Compiler Design

L49:Code Optimization in Compiler Design
L49:Code Optimization in Compiler Design , Basic Blocks, Flow Graph by University Academy

Optimization is a program transformation technique, which tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.
In optimization, high-level general programming constructs are replaced by very efficient low-level programming codes. A code optimizing process must follow the three rules given below:
  • The output code must not, in any way, change the meaning of the program.
  • Optimization should increase the speed of the program and if possible, the program should demand less number of resources.
  • Optimization should itself be fast and should not delay the overall compiling process.
Efforts for an optimized code can be made at various levels of compiling the process.
  • At the beginning, users can change/rearrange the code or use better algorithms to write the code.
  • After generating intermediate code, the compiler can modify the intermediate code by address calculations and improving loops.
  • While producing the target machine code, the compiler can make use of memory hierarchy and CPU registers.
Optimization can be categorized broadly into two types : machine independent and machine dependent.

Machine-independent Optimization

In this optimization, the compiler takes in the intermediate code and transforms a part of the code that does not involve any CPU registers and/or absolute memory locations. For example:
= 10;
= value + item;
} while(value<100);
This code involves repeated assignment of the identifier item, which if we put this way:
Item = 10;
= value + item;
} while(value<100);
should not only save the CPU cycles, but can be used on any processor.

Machine-dependent Optimization

Machine-dependent optimization is done after the target code has been generated and when the code is transformed according to the target machine architecture. It involves CPU registers and may have absolute memory references rather than relative references. Machine-dependent optimizers put efforts to take maximum advantage of memory hierarchy.

Basic Blocks

Source codes generally have a number of instructions, which are always executed in sequence and are considered as the basic blocks of the code. These basic blocks do not have any jump statements among them, i.e., when the first instruction is executed, all the instructions in the same basic block will be executed in their sequence of appearance without losing the flow control of the program.
A program can have various constructs as basic blocks, like IF-THEN-ELSE, SWITCH-CASE conditional statements and loops such as DO-WHILE, FOR, and REPEAT-UNTIL, etc.

Basic block identification

We may use the following algorithm to find the basic blocks in a program:
  • Search header statements of all the basic blocks from where a basic block starts:
    • First statement of a program.
    • Statements that are target of any branch (conditional/unconditional).
    • Statements that follow any branch statement.
  • Header statements and the statements following them form a basic block.
  • A basic block does not include any header statement of any other basic block.
Basic blocks are important concepts from both code generation and optimization point of view.

Basic Blocks

Basic blocks play an important role in identifying variables, which are being used more than once in a single basic block. If any variable is being used more than once, the register memory allocated to that variable need not be emptied unless the block finishes execution.

Control Flow Graph

Basic blocks in a program can be represented by means of control flow graphs. A control flow graph depicts how the program control is being passed among the blocks. It is a useful tool that helps in optimization by help locating any unwanted loops in the program.

Control Flow Graph

L48:Error Detection And Recovery, Lexical Phase Error,syntactic phase errors,Semantic Phase Error

L48:Error Detection And Recovery, Lexical Phase Error,syntactic phase errors,Semantic Phase Error
L48:Error Detection And Recovery, Lexical Phase Error,syntactic phase errors,Semantic Phase Error

Syntactic phase errors
These errors are detected during syntax analysis phase. Typical syntax errors are
  • Errors in structure
  • Missing operator
  • Misspelled keywords
  • Unbalanced parenthesis
Example : swicth(ch)
The keyword switch is incorrectly written as swicth. Hence, “Unidentified keyword/identifier” error occurs.
Error recovery:
  1. Panic Mode Recovery
    • In this method, successive characters from input are removed one at a time until a designated set of synchronizing tokens is found. Synchronizing tokens are deli-meters such as ; or }
    • Advantage is that its easy to implement and guarantees not to go to infinite loop
    • Disadvantage is that a considerable amount of input is skipped without checking it for additional errors
  2. Statement Mode recovery
    • In this method, when a parser encounters an error, it performs necessary correction on remaining input so that the rest of input statement allow the parser to parse ahead.
    • The correction can be deletion of extra semicolons, replacing comma by semicolon or inserting missing semicolon.
    • While performing correction, atmost care should be taken for not going in infinite loop.
    • Disadvantage is that it finds difficult to handle situations where actual error occured before point of detection.
  3. Error production
    • If user has knowledge of common errors that can be encountered then, these errors can be incorporated by augmenting the grammar with error productions that generate erroneous constructs.
    • If this is used then, during parsing appropriate error messages can be generated and parsing can be continued.
    • Disadvantage is that its difficult to maintain.
  4. Global Correction
    • The parser examines the whole program and tries to find out the closest match for it which is error free.
    • The closest match program has less number of insertions, deletions and changes of tokens to recover from erroneous input.
    • Due to high time and space complexity, this method is not implemented practically.
Semantic errors
These errors are detected during semantic analysis phase. Typical semantic errors are
  • Incompatible type of operands
  • Undeclared variables
  • Not matching of actual arguments with formal one
Example : int a[10], b;
a = b;
It generates a semantic error because of an incompatible type of a and b.
Error recovery
  • If error “Undeclared Identifier” is encountered then, to recover from this a symbol table entry for corresponding identifier is made.
  • If data types of two operands are incompatible then, automatic type conversion is done by the compiler.

L46:Compiler Design Tutorial,Run Time Storage Administration,Stack,Block Structured Language hindi

L46:Compiler Design Tutorial,Run Time Storage Administration,Stack,Block Structured Language hindi
L46:Compiler Design Tutorial,Run Time Storage Administration,Stack,Block Structured Language hindi

A programming language that permits the creation of blocks, including blocks nested within other blocks, is called a block-structured programming language. 
A block consists of a sequence of statements and/or blocks, preceded by declarations of variables.
like in any programming language in which sections of source code contained within pairs of matching delimiters such as “” and “” (e.g. in C ) or “begin” and “end” (e.g. Algol ) are executed as a single unit. 
A block of code may be the body of a subroutine or function, or it may be controlled by conditional execution (if statement) or repeated execution (while statement, for statement, etc.). 

{ int a, b, c;
  a = 0; b = 0; c = 0;   // this is block 1

int c, d;
c = a + 1;    //  this is block 2 
d = c;

string a; int c;       // this is block 3
print c;
print d

L45:Compiler Design Tutorial,Run Time Storage Administration,Simple stack allocation scheme in hindi

L45:Compiler Design Tutorial,Run Time Storage Administration,Simple stack allocation scheme in hindi
L45:Compiler Design Tutorial,Run Time Storage Administration,Simple stack allocation scheme in hindi

A program as a source code is merely a collection of text (code, statements etc.) and to make it alive, it requires actions to be performed on the target machine. A program needs memory resources to execute instructions. A program contains names for procedures, identifiers etc., that require mapping with the actual memory location at runtime.
By runtime, we mean a program in execution. Runtime environment is a state of the target machine, which may include software libraries, environment variables, etc., to provide services to the processes running in the system.
Runtime support system is a package, mostly generated with the executable program itself and facilitates the process communication between the process and the runtime environment. It takes care of memory allocation and de-allocation while the program is being executed.

Activation Trees

A program is a sequence of instructions combined into a number of procedures. Instructions in a procedure are executed sequentially. A procedure has a start and an end delimiter and everything inside it is called the body of the procedure. The procedure identifier and the sequence of finite instructions inside it make up the body of the procedure.
The execution of a procedure is called its activation. An activation record contains all the necessary information required to call a procedure. An activation record may contain the following units (depending upon the source language used).
Temporaries Stores temporary and intermediate values of an expression.
Local Data Stores local data of the called procedure.
Machine Status Stores machine status such as Registers, Program Counter etc., before the procedure is called.
Control Link Stores the address of activation record of the caller procedure.
Access Link Stores the information of data which is outside the local scope.
Actual Parameters Stores actual parameters, i.e., parameters which are used to send input to the called procedure.
Return Value Stores return values.
Whenever a procedure is executed, its activation record is stored on the stack, also known as control stack. When a procedure calls another procedure, the execution of the caller is suspended until the called procedure finishes execution. At this time, the activation record of the called procedure is stored on the stack.
We assume that the program control flows in a sequential manner and when a procedure is called, its control is transferred to the called procedure. When a called procedure is executed, it returns the control back to the caller. This type of control flow makes it easier to represent a series of activations in the form of a tree, known as the activation tree.
To understand this concept, we take a piece of code as an example:
. . .
(“Enter Your Name: “);
(“%s”, username);
(“Press any key to continue…”);
. . .
int show_data(char *user)
(“Your name is %s”, username);
return 0;
. . .
Below is the activation tree of the code given.

Activation Tree

Now we understand that procedures are executed in depth-first manner, thus stack allocation is the best suitable form of storage for procedure activations.

Storage Allocation

Runtime environment manages runtime memory requirements for the following entities:
  • Code : It is known as the text part of a program that does not change at runtime. Its memory requirements are known at the compile time.
  • Procedures : Their text part is static but they are called in a random manner. That is why, stack storage is used to manage procedure calls and activations.
  • Variables : Variables are known at the runtime only, unless they are global or constant. Heap memory allocation scheme is used for managing allocation and de-allocation of memory for variables in runtime.

Static Allocation

In this allocation scheme, the compilation data is bound to a fixed location in the memory and it does not change when the program executes. As the memory requirement and storage locations are known in advance, runtime support package for memory allocation and de-allocation is not required.

Stack Allocation

Procedure calls and their activations are managed by means of stack memory allocation. It works in last-in-first-out (LIFO) method and this allocation strategy is very useful for recursive procedure calls.

Heap Allocation

Variables local to a procedure are allocated and de-allocated only at runtime. Heap allocation is used to dynamically allocate memory to the variables and claim it back when the variables are no more required.
Except statically allocated memory area, both stack and heap memory can grow and shrink dynamically and unexpectedly. Therefore, they cannot be provided with a fixed amount of memory in the system.

Heap Allocation

As shown in the image above, the text part of the code is allocated a fixed amount of memory. Stack and heap memory are arranged at the extremes of total memory allocated to the program. Both shrink and grow against each other.

Parameter Passing

The communication medium among procedures is known as parameter passing. The values of the variables from a calling procedure are transferred to the called procedure by some mechanism. Before moving ahead, first go through some basic terminologies pertaining to the values in a program.


The value of an expression is called its r-value. The value contained in a single variable also becomes an r-value if it appears on the right-hand side of the assignment operator. r-values can always be assigned to some other variable.


The location of memory (address) where an expression is stored is known as the l-value of that expression. It always appears at the left hand side of an assignment operator.
For example:
day = 1;
= day * 7;
= 1;
= month * 12;
From this example, we understand that constant values like 1, 7, 12, and variables like day, week, month and year, all have r-values. Only variables have l-values as they also represent the memory location assigned to them.
For example:
7 = x + y;
is an l-value error, as the constant 7 does not represent any memory location.

Formal Parameters

Variables that take the information passed by the caller procedure are called formal parameters. These variables are declared in the definition of the called function.

Actual Parameters

Variables whose values or addresses are being passed to the called procedure are called actual parameters. These variables are specified in the function call as arguments.
int actual_parameter = 10;
call fun_two
(int actual_parameter);
(int formal_parameter)
print formal_parameter;
Formal parameters hold the information of the actual parameter, depending upon the parameter passing technique used. It may be a value or an address.

Pass by Value

In pass by value mechanism, the calling procedure passes the r-value of actual parameters and the compiler puts that into the called procedure’s activation record. Formal parameters then hold the values passed by the calling procedure. If the values held by the formal parameters are changed, it should have no impact on the actual parameters.

Pass by Reference

In pass by reference mechanism, the l-value of the actual parameter is copied to the activation record of the called procedure. This way, the called procedure now has the address (memory location) of the actual parameter and the formal parameter refers to the same memory location. Therefore, if the value pointed by the formal parameter is changed, the impact should be seen on the actual parameter as they should also point to the same value.

Pass by Copy-restore

This parameter passing mechanism works similar to ‘pass-by-reference’ except that the changes to actual parameters are made when the called procedure ends. Upon function call, the values of actual parameters are copied in the activation record of the called procedure. Formal parameters if manipulated have no real-time effect on actual parameters (as l-values are passed), but when the called procedure ends, the l-values of formal parameters are copied to the l-values of actual parameters.
int y; 
= 10;
(y); //l-value of y is passed
printf y
; //prints 99
(int x)
= 99; // y still has value 10 (unaffected)
= 0; // y is now 0
When this function ends, the l-value of formal parameter x is copied to the actual parameter y. Even if the value of y is changed before the procedure ends, the l-value of x is copied to the l-value of y making it behave like call by reference.

Pass by Name

Languages like Algol provide a new kind of parameter passing mechanism that works like preprocessor in C language. In pass by name mechanism, the name of the procedure being called is replaced by its actual body. Pass-by-name textually substitutes the argument expressions in a procedure call for the corresponding parameters in the body of the procedure so that it can now work on actual parameters, much like pass-by-reference.

L43:Compiler Design Tutorial,Symbol Table, operation on Symbol Table HIndi by University Academy

L43:Compiler Design Tutorial,Symbol Table, operation on Symbol Table HIndi by University Academy

What are the different operations provided by symbol table?

The following operations are provided by the linear or hash symbol table:


During the analysis phase, this operation is used, where the tokens are identified and the names are stored in the table. Information is added in the symbol table by using this operation. The format in which names are stored depends on in hand compiler.
The information associated with the symbol attributes to the symbol attribute. The information contains value, state, scope, and type about the symbol. The symbol along with its attributes is taken by insert() function as arguments and the information is stored in the symbol state.
For instance:
int a;
should be processed by the compiler as:
insert(a, int);


A name in the symbol table is searched by this operation, to determine:
  • The existence of symbol in the table.
  • The declaration of the symbol before it is used.
  • Checking whether the name is used in the scope.
  • Initialization of the symbol.
  • Check whether the symbol is declared multiple times.
In accordance with the programming language, the format of the lookup() function varies. The basic format is as follows:
Zero is returned if symbol does not exist in the symbol table. And the attributes stored in the table are returned if the symbol exists in the symbol table.
NewsletterFor latest information