SUDS User Guide

 

Eric Larson

elarson@seattleu.edu

Seattle University

 

Table of Contents

 

PART I: Using SUDS

1. Installing and Using SUDS. 1

2. Phases of SUDS. 4

3. Instrumentation. 9

 

PART II: Internals of SUDS

4. Simplified C Grammar 19

5. Simplification. 23

6. Data Flow Analysis. 25

7. Data Structures. 27

 

PART III: Issues

8. Language Restrictions. 35

9. Other Known Problems. 37

 

References. 39

 

 

 

1.    Installing and Using SUDS

 

1.1         Installing SUDS

 

To install SUDS:

 

1. Switch to the top level directory of suds.

2. Type: configure

3. Type: make

 

Notes:

- SUDS uses the dynamic bitset library from boost.  This dependency is not checked by the configure script.  You may need to install this library which I believe is included in most Linux distributions.

- Running 'make install' merely copies the executables into the bin directory provided in the distribution.

 

1.2         Running SUDS

 

When you make SUDS, you will notice that seven executables are created.  Each is built using a different instrumentation model:

 

suds-inref       Input reference checker

suds-strchk     String checker

suds-array       Array checker

suds-arith    Input arithmetic checker

suds-null     Null dereference checker

suds-trace       Traces function calls and returns

suds-none        No instrumentation model

 

SUDS only works with preprocessed source code files (.i).  Initially you will need to create the .i files using your compiler. (For gcc, use the –E switch).

 

The most basic command line for SUDS is this:

 

suds-null sample.i

 

This will create an instrumented file sample.suds.c.

 

You can control the suffix of the file using the suffix switch like this:

 

suds-null –suffix null sample.i

 

This will create the instrumented file sample.null.c.

 

Since SUDS can work on the whole program, you can specify multiple files in a separate list file.  Each line of the file contains the name of a file without the .c or .i suffix.  Then use the –f switch on the command line to specify the name of the file like this:

 

suds-null –f file_list.txt

 

This command will create a .suds.c file for every file in the list.

 

Once an instrumented file is created, it can be compiled using gcc or another C compiler.  During the linking process it is necessary to link in the proper model that contains the instrumentation code.  The models are located in the models directory:

 

libinref.a      Input reference checker          

libstrchk.a    String checker

libarray.a      Array checker

libarith.a      Input arithmetic checker

libnull.a        Null dereference checker

libtrace.a      Traces function calls and returns        

 

Here is an overview of the process:

 

image001.gif

 

Once the instrumented executable is created, run the program normally.  The only difference is that the instrumented executable will print additional output related to the instrumentation model.

 

To run the static analysis phases of SUDS, you must use the –dfa switch to run the basic data flow analysis options.  Then you specify if you want to run the –taint analysis or one of the five program slicing modes (see section 2.10 for more information).  This command line runs taint analysis and slices based on dangerous statements in the function foo:

 

suds-inref –dfa –taint –slice_fn_danger foo sample.i

 

Note: By default, the definitions of tainted and dangerous are tied to the input reference model.  Using these modes for other models will have no effect on the instrumented code.  However, it is possible to modify SUDS to change these definitions when creating your own models.


1.3         Command Line Options in SUDS

 

-help, -h

show command line options

-version, -V  

show the version number

-file <file>, -f <file>

file that contains a list of files to process

-suffix <string>

suffix of the output file

-base

emit the code as parsed (no simplification)

-simple

emit the code and exit after simplification

-dfa 

run data flow analyses

-taint

run tainted propagation algorithm

-no_output

do not create output file

-slice_all_danger

run program slicing (all dangerous statements)

-slice_fn_any <func>

run program slicing (all statements in function)

-slice_fn_danger <func>

run program slicing (all dangerous statements in function)

-slice_stmt_any <stmt>

run program slicing (individual statement, all uses)

-slice_stmt_danger <stmt>

run program slicing (individual statement, danger uses)

-ptr_flow_insens

use flow insensitive pointer analysis

-slice_ignore_ctrl

use ignore indirect control slicing

-instr_mode

mode for instrumentation

-debug <mode>, -d <mode>

print debugging information

-stats <mode> <tag> <file>

print stats

-dump_cfg <file>

dump control flow graph to file

-dump_dfa <file>

dump data flow analysis to file

-dump_dstmt <file>

dump dangerous statements to file

-dump_dfunc <file>

dump dangerous functions to file

-dump_slice <file>

dump program slicing information to file

-dump_syms <file>

dump symbol tables to file

-dump_vars <file>

dump variable information to file

-dump_defs <file>

dump definitions to file

 

 

2.    Phases of SUDS

 

2.1         Parsing (gram.y, lexer.l, parse.cpp)

 

The parser only works with preprocessed source code (.i file from gcc).  This phase creates the AST and creates the symbol tables.  Multiple files can be parsed with one invocation of SUDS creating a large AST that spans across files.

 

While almost all C programs can be parsed, the parser is not as mature as the parser in gcc or other compilers.  In particular, it does not perform robust error checking or designed to give feedback on different types of errors.  In addition, some legitimate C++ programs may not compile successfully.  Section 8 describes the language restrictions.

 

The AST that is created cannot be directly used by the static analysis or instrumentation phases.  The code must pass through the simplification phase first.  However, it is possible to directly emit the parsed code using the "-base" option.  This is useful if you want to modify the parser and test your changes.

 

The parsing and symbol table code is a modified version of the parser and symbol table found in cTool [1].

 

2.2         Create initializer functions (init.cpp)

 

This phase creates special functions – one for each file that is a placeholder for simplified code and instrumented code that is related to initializers of global variables (not part of any function).  In addition, a special function, called suds_init, is created for the file that contains main that calls all of these initializer functions.  Finally, a call to suds_init is added to the start of main.

 

2.3         Simplification (simplify.cpp)

 

The simplification phase converts the program into a simpler format.  The output of this phase is a program that is equivalent to the original program.  The simpler format allows subsequent analysis phases to be less complex and permits the insertion of instrumentation anywhere in the code.  Some highlights of the simplification process:

 

·         All right-hand side expressions of an assignment operator only have one operator.  For example, the expression a + b – ( c * d) is broken down into multiple statements.

·         All side-effects (such as '++') or short-circuited operators are removed and replaced by equivalent statements.

·         All loop and control conditions are replaced by a single variable or constant.

·         Address of operations involving arrays (such as '&a[5]') are replaced by equivalent addition operations (such as 'a + 5' in this case).

 

The entire simplified grammar is shown in Section 4.  More implementation notes are found in Section 5.

 

2.4         Finalize AST (ast.cpp)

 

This phase completes the creation of the AST and other key data structures.  It does the following:

 

1.      Creates variable objects and attaches them to when they are used in expressions and statements.

2.      Creates special statements that mark the end of a control statement.

3.      Replaces function call expression with two statements: a call statement and a return from statement. This separates the functionality of making the call and the assignment of the return value after the call is complete.

4.      Global variables (those declared outside of a function) are marked as such.

5.      Extra “dummy” variables are created for each call to malloc (or other dynamic allocation functions).  Dynamic memory is modeled by call site.  All variables allocated at a particular call site are lumped together.  Since dynamic memory could be used to create a structure containing an array, an array variable is created for each dynamic memory variable.

6.      The simplified statements are numbered (unique id).  This allows the user to specify statements as slicing criteria.

7.      A set of local variables used in created in each function.  During data flow analysis, a set of global variables is created.

 

2.5         CFG Creation (cfg.cpp)

 

The control flow graph is created by dividing each function into a set of basic blocks and computing the predecessor and successor sets for each basic block.  This is done intraprocedurally.  Function calls are treated as basic block ending points with the lone successor being the basic block immediately following the function call. Useless basic blocks are removed: any block that has no backwards path to the first basic block of the function.

 

2.6         Call Graph Creation (call.cpp)

 

To handle interprocedural control flow, a call graph is created.  The call graph is not a separate structure, instead additional information is tracked with the following statements:

 

 

The set of basic blocks, predecessors and successors in particular,  remain unchanged.

 

Since creating the call graph requires knowledge of where function pointers point to, the call graph is created in tandem with pointer analysis. 

 

2.7         Pointer Analysis (pointer.cpp)

 

The interprocedural pointer analysis developed by Hind et. al. [4] is used.  The information is stored in a PointsToInfo class which maps variables (represented by their id number) to the variables it can point to.

 

The algorithm can either be run in a flow-sensitive or flow-insensitive manner via the command line option –ptr_flow_insens.  In the flow sensitive version, the points-to information is stored for each basic block.  Due to space issues, the points-to information is recomputed for each statement when necessary.  The flow-insensitive algorithm stores all of the points-to relationships into one global instance of the PointsToInfo class.

 

2.8         Data Flow Analysis (data.cpp)

 

The data flow analysis phase is used to create these sets:

 

·         defsout:     The set of definitions generated by a statement or basic block.

·         killed:        The set of definitions no longer active after the statement or basic block.

·         reaches:     The set of active definitions at the beginning of a basic block or statement.

·         used:          The set of definitions that reach the statement and are used as input to the statement.

 

There are four steps to determining these sets:

1.      Create the defsout set for each statement.

2.      Create the killed set for each statement.

3.      Create the defsout and killed sets for each basic block.

4.      Create the set of reaching definitions for each basic block.

 

Due to memory space issues, the used sets are recreated for statements when necessary.

 

The analysis is done intraprocedurally.  For each call site, live definitions are propagated from the caller to the callee function on a parameter by parameter basis. At the end of the function, definitions that refer to global variables are propagated to all return points. Other definitions are only propagated back to the caller if the definition refers to a variable that had at least one live definition prior to the call. Though the algorithm is not truly context-sensitive, this last rule restricts the flow of definitions from a call site to a completely different return point. Any definitions live in the caller just prior to the call are also considered live just after the call.

 

A summary of data flow analysis is found in Section 6.

 

2.9         Dangerous Operation Identification (danger.cpp)

 

This is not actually a phase.  The code in danger.cpp implements the isDangerous member function for statements.  This can be used either as a criterion in program slicing and/or to further focus the instrumentation.

 

Currently, the set of dangerous operations includes array references and pointer dereferences.  This can easily be modified by controlling which uses are considered dangerous.

 

2.10     Program Slicing (slice.cpp)

 

This will identify statements that are part of a slice based on a criterion.   There are five ways for specifying the slicing criterion on the command line:

 

-slice_all_danger                  includes all dangerous statements in the program

-slice_fn_any <func>             includes all statements in the given function

-slice_fn_danger <func>        includes all dangerous statements in the given function

-slice_stmt_any <stmt>     includes the given statement only (all uses)

-slice_stmt_danger <stmt>    includes the given statement only (dangerous uses only)

 

A backwards slicing algorithm is employed starting with the slicing criterion (the operation under investigation). All definitions that are used in the statement(s) in the criterion are added to the slice. Then, each statement is analyzed. If any definition generated by a statement is in the slice, then all definitions used by the statement are added to the slice. Indirect uses from control statements are also added to the slice by default: if any statement in a control statement is in the slice, then the control statement and its uses are added to the slice. The user can exclude indirect uses by use of a command-line switch (-slice_ignore_ctrl). The algorithm is interprocedural and context-insensitive. Once a definition is added to the slice, it remains in the slice for the duration of the algorithm. The algorithm continues to iterate until steady state (no definitions enter the slice).

 

Function calls are handled as follows:

·         If the definition associated with the return value (in the ReturnFromStmt) is in the slice, then uses corresponding to any of the callee return statements are added to the slice.  Both the ReturnFromStmt and ReturnStmt are added to the slice.

·         Any variable that is pointed to by an argument in the function call or global variable will have a definition in defsout of the CallStmt.  If any of these definitions are in the slice, definitions of the corresponding variables that reach the end of the function are added to the slice.  This is done on a parameter by parameter basis.

·         If an initial parameter definition is in the slice, then the corresponding function call argument is added to the slice.  This is done on a parameter by parameter basis.

·         If an initial definition of a global variable is in the slice, then any definitions of the global variable that reach the function call are added to the slice.

·         For system calls, if any of the generated definitions of the CallStmt or ReturnFromStmt are in the slice, then all of the uses (arguments and the variables they point to) are added to the slice.

·         The algorithm is context-insensitive but the method for adding functions calls is restricted, preventing propagation of the slice into areas of the code that are clearly not part of the slice.  A function is only added to the slice if one of the following are true:

o   The function contains a statement in the slicing criterion.

o   The function can be reached on a backwards traversal from a function containing a statement in the slicing criterion back to main AND there is at least one use in the slice is active at the end of the function.  This is tracked by the propToSlice set.

o   The function is called from a function that meets one of the two above criteria AND the slice propagates into that function in some fashion (return value, global variables, output parameters, etc.) 

 

After definitions and statements are marked in the slice, variables are marked in the slice if at least one definition is in the slice.  Then, functions are marked in the slice if they have at least one statement in the slice.

 

2.11     Tainted Propagation (taint.cpp, taint-init.cpp)

 

This algorithm identifies which variables and definitions are tainted directly and indirectly.  The initial set of tainted definitions and statements is specified in the file taint-init.cpp.   Currently, variables that come from input are marked as tainted.

 

The tainted propagation algorithm works in reverse of the program slicing algorithm.  It propagates tainted information from the uses to the definitions.  For simple assignment statements (no arithmetic calculations), if any use is in a given state, all definitions are now updated to be in that state.  Again, parameters are handled on an individual basis.

 

For statements involving arithmetic operations, the taint state is computer based on the operation and the state of the operands.

 

2.12        Instrumentation (instr.cpp)

 

The instrumentation phases traverses the AST and adds instrumentation code based on the instrumentation model.  This phase is described in more detail in Section 3.

 

2.13        Output Printing (print.cpp)

 

The output phase traverses the simplified statements in program order and prints them to a file, including any instrumentation statements.  The output file can be subsequently compiled by a C compiler.

 


3.    Instrumentation

 

To add instrumentation, it is necessary to either use or create an instrumentation director.  This section describes how to create an instrumentation director.

 

The first step to creating a model is to provide an implementation for the InstrDirector class listed located in instr.h.  Each of the member functions refers to a different code construct or event in the program.  A list of functions is provided in Section 3.1.  In the corresponding implementation file, simply fill in the functions for the constructs that you want to add instrumentation.  Leave the function empty for constructs that do not need instrumentation.

 

Pre-existing instances of instrumentation directors are available in the src/directors directory in SUDS.  You can either use any of the existing instrumentation directors as a starting point or use instr-none.cpp which starts with all of the member functions as empty.

 

The instrumentation director interface has several data members that can be used within the instrumentation routine to determine what instrumentation is necessary and/or to pass into instrumentation routines as parameters.

 

bool inLhs;                  // true if processing the lhs of an assignment

bool isGlobal;               // true if processing global variable

class Expr *currLhs;         // current value of lhs if processing rhs

class Expr *currRhs;         // current value of rhs if processing lhs

class Location *currLoc;     // current location

class FunctionDef *currFunc; // current function

class Stmt *currStmt;        // current statement

unsigned int uniqueId;       // unique identifier

 

Each member function has one and/or two statement lists as pass-by-reference parameter(s).  Instrumented code is to be added to this list using a helper function more later).  Any instrumented statements added to the before list are inserted before the corresponding statement.  Similarly, any statements added to the after list are inserted after the corresponding.  Instrumentation is always added at the boundary of simplified statements.  In functions that have both a before and after list as parameters, you can choose which list to add to (or both) based on the model you are trying to create.  However, many constructs force you to either add instrumentation before or after the corresponding statement.

 

Most member functions contain a pointer to the expression, statement, or variable in question.  This allows you to apply different instrumentation based on the type of an operation.  For instance, an addition operation involving integers is likely to have different instrumentation that an addition operation involving a pointer and an integer.  Expressions that can appear on the left-hand of an assignment have a Boolean parameter isLhs that is true if the expression appears on the left-hand side of an assignment expression and false if it appears on the right-hand side.  Expressions that do not have this parameter cannot appear on the left-hand side after simplification.

 

Member functions relating to variable declarations, function parameters, and return values use an offset mechanism to distinguish between different parameters in the same parameter list and for individual fields with a variable / parameter / return value is a struct.  The offset parameter into these member functions allow you to link up function call arguments to function parameters.  For instance, instrumentCallExprArg with offset 3 can be paired up with instrumentBirthParm with offset 3.

 

Another unique feature with member functions that use the offset mechanism is that they use a string to store the expression opposed to using an instance of an expression class.  The reason for this is two-fold: 1. Some of these events don't correspond to expressions (like death of a variable).  2. To avoid creating temporary expression classes for each member of a struct.  Since the string does not convey type information, the type of the expression is passed separately.  The function instrumentFieldCopy also employs strings in a similar fashion.

 

There are also two functions beginInstrumentation and endInstrumentation that are called before and after the instrumentation phase respectively.  The beginInstrumentation function can be used to initialize or allocate any variables needed by the instrumentation model.  The endInstrumentation function can be used to perform any post-processing such as gather statistics or performing sanity checks.

 

Once you determine which constructs the need instrumentation, you need to add code that inserts the instrumented code into the proper list.  There are several helper functions in instr-call.cpp to assist in this process.  Here is the most common flow which adds a function call to one of the instrumentation routines:

 

1. Initialize the call by calling initInstrCall. It takes one optional Boolean parameter which if set to true will automatically add parameters for the file name, line number, and unique identifier associated with this instrumentation call.

2. Add parameters to the function. The table below outlines the different routines are available depending on the type and use of parameter.

3. Complete the call by calling addInstrCall. This function takes two parameters: a list to add the instrumentation statement to (either the before or after list) and the name (string) of the instrumentation routine to call.

 

Routine

If x is ___

Then,

add ___

Used For

addStringParm(string x);

foo

foo

Any expression in string form. Useful for constructing your own expressions by concatenating different stings.

addStringConstantParm(string x);               

foo

“foo”

Any string used as a constant within the instrumentation routine.

addIntegerParm(int x);

23

23

Any integer constant.

addExprParm(Expr *x);            

a[i]

a[i]

The value of an expression or variable.

addStringExprParm(Expr *x);

a[i]

“a[i]”

The expression as a string - useful for debugging and diagnostics.

addAddrOfExprParm(Expr *x);

a[i]

&(a[i])

The address of an expression (assuming it is addressable); useful for models that track variables by address.

addDeclParm(Decl *x);

a

a

The variable associated with a declaration.             

addSizeParm(Type *x);

int

sizeof(int)

The size of a type.

addObjSizeParm(Type *x);      

int *

sizeof(int)

The size of the type pointed to (or the size of the element for arrays).

addFileNameParm();

-

-

The current file name.

addLineNumParm();

-

-

The current line number.

addUniqueIdParm();

-

-

A unique integer that is incremented after each call to an instrumentation routine. Used to distinguish between multiple calls that occur on the same source code line.

 

  

3.1         Instrumentation Model Member Functions

 

Begin/End Functions

before

list

after

list

isLhs

parameter

offset

parameter

instrumentBeginProgram

Beginning of a program (start of main)

No

Yes

No

No

instrumentEndProgram

End of the program

Yes

No

No

No

instrumentBeginFunction

Beginning of a function

No

Yes

No

No

instrumentEndFunction

End of a function

Yes

No

No

No

instrumentBeginBlock

Beginning of a statement block '{'

No

Yes

No

No

instrumentEndBlock

End of a statement block '}'

Yes

No

No

No

Declaration instrumentation functions

before

list

after

list

isLhs

parameter

offset

parameter

instrumentBirthVariable

Declaration of a variable

No

Yes

No

Yes

instrumentDeathVariable

Lifetime of variable ends

Yes

No

No

Yes

Parameter / return value instrumentation functions

before

list

after

list

isLhs

parameter

offset

parameter

instrumentCallExprArg

Function call - called for each argument

Yes

No

No

Yes

instrumentBirthParm

Parameter declaration - called for each parameter

No

Yes

No

Yes

instrumentBirthMainParm

Parameter declaration in main - called for each parameter

No

Yes

No

Yes

instrumentDeathParm

Lifetime of parameter ends - called for each parameter

Yes

No

No

Yes

instrumentDeathMainParm

Lifetime of parameter in main ends - called for each parameter

Yes

No

No

Yes

instrumentReturnVal

A value is returned (callee side) - called for each data member if a struct is returned

Yes

No

No

Yes

instrumentReturnFromVal

A value is returned (caller side) - called for each data member if a struct is returned

No

Yes

No

Yes

Stmt instrumentation functions

before

list

after

list

isLhs

parameter

offset

parameter

instrumentExprStmt

expression statement

Yes

Yes

No

No

instrumentIfStmt

if statement

Yes

No

No

No

instrumentElseStmt

else statement

No

Yes

No

No

instrumentSwitchStmt

switch statement

Yes

No

No

No

instrumentForStmt

for loop

Yes

No

No

No

instrumentWhileStmt

while loop

Yes

No

No

No

instrumentDoStartStmt

start of a do-while loop

Yes

No

No

No

instrumentDoEndStmt

end of a do-while loop

No

Yes

No

No

instrumentGotoStmt

goto statement

Yes

No

No

No

instrumentBreakStmt

break statement

Yes

No

No

No

instrumentContinueStmt

continue statement

Yes

No

No

No

instrumentCallStmt

function call

Yes

No

No

No

instrumentSystemCall

function call to system function or any function where source code is not present

Yes

Yes

No

No

instrumentReturnStmt

return stmt

Yes

No

No

No

instrumentReturnFromStmt

returning from a function call

No

Yes

No

No

instrumentVaStartStmt

va_start statement

Yes

Yes

No

No

instrumentVaEndStmt

va_end statement

Yes

Yes

No

No

instrumentLabelStmt

label

No

Yes

No

No

Expr instrumentation functions

before

list

after

list

isLhs

parameter

offset

parameter

 

instrumentConstant

constant

Yes

Yes

No

No

instrumentStringConstant

string constant

Yes

Yes

No

No

instrumentRhsVar

lone variable expression

Yes

Yes

Yes

No

instrumentRhsUnaryExpr

unary expression

Yes

Yes

No

No

instrumentRhsBinaryExpr

binary expression

Yes

Yes

No

No

instrumentRhsRelExpr

relational expression

Yes

Yes

No

No

instrumentRhsCastExpr

cast expression

Yes

Yes

No

No

instrumentRhsSizeofExpr

sizeof expression

Yes

Yes

No

No

instrumentRhsAddrOfExpr

address of expression

Yes

Yes

No

No

instrumentRhsAddrArrowExpr

address of expression involving member selection via pointer

Yes

Yes

No

No

instrumentRhsDerefExpr

pointer dereference

Yes

Yes

Yes

No

instrumentRhsMemberExpr

member selection

Yes

Yes

Yes

No

instrumentRhsPtrMemberExpr

member selection via pointer

Yes

Yes

Yes

No

instrumentRhsArrayExpr

array reference

Yes

Yes

Yes

No

instrumentRhsVaArgExpr

va_arg expression

Yes

Yes

No

No

instrumentRhsBuiltinExpr

built-in expression to compiler

Yes

Yes

No

No

instrumentCastBirth

reassignment of a variable to a new type – called for each data member if new type is a struct

No

Yes

No

No

instrumentFieldCopy

field assignment when assigning one struct to another - called for each data member of the struct

Yes

Yes

No

No

 

 

4.    Simplified C Grammar

 

This section describes the grammar of a program that is the output of the simplification process. Simplifying the pro­gram significantly reduces the complexity of future data analysis and instrumentation phases. The grammar is based on formats used in GCC [2] and Hendren et. al. [3].

 

4.1         Statements

 

stmt

: comp_stmt

| assign_expr ';'

| IF '(' bool_val ')' comp_stmt

| IF '(' bool_val ')' comp_stmt ELSE comp_stmt

| SWITCH '(' bool_val ')' comp_stmt

| FOR '(' ',' bool_val ',' ')' comp_stmt

| WHILE '(' bool_val ')' comp_stmt

| DO comp_stmt WHILE '(' bool_val ')'

| BREAK ';'

| CONTINUE ';'

| GOTO IDENT ';'

| IDENT ':'                /* Label – treated as separate statement */

| CASE INTEGER_CST ':'

| function_call ';'

| RETURN ';'

| RETURN val ';'

| VA_START '(' IDENT ',' IDENT ')' ';'

| VA_END '(' IDENT ')' ';'

| decl_stmt ';'

 

comp_stmt

: '{' stmt_list '}'

| '{' '}'

 

stmt_list

: stmt_list stmt

| stmt

 

assign_expr  

       : lhs '=' rhs

 

function_call

       : IDENT '(' arg_list ')'

| IDENT '(' ')'

| lhs = IDENT '(' arg_list ')'

| lhs = IDENT '(' ')'

 

arg_list

       : arg_list ',' val

       | val

 

Notes:

 

·         Control constructs are simplified by having a test condition of a single variable or constant.  This variable or constant must be a Boolean (integer) type.  Pointers are replaced by comparisons to 0.

·         After simplification, only two types of expression statements are allowed: assignments and function calls (which can optionally have an assignment).  Function calls can only have argument lists that consist of a single variable or constant.

·         Scopes have been added (using ‘{‘ and ‘}’) for loop bodies, then clauses, else clauses, and switch bodies. This allows for statements and declara­tions to be added at any point in the program.


4.2         Expressions

 

lhs

       : IDENT

| deref_expr

| member_expr

| ptr_member_expr

       | array_ref

| ind_array_ref

      

rhs

       : constant

       | STRING_CONSTANT

       | IDENT

| unary_expr

| binary_expr

| rel_expr

| cast_expr

| sizeof_expr

| addrof_expr

| addr_arrow_expr

| deref_expr

| member_expr

| ptr_member_expr

       | array_expr

| va_arg_expr

 

constant:            INTEGER_CST | CHAR_CST | FLOAT_CST

unary_expr:          UNARY_OP val

binary_expr:         val BINARY_OP val

rel_expr:            val REL_OP val

cast_expr:           '(' type ')' val

sizeof_expr:         SIZEOF '(' IDENT ')' | SIZEOF '(' type ')'

addrof_expr:         '&' IDENT | '&' member_expr

addr_arrow_expr:     '&' ptr_member_expr

deref_expr:          '*' IDENT

member_expr:         IDENT '.' IDENT

ptr_member_expr:     IDENT "->" IDENT

array_expr:          array_expr '[' val ']' | IDENT '[' val ']'

va_arg_expr:         VA_ARG '(' IDENT ',' type ')'

 

val: id | constant

bool_val: id | constant           /* must be integer type */

 

Notes:

 

·         Expressions that do nothing (such as ‘a+b;’) are eliminated during the simplification process.

·         The following operators are converted into a functionally equivalent expression: ‘,’ , ‘++’, ‘--’,     ‘->’, ‘?:’.  The gcc extension of statement expressions are supported by the parser but replaced during the simplification process.

·         The short-circuited operators ‘&&’ and ‘||’ are converted into if-then-else statements.

·         Temporary variables for integers and floats are all created with type long and double respectively. Per the C standard, values of temporary expressions can use larger types of the same family.  

·         For variables declared arrays, array reference may include as many dimensions are allowed as needed. However, if an array reference uses a pointer variables, only one level of indirection (set of braces []) is permitted.


4.3         Declarations

 

Declarations are handled in the same manner as C but initializers are not permitted. Initializers are created as separated statements after the declaration.   For global variables, the simplified code is placed into special initializer functions – one for each file in the program.

 

To ensure compatibility, initializers are permitted and remain on local static variables and variables declared as constants.  However, the remaining analyses ignore these initializers.

 


5.    Simplification

 

To determine when statements and expressions are properly simplified, the enumeration smplMode defines the following five constants.  Here is what each constant is used for:

 

SM_expr: expression of an expression statement

 

SM_ rhs: right hand side of an assignment expression

 

SM_lhs: left hand side of an assignment expression

 

SM_addr: operand for address-of operator (&)

 

SM_boolean:   condition for if/else, switch, and loops – must be a lone integer variable or constant

 

SM_var: anywhere a lone variable is needed, specifically:

operands for dereference (*), member (.), and indirect member operations (->)

            pointer for indirect array references

            function for function call (a variable is created for each function)

            replacement expression when simplifying increment (++) or decrement (--)

 

SM_var_const:            anywhere a lone variable or constant is needed, specifically:

condition for if/else, switch, and loops

            return value

            operands for arithmetic, relational, and cast operations

operands for va_start, va_arg, and va_end

            index for array references

            each argument in a function call

 

SM_var_array: anywhere a lone variable or array reference is needed, specifically:

            array for array references

 

A table describing how each expression responds to these constants is on the next page.

 

There are two key functions that drive the simplification:

 

simplifyBlock:  This function simplifies all statements in a block.  It is responsible for making sure the additional statements produced by the simplification process are inserted properly.  At the end of the phase, a loop will traverse through all statements removing any statements that have no side effects whatsoever (no assignments or control).

 

simplifyExpr: This function is the starting point for all expressions.  It will call a helper function based on the expression type.  If the expression is not simplified enough, it is replaced with a temporary variable or a dereference if an lvalue is needed.

 

Expression

SM_expr

SM_rhs

SM_lhs

SM_addr

SM_boolean

SM_var

SM_var_const

SM_var_array

Constant (string)

Yes

Yes

No

No

No

No

No

No

Constant (other)

Yes

Yes

No

No

1

No

Yes

Yes

Variable (lone)

Yes

Yes

Yes

Yes

1

Yes

Yes

Yes

Unary Operators

2

2

No

No

No

No

No

No

Arithmetic Operators

2

2

No

No

No

No

No

No

Relational Operators

2

2

No

No

No

No

No

No

Casts

2

2

No

No

No

No

No

No

va_arg Expression

2

2

No

No

No

No

No

No

Sizeof Operator

Yes

Yes

No

No

No

No

No

No

Addr Of Operator (&)

3

3

No

No

No

No

No

No

Addr Arrow

(&a->b)

4

4

No

No

No

No

No

No

Incr (++) / Decr (--)

No

No

No

No

No

No

No

No

And (&&) / Or (||)

No

No

No

No

No

No

No

No

Comma (,)

No

No

No

No

No

No

No

No

Conditional (?:)

No

No

No

No

No

No

No

No

Statement Expression

No

No

No

No

No

No

No

No

Dereference (*)

4

4

4

No

No

No

No

No

Member (.)

4

4

4

4

No

No

No

No

Arrow (->)

4

4

4

No

No

No

No

No

Array Reference

5

5

5

No

No

No

No

6

Assign

7

No

No

No

No

No

No

No

Assign w/Arith

No

No

No

No

No

No

No

No

Function Call

8

8

No

No

No

No

No

No

 

1          Yes, if operand has integer or Boolean type.

2          Yes, if operand(s) are SM_var_const.

3          Yes, if operand is SM_addr.

4          Yes, if operands are SM_var.

5          Yes, if array (or array pointer) is SM_var and array index is SM_var_const.

6          Yes, if array reference is for an array variable (not a pointer).

7          Yes, if left hand side is SM_lhs and right hand side is SM_rhs.

8          Yes, if function is SM_var and each argument is SM_var_const.

 


6.     Data Flow Analysis

 

6.1         Statements

 

BeginBlockStmt

defsout:

A definition is created for each parameter.

killed:

none

used:

none

ExprStmt

defsout:

Based on left-hand side.

killed:

Based on left-hand side.

used:

All definitions that reach the statement and are used as input on either side of the assignment.

IfStmt, SwitchStmt, ForStmt, WhileStmt, DoWhileStmt

defsout:

none

killed:

none

used:

Active definitions corresponding to the control condition.

CallStmt

defsout:

A definition is created for any variable that could be pointed to by any argument using any level of indirection.

killed:

none

used:

All definitions that reach the statement and correspond to a variable used as an argument.  For system calls, variables pointed to by an argument (using any level of indirection) are also considered used.

ReturnStmt

defsout:

none

killed:

none

used:

All active definitions of the return value (if statement has return value).

ReturnFromStmt

defsout:

Based on left-hand side.

killed:

Based on left-hand side.

used:

Any variable used on the left-hand side as an input.

VaStartStmt, VaEndStmt

defsout:

A definition is created for the list variable.

killed:

All other definitions associated with the list variable.

used:

none

all other statements

defsout:

none

killed:

none

used:

none

 


6.2         Left Hand Side Expressions

 

VarExpr

a

defsout:

A definition for the variable.

killed:

All definitions corresponding to the variable except for the definition generated this statement.

used:

none

DerefExpr

*a

defsout:

A definition for every variable that can be pointed by the variable.

killed:

none

used:

Reaching definitions for the pointer used to make the dereference.

MemberExpr

a.b

defsout:

A definition for the struct variable.

killed:

none

used:

none

PtrMemberExpr

a->b

defsout:

A definition for every variable that can be pointed by the variable.

killed:

none

used:

Reaching definitions for the pointer used to make the dereference.

ArrayExpr

a[b]

defsout:

A definition for the any variable that can be pointed by the array/pointer.

killed:

none

used:

Reaching definitions for variables that represent the array/pointer and index.

 


7.    Data Structures

 

This section describes the key data structures in SUDS.  Here is a summary of the AST components:

 

A project encompasses all the source code.  There is only one project variable when SUDS is running.  A project contains a list of translation units.

 

A translation unit refers to a single source code file.  Each file contains a list of statements.  The list of statements is restricted to declarations (whether it be types, variables, or functions).

 

A function definition is a special type of statement that refers to a function definition (not a function prototype).  In addition to information about the function, a function contains a list of statements that comprise the function body.

 

A statement refers to a C statement.  For simplicity, there are also statements that refer to C constructs (such as case label) that are not necessarily considered statements in the C programming language.  Statements reference other statements, expressions, functions, and variables.

 

An expression refers to a C expression.  Expressions can refer to other expressions and variables.

 

A variable refers to a declared variable in C.  It hold various attributes about the variable and a pointer to its type.

 

A declaration is a special statement that declares either a variable or a type.

 

A type refers to a type in C. 

 

Other data structures in SUDS are listed in Section 7.9.

 

One important note about the data structures.  SUDS violates many principles of good object-oriented design.  For instance, most, if not all of the data members are public.  The program was designed in a more C-like procedural fashion using inheritance as syntactic sugar.  From my point of view it was better than using alternatives such as unions, void pointers, or large switch statements.  At some point in the future, I may consider a redesign of SUDS.

 

7.1         Project

 

The Project class (project.h) encompasses all the source code for the program.  There is only one project variable in SUDS.  It stores a list of translation units – one for each source code file.  It also stores a pointer to a special initializer function that is added to the beginning of main.

 

Most tasks in SUDS that requires traversing the source code will start at the project level.  The project member functions merely call the same function for each of its translation units.

 

Data members:

 

vector<class TransUnit*> units;  // list of translation units

class FunctionDef *mainInit;     // main initializer function


7.2         Translation Unit 

 

A TransUnit (transunit.h) is a class that represents a single source code file in a project.  A TransUnit stores a list of statements. These statements can only be declarations of types, variables, or functions.  Code within a function is associated with that function.  It also stores a pointer to the initialization function associated with this file.  Each instrumented source code file contains a special initialization function that is called when the program starts.  This function is used by the simplification and instrumentation phases as a place to add code when processing global variable declarations.

 

As with the project, the member functions are largely used for tasks that traverse the source code.

 

Data members:

 

class Stmt *head;            // The first statement in the list.

class Stmt *tail;            // The last statement in the list.

const string filename;       // The file we parsed this from.

class FunctionDef *initFunc; // The initializer function.

 

7.3         Function Definition

 

The FunctionDef class (func.h) refers to a function definition in C.  It is derived from the Stmt base class (described later in this section) because it is considered a top-level variable declaration statement for the function.  Since a function definition cannot appear within a function, it really does not behave like the other statement classes that are defined later.

 

The function really has two distinct parts: the declaration and the function body.  The declaration stores all of the information about the function such as the parameters and return type (in reality, most of this is stored in a function type variable within the declaration).  The function body is stored as a linked list of statements, starting at head.  The other data members are used to store information from various phases in SUDS.

 

Data members:

 

class Decl *decl;                        // function declaration

string name;                             // name of function

class Stmt *head;                        // first statement in function

class Stmt *tail;                        // last statement in function

int startBbl;                            // id of first bbl

int endBbl;                              // id of last bbl

bool usefulFunc;                         // set if function is reachable from main

set<int> returnBbls;                     // set of bbls that end the function

map<string, class Stmt *> labelTable;    // mapping of labels to stmts

class PointsToInfo ptiEntry;             // points to information at beginning

class PointsToInfo ptiExit;              // points to information at end

set<class FunctionDef *> funcsCalled;    // list of functions called

set<int> lvars;                          // list of local variables declared

set<int> funcDefs;                       // defs in the function

dynamic_bitset<> endReaches;             // set of defs live at end of function

vector< set<int> > parmDefs;             // defs for each parameter

set<class Stmt *> callerStmts;           // set of statements that call this function

set<class Stmt *> propToSlice;           // set of call statements to propagate to

bool inSlice;                            // true if function is in slice


7.4         Statement

 

A Stmt (stmt.h) refers to a C statement.  To make analysis easier, there are some additional statements that refer to C constructs not normally considered as statements.

 

Each statement has a code that represents the type of statement and a unique id.  Statements are connected together using a doubly linked list.  Functions in stmtlist.cpp can be used to manipulate linked lists of statements.

 

Data members:

 

StmtCode code;                           // kind of statement

Location location;                       // location in source code

unsigned long id;                        // unique statement id

bool isInitStmt;                         // true if stmt used to initialize a variable

bool bblStart;                           // true if stmt is a bblock starting point

bool bblEnd;                             // true if stmt is a bblock ending point

class Bblock *bbl;                       // pointer to basic block node that statement resides in

class FunctionDef *fn;                   // function in which stmt resides

class PointsToInfo pti;                  // points-to information before statement

set<int> defsout;                        // defs created by this statement

set<int> killed;                         // defs killed by this statement

bool inSlice;                            // set if in slice

bool sliceProcessed;                    // set if variable has been processed during slicing

bool isTainted;                          // set if at least one def is tainted

Stmt *prev;                              // For making a list of statements.

Stmt *next;                              // For making a list of statements.

 

There are many derived classes of statements described in the following table:

 

Class Name

Description

BeginBlockStmt

The beginning of a block of code starting with a '{'.  Contains a list of declaration statements to any variables declared in this score.

EndBlockStmt

The end of a block of code ending with a '}'.

ExprStmt

An expression statement, typically assigns a value to a variable.

IfStmt

if statement

ElseStmt

The else part of an if statement.

SwitchStmt

switch statement

ForStmt

for loop

WhileStmt

while loop

DoStartStmt

Start of a do-while loop (the word "do")

DoEndStmt

End of a do-while loop (the test condition)

EndControlStmt

Dummy statement that marks the end of a control statement (such as if) or loop.

BreakStmt

break statement

ContinueStmt

continue statement

GotoStmt

goto statement

LabelStmt

A label in C.  This could be a case label, default label, or a label that you can goto to.

CallStmt

The first of two instructions used to make a function call. This class stores the function(s) being called and the arguments used to make the function call.

ReturnStmt

return statement

ReturnFromStmt

The second of two instructions used to make a function call.  This calls keeps track where the return value, if any, is stored when the function returns.

NullStmt

An empty statement

VaStartStmt

Refers to a va_start call, used in functions with variable-length parameter lists.

VaEndStmt

Refers to a va_end call, used in functions with variable-length parameter lists.

DeclStmt

Declaration of a variable or type.

FunctionDef

Function definition.

InstrumentedStmt

A statement added by the instrumentation engine.  The instrumented statement is stored in a string.

 

7.5         Expression

 

An Expr (expr.h) refers to a C expression.  All expressions have a code that designates the kind of expression and have a data type which can be used by the instrumentation engine.

 

Data members:

 

ExprCode code;              // variety of expression

Location location;          // location in program

const class Type *type;     // type of expr (such as int)

Expr *next;                 // linked list connector

 

There are many derived classes of statements described in the following table:

 

Class Name

Example

Description

Constant

5

The beginning of a block of code starting with a '{'.  Contains a list of declaration statements to any variables declared in this score.

VarExpr

x

A single variable

UnaryExpr

-x

A unary expression (see list of operators below)

BinaryExpr

x + y

A binary expression (see list of operators below)

RelExpr

x < y

A relational expression (see list of operators below)

CastExpr

(int) x

Cast to a different type

SizeofExpr

sizeof(x)

sizeof expression

AddrOfExpr

&x

address of expression

AddrArrowExpr

&(x->y)

address of expression involving a member selection deference

DerefExpr

*x

pointer dereference

MemberExpr

x.y

member selection

PtrMemberExpr

x->y

member selection via pointer dereference

ArrayExpr

x[y]

array reference

VaArgExpr

 

va_arg expression, used in variable-length argument lists

BuiltinExpr

 

builtin expression to the compiler

AssignExpr

x = y

assignment expression, also includes variants such +=.

CallExpr

 

function call

TrinaryExpr

x ? y : z

trinary logical choice expression

StmtExpr

 

gcc extension that allows a statement to be embedded into an expression

 

The last three expressions CallExpr, TrinaryExpr, and StmtExpr are not present in the AST after simplification.


Here is a list of operators:

 

enum UnaryOp

 

enum BinaryOp

UO_Plus

+

 

BO_Plus

+

UO_Minus

-

 

BO_Minus

-

UO_BitNot

~

 

BO_Mult

*

UO_Not

!

 

BO_Div

/

UO_PreInc

++

 

BO_Mod

%

UO_PreDec

--

 

BO_Shl

<< 

UO_PostInc

++

 

BO_Shr

>> 

UO_PostDec

--

 

BO_BitAnd

&

 

 

 

BO_BitXor

^

enum AssignOp

 

BO_BitOr

|

AO_Equal

=

 

BO_And

&&

AO_PlusEql

+=

 

BO_Or

||

AO_MinusEql

-=

 

BO_Comma

,

AO_MultEql

*=

 

 

 

AO_DivEql

/=

 

enum RelOp

AO_ModEql

%=

 

RO_Equal

==

AO_ShlEql

<<=

 

RO_NotEqual

!=

AO_ShrEql

>>=

 

RO_Less

AO_BitAndEql

&=

 

RO_LessEql

<=

AO_BitXorEql

^=

 

RO_Grtr

AO_BitOrEql

|=

 

RO_GrtrEql

>=

 

The increment, decrement, (logical) and, (logical) or, and comma operators are not present in the AST after simplification.  AO_Equal  (=) is the only variant of the AssignExpr that is permitted after simplification.  Most of the analysis takes advantage of this fact.

 

7.6         Variable

 

A Var (var.h) represents a variable.  Each variable is assigned a unique identifier that is used in many of the analyses.  The Var class stores a pointer to the variable's declaration, its type, and its entry in the symbol table.

 

Data members:

 

int id;                     // unique id

string name;                // variable name

class Decl *decl;           // variable declaration

class Type *type;           // variable type (from decl)

class SymEntry *st_entry;   // symbol table entry

set<int> def_set;           // list of definitions of the variable

bool isGlobal;              // set if not declared in function

bool isHeap;                // set if refers to heap variable

bool inSlice;               // set if variable is in slice

bool isTainted;             // set if tainted

 

7.7         Declaration

 

A Decl (decl.h) represents a variables declaration and is contained within a DeclStmt.  It contains information about its name, type, and various properties.  Declarations are not used during the analysis portion of SUDS – the Var class is used to keep track of the various states of variables.

 

Data members:

 

StorageCode storage;        // static or typedef or none

class Symbol *name;         // symbol being declared

class Type *type;           // type of the declaration

class GccAttrib *attrib;    // optional gcc attribute

class Expr *initializer;    // initial value

class Var *var;             // pointer to variable object

class Decl *next;           // For linking into lists

 

7.8         Type

 

A Type (type.h) refers to a data type in C.  The basic type (such as int, array, pointer) is represented by a code (see list below).  Any storage indicator and/or type qualifiers are also stored (see list below).  Many types use a subtype (pointers, arrays, and functions to name a few).  This is also captured in the base type class.  Additional types store additional information.

 

For user-defined types created using typedef, the type class stores the basic type information along with the type name.

 

Names for structs, unions, and enums are called tags.  The derived classes for these types store the tag names.


Data members:

 

TypeCode code;              // basic type

StorageCode storage;        // storage of type

TypeQual qualifier;         // type qualifier(s)

class GccAttrib *attrib;    // gcc attribute (optional)

class Symbol *typeName;     // NULL if not typedef

class Type *subType;        // subtype for pts, arrays, funcs, etc.

bool useSubType;            // set if type uses subtype

 

Here is a table of the derived type classes:

 

Class Name

Description

NoSpecType

No specified type – used to represent data on the heap

VoidType

void

IntType

integers, represents all variants and sizes of integers

FloatType

floating point numbers, represents all sizes of floating point numbers

PointerType

pointers

ArrayType

arrays

BitFieldType

bit fields

FunctionType

functions

StructType

structs and unions

EnumType

enumerated types

EllipsisType

The type of the "…" parameter used in variable length parameter lists

VaListType

The type of a va_list, used in variable length argument lists

 

Here is are lists of storage code and type qualifiers:

 

StorageCode

 

TypeQual

ST_None

 

TQ_None

ST_Auto

 

TQ_Const

ST_Extern

 

TQ_Volatile

ST_Register

 

TQ_Inline

ST_Static

 

ST_Typedef

 

 

7.9         Other Data Structures

 

Here is a list of other data structures used in SUDS:

 

Class Name

File Name

Description

Bblock

bblock.h

basic block

Def

def.h

definition for data analysis

GccAttrib

gccattrib.h

gcc attribute extension

InstrDirector

instr.h

instrumentation director interface

Label

label.h

label

Location

parse.h

location in code

ParseEnv

parse.h

parsing environment

ParseEnvCtxt

parse.h

parsing environment context

ParseCtxt

parse.h

parsing context

PointsToInfo

pointer.h

points-to information

Symbol

symbol.h

symbol from lexer

SymEntry

symbol.h

entry in symbol table

ScopeTbl

symbol.h

stores symbol table entries at a  particular scope

SymTbl

symbol.h

symbol table

TopSymTbl

symbol.h

top level symbol table that contains three symbol tables (labels, tags, all others); there is only one top level symbol table (global variable symTables)

 


8.    Language Restrictions

 

This section outlines restrictions in the C language.  You must either alter your program or fix these issues in SUDS.

 

Problem:

Cannot use default type (int) for variables (such as "static x = 3;")

Detected:

Yes - parser error (unconfirmed)

Workaround:

Add the proper type to the declaration

Possible fix:

Add 'int' automatically if no type appears

Likely to fix:

No (doesn't occur often enough)

 

Problem:

Old style function declarations are unsupported

Detected:

Yes - parser error

Workaround:

Convert function to standard style function declarations. Run CIL [5] ahead on source code file ahead of time.

Possible fix:

Alter grammar and symbol processing to work with old style functions;

argument ordering is problematic.

Likely to fix:

No (too many obstacles to overcome)

 

Problem:

Default declarations for functions unsupported

Detected:

Yes - parser error (unconfirmed)

Workaround:

Add the proper type to the declaration

Possible fix:

Add 'int' automatically if no type appears

Likely to fix:

No (doesn't occur often enough)

 

Problem:

Parsing of initializer lists outside of initializers

Detected:

Yes - parser error (unconfirmed)

Workaround:

Rewrite code to not use an initializer list

Possible fix:

Leverage simplification of initializer lists for declarations

Likely to fix:

No (doesn't occur often enough)

 

Problem:

Parsing problems with variables declared with unusual orders of storage

classes, qualifiers, etc. (see K+R for proper rules)

Detected:

Yes - parser error

Workaround:

In most cases, it should be possible to rewrite the code.

Possible fix:

Redesign parsing of storage classes and qualifiers

Likely to fix:

No (doesn't occur often enough)

 

Problem:

Possible parsing problems with gcc attributes

Detected:

Yes - parser error

Workaround:

Eliminate or reposition the attribute

Possible fix:

The main problem here is understanding how to parse the gcc attributes.  The documentation is confusing and "subject to change"

Likely to fix:

Maybe (if there is better information)

  

Problem:

Multiple struct variables declared on the same statement that also define

the struct are not allowed

Detected:

Not detected by SUDS, code emitted from SUDS will not compile

Workaround:

Rewrite using non-generic structs

Possible fix:

Alter the type of all but the first declaration such that fields are not

printed out.

Likely to fix:

No (doesn't occur often enough)

 

Problem:

Generic structs declared in structs that are included in multiple

files are not allowed.

Detected:

Not detected by SUDS, code emitted from SUDS will not compile

Workaround:

Rewrite using non-generic structs if possible, may need to alter system include files.

Possible fix:

Change symbol table management.

Likely to fix:

Maybe (symbol table is fragile)

 

Problem:

Const / static variables can cause problems because initialization is

separated from declaration

Detected:

Not detected by SUDS, code emitted from SUDS will not compile

Workaround:

Rewrite not using const variables

Possible fix:

Two possibilities:

(1) Redo remaining phases to work with declarations that have initializers.

(2) Automatically remove const modifier (does not work with static).

Likely to fix:

Maybe (occurs often, second fix is not hard but could cause other problems)

 

Problem:

Complex expressions in address-of expressions can cause SUDS to fail

Detected:

Yes – simplification error

Workaround:

In most cases, it should be possible to rewrite the code

Possible fix:

 

Likely to fix:

No (doesn't occur often enough, no easy fix)

 

Problem:

Arrays declared static will not have sizes computed

Detected:

?

Workaround:

In most cases, it should be possible to remove the static designation.

Possible fix:

Compute the size despite the static designation

Likely to fix:

Yes

 

Problem:

Cast operations are separated from the expressions creating temporary variables of invalid types.

Detected:

Not detected by SUDS, code emitted from SUDS will produce warnings and/or errors when compiled.

Workaround:

Most warnings can be ignored, may or may not be possible to rewrite

the code to avoid any warnings or errors.

Possible fix:

Could allow each expression to optionally have a cast instead of

breaking it into two pieces

Likely to fix:

Not likely (non-ignorable errors don't occur often enough)

 

Problem:

Static local variables with initializers are problematic for instrumentation

Detected:

No

Workaround:

Try to rewrite code if possible

Possible fix:

Instrumentation would have to support non-simplified initializers

Likely to fix:

No (too much work)

 

Problem:

Variables declared using 'register' will lose 'register' designation since instrumentation models often take the addresses of variables.

Detected:

Losing the register designation should not cause a functionality problem.

Workaround:

not possible to workaround

Possible fix:

Add a command line switch to allow users to indicate how the register designation should be handled.

Likely to fix:

Maybe (not that hard but not that useful)

 

Problem:

Arrays declared extern with unknown sizes and no initializer

Detected:

No

Workaround:

Rewrite using a 'char *'.

Possible fix:

Use the size when it is declared without extern

Likely to fix:

No (requires modification to the fragile symbol table)

 

Problem:

Variable or field with same name as typedef name

Detected:

Yes – parser error

Workaround:

Rename one or other

Possible fix:

Have a separate table for typedef names.  Another issue is to more strictly regulate the possible Type field

Likely to fix:

No (requires modification to the fragile symbol table)

 

 

9.  Other Known Problems

 

Problem:

Investigate the scope table as only struct/enum definitions go into external scope. Should others go to external scope or should the global struct/enum definitions go into level 2?

Result:

No problem results from this, this symbol table dump appears off.

Possible fix:

Redo symbol table.

Likely to fix:

No (unless problem occurs)

 

Problem:

Loss of quantifiers (const/inline/volatile) for temporary variables – needed because temp variables are often not constant and cannot be declared inline

Result:

Some code emitted from SUDS may not compile.

Possible fix:

?

Likely to fix:

No (can't think of a reasonable fix)

 

Problem:

May not model all system functions that return pointers to arbitrary arrays.

Result:

Static analysis may miss some pointer relationships causing the data flow analysis to be too aggressive.

Possible fix:

Update the list of system functions that return pointers to arbitrary arrays.

Likely to fix:

Case by case basis when detected

 

Problem:

System calls that alter the points-to information of its parameters.

Result:

Static analysis may miss some pointer relationships causing the data flow analysis to be too aggressive.

Possible fix:

Make a concerted effort to recognize these functions and alter the points-to information.

Likely to fix:

Maybe

 

Problem:

List of input functions is incomplete

Result:

Missed opportunity to catch bugs

Possible fix:

Add missing functions to list.

Likely to fix:

Fix when missing input function is discovered

 

Problem:

Instrumented statements do not have a location.

Result:

Some output messages are awkward

Possible fix:

Have the instrumented statement grab a location (not always easy)

Likely to fix:

Maybe

 

Problem:

Handling of variable-arg function calls in instrumentation engine (including va_start and va_end)

Result:

Instrumentation capabilities for variable-arg function calls is limited.

Possible fix:

Depends on what kind of instrumentation, if any, would be added in these cases.

Likely to fix:

Not likely

 

Problem:

Processing of unary plus operator

Result:

?

Possible fix:

Convert unary plus to varExpr

Likely to fix:

No (doesn't occur often enough)

 

Problem:

derefUses is a deviation from the current strategy of recomputing the uses when necessary.

Result:

Could be a performance and space issue.

Possible fix:

Require recomputing the deref uses along with other uses.

Likely to fix:

No (unless performance and space become an issue)

 

Problem:

More precise type for array constants.

Result:

Missed opportunity to introduce instrumentation because of missing type.

Possible fix:

Create the array type for the constant.

Likely to fix:

No (doesn't occur often enough)

 

Problem:

More precise types for binary expressions (size, unsigned, etc.)

Result:

Missed opportunity to introduce instrumentation because of missing type.

Potential issues for instrumentation relying on the precise size of variables.

Possible fix:

Redo computation of binary expressions.

Likely to fix:

No (unless it becomes a problem; may revisit if type system is overhauled)

 

Problem:

Sizes of variables not accurately tracked (esp. large constants)

Result:

Potential issues for instrumentation relying on the precise size of variables.

Possible fix:

Fix SUDS to strictly follow the rules of converting types in C.

Likely to fix:

No (unless it becomes a problem; may revisit if type system is overhauled)

 

Problem:

Types organization is messy (needs overhaul)

Result:

Hard to understand and modify code related to types.

Possible fix:

Overhaul type system

Likely to fix:

No (unless SUDS is redesigned)

 

Problem:

Printing of declarations is messy.

Result:

Hard to understand and modify code related to printing declarations.

Possible fix:

Redesign code that prints declarations

Likely to fix:

No (unless SUDS is redesigned)

 

Problem:

Duplication of statements: dup seems broken for begin block and call stmt

Result:

No known problems at the problem

Possible fix:

Avoid using dup but provide copy list functions.

Likely to fix:

No (unless it becomes a problem; would revisit in a redesign of SUDS)

 

Problem:

Memory management: constructors, destructors, duplicators

Result:

Memory is wasted, especially for types

Possible fix:

Redo memory management

Likely to fix:

No (unless it becomes a problem; would revisit in a redesign of SUDS)

 

Problem:

Get strange messages (such as warn_unused_result) when using include files

and libraries from latest Linux release.

Result:

Get strange messages (such as warn_unused_result) when running SUDS.

Possible fix:

Need to understand what is causing the messages

Likely to fix:

Maybe (if I can figure out why this is happening)

 

9.1     Enhancements

 

Enhancement:

Embed the preprocessor command inside suds

Likely to add:

Maybe

 

Enhancement:

Add routine to compare types for use in sanity or other checks.

Likely to add:

No (unless SUDS is redesigned)

 

 

References

 

[1]     cTool. http://sourceforge.net/projects/ctool/

[2]     Free Software Foundation. SSA for Trees. http://www.gnu.org/software/gcc/projects/tree-ssa

[3]     L. Hendren, C. Donawa, M. Emami, G. Gao, Justiani, and B. Sridha­ran. Designing the McCAT Compiler Based on a Family of Structured Intermediate Representations. Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing, August 1992.

[4]     M. Hind, M. Burke, P. Carini, and J. Choi. Interprocedural Pointer Alias Analysis. ACM Transactions on Programming Languages and Systems. July 1999.

[5]     G. Necula, S. McPeak, S. P. Rahul, W.Weimer. CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. International Conference on Compiler Construction, Apr. 2002

 

This is a personal WEB site developed and maintained by an individual and not by Seattle University. The content and link(s) provided on this site do not represent or reflect the view(s) of Seattle University. The individual who authored this site is solely responsible for the site's content. This site and its author are subject to applicable University policies including the Computer Acceptable Use Policy (www.seattleu.edu/policies).