- SUDS uses the dynamic bitset library from boost. This dependency is not checked by the configure script. You may need to install this library which I believe is included in most Linux distributions.

- Running 'make install' merely copies the executables into the bin directory provided in the distribution.

1.2 Running SUDS

When you make SUDS, you will notice that seven executables are created. Each is built using a different instrumentation model:

suds-inref Input reference checker

suds-strchk String checker

suds-array Array checker

suds-arith Input arithmetic checker

suds-null Null dereference checker

suds-trace Traces function calls and returns

suds-none No instrumentation model

SUDS only works with preprocessed source code files (.i). Initially you will need to create the .i files using your compiler. (For gcc, use the –E switch).

The most basic command line for SUDS is this:

suds-null sample.i

This will create an instrumented file sample.suds.c.

You can control the suffix of the file using the suffix switch like this:

suds-null –suffix null sample.i

This will create the instrumented file sample.null.c.

Since SUDS can work on the whole program, you can specify multiple files in a separate list file. Each line of the file contains the name of a file without the .c or .i suffix. Then use the –f switch on the command line to specify the name of the file like this:

suds-null –f file_list.txt

This command will create a .suds.c file for every file in the list.

Once an instrumented file is created, it can be compiled using gcc or another C compiler. During the linking process it is necessary to link in the proper model that contains the instrumentation code. The models are located in the models directory:

libinref.a Input reference checker

libstrchk.a String checker

libarray.a Array checker

libarith.a Input arithmetic checker

libnull.a Null dereference checker

libtrace.a Traces function calls and returns

Here is an overview of the process:

Once the instrumented executable is created, run the program normally. The only difference is that the instrumented executable will print additional output related to the instrumentation model.

To run the static analysis phases of SUDS, you must use the –dfa switch to run the basic data flow analysis options. Then you specify if you want to run the –taint analysis or one of the five program slicing modes (see section 2.10 for more information). This command line runs taint analysis and slices based on dangerous statements in the function foo:

suds-inref –dfa –taint –slice_fn_danger foo sample.i

Note: By default, the definitions of tainted and dangerous are tied to the input reference model. Using these modes for other models will have no effect on the instrumented code. However, it is possible to modify SUDS to change these definitions when creating your own models.

1.3 Command Line Options in SUDS

-help, -h	show command line options
-version, -V	show the version number
-file <file>, -f <file>	file that contains a list of files to process
-suffix <string>	suffix of the output file
-base	emit the code as parsed (no simplification)
-simple	emit the code and exit after simplification
-dfa	run data flow analyses
-taint	run tainted propagation algorithm
-no_output	do not create output file
-slice_all_danger	run program slicing (all dangerous statements)
-slice_fn_any <func>	run program slicing (all statements in function)
-slice_fn_danger <func>	run program slicing (all dangerous statements in function)
-slice_stmt_any <stmt>	run program slicing (individual statement, all uses)
-slice_stmt_danger <stmt>	run program slicing (individual statement, danger uses)
-ptr_flow_insens	use flow insensitive pointer analysis
-slice_ignore_ctrl	use ignore indirect control slicing
-instr_mode	mode for instrumentation
-debug <mode>, -d <mode>	print debugging information
-stats <mode> <tag> <file>	print stats
-dump_cfg <file>	dump control flow graph to file
-dump_dfa <file>	dump data flow analysis to file
-dump_dstmt <file>	dump dangerous statements to file
-dump_dfunc <file>	dump dangerous functions to file
-dump_slice <file>	dump program slicing information to file
-dump_syms <file>	dump symbol tables to file
-dump_vars <file>	dump variable information to file
-dump_defs <file>	dump definitions to file

2. Phases of SUDS

2.1 Parsing (gram.y, lexer.l, parse.cpp)

The parser only works with preprocessed source code (.i file from gcc). This phase creates the AST and creates the symbol tables. Multiple files can be parsed with one invocation of SUDS creating a large AST that spans across files.

While almost all C programs can be parsed, the parser is not as mature as the parser in gcc or other compilers. In particular, it does not perform robust error checking or designed to give feedback on different types of errors. In addition, some legitimate C++ programs may not compile successfully. Section 8 describes the language restrictions.

The AST that is created cannot be directly used by the static analysis or instrumentation phases. The code must pass through the simplification phase first. However, it is possible to directly emit the parsed code using the "-base" option. This is useful if you want to modify the parser and test your changes.

The parsing and symbol table code is a modified version of the parser and symbol table found in cTool [1].

2.2 Create initializer functions (init.cpp)

This phase creates special functions – one for each file that is a placeholder for simplified code and instrumented code that is related to initializers of global variables (not part of any function). In addition, a special function, called suds_init, is created for the file that contains main that calls all of these initializer functions. Finally, a call to suds_init is added to the start of main.

2.3 Simplification (simplify.cpp)

The simplification phase converts the program into a simpler format. The output of this phase is a program that is equivalent to the original program. The simpler format allows subsequent analysis phases to be less complex and permits the insertion of instrumentation anywhere in the code. Some highlights of the simplification process:

· All right-hand side expressions of an assignment operator only have one operator. For example, the expression a + b – ( c * d) is broken down into multiple statements.

· All side-effects (such as '++') or short-circuited operators are removed and replaced by equivalent statements.

· All loop and control conditions are replaced by a single variable or constant.

· Address of operations involving arrays (such as '&a[5]') are replaced by equivalent addition operations (such as 'a + 5' in this case).

The entire simplified grammar is shown in Section 4. More implementation notes are found in Section 5.

2.4 Finalize AST (ast.cpp)

This phase completes the creation of the AST and other key data structures. It does the following:

1. Creates variable objects and attaches them to when they are used in expressions and statements.

2. Creates special statements that mark the end of a control statement.

3. Replaces function call expression with two statements: a call statement and a return from statement. This separates the functionality of making the call and the assignment of the return value after the call is complete.

4. Global variables (those declared outside of a function) are marked as such.

5. Extra “dummy” variables are created for each call to malloc (or other dynamic allocation functions). Dynamic memory is modeled by call site. All variables allocated at a particular call site are lumped together. Since dynamic memory could be used to create a structure containing an array, an array variable is created for each dynamic memory variable.

6. The simplified statements are numbered (unique id). This allows the user to specify statements as slicing criteria.

7. A set of local variables used in created in each function. During data flow analysis, a set of global variables is created.

2.5 CFG Creation (cfg.cpp)

The control flow graph is created by dividing each function into a set of basic blocks and computing the predecessor and successor sets for each basic block. This is done intraprocedurally. Function calls are treated as basic block ending points with the lone successor being the basic block immediately following the function call. Useless basic blocks are removed: any block that has no backwards path to the first basic block of the function.

2.6 Call Graph Creation (call.cpp)

To handle interprocedural control flow, a call graph is created. The call graph is not a separate structure, instead additional information is tracked with the following statements:

FunctionDef: A list of statements that call this function.
CallStmt: A list of functions that can be called from this statement.
ReturnStmt: A list of statements that could be returned to.
ReturnFromStmt: A list of return statements that could precede this statement.

The set of basic blocks, predecessors and successors in particular, remain unchanged.

Since creating the call graph requires knowledge of where function pointers point to, the call graph is created in tandem with pointer analysis.

2.7 Pointer Analysis (pointer.cpp)

The interprocedural pointer analysis developed by Hind et. al. [4] is used. The information is stored in a PointsToInfo class which maps variables (represented by their id number) to the variables it can point to.

The algorithm can either be run in a flow-sensitive or flow-insensitive manner via the command line option –ptr_flow_insens. In the flow sensitive version, the points-to information is stored for each basic block. Due to space issues, the points-to information is recomputed for each statement when necessary. The flow-insensitive algorithm stores all of the points-to relationships into one global instance of the PointsToInfo class.

2.8 Data Flow Analysis (data.cpp)

The data flow analysis phase is used to create these sets:

· defsout: The set of definitions generated by a statement or basic block.

· killed: The set of definitions no longer active after the statement or basic block.

· reaches: The set of active definitions at the beginning of a basic block or statement.

· used: The set of definitions that reach the statement and are used as input to the statement.

There are four steps to determining these sets:

1. Create the defsout set for each statement.

2. Create the killed set for each statement.

3. Create the defsout and killed sets for each basic block.

4. Create the set of reaching definitions for each basic block.

Due to memory space issues, the used sets are recreated for statements when necessary.

The analysis is done intraprocedurally. For each call site, live definitions are propagated from the caller to the callee function on a parameter by parameter basis. At the end of the function, definitions that refer to global variables are propagated to all return points. Other definitions are only propagated back to the caller if the definition refers to a variable that had at least one live definition prior to the call. Though the algorithm is not truly context-sensitive, this last rule restricts the flow of definitions from a call site to a completely different return point. Any definitions live in the caller just prior to the call are also considered live just after the call.

A summary of data flow analysis is found in Section 6.

2.9 Dangerous Operation Identification (danger.cpp)

This is not actually a phase. The code in danger.cpp implements the isDangerous member function for statements. This can be used either as a criterion in program slicing and/or to further focus the instrumentation.

Currently, the set of dangerous operations includes array references and pointer dereferences. This can easily be modified by controlling which uses are considered dangerous.

2.10 Program Slicing (slice.cpp)

This will identify statements that are part of a slice based on a criterion. There are five ways for specifying the slicing criterion on the command line:

-slice_all_danger includes all dangerous statements in the program

-slice_fn_any <func> includes all statements in the given function

-slice_fn_danger <func> includes all dangerous statements in the given function

-slice_stmt_any <stmt> includes the given statement only (all uses)

-slice_stmt_danger <stmt> includes the given statement only (dangerous uses only)

A backwards slicing algorithm is employed starting with the slicing criterion (the operation under investigation). All definitions that are used in the statement(s) in the criterion are added to the slice. Then, each statement is analyzed. If any definition generated by a statement is in the slice, then all definitions used by the statement are added to the slice. Indirect uses from control statements are also added to the slice by default: if any statement in a control statement is in the slice, then the control statement and its uses are added to the slice. The user can exclude indirect uses by use of a command-line switch (-slice_ignore_ctrl). The algorithm is interprocedural and context-insensitive. Once a definition is added to the slice, it remains in the slice for the duration of the algorithm. The algorithm continues to iterate until steady state (no definitions enter the slice).

Function calls are handled as follows:

· If the definition associated with the return value (in the ReturnFromStmt) is in the slice, then uses corresponding to any of the callee return statements are added to the slice. Both the ReturnFromStmt and ReturnStmt are added to the slice.

· Any variable that is pointed to by an argument in the function call or global variable will have a definition in defsout of the CallStmt. If any of these definitions are in the slice, definitions of the corresponding variables that reach the end of the function are added to the slice. This is done on a parameter by parameter basis.

· If an initial parameter definition is in the slice, then the corresponding function call argument is added to the slice. This is done on a parameter by parameter basis.

· If an initial definition of a global variable is in the slice, then any definitions of the global variable that reach the function call are added to the slice.

· For system calls, if any of the generated definitions of the CallStmt or ReturnFromStmt are in the slice, then all of the uses (arguments and the variables they point to) are added to the slice.

· The algorithm is context-insensitive but the method for adding functions calls is restricted, preventing propagation of the slice into areas of the code that are clearly not part of the slice. A function is only added to the slice if one of the following are true:

o The function contains a statement in the slicing criterion.

o The function can be reached on a backwards traversal from a function containing a statement in the slicing criterion back to main AND there is at least one use in the slice is active at the end of the function. This is tracked by the propToSlice set.

o The function is called from a function that meets one of the two above criteria AND the slice propagates into that function in some fashion (return value, global variables, output parameters, etc.)

After definitions and statements are marked in the slice, variables are marked in the slice if at least one definition is in the slice. Then, functions are marked in the slice if they have at least one statement in the slice.

2.11 Tainted Propagation (taint.cpp, taint-init.cpp)

This algorithm identifies which variables and definitions are tainted directly and indirectly. The initial set of tainted definitions and statements is specified in the file taint-init.cpp. Currently, variables that come from input are marked as tainted.

The tainted propagation algorithm works in reverse of the program slicing algorithm. It propagates tainted information from the uses to the definitions. For simple assignment statements (no arithmetic calculations), if any use is in a given state, all definitions are now updated to be in that state. Again, parameters are handled on an individual basis.

For statements involving arithmetic operations, the taint state is computer based on the operation and the state of the operands.

2.12 Instrumentation (instr.cpp)

The instrumentation phases traverses the AST and adds instrumentation code based on the instrumentation model. This phase is described in more detail in Section 3.

2.13 Output Printing (print.cpp)

The output phase traverses the simplified statements in program order and prints them to a file, including any instrumentation statements. The output file can be subsequently compiled by a C compiler.

3. Instrumentation

To add instrumentation, it is necessary to either use or create an instrumentation director. This section describes how to create an instrumentation director.

The first step to creating a model is to provide an implementation for the InstrDirector class listed located in instr.h. Each of the member functions refers to a different code construct or event in the program. A list of functions is provided in Section 3.1. In the corresponding implementation file, simply fill in the functions for the constructs that you want to add instrumentation. Leave the function empty for constructs that do not need instrumentation.

Pre-existing instances of instrumentation directors are available in the src/directors directory in SUDS. You can either use any of the existing instrumentation directors as a starting point or use instr-none.cpp which starts with all of the member functions as empty.

The instrumentation director interface has several data members that can be used within the instrumentation routine to determine what instrumentation is necessary and/or to pass into instrumentation routines as parameters.

bool inLhs; // true if processing the lhs of an assignment

bool isGlobal; // true if processing global variable

class Expr *currLhs; // current value of lhs if processing rhs

class Expr *currRhs; // current value of rhs if processing lhs

class Location *currLoc; // current location

class FunctionDef *currFunc; // current function

class Stmt *currStmt; // current statement

unsigned int uniqueId; // unique identifier

Each member function has one and/or two statement lists as pass-by-reference parameter(s). Instrumented code is to be added to this list using a helper function more later). Any instrumented statements added to the before list are inserted before the corresponding statement. Similarly, any statements added to the after list are inserted after the corresponding. Instrumentation is always added at the boundary of simplified statements. In functions that have both a before and after list as parameters, you can choose which list to add to (or both) based on the model you are trying to create. However, many constructs force you to either add instrumentation before or after the corresponding statement.

Most member functions contain a pointer to the expression, statement, or variable in question. This allows you to apply different instrumentation based on the type of an operation. For instance, an addition operation involving integers is likely to have different instrumentation that an addition operation involving a pointer and an integer. Expressions that can appear on the left-hand of an assignment have a Boolean parameter isLhs that is true if the expression appears on the left-hand side of an assignment expression and false if it appears on the right-hand side. Expressions that do not have this parameter cannot appear on the left-hand side after simplification.

Member functions relating to variable declarations, function parameters, and return values use an offset mechanism to distinguish between different parameters in the same parameter list and for individual fields with a variable / parameter / return value is a struct. The offset parameter into these member functions allow you to link up function call arguments to function parameters. For instance, instrumentCallExprArg with offset 3 can be paired up with instrumentBirthParm with offset 3.

Another unique feature with member functions that use the offset mechanism is that they use a string to store the expression opposed to using an instance of an expression class. The reason for this is two-fold: 1. Some of these events don't correspond to expressions (like death of a variable). 2. To avoid creating temporary expression classes for each member of a struct. Since the string does not convey type information, the type of the expression is passed separately. The function instrumentFieldCopy also employs strings in a similar fashion.

There are also two functions beginInstrumentation and endInstrumentation that are called before and after the instrumentation phase respectively. The beginInstrumentation function can be used to initialize or allocate any variables needed by the instrumentation model. The endInstrumentation function can be used to perform any post-processing such as gather statistics or performing sanity checks.

Once you determine which constructs the need instrumentation, you need to add code that inserts the instrumented code into the proper list. There are several helper functions in instr-call.cpp to assist in this process. Here is the most common flow which adds a function call to one of the instrumentation routines:

1. Initialize the call by calling initInstrCall. It takes one optional Boolean parameter which if set to true will automatically add parameters for the file name, line number, and unique identifier associated with this instrumentation call.

2. Add parameters to the function. The table below outlines the different routines are available depending on the type and use of parameter.

3. Complete the call by calling addInstrCall. This function takes two parameters: a list to add the instrumentation statement to (either the before or after list) and the name (string) of the instrumentation routine to call.

Routine	If x is ___	Then, add ___	Used For
addStringParm(string x);	foo	foo	Any expression in string form. Useful for constructing your own expressions by concatenating different stings.
addStringConstantParm(string x);	foo	“foo”	Any string used as a constant within the instrumentation routine.
addIntegerParm(int x);	23	23	Any integer constant.
addExprParm(Expr *x);	a[i]	a[i]	The value of an expression or variable.
addStringExprParm(Expr *x);	a[i]	“a[i]”	The expression as a string - useful for debugging and diagnostics.
addAddrOfExprParm(Expr *x);	a[i]	&(a[i])	The address of an expression (assuming it is addressable); useful for models that track variables by address.
addDeclParm(Decl *x);	a	a	The variable associated with a declaration.
addSizeParm(Type *x);	int	sizeof(int)	The size of a type.
addObjSizeParm(Type *x);	int *	sizeof(int)	The size of the type pointed to (or the size of the element for arrays).
addFileNameParm();	-	-	The current file name.
addLineNumParm();	-	-	The current line number.
addUniqueIdParm();	-	-	A unique integer that is incremented after each call to an instrumentation routine. Used to distinguish between multiple calls that occur on the same source code line.

3.1 Instrumentation Model Member Functions

*Begin/End Functions*		*before* *list*	*after* *list*	*isLhs* *parameter*	*offset* *parameter*
instrumentBeginProgram	Beginning of a program (start of main)	No	Yes	No	No
instrumentEndProgram	End of the program	Yes	No	No	No
instrumentBeginFunction	Beginning of a function	No	Yes	No	No
instrumentEndFunction	End of a function	Yes	No	No	No
instrumentBeginBlock	Beginning of a statement block '{'	No	Yes	No	No
instrumentEndBlock	End of a statement block '}'	Yes	No	No	No
*Declaration instrumentation functions*		*before* *list*	*after* *list*	*isLhs* *parameter*	*offset* *parameter*
instrumentBirthVariable	Declaration of a variable	No	Yes	No	Yes
instrumentDeathVariable	Lifetime of variable ends	Yes	No	No	Yes
*Parameter / return value instrumentation functions*		*before* *list*	*after* *list*	*isLhs* *parameter*	*offset* *parameter*
instrumentCallExprArg	Function call - called for each argument	Yes	No	No	Yes
instrumentBirthParm	Parameter declaration - called for each parameter	No	Yes	No	Yes
instrumentBirthMainParm	Parameter declaration in main - called for each parameter	No	Yes	No	Yes
instrumentDeathParm	Lifetime of parameter ends - called for each parameter	Yes	No	No	Yes
instrumentDeathMainParm	Lifetime of parameter in main ends - called for each parameter	Yes	No	No	Yes
instrumentReturnVal	A value is returned (callee side) - called for each data member if a struct is returned	Yes	No	No	Yes
instrumentReturnFromVal	A value is returned (caller side) - called for each data member if a struct is returned	No	Yes	No	Yes
*Stmt instrumentation functions*		*before* *list*	*after* *list*	*isLhs* *parameter*	*offset* *parameter*
instrumentExprStmt	expression statement	Yes	Yes	No	No
instrumentIfStmt	if statement	Yes	No	No	No
instrumentElseStmt	else statement	No	Yes	No	No
instrumentSwitchStmt	switch statement	Yes	No	No	No
instrumentForStmt	for loop	Yes	No	No	No
instrumentWhileStmt	while loop	Yes	No	No	No
instrumentDoStartStmt	start of a do-while loop	Yes	No	No	No
instrumentDoEndStmt	end of a do-while loop	No	Yes	No	No
instrumentGotoStmt	goto statement	Yes	No	No	No
instrumentBreakStmt	break statement	Yes	No	No	No
instrumentContinueStmt	continue statement	Yes	No	No	No
instrumentCallStmt	function call	Yes	No	No	No
instrumentSystemCall	function call to system function or any function where source code is not present	Yes	Yes	No	No
instrumentReturnStmt	return stmt	Yes	No	No	No
instrumentReturnFromStmt	returning from a function call	No	Yes	No	No
instrumentVaStartStmt	va_start statement	Yes	Yes	No	No
instrumentVaEndStmt	va_end statement	Yes	Yes	No	No
instrumentLabelStmt	label	No	Yes	No	No
*Expr instrumentation functions*		*before* *list*	*after* *list*	*isLhs* *parameter*	*offset* *parameter*

instrumentConstant	constant	Yes	Yes	No	No
instrumentStringConstant	string constant	Yes	Yes	No	No
instrumentRhsVar	lone variable expression	Yes	Yes	Yes	No
instrumentRhsUnaryExpr	unary expression	Yes	Yes	No	No
instrumentRhsBinaryExpr	binary expression	Yes	Yes	No	No
instrumentRhsRelExpr	relational expression	Yes	Yes	No	No
instrumentRhsCastExpr	cast expression	Yes	Yes	No	No
instrumentRhsSizeofExpr	sizeof expression	Yes	Yes	No	No
instrumentRhsAddrOfExpr	address of expression	Yes	Yes	No	No
instrumentRhsAddrArrowExpr	address of expression involving member selection via pointer	Yes	Yes	No	No
instrumentRhsDerefExpr	pointer dereference	Yes	Yes	Yes	No
instrumentRhsMemberExpr	member selection	Yes	Yes	Yes	No
instrumentRhsPtrMemberExpr	member selection via pointer	Yes	Yes	Yes	No
instrumentRhsArrayExpr	array reference	Yes	Yes	Yes	No
instrumentRhsVaArgExpr	va_arg expression	Yes	Yes	No	No
instrumentRhsBuiltinExpr	built-in expression to compiler	Yes	Yes	No	No
instrumentCastBirth	reassignment of a variable to a new type – called for each data member if new type is a struct	No	Yes	No	No
instrumentFieldCopy	field assignment when assigning one struct to another - called for each data member of the struct	Yes	Yes	No	No

4. Simplified C Grammar

This section describes the grammar of a program that is the output of the simplification process. Simplifying the program significantly reduces the complexity of future data analysis and instrumentation phases. The grammar is based on formats used in GCC [2] and Hendren et. al. [3].

4.1 Statements

stmt

: comp_stmt

| assign_expr ';'

| IF '(' bool_val ')' comp_stmt

| IF '(' bool_val ')' comp_stmt ELSE comp_stmt

| SWITCH '(' bool_val ')' comp_stmt

| FOR '(' ',' bool_val ',' ')' comp_stmt

| WHILE '(' bool_val ')' comp_stmt

| DO comp_stmt WHILE '(' bool_val ')'

| BREAK ';'

| CONTINUE ';'

| GOTO IDENT ';'

| IDENT ':' /* Label – treated as separate statement */

| CASE INTEGER_CST ':'

| function_call ';'

| RETURN ';'

| RETURN val ';'

| VA_START '(' IDENT ',' IDENT ')' ';'

| VA_END '(' IDENT ')' ';'

| decl_stmt ';'

comp_stmt

: '{' stmt_list '}'

| '{' '}'

stmt_list

: stmt_list stmt

| stmt

assign_expr

: lhs '=' rhs

function_call

: IDENT '(' arg_list ')'

| IDENT '(' ')'

| lhs = IDENT '(' arg_list ')'

| lhs = IDENT '(' ')'

arg_list

: arg_list ',' val

| val

Notes:

· Control constructs are simplified by having a test condition of a single variable or constant. This variable or constant must be a Boolean (integer) type. Pointers are replaced by comparisons to 0.

· After simplification, only two types of expression statements are allowed: assignments and function calls (which can optionally have an assignment). Function calls can only have argument lists that consist of a single variable or constant.

· Scopes have been added (using ‘{‘ and ‘}’) for loop bodies, then clauses, else clauses, and switch bodies. This allows for statements and declarations to be added at any point in the program.

4.2 Expressions

lhs

: IDENT

| deref_expr

| member_expr

| ptr_member_expr

| array_ref

| ind_array_ref

rhs

: constant

| STRING_CONSTANT

| IDENT

| unary_expr

| binary_expr

| rel_expr

| cast_expr

| sizeof_expr

| addrof_expr

| addr_arrow_expr

| deref_expr

| member_expr

| ptr_member_expr

| array_expr

| va_arg_expr

constant: INTEGER_CST | CHAR_CST | FLOAT_CST

unary_expr: UNARY_OP val

binary_expr: val BINARY_OP val

rel_expr: val REL_OP val

cast_expr: '(' type ')' val

sizeof_expr: SIZEOF '(' IDENT ')' | SIZEOF '(' type ')'

addrof_expr: '&' IDENT | '&' member_expr

addr_arrow_expr: '&' ptr_member_expr

deref_expr: '*' IDENT

member_expr: IDENT '.' IDENT

ptr_member_expr: IDENT "->" IDENT

array_expr: array_expr '[' val ']' | IDENT '[' val ']'

va_arg_expr: VA_ARG '(' IDENT ',' type ')'

val: id | constant

bool_val: id | constant /* must be integer type */

Notes:

· Expressions that do nothing (such as ‘a+b;’) are eliminated during the simplification process.

· The following operators are converted into a functionally equivalent expression: ‘,’ , ‘++’, ‘--’, ‘->’, ‘?:’. The gcc extension of statement expressions are supported by the parser but replaced during the simplification process.

· The short-circuited operators ‘&&’ and ‘||’ are converted into if-then-else statements.

· Temporary variables for integers and floats are all created with type long and double respectively. Per the C standard, values of temporary expressions can use larger types of the same family.

· For variables declared arrays, array reference may include as many dimensions are allowed as needed. However, if an array reference uses a pointer variables, only one level of indirection (set of braces []) is permitted.

4.3 Declarations

Declarations are handled in the same manner as C but initializers are not permitted. Initializers are created as separated statements after the declaration. For global variables, the simplified code is placed into special initializer functions – one for each file in the program.

To ensure compatibility, initializers are permitted and remain on local static variables and variables declared as constants. However, the remaining analyses ignore these initializers.

5. Simplification

To determine when statements and expressions are properly simplified, the enumeration smplMode defines the following five constants. Here is what each constant is used for:

SM_expr: expression of an expression statement

SM_ rhs: right hand side of an assignment expression

SM_lhs: left hand side of an assignment expression

SM_addr: operand for address-of operator (&)

SM_boolean: condition for if/else, switch, and loops – must be a lone integer variable or constant

SM_var: anywhere a lone variable is needed, specifically:

operands for dereference (*), member (.), and indirect member operations (->)

pointer for indirect array references

function for function call (a variable is created for each function)

replacement expression when simplifying increment (++) or decrement (--)

SM_var_const: anywhere a lone variable or constant is needed, specifically:

condition for if/else, switch, and loops

return value

operands for arithmetic, relational, and cast operations

operands for va_start, va_arg, and va_end

index for array references

each argument in a function call

SM_var_array: anywhere a lone variable or array reference is needed, specifically:

array for array references

A table describing how each expression responds to these constants is on the next page.

There are two key functions that drive the simplification:

simplifyBlock: This function simplifies all statements in a block. It is responsible for making sure the additional statements produced by the simplification process are inserted properly. At the end of the phase, a loop will traverse through all statements removing any statements that have no side effects whatsoever (no assignments or control).

simplifyExpr: This function is the starting point for all expressions. It will call a helper function based on the expression type. If the expression is not simplified enough, it is replaced with a temporary variable or a dereference if an lvalue is needed.

Expression	SM_expr	SM_rhs	SM_lhs	SM_addr	SM_boolean	SM_var	SM_var_const	SM_var_array
Constant (string)	Yes	Yes	No	No	No	No	No	No
Constant (other)	Yes	Yes	No	No	1	No	Yes	Yes
Variable (lone)	Yes	Yes	Yes	Yes	1	Yes	Yes	Yes
Unary Operators	2	2	No	No	No	No	No	No
Arithmetic Operators	2	2	No	No	No	No	No	No
Relational Operators	2	2	No	No	No	No	No	No
Casts	2	2	No	No	No	No	No	No
va_arg Expression	2	2	No	No	No	No	No	No
Sizeof Operator	Yes	Yes	No	No	No	No	No	No
Addr Of Operator (&)	3	3	No	No	No	No	No	No
Addr Arrow (&a->b)	4	4	No	No	No	No	No	No
Incr (++) / Decr (--)	No	No	No	No	No	No	No	No
And (&&) / Or (\|\|)	No	No	No	No	No	No	No	No
Comma (,)	No	No	No	No	No	No	No	No
Conditional (?:)	No	No	No	No	No	No	No	No
Statement Expression	No	No	No	No	No	No	No	No
Dereference (*)	4	4	4	No	No	No	No	No
Member (.)	4	4	4	4	No	No	No	No
Arrow (->)	4	4	4	No	No	No	No	No
Array Reference	5	5	5	No	No	No	No	6
Assign	7	No	No	No	No	No	No	No
Assign w/Arith	No	No	No	No	No	No	No	No
Function Call	8	8	No	No	No	No	No	No

1 Yes, if operand has integer or Boolean type.

2 Yes, if operand(s) are SM_var_const.

3 Yes, if operand is SM_addr.

4 Yes, if operands are SM_var.

5 Yes, if array (or array pointer) is SM_var and array index is SM_var_const.

6 Yes, if array reference is for an array variable (not a pointer).

7 Yes, if left hand side is SM_lhs and right hand side is SM_rhs.

8 Yes, if function is SM_var and each argument is SM_var_const.

6. Data Flow Analysis

6.1 Statements

BeginBlockStmt	defsout:	A definition is created for each parameter.
	killed:	none
	used:	none
ExprStmt	defsout:	Based on left-hand side.
	killed:	Based on left-hand side.
	used:	All definitions that reach the statement and are used as input on either side of the assignment.
IfStmt, SwitchStmt, ForStmt, WhileStmt, DoWhileStmt	defsout:	none
	killed:	none
	used:	Active definitions corresponding to the control condition.
CallStmt	defsout:	A definition is created for any variable that could be pointed to by any argument using any level of indirection.
	killed:	none
	used:	All definitions that reach the statement and correspond to a variable used as an argument. For system calls, variables pointed to by an argument (using any level of indirection) are also considered used.
ReturnStmt	defsout:	none
	killed:	none
	used:	All active definitions of the return value (if statement has return value).
ReturnFromStmt	defsout:	Based on left-hand side.
	killed:	Based on left-hand side.
	used:	Any variable used on the left-hand side as an input.
VaStartStmt, VaEndStmt	defsout:	A definition is created for the list variable.
	killed:	All other definitions associated with the list variable.
	used:	none
all other statements	defsout:	none
	killed:	none
	used:	none

6.2 Left Hand Side Expressions

VarExpr a	defsout:	A definition for the variable.
	killed:	All definitions corresponding to the variable except for the definition generated this statement.
	used:	none
DerefExpr *a	defsout:	A definition for every variable that can be pointed by the variable.
	killed:	none
	used:	Reaching definitions for the pointer used to make the dereference.
MemberExpr a.b	defsout:	A definition for the struct variable.
	killed:	none
	used:	none
PtrMemberExpr a->b	defsout:	A definition for every variable that can be pointed by the variable.
	killed:	none
	used:	Reaching definitions for the pointer used to make the dereference.
ArrayExpr a[b]	defsout:	A definition for the any variable that can be pointed by the array/pointer.
	killed:	none
	used:	Reaching definitions for variables that represent the array/pointer and index.

7. Data Structures

This section describes the key data structures in SUDS. Here is a summary of the AST components:

A project encompasses all the source code. There is only one project variable when SUDS is running. A project contains a list of translation units.

A translation unit refers to a single source code file. Each file contains a list of statements. The list of statements is restricted to declarations (whether it be types, variables, or functions).

A function definition is a special type of statement that refers to a function definition (not a function prototype). In addition to information about the function, a function contains a list of statements that comprise the function body.

A statement refers to a C statement. For simplicity, there are also statements that refer to C constructs (such as case label) that are not necessarily considered statements in the C programming language. Statements reference other statements, expressions, functions, and variables.

An expression refers to a C expression. Expressions can refer to other expressions and variables.

A variable refers to a declared variable in C. It hold various attributes about the variable and a pointer to its type.

A declaration is a special statement that declares either a variable or a type.

A type refers to a type in C.

Other data structures in SUDS are listed in Section 7.9.

One important note about the data structures. SUDS violates many principles of good object-oriented design. For instance, most, if not all of the data members are public. The program was designed in a more C-like procedural fashion using inheritance as syntactic sugar. From my point of view it was better than using alternatives such as unions, void pointers, or large switch statements. At some point in the future, I may consider a redesign of SUDS.

7.1 Project

The Project class (project.h) encompasses all the source code for the program. There is only one project variable in SUDS. It stores a list of translation units – one for each source code file. It also stores a pointer to a special initializer function that is added to the beginning of main.

Most tasks in SUDS that requires traversing the source code will start at the project level. The project member functions merely call the same function for each of its translation units.

Data members:

vector<class TransUnit*> units; // list of translation units

class FunctionDef *mainInit; // main initializer function

7.2 Translation Unit

A TransUnit (transunit.h) is a class that represents a single source code file in a project. A TransUnit stores a list of statements. These statements can only be declarations of types, variables, or functions. Code within a function is associated with that function. It also stores a pointer to the initialization function associated with this file. Each instrumented source code file contains a special initialization function that is called when the program starts. This function is used by the simplification and instrumentation phases as a place to add code when processing global variable declarations.

As with the project, the member functions are largely used for tasks that traverse the source code.

Data members:

class Stmt *head; // The first statement in the list.

class Stmt *tail; // The last statement in the list.

const string filename; // The file we parsed this from.

class FunctionDef *initFunc; // The initializer function.

7.3 Function Definition

The FunctionDef class (func.h) refers to a function definition in C. It is derived from the Stmt base class (described later in this section) because it is considered a top-level variable declaration statement for the function. Since a function definition cannot appear within a function, it really does not behave like the other statement classes that are defined later.

The function really has two distinct parts: the declaration and the function body. The declaration stores all of the information about the function such as the parameters and return type (in reality, most of this is stored in a function type variable within the declaration). The function body is stored as a linked list of statements, starting at head. The other data members are used to store information from various phases in SUDS.

Data members:

class Decl *decl; // function declaration

string name; // name of function

class Stmt *head; // first statement in function

class Stmt *tail; // last statement in function

int startBbl; // id of first bbl

int endBbl; // id of last bbl

bool usefulFunc; // set if function is reachable from main

set<int> returnBbls; // set of bbls that end the function

map<string, class Stmt *> labelTable; // mapping of labels to stmts

class PointsToInfo ptiEntry; // points to information at beginning

class PointsToInfo ptiExit; // points to information at end

set<class FunctionDef *> funcsCalled; // list of functions called

set<int> lvars; // list of local variables declared

set<int> funcDefs; // defs in the function

dynamic_bitset<> endReaches; // set of defs live at end of function

vector< set<int> > parmDefs; // defs for each parameter

set<class Stmt *> callerStmts; // set of statements that call this function

set<class Stmt *> propToSlice; // set of call statements to propagate to

bool inSlice; // true if function is in slice

7.4 Statement

A Stmt (stmt.h) refers to a C statement. To make analysis easier, there are some additional statements that refer to C constructs not normally considered as statements.

Each statement has a code that represents the type of statement and a unique id. Statements are connected together using a doubly linked list. Functions in stmtlist.cpp can be used to manipulate linked lists of statements.

Data members:

StmtCode code; // kind of statement

Location location; // location in source code

unsigned long id; // unique statement id

bool isInitStmt; // true if stmt used to initialize a variable

bool bblStart; // true if stmt is a bblock starting point

bool bblEnd; // true if stmt is a bblock ending point

class Bblock *bbl; // pointer to basic block node that statement resides in

class FunctionDef *fn; // function in which stmt resides

class PointsToInfo pti; // points-to information before statement

set<int> defsout; // defs created by this statement

set<int> killed; // defs killed by this statement

bool inSlice; // set if in slice

bool sliceProcessed; // set if variable has been processed during slicing

bool isTainted; // set if at least one def is tainted

Stmt *prev; // For making a list of statements.

Stmt *next; // For making a list of statements.

There are many derived classes of statements described in the following table:

*Class Name*	*Description*
BeginBlockStmt	The beginning of a block of code starting with a '{'. Contains a list of declaration statements to any variables declared in this score.
EndBlockStmt	The end of a block of code ending with a '}'.
ExprStmt	An expression statement, typically assigns a value to a variable.
IfStmt	if statement
ElseStmt	The else part of an if statement.
SwitchStmt	switch statement
ForStmt	for loop
WhileStmt	while loop
DoStartStmt	Start of a do-while loop (the word "do")
DoEndStmt	End of a do-while loop (the test condition)
EndControlStmt	Dummy statement that marks the end of a control statement (such as if) or loop.
BreakStmt	break statement
ContinueStmt	continue statement
GotoStmt	goto statement
LabelStmt	A label in C. This could be a case label, default label, or a label that you can goto to.
CallStmt	The first of two instructions used to make a function call. This class stores the function(s) being called and the arguments used to make the function call.
ReturnStmt	return statement
ReturnFromStmt	The second of two instructions used to make a function call. This calls keeps track where the return value, if any, is stored when the function returns.
NullStmt	An empty statement
VaStartStmt	Refers to a va_start call, used in functions with variable-length parameter lists.
VaEndStmt	Refers to a va_end call, used in functions with variable-length parameter lists.
DeclStmt	Declaration of a variable or type.
FunctionDef	Function definition.
InstrumentedStmt	A statement added by the instrumentation engine. The instrumented statement is stored in a string.

7.5 Expression

An Expr (expr.h) refers to a C expression. All expressions have a code that designates the kind of expression and have a data type which can be used by the instrumentation engine.

Data members:

ExprCode code; // variety of expression

Location location; // location in program

const class Type *type; // type of expr (such as int)

Expr *next; // linked list connector

There are many derived classes of statements described in the following table:

*Class Name*	*Example*	*Description*
Constant	5	The beginning of a block of code starting with a '{'. Contains a list of declaration statements to any variables declared in this score.
VarExpr	x	A single variable
UnaryExpr	-x	A unary expression (see list of operators below)
BinaryExpr	x + y	A binary expression (see list of operators below)
RelExpr	x < y	A relational expression (see list of operators below)
CastExpr	(int) x	Cast to a different type
SizeofExpr	sizeof(x)	sizeof expression
AddrOfExpr	&x	address of expression
AddrArrowExpr	&(x->y)	address of expression involving a member selection deference
DerefExpr	*x	pointer dereference
MemberExpr	x.y	member selection
PtrMemberExpr	x->y	member selection via pointer dereference
ArrayExpr	x[y]	array reference
VaArgExpr		va_arg expression, used in variable-length argument lists
BuiltinExpr		builtin expression to the compiler
AssignExpr	x = y	assignment expression, also includes variants such +=.
CallExpr		function call
TrinaryExpr	x ? y : z	trinary logical choice expression
StmtExpr		gcc extension that allows a statement to be embedded into an expression

The last three expressions CallExpr, TrinaryExpr, and StmtExpr are not present in the AST after simplification.

Here is a list of operators:

enum UnaryOp		enum BinaryOp
UO_Plus	+	BO_Plus	+
UO_Minus	-	BO_Minus	-
UO_BitNot	~	BO_Mult	*
UO_Not	!	BO_Div	/
UO_PreInc	++	BO_Mod	%
UO_PreDec	--	BO_Shl	<<
UO_PostInc	++	BO_Shr	>>
UO_PostDec	--	BO_BitAnd	&
		BO_BitXor	^
enum AssignOp		BO_BitOr	\|
AO_Equal	=	BO_And	&&
AO_PlusEql	+=	BO_Or	\|\|
AO_MinusEql	-=	BO_Comma	,
AO_MultEql	*=
AO_DivEql	/=	enum RelOp
AO_ModEql	%=	RO_Equal	==
AO_ShlEql	<<=	RO_NotEqual	!=
AO_ShrEql	>>=	RO_Less	<
AO_BitAndEql	&=	RO_LessEql	<=
AO_BitXorEql	^=	RO_Grtr	>
AO_BitOrEql	\|=	RO_GrtrEql	>=

The increment, decrement, (logical) and, (logical) or, and comma operators are not present in the AST after simplification. AO_Equal (=) is the only variant of the AssignExpr that is permitted after simplification. Most of the analysis takes advantage of this fact.

7.6 Variable

A Var (var.h) represents a variable. Each variable is assigned a unique identifier that is used in many of the analyses. The Var class stores a pointer to the variable's declaration, its type, and its entry in the symbol table.

Data members:

int id; // unique id

string name; // variable name

class Decl *decl; // variable declaration

class Type *type; // variable type (from decl)

class SymEntry *st_entry; // symbol table entry

set<int> def_set; // list of definitions of the variable

bool isGlobal; // set if not declared in function

bool isHeap; // set if refers to heap variable

bool inSlice; // set if variable is in slice

bool isTainted; // set if tainted

7.7 Declaration

A Decl (decl.h) represents a variables declaration and is contained within a DeclStmt. It contains information about its name, type, and various properties. Declarations are not used during the analysis portion of SUDS – the Var class is used to keep track of the various states of variables.

Data members:

StorageCode storage; // static or typedef or none

class Symbol *name; // symbol being declared

class Type *type; // type of the declaration

class GccAttrib *attrib; // optional gcc attribute

class Expr *initializer; // initial value

class Var *var; // pointer to variable object

class Decl *next; // For linking into lists

7.8 Type

A Type (type.h) refers to a data type in C. The basic type (such as int, array, pointer) is represented by a code (see list below). Any storage indicator and/or type qualifiers are also stored (see list below). Many types use a subtype (pointers, arrays, and functions to name a few). This is also captured in the base type class. Additional types store additional information.

For user-defined types created using typedef, the type class stores the basic type information along with the type name.

Names for structs, unions, and enums are called tags. The derived classes for these types store the tag names.

Data members:

TypeCode code; // basic type

StorageCode storage; // storage of type

TypeQual qualifier; // type qualifier(s)

class GccAttrib *attrib; // gcc attribute (optional)

class Symbol *typeName; // NULL if not typedef

class Type *subType; // subtype for pts, arrays, funcs, etc.

bool useSubType; // set if type uses subtype

Here is a table of the derived type classes:

*Class Name*	*Description*
NoSpecType	No specified type – used to represent data on the heap
VoidType	void
IntType	integers, represents all variants and sizes of integers
FloatType	floating point numbers, represents all sizes of floating point numbers
PointerType	pointers
ArrayType	arrays
BitFieldType	bit fields
FunctionType	functions
StructType	structs and unions
EnumType	enumerated types
EllipsisType	The type of the "…" parameter used in variable length parameter lists
VaListType	The type of a va_list, used in variable length argument lists

Here is are lists of storage code and type qualifiers:

StorageCode		TypeQual
ST_None		TQ_None
ST_Auto		TQ_Const
ST_Extern		TQ_Volatile
ST_Register		TQ_Inline
ST_Static
ST_Typedef

7.9 Other Data Structures

Here is a list of other data structures used in SUDS:

*Class Name*	*File Name*	*Description*
Bblock	bblock.h	basic block
Def	def.h	definition for data analysis
GccAttrib	gccattrib.h	gcc attribute extension
InstrDirector	instr.h	instrumentation director interface
Label	label.h	label
Location	parse.h	location in code
ParseEnv	parse.h	parsing environment
ParseEnvCtxt	parse.h	parsing environment context
ParseCtxt	parse.h	parsing context
PointsToInfo	pointer.h	points-to information
Symbol	symbol.h	symbol from lexer
SymEntry	symbol.h	entry in symbol table
ScopeTbl	symbol.h	stores symbol table entries at a particular scope
SymTbl	symbol.h	symbol table
TopSymTbl	symbol.h	top level symbol table that contains three symbol tables (labels, tags, all others); there is only one top level symbol table (global variable symTables)

8. Language Restrictions

This section outlines restrictions in the C language. You must either alter your program or fix these issues in SUDS.

Problem:	Cannot use default type (int) for variables (such as "static x = 3;")
Detected:	Yes - parser error (unconfirmed)
Workaround:	Add the proper type to the declaration
Possible fix:	Add 'int' automatically if no type appears
Likely to fix:	No (doesn't occur often enough)

Problem:	Old style function declarations are unsupported
Detected:	Yes - parser error
Workaround:	Convert function to standard style function declarations. Run CIL [5] ahead on source code file ahead of time.
Possible fix:	Alter grammar and symbol processing to work with old style functions; argument ordering is problematic.
Likely to fix:	No (too many obstacles to overcome)

Problem:	Default declarations for functions unsupported
Detected:	Yes - parser error (unconfirmed)
Workaround:	Add the proper type to the declaration
Possible fix:	Add 'int' automatically if no type appears
Likely to fix:	No (doesn't occur often enough)

Problem:	Parsing of initializer lists outside of initializers
Detected:	Yes - parser error (unconfirmed)
Workaround:	Rewrite code to not use an initializer list
Possible fix:	Leverage simplification of initializer lists for declarations
Likely to fix:	No (doesn't occur often enough)

Problem:	Parsing problems with variables declared with unusual orders of storage classes, qualifiers, etc. (see K+R for proper rules)
Detected:	Yes - parser error
Workaround:	In most cases, it should be possible to rewrite the code.
Possible fix:	Redesign parsing of storage classes and qualifiers
Likely to fix:	No (doesn't occur often enough)

Problem:	Possible parsing problems with gcc attributes
Detected:	Yes - parser error
Workaround:	Eliminate or reposition the attribute
Possible fix:	The main problem here is understanding how to parse the gcc attributes. The documentation is confusing and "subject to change"
Likely to fix:	Maybe (if there is better information)

Problem:	Multiple struct variables declared on the same statement that also define the struct are not allowed
Detected:	Not detected by SUDS, code emitted from SUDS will not compile
Workaround:	Rewrite using non-generic structs
Possible fix:	Alter the type of all but the first declaration such that fields are not printed out.
Likely to fix:	No (doesn't occur often enough)

Problem:	Generic structs declared in structs that are included in multiple files are not allowed.
Detected:	Not detected by SUDS, code emitted from SUDS will not compile
Workaround:	Rewrite using non-generic structs if possible, may need to alter system include files.
Possible fix:	Change symbol table management.
Likely to fix:	Maybe (symbol table is fragile)

Problem:	Const / static variables can cause problems because initialization is separated from declaration
Detected:	Not detected by SUDS, code emitted from SUDS will not compile
Workaround:	Rewrite not using const variables
Possible fix:	Two possibilities: (1) Redo remaining phases to work with declarations that have initializers. (2) Automatically remove const modifier (does not work with static).
Likely to fix:	Maybe (occurs often, second fix is not hard but could cause other problems)

Problem:	Complex expressions in address-of expressions can cause SUDS to fail
Detected:	Yes – simplification error
Workaround:	In most cases, it should be possible to rewrite the code
Possible fix:
Likely to fix:	No (doesn't occur often enough, no easy fix)

Problem:	Arrays declared static will not have sizes computed
Detected:	?
Workaround:	In most cases, it should be possible to remove the static designation.
Possible fix:	Compute the size despite the static designation
Likely to fix:	Yes

Problem:	Cast operations are separated from the expressions creating temporary variables of invalid types.
Detected:	Not detected by SUDS, code emitted from SUDS will produce warnings and/or errors when compiled.
Workaround:	Most warnings can be ignored, may or may not be possible to rewrite the code to avoid any warnings or errors.
Possible fix:	Could allow each expression to optionally have a cast instead of breaking it into two pieces
Likely to fix:	Not likely (non-ignorable errors don't occur often enough)

Problem:	Static local variables with initializers are problematic for instrumentation
Detected:	No
Workaround:	Try to rewrite code if possible
Possible fix:	Instrumentation would have to support non-simplified initializers
Likely to fix:	No (too much work)

Problem:	Variables declared using 'register' will lose 'register' designation since instrumentation models often take the addresses of variables.
Detected:	Losing the register designation should not cause a functionality problem.
Workaround:	not possible to workaround
Possible fix:	Add a command line switch to allow users to indicate how the register designation should be handled.
Likely to fix:	Maybe (not that hard but not that useful)

Problem:	Arrays declared extern with unknown sizes and no initializer
Detected:	No
Workaround:	Rewrite using a 'char *'.
Possible fix:	Use the size when it is declared without extern
Likely to fix:	No (requires modification to the fragile symbol table)

Problem:	Variable or field with same name as typedef name
Detected:	Yes – parser error
Workaround:	Rename one or other
Possible fix:	Have a separate table for typedef names. Another issue is to more strictly regulate the possible Type field
Likely to fix:	No (requires modification to the fragile symbol table)

9. Other Known Problems

Problem:	Investigate the scope table as only struct/enum definitions go into external scope. Should others go to external scope or should the global struct/enum definitions go into level 2?
Result:	No problem results from this, this symbol table dump appears off.
Possible fix:	Redo symbol table.
Likely to fix:	No (unless problem occurs)

Problem:	Loss of quantifiers (const/inline/volatile) for temporary variables – needed because temp variables are often not constant and cannot be declared inline
Result:	Some code emitted from SUDS may not compile.
Possible fix:	?
Likely to fix:	No (can't think of a reasonable fix)

Problem:	May not model all system functions that return pointers to arbitrary arrays.
Result:	Static analysis may miss some pointer relationships causing the data flow analysis to be too aggressive.
Possible fix:	Update the list of system functions that return pointers to arbitrary arrays.
Likely to fix:	Case by case basis when detected

Problem:	System calls that alter the points-to information of its parameters.
Result:	Static analysis may miss some pointer relationships causing the data flow analysis to be too aggressive.
Possible fix:	Make a concerted effort to recognize these functions and alter the points-to information.
Likely to fix:	Maybe

Problem:	List of input functions is incomplete
Result:	Missed opportunity to catch bugs
Possible fix:	Add missing functions to list.
Likely to fix:	Fix when missing input function is discovered

Problem:	Instrumented statements do not have a location.
Result:	Some output messages are awkward
Possible fix:	Have the instrumented statement grab a location (not always easy)
Likely to fix:	Maybe

Problem:	Handling of variable-arg function calls in instrumentation engine (including va_start and va_end)
Result:	Instrumentation capabilities for variable-arg function calls is limited.
Possible fix:	Depends on what kind of instrumentation, if any, would be added in these cases.
Likely to fix:	Not likely

Problem:	Processing of unary plus operator
Result:	?
Possible fix:	Convert unary plus to varExpr
Likely to fix:	No (doesn't occur often enough)

Problem:	derefUses is a deviation from the current strategy of recomputing the uses when necessary.
Result:	Could be a performance and space issue.
Possible fix:	Require recomputing the deref uses along with other uses.
Likely to fix:	No (unless performance and space become an issue)

Problem:	More precise type for array constants.
Result:	Missed opportunity to introduce instrumentation because of missing type.
Possible fix:	Create the array type for the constant.
Likely to fix:	No (doesn't occur often enough)

Problem:	More precise types for binary expressions (size, unsigned, etc.)
Result:	Missed opportunity to introduce instrumentation because of missing type. Potential issues for instrumentation relying on the precise size of variables.
Possible fix:	Redo computation of binary expressions.
Likely to fix:	No (unless it becomes a problem; may revisit if type system is overhauled)

Problem:	Sizes of variables not accurately tracked (esp. large constants)
Result:	Potential issues for instrumentation relying on the precise size of variables.
Possible fix:	Fix SUDS to strictly follow the rules of converting types in C.
Likely to fix:	No (unless it becomes a problem; may revisit if type system is overhauled)

Problem:	Types organization is messy (needs overhaul)
Result:	Hard to understand and modify code related to types.
Possible fix:	Overhaul type system
Likely to fix:	No (unless SUDS is redesigned)

Problem:	Printing of declarations is messy.
Result:	Hard to understand and modify code related to printing declarations.
Possible fix:	Redesign code that prints declarations
Likely to fix:	No (unless SUDS is redesigned)

Problem:	Duplication of statements: dup seems broken for begin block and call stmt
Result:	No known problems at the problem
Possible fix:	Avoid using dup but provide copy list functions.
Likely to fix:	No (unless it becomes a problem; would revisit in a redesign of SUDS)

Problem:	Memory management: constructors, destructors, duplicators
Result:	Memory is wasted, especially for types
Possible fix:	Redo memory management
Likely to fix:	No (unless it becomes a problem; would revisit in a redesign of SUDS)

Problem:	Get strange messages (such as warn_unused_result) when using include files and libraries from latest Linux release.
Result:	Get strange messages (such as warn_unused_result) when running SUDS.
Possible fix:	Need to understand what is causing the messages
Likely to fix:	Maybe (if I can figure out why this is happening)

9.1 Enhancements

Enhancement:	Embed the preprocessor command inside suds
Likely to add:	Maybe

Enhancement:	Add routine to compare types for use in sanity or other checks.
Likely to add:	No (unless SUDS is redesigned)

References

[1] cTool. http://sourceforge.net/projects/ctool/

[2] Free Software Foundation. SSA for Trees. http://www.gnu.org/software/gcc/projects/tree-ssa

[3] L. Hendren, C. Donawa, M. Emami, G. Gao, Justiani, and B. Sridharan. Designing the McCAT Compiler Based on a Family of Structured Intermediate Representations. Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing, August 1992.

[4] M. Hind, M. Burke, P. Carini, and J. Choi. Interprocedural Pointer Alias Analysis. ACM Transactions on Programming Languages and Systems. July 1999.

[5] G. Necula, S. McPeak, S. P. Rahul, W.Weimer. CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. International Conference on Compiler Construction, Apr. 2002