SUDS User Guide
Eric Larson
elarson@seattleu.edu
Seattle University
Table of Contents
PART I: Using SUDS
PART II: Internals of SUDS
PART III: Issues
To install SUDS:
1. Switch to the top level directory of suds.
2. Type: configure
3. Type: make
Notes:
- SUDS uses the dynamic bitset library from boost. This dependency is not checked by the configure script. You may need to install this library which I believe is included in most Linux distributions.
- Running 'make install' merely copies the executables into the bin directory provided in the distribution.
When you make SUDS, you will notice that seven executables are created. Each is built using a different instrumentation model:
suds-inref Input reference checker
suds-strchk String checker
suds-array Array checker
suds-arith Input arithmetic checker
suds-null Null dereference checker
suds-trace Traces function calls and returns
suds-none No instrumentation model
SUDS only works with preprocessed source code files (.i). Initially you will need to create the .i files using your compiler. (For gcc, use the –E switch).
The most basic command line for SUDS is this:
suds-null sample.i
This will create an instrumented file sample.suds.c.
You can control the suffix of the file using the suffix switch like this:
suds-null –suffix null sample.i
This will create the instrumented file sample.null.c.
Since SUDS can work on the whole program, you can specify multiple files in a separate list file. Each line of the file contains the name of a file without the .c or .i suffix. Then use the –f switch on the command line to specify the name of the file like this:
suds-null –f file_list.txt
This command will create a .suds.c file for every file in the list.
Once an instrumented file is created, it can be compiled using gcc or another C compiler. During the linking process it is necessary to link in the proper model that contains the instrumentation code. The models are located in the models directory:
libinref.a Input reference checker
libstrchk.a String checker
libarray.a Array checker
libarith.a Input arithmetic checker
libnull.a Null dereference checker
libtrace.a Traces function calls and returns
Here is an overview of the process:
Once the instrumented executable is created, run the program normally. The only difference is that the instrumented executable will print additional output related to the instrumentation model.
To run the static analysis phases of SUDS, you must use the –dfa switch to run the basic data flow analysis options. Then you specify if you want to run the –taint analysis or one of the five program slicing modes (see section 2.10 for more information). This command line runs taint analysis and slices based on dangerous statements in the function foo:
suds-inref –dfa –taint –slice_fn_danger foo sample.i
Note: By default, the definitions of tainted and dangerous are tied to the input reference model. Using these modes for other models will have no effect on the instrumented code. However, it is possible to modify SUDS to change these definitions when creating your own models.
-help, -h |
show command line options |
-version, -V |
show the version number |
-file <file>, -f <file> |
file that contains a list of files to process |
-suffix <string> |
suffix of the output file |
-base |
emit the code as parsed (no simplification) |
-simple |
emit the code and exit after simplification |
-dfa |
run data flow analyses |
-taint |
run tainted propagation algorithm |
-no_output |
do not create output file |
-slice_all_danger |
run program slicing (all dangerous statements) |
-slice_fn_any <func> |
run program slicing (all statements in function) |
-slice_fn_danger <func> |
run program slicing (all dangerous statements in function) |
-slice_stmt_any <stmt> |
run program slicing (individual statement, all uses) |
-slice_stmt_danger <stmt> |
run program slicing (individual statement, danger uses) |
-ptr_flow_insens |
use flow insensitive pointer analysis |
-slice_ignore_ctrl |
use ignore indirect control slicing |
-instr_mode |
mode for instrumentation |
-debug <mode>, -d <mode> |
print debugging information |
-stats <mode> <tag> <file> |
print stats |
-dump_cfg <file> |
dump control flow graph to file |
-dump_dfa <file> |
dump data flow analysis to file |
-dump_dstmt <file> |
dump dangerous statements to file |
-dump_dfunc <file> |
dump dangerous functions to file |
-dump_slice <file> |
dump program slicing information to file |
-dump_syms <file> |
dump symbol tables to file |
-dump_vars <file> |
dump variable information to file |
-dump_defs <file> |
dump definitions to file |
The parser only works with preprocessed source code (.i file from gcc). This phase creates the AST and creates the symbol tables. Multiple files can be parsed with one invocation of SUDS creating a large AST that spans across files.
While almost all C programs can be parsed, the parser is not as mature as the parser in gcc or other compilers. In particular, it does not perform robust error checking or designed to give feedback on different types of errors. In addition, some legitimate C++ programs may not compile successfully. Section 8 describes the language restrictions.
The AST that is created cannot be directly used by the static analysis or instrumentation phases. The code must pass through the simplification phase first. However, it is possible to directly emit the parsed code using the "-base" option. This is useful if you want to modify the parser and test your changes.
The parsing and symbol table code is a modified version of the parser and symbol table found in cTool [1].
This phase creates special functions – one for each file that is a placeholder for simplified code and instrumented code that is related to initializers of global variables (not part of any function). In addition, a special function, called suds_init, is created for the file that contains main that calls all of these initializer functions. Finally, a call to suds_init is added to the start of main.
The simplification phase converts the program into a simpler format. The output of this phase is a program that is equivalent to the original program. The simpler format allows subsequent analysis phases to be less complex and permits the insertion of instrumentation anywhere in the code. Some highlights of the simplification process:
· All right-hand side expressions of an assignment operator only have one operator. For example, the expression a + b – ( c * d) is broken down into multiple statements.
· All side-effects (such as '++') or short-circuited operators are removed and replaced by equivalent statements.
· All loop and control conditions are replaced by a single variable or constant.
· Address of operations involving arrays (such as '&a[5]') are replaced by equivalent addition operations (such as 'a + 5' in this case).
The entire simplified grammar is shown in Section 4. More implementation notes are found in Section 5.
This phase completes the creation of the AST and other key data structures. It does the following:
1. Creates variable objects and attaches them to when they are used in expressions and statements.
2. Creates special statements that mark the end of a control statement.
3. Replaces function call expression with two statements: a call statement and a return from statement. This separates the functionality of making the call and the assignment of the return value after the call is complete.
4. Global variables (those declared outside of a function) are marked as such.
5. Extra “dummy” variables are created for each call to malloc (or other dynamic allocation functions). Dynamic memory is modeled by call site. All variables allocated at a particular call site are lumped together. Since dynamic memory could be used to create a structure containing an array, an array variable is created for each dynamic memory variable.
6. The simplified statements are numbered (unique id). This allows the user to specify statements as slicing criteria.
7. A set of local variables used in created in each function. During data flow analysis, a set of global variables is created.
The control flow graph is created by dividing each function into a set of basic blocks and computing the predecessor and successor sets for each basic block. This is done intraprocedurally. Function calls are treated as basic block ending points with the lone successor being the basic block immediately following the function call. Useless basic blocks are removed: any block that has no backwards path to the first basic block of the function.
To handle interprocedural control flow, a call graph is created. The call graph is not a separate structure, instead additional information is tracked with the following statements:
The set of basic blocks, predecessors and successors in particular, remain unchanged.
Since creating the call graph requires knowledge of where function pointers point to, the call graph is created in tandem with pointer analysis.
The interprocedural pointer analysis developed by Hind et. al. [4] is used. The information is stored in a PointsToInfo class which maps variables (represented by their id number) to the variables it can point to.
The algorithm can either be run in a flow-sensitive or flow-insensitive manner via the command line option –ptr_flow_insens. In the flow sensitive version, the points-to information is stored for each basic block. Due to space issues, the points-to information is recomputed for each statement when necessary. The flow-insensitive algorithm stores all of the points-to relationships into one global instance of the PointsToInfo class.
The data flow analysis phase is used to create these sets:
· defsout: The set of definitions generated by a statement or basic block.
· killed: The set of definitions no longer active after the statement or basic block.
· reaches: The set of active definitions at the beginning of a basic block or statement.
· used: The set of definitions that reach the statement and are used as input to the statement.
There are four steps to determining these sets:
1. Create the defsout set for each statement.
2. Create the killed set for each statement.
3. Create the defsout and killed sets for each basic block.
4. Create the set of reaching definitions for each basic block.
Due to memory space issues, the used sets are recreated for statements when necessary.
The analysis is done intraprocedurally. For each call site, live definitions are propagated from the caller to the callee function on a parameter by parameter basis. At the end of the function, definitions that refer to global variables are propagated to all return points. Other definitions are only propagated back to the caller if the definition refers to a variable that had at least one live definition prior to the call. Though the algorithm is not truly context-sensitive, this last rule restricts the flow of definitions from a call site to a completely different return point. Any definitions live in the caller just prior to the call are also considered live just after the call.
A summary of data flow analysis is found in Section 6.
This is not actually a phase. The code in danger.cpp implements the isDangerous member function for statements. This can be used either as a criterion in program slicing and/or to further focus the instrumentation.
Currently, the set of dangerous operations includes array references and pointer dereferences. This can easily be modified by controlling which uses are considered dangerous.
This will identify statements that are part of a slice based on a criterion. There are five ways for specifying the slicing criterion on the command line:
-slice_all_danger includes all dangerous statements in the program
-slice_fn_any <func> includes all statements in the given function
-slice_fn_danger <func> includes all dangerous statements in the given function
-slice_stmt_any <stmt> includes the given statement only (all uses)
-slice_stmt_danger <stmt> includes the given statement only (dangerous uses only)
A backwards slicing algorithm is employed starting with the slicing criterion (the operation under investigation). All definitions that are used in the statement(s) in the criterion are added to the slice. Then, each statement is analyzed. If any definition generated by a statement is in the slice, then all definitions used by the statement are added to the slice. Indirect uses from control statements are also added to the slice by default: if any statement in a control statement is in the slice, then the control statement and its uses are added to the slice. The user can exclude indirect uses by use of a command-line switch (-slice_ignore_ctrl). The algorithm is interprocedural and context-insensitive. Once a definition is added to the slice, it remains in the slice for the duration of the algorithm. The algorithm continues to iterate until steady state (no definitions enter the slice).
Function calls are handled as follows:
· If the definition associated with the return value (in the ReturnFromStmt) is in the slice, then uses corresponding to any of the callee return statements are added to the slice. Both the ReturnFromStmt and ReturnStmt are added to the slice.
· Any variable that is pointed to by an argument in the function call or global variable will have a definition in defsout of the CallStmt. If any of these definitions are in the slice, definitions of the corresponding variables that reach the end of the function are added to the slice. This is done on a parameter by parameter basis.
· If an initial parameter definition is in the slice, then the corresponding function call argument is added to the slice. This is done on a parameter by parameter basis.
· If an initial definition of a global variable is in the slice, then any definitions of the global variable that reach the function call are added to the slice.
· For system calls, if any of the generated definitions of the CallStmt or ReturnFromStmt are in the slice, then all of the uses (arguments and the variables they point to) are added to the slice.
· The algorithm is context-insensitive but the method for adding functions calls is restricted, preventing propagation of the slice into areas of the code that are clearly not part of the slice. A function is only added to the slice if one of the following are true:
o The function contains a statement in the slicing criterion.
o The function can be reached on a backwards traversal from a function containing a statement in the slicing criterion back to main AND there is at least one use in the slice is active at the end of the function. This is tracked by the propToSlice set.
o The function is called from a function that meets one of the two above criteria AND the slice propagates into that function in some fashion (return value, global variables, output parameters, etc.)
After definitions and statements are marked in the slice, variables are marked in the slice if at least one definition is in the slice. Then, functions are marked in the slice if they have at least one statement in the slice.
This algorithm identifies which variables and definitions are tainted directly and indirectly. The initial set of tainted definitions and statements is specified in the file taint-init.cpp. Currently, variables that come from input are marked as tainted.
The tainted propagation algorithm works in reverse of the program slicing algorithm. It propagates tainted information from the uses to the definitions. For simple assignment statements (no arithmetic calculations), if any use is in a given state, all definitions are now updated to be in that state. Again, parameters are handled on an individual basis.
For statements involving arithmetic operations, the taint state is computer based on the operation and the state of the operands.
The instrumentation phases traverses the AST and adds instrumentation code based on the instrumentation model. This phase is described in more detail in Section 3.
The output phase traverses the simplified statements in program order and prints them to a file, including any instrumentation statements. The output file can be subsequently compiled by a C compiler.
To add instrumentation, it is necessary to either use or create an instrumentation director. This section describes how to create an instrumentation director.
The first step to creating a model is to provide an implementation for the InstrDirector class listed located in instr.h. Each of the member functions refers to a different code construct or event in the program. A list of functions is provided in Section 3.1. In the corresponding implementation file, simply fill in the functions for the constructs that you want to add instrumentation. Leave the function empty for constructs that do not need instrumentation.
Pre-existing instances of instrumentation directors are available in the src/directors directory in SUDS. You can either use any of the existing instrumentation directors as a starting point or use instr-none.cpp which starts with all of the member functions as empty.
The instrumentation director interface has several data members that can be used within the instrumentation routine to determine what instrumentation is necessary and/or to pass into instrumentation routines as parameters.
bool inLhs; // true if processing the lhs of an assignment
bool isGlobal; // true if processing global variable
class Expr *currLhs; // current value of lhs if processing rhs
class Expr *currRhs; // current value of rhs if processing lhs
class Location *currLoc; // current location
class FunctionDef *currFunc; // current function
class Stmt *currStmt; // current statement
unsigned int uniqueId; // unique identifier
Each member function has one and/or two statement lists as
pass-by-reference parameter(s). Instrumented
code is to be added to this list using a helper function more later). Any
instrumented statements added to the before list are inserted before the
corresponding statement. Similarly, any statements added to the after
list are inserted after the corresponding. Instrumentation is always
added at the boundary of simplified statements. In functions that have
both a before and after list as parameters, you can choose which list to add to
(or both) based on the model you are trying to create. However, many
constructs force you to either add instrumentation before or after the
corresponding statement.
Most member functions contain a pointer to the expression, statement, or variable in question. This allows you to apply different instrumentation based on the type of an operation. For instance, an addition operation involving integers is likely to have different instrumentation that an addition operation involving a pointer and an integer. Expressions that can appear on the left-hand of an assignment have a Boolean parameter isLhs that is true if the expression appears on the left-hand side of an assignment expression and false if it appears on the right-hand side. Expressions that do not have this parameter cannot appear on the left-hand side after simplification.
Member functions relating to variable declarations, function parameters, and return values use an offset mechanism to distinguish between different parameters in the same parameter list and for individual fields with a variable / parameter / return value is a struct. The offset parameter into these member functions allow you to link up function call arguments to function parameters. For instance, instrumentCallExprArg with offset 3 can be paired up with instrumentBirthParm with offset 3.
Another unique feature with member
functions that use the offset mechanism is that they use a string to store the
expression opposed to using an instance of an expression class. The
reason for this is two-fold: 1. Some of these events don't correspond to
expressions (like death of a variable). 2. To avoid creating temporary
expression classes for each member of a struct. Since the string does not
convey type information, the type of the expression is passed separately.
The function instrumentFieldCopy also employs strings in a similar fashion.
There are also two functions beginInstrumentation and endInstrumentation that are called before and after the instrumentation phase respectively. The beginInstrumentation function can be used to initialize or allocate any variables needed by the instrumentation model. The endInstrumentation function can be used to perform any post-processing such as gather statistics or performing sanity checks.
Once you determine which constructs the need instrumentation, you need to add code that inserts the instrumented code into the proper list. There are several helper functions in instr-call.cpp to assist in this process. Here is the most common flow which adds a function call to one of the instrumentation routines:
1. Initialize the call by calling initInstrCall. It takes one optional Boolean parameter which if set to true will automatically add parameters for the file name, line number, and unique identifier associated with this instrumentation call.
2. Add parameters to the function. The table below outlines the different routines are available depending on the type and use of parameter.
3. Complete the call by calling addInstrCall. This function takes two parameters: a list to add the instrumentation statement to (either the before or after list) and the name (string) of the instrumentation routine to call.
Routine |
If x is ___ |
Then, add ___ |
Used For |
addStringParm(string
x); |
foo |
foo |
Any expression in string form. Useful for constructing your own expressions by concatenating different stings. |
addStringConstantParm(string
x); |
foo |
“foo” |
Any string used as a constant within the instrumentation routine. |
addIntegerParm(int
x); |
23 |
23 |
Any integer constant. |
addExprParm(Expr
*x); |
a[i] |
a[i] |
The value of an expression or variable. |
addStringExprParm(Expr
*x); |
a[i] |
“a[i]” |
The expression as a string - useful for debugging and diagnostics. |
addAddrOfExprParm(Expr
*x); |
a[i] |
&(a[i]) |
The address of an expression (assuming it is addressable); useful for models that track variables by address. |
addDeclParm(Decl
*x); |
a |
a |
The variable associated with a declaration. |
addSizeParm(Type
*x); |
int |
sizeof(int) |
The size of a type. |
addObjSizeParm(Type
*x); |
int * |
sizeof(int) |
The size of the type pointed to (or the size of the element for arrays). |
addFileNameParm(); |
- |
- |
The current file name. |
addLineNumParm(); |
- |
- |
The current line number. |
addUniqueIdParm(); |
- |
- |
A unique integer that is incremented after each call to an instrumentation routine. Used to distinguish between multiple calls that occur on the same source code line. |
Begin/End Functions |
before list |
after list |
isLhs parameter |
offset parameter |
|
instrumentBeginProgram |
Beginning of a program (start of main) |
No |
Yes |
No |
No |
instrumentEndProgram |
End of the program |
Yes |
No |
No |
No |
instrumentBeginFunction |
Beginning of a function |
No |
Yes |
No |
No |
instrumentEndFunction |
End of a function |
Yes |
No |
No |
No |
instrumentBeginBlock |
Beginning of a statement block '{' |
No |
Yes |
No |
No |
instrumentEndBlock |
End of a statement block '}' |
Yes |
No |
No |
No |
Declaration instrumentation functions |
before list |
after list |
isLhs parameter |
offset parameter |
|
instrumentBirthVariable |
Declaration of a variable |
No |
Yes |
No |
Yes |
instrumentDeathVariable |
Lifetime of variable ends |
Yes |
No |
No |
Yes |
Parameter / return value instrumentation functions |
before list |
after list |
isLhs parameter |
offset parameter |
|
instrumentCallExprArg |
Function call - called for each argument |
Yes |
No |
No |
Yes |
instrumentBirthParm |
Parameter declaration - called for each parameter |
No |
Yes |
No |
Yes |
instrumentBirthMainParm |
Parameter declaration in main - called for each parameter |
No |
Yes |
No |
Yes |
instrumentDeathParm |
Lifetime of parameter ends - called for each parameter |
Yes |
No |
No |
Yes |
instrumentDeathMainParm |
Lifetime of parameter in main ends - called for each parameter |
Yes |
No |
No |
Yes |
instrumentReturnVal |
A value is returned (callee side) - called for each data member if a struct is returned |
Yes |
No |
No |
Yes |
instrumentReturnFromVal |
A value is returned (caller side) - called for each data member if a struct is returned |
No |
Yes |
No |
Yes |
Stmt instrumentation functions |
before list |
after list |
isLhs parameter |
offset parameter |
|
instrumentExprStmt |
expression statement |
Yes |
Yes |
No |
No |
instrumentIfStmt |
if statement |
Yes |
No |
No |
No |
instrumentElseStmt |
else statement |
No |
Yes |
No |
No |
instrumentSwitchStmt |
switch statement |
Yes |
No |
No |
No |
instrumentForStmt |
for loop |
Yes |
No |
No |
No |
instrumentWhileStmt |
while loop |
Yes |
No |
No |
No |
instrumentDoStartStmt |
start of a do-while loop |
Yes |
No |
No |
No |
instrumentDoEndStmt |
end of a do-while loop |
No |
Yes |
No |
No |
instrumentGotoStmt |
goto statement |
Yes |
No |
No |
No |
instrumentBreakStmt |
break statement |
Yes |
No |
No |
No |
instrumentContinueStmt |
continue statement |
Yes |
No |
No |
No |
instrumentCallStmt |
function call |
Yes |
No |
No |
No |
instrumentSystemCall |
function call to system function or any function where source code is not present |
Yes |
Yes |
No |
No |
instrumentReturnStmt |
return stmt |
Yes |
No |
No |
No |
instrumentReturnFromStmt |
returning from a function call |
No |
Yes |
No |
No |
instrumentVaStartStmt |
va_start statement |
Yes |
Yes |
No |
No |
instrumentVaEndStmt |
va_end statement |
Yes |
Yes |
No |
No |
instrumentLabelStmt |
label |
No |
Yes |
No |
No |
Expr instrumentation functions |
before list |
after list |
isLhs parameter |
offset parameter |
instrumentConstant |
constant |
Yes |
Yes |
No |
No |
instrumentStringConstant |
string constant |
Yes |
Yes |
No |
No |
instrumentRhsVar |
lone variable expression |
Yes |
Yes |
Yes |
No |
instrumentRhsUnaryExpr |
unary expression |
Yes |
Yes |
No |
No |
instrumentRhsBinaryExpr |
binary expression |
Yes |
Yes |
No |
No |
instrumentRhsRelExpr |
relational expression |
Yes |
Yes |
No |
No |
instrumentRhsCastExpr |
cast expression |
Yes |
Yes |
No |
No |
instrumentRhsSizeofExpr |
sizeof expression |
Yes |
Yes |
No |
No |
instrumentRhsAddrOfExpr |
address of expression |
Yes |
Yes |
No |
No |
instrumentRhsAddrArrowExpr |
address of expression involving member selection via pointer |
Yes |
Yes |
No |
No |
instrumentRhsDerefExpr |
pointer dereference |
Yes |
Yes |
Yes |
No |
instrumentRhsMemberExpr |
member selection |
Yes |
Yes |
Yes |
No |
instrumentRhsPtrMemberExpr |
member selection via pointer |
Yes |
Yes |
Yes |
No |
instrumentRhsArrayExpr |
array reference |
Yes |
Yes |
Yes |
No |
instrumentRhsVaArgExpr |
va_arg expression |
Yes |
Yes |
No |
No |
instrumentRhsBuiltinExpr |
built-in expression to compiler |
Yes |
Yes |
No |
No |
instrumentCastBirth |
reassignment of a variable to a new type – called for each data member
if new type is a struct |
No |
Yes |
No |
No |
instrumentFieldCopy |
field assignment when assigning one struct to another - called for each data member of the struct |
Yes |
Yes |
No |
No |
This section describes the grammar of a program that is the output of the simplification process. Simplifying the program significantly reduces the complexity of future data analysis and instrumentation phases. The grammar is based on formats used in GCC [2] and Hendren et. al. [3].
stmt
: comp_stmt
| assign_expr ';'
| IF '(' bool_val ')' comp_stmt
| IF '(' bool_val ')' comp_stmt ELSE comp_stmt
| SWITCH '(' bool_val ')' comp_stmt
| FOR '(' ',' bool_val ',' ')' comp_stmt
| WHILE '(' bool_val ')' comp_stmt
| DO comp_stmt WHILE '(' bool_val ')'
| BREAK ';'
| CONTINUE ';'
| GOTO IDENT ';'
| IDENT ':' /* Label – treated as separate statement */
| CASE INTEGER_CST ':'
| function_call ';'
| RETURN ';'
| RETURN val ';'
| VA_START '(' IDENT ',' IDENT ')' ';'
| VA_END '(' IDENT ')' ';'
| decl_stmt ';'
comp_stmt
: '{' stmt_list '}'
| '{' '}'
stmt_list
: stmt_list stmt
| stmt
assign_expr
: lhs '=' rhs
function_call
: IDENT '(' arg_list ')'
| IDENT '(' ')'
| lhs = IDENT '(' arg_list ')'
| lhs = IDENT '(' ')'
arg_list
: arg_list ',' val
| val
Notes:
· Control constructs are simplified by having a test condition of a single variable or constant. This variable or constant must be a Boolean (integer) type. Pointers are replaced by comparisons to 0.
· After simplification, only two types of expression statements are allowed: assignments and function calls (which can optionally have an assignment). Function calls can only have argument lists that consist of a single variable or constant.
· Scopes have been added (using ‘{‘ and ‘}’) for loop bodies, then clauses, else clauses, and switch bodies. This allows for statements and declarations to be added at any point in the program.
lhs
: IDENT
| deref_expr
| member_expr
| ptr_member_expr
| array_ref
| ind_array_ref
rhs
: constant
| STRING_CONSTANT
| IDENT
| unary_expr
| binary_expr
| rel_expr
| cast_expr
| sizeof_expr
| addrof_expr
| addr_arrow_expr
| deref_expr
| member_expr
| ptr_member_expr
| array_expr
| va_arg_expr
constant: INTEGER_CST | CHAR_CST | FLOAT_CST
unary_expr: UNARY_OP val
binary_expr: val BINARY_OP val
rel_expr: val REL_OP val
cast_expr: '(' type ')' val
sizeof_expr: SIZEOF '(' IDENT ')' | SIZEOF '(' type ')'
addrof_expr: '&' IDENT | '&' member_expr
addr_arrow_expr: '&' ptr_member_expr
deref_expr: '*' IDENT
member_expr: IDENT '.' IDENT
ptr_member_expr: IDENT "->" IDENT
array_expr: array_expr '[' val ']' | IDENT '[' val ']'
va_arg_expr: VA_ARG '(' IDENT ',' type ')'
val: id | constant
bool_val: id | constant /* must be integer type */
Notes:
· Expressions that do nothing (such as ‘a+b;’) are eliminated during the simplification process.
· The following operators are converted into a functionally equivalent expression: ‘,’ , ‘++’, ‘--’, ‘->’, ‘?:’. The gcc extension of statement expressions are supported by the parser but replaced during the simplification process.
· The short-circuited operators ‘&&’ and ‘||’ are converted into if-then-else statements.
· Temporary variables for integers and floats are all created with type long and double respectively. Per the C standard, values of temporary expressions can use larger types of the same family.
· For variables declared arrays, array reference may include as many dimensions are allowed as needed. However, if an array reference uses a pointer variables, only one level of indirection (set of braces []) is permitted.
Declarations are handled in the same manner as C but initializers are not permitted. Initializers are created as separated statements after the declaration. For global variables, the simplified code is placed into special initializer functions – one for each file in the program.
To ensure compatibility, initializers are permitted and remain on local static variables and variables declared as constants. However, the remaining analyses ignore these initializers.
To determine when statements and expressions are properly simplified, the enumeration smplMode defines the following five constants. Here is what each constant is used for:
SM_expr: expression of an expression statement
SM_ rhs: right hand side of an assignment expression
SM_lhs: left hand side of an assignment expression
SM_addr: operand for address-of operator (&)
SM_boolean: condition for if/else, switch, and loops – must be a lone integer variable or constant
SM_var: anywhere a lone variable is needed, specifically:
operands for dereference (*), member (.), and indirect member operations (->)
pointer for indirect array references
function for function call (a variable is created for each function)
replacement expression when simplifying increment (++) or decrement (--)
SM_var_const: anywhere a lone variable or constant is needed, specifically:
condition for if/else, switch, and loops
return value
operands for arithmetic, relational, and cast operations
operands for va_start, va_arg, and va_end
index for array references
each argument in a function call
SM_var_array: anywhere a lone variable or array reference is needed, specifically:
array for array references
A table describing how each expression responds to these constants is on the next page.
There are two key functions that drive the simplification:
simplifyBlock: This function simplifies all statements in a block. It is responsible for making sure the additional statements produced by the simplification process are inserted properly. At the end of the phase, a loop will traverse through all statements removing any statements that have no side effects whatsoever (no assignments or control).
simplifyExpr: This function is the starting point for all expressions. It will call a helper function based on the expression type. If the expression is not simplified enough, it is replaced with a temporary variable or a dereference if an lvalue is needed.
Expression |
SM_expr |
SM_rhs |
SM_lhs |
SM_addr |
SM_boolean |
SM_var |
SM_var_const |
SM_var_array |
Constant (string) |
Yes |
Yes |
No |
No |
No |
No |
No |
No |
Constant (other) |
Yes |
Yes |
No |
No |
1 |
No |
Yes |
Yes |
Variable (lone) |
Yes |
Yes |
Yes |
Yes |
1 |
Yes |
Yes |
Yes |
Unary Operators |
2 |
2 |
No |
No |
No |
No |
No |
No |
Arithmetic Operators |
2 |
2 |
No |
No |
No |
No |
No |
No |
Relational Operators |
2 |
2 |
No |
No |
No |
No |
No |
No |
Casts |
2 |
2 |
No |
No |
No |
No |
No |
No |
va_arg Expression |
2 |
2 |
No |
No |
No |
No |
No |
No |
Sizeof Operator |
Yes |
Yes |
No |
No |
No |
No |
No |
No |
Addr Of Operator (&) |
3 |
3 |
No |
No |
No |
No |
No |
No |
Addr Arrow (&a->b) |
4 |
4 |
No |
No |
No |
No |
No |
No |
Incr (++) / Decr (--) |
No |
No |
No |
No |
No |
No |
No |
No |
And (&&) / Or (||) |
No |
No |
No |
No |
No |
No |
No |
No |
Comma (,) |
No |
No |
No |
No |
No |
No |
No |
No |
Conditional (?:) |
No |
No |
No |
No |
No |
No |
No |
No |
Statement Expression |
No |
No |
No |
No |
No |
No |
No |
No |
Dereference (*) |
4 |
4 |
4 |
No |
No |
No |
No |
No |
Member (.) |
4 |
4 |
4 |
4 |
No |
No |
No |
No |
Arrow (->) |
4 |
4 |
4 |
No |
No |
No |
No |
No |
Array Reference |
5 |
5 |
5 |
No |
No |
No |
No |
6 |
Assign |
7 |
No |
No |
No |
No |
No |
No |
No |
Assign w/Arith |
No |
No |
No |
No |
No |
No |
No |
No |
Function Call |
8 |
8 |
No |
No |
No |
No |
No |
No |
1 Yes, if operand has integer or Boolean type.
2 Yes, if operand(s) are SM_var_const.
3 Yes, if operand is SM_addr.
4 Yes, if operands are SM_var.
5 Yes, if array (or array pointer) is SM_var and array index is SM_var_const.
6 Yes, if array reference is for an array variable (not a pointer).
7 Yes, if left hand side is SM_lhs and right hand side is SM_rhs.
8 Yes, if function is SM_var and each argument is SM_var_const.
BeginBlockStmt |
defsout: |
A definition is created for each parameter. |
killed: |
none |
|
used: |
none |
|
ExprStmt |
defsout: |
Based on left-hand side. |
killed: |
Based on left-hand side. |
|
used: |
All definitions that reach the statement and are used as input on either side of the assignment. |
|
IfStmt, SwitchStmt, ForStmt, WhileStmt, DoWhileStmt |
defsout: |
none |
killed: |
none |
|
used: |
Active definitions corresponding to the control condition. |
|
CallStmt |
defsout: |
A definition is created for any variable that could be pointed to by any argument using any level of indirection. |
killed: |
none |
|
used: |
All definitions that reach the statement and correspond to a variable used as an argument. For system calls, variables pointed to by an argument (using any level of indirection) are also considered used. |
|
ReturnStmt |
defsout: |
none |
killed: |
none |
|
used: |
All active definitions of the return value (if statement has return value). |
|
ReturnFromStmt |
defsout: |
Based on left-hand side. |
killed: |
Based on left-hand side. |
|
used: |
Any variable used on the left-hand side as an input. |
|
VaStartStmt, VaEndStmt |
defsout: |
A definition is created for the list variable. |
killed: |
All other definitions associated with the list variable. |
|
used: |
none |
|
all other statements |
defsout: |
none |
killed: |
none |
|
used: |
none |
VarExpr a |
defsout: |
A definition for the variable. |
killed: |
All definitions corresponding to the variable except for the definition generated this statement. |
|
used: |
none |
|
DerefExpr *a |
defsout: |
A definition for every variable that can be pointed by the variable. |
killed: |
none |
|
used: |
Reaching definitions for the pointer used to make the dereference. |
|
MemberExpr a.b |
defsout: |
A definition for the struct variable. |
killed: |
none |
|
used: |
none |
|
PtrMemberExpr a->b |
defsout: |
A definition for every variable that can be pointed by the variable. |
killed: |
none |
|
used: |
Reaching definitions for the pointer used to make the dereference. |
|
ArrayExpr a[b] |
defsout: |
A definition for the any variable that can be pointed by the array/pointer. |
killed: |
none |
|
used: |
Reaching definitions for variables that represent the array/pointer and index. |
This section describes the key data structures in SUDS. Here is a summary of the AST components:
A project encompasses all the source code. There is only one project variable when SUDS is running. A project contains a list of translation units.
A translation unit refers to a single source code file. Each file contains a list of statements. The list of statements is restricted to declarations (whether it be types, variables, or functions).
A function definition is a special type of statement that refers to a function definition (not a function prototype). In addition to information about the function, a function contains a list of statements that comprise the function body.
A statement refers to a C statement. For simplicity, there are also statements that refer to C constructs (such as case label) that are not necessarily considered statements in the C programming language. Statements reference other statements, expressions, functions, and variables.
An expression refers to a C expression. Expressions can refer to other expressions and variables.
A variable refers to a declared variable in C. It hold various attributes about the variable and a pointer to its type.
A declaration is a special statement that declares either a variable or a type.
A type refers to a type in C.
Other data structures in SUDS are listed in Section 7.9.
One important note about the data structures. SUDS violates many principles of good object-oriented design. For instance, most, if not all of the data members are public. The program was designed in a more C-like procedural fashion using inheritance as syntactic sugar. From my point of view it was better than using alternatives such as unions, void pointers, or large switch statements. At some point in the future, I may consider a redesign of SUDS.
The Project class (project.h) encompasses all the source code for the program. There is only one project variable in SUDS. It stores a list of translation units – one for each source code file. It also stores a pointer to a special initializer function that is added to the beginning of main.
Most tasks in SUDS that requires traversing the source code will start at the project level. The project member functions merely call the same function for each of its translation units.
Data members:
vector<class TransUnit*> units; // list of translation units
class FunctionDef *mainInit; // main initializer function
A TransUnit (transunit.h) is a class that represents a single source code file in a project. A TransUnit stores a list of statements. These statements can only be declarations of types, variables, or functions. Code within a function is associated with that function. It also stores a pointer to the initialization function associated with this file. Each instrumented source code file contains a special initialization function that is called when the program starts. This function is used by the simplification and instrumentation phases as a place to add code when processing global variable declarations.
As with the project, the member functions are largely used for tasks that traverse the source code.
Data members:
class Stmt *head; // The first statement in the list.
class Stmt *tail; // The last statement in the list.
const string filename; // The file we parsed this from.
class FunctionDef *initFunc; // The initializer function.
The FunctionDef class (func.h) refers to a function definition in C. It is derived from the Stmt base class (described later in this section) because it is considered a top-level variable declaration statement for the function. Since a function definition cannot appear within a function, it really does not behave like the other statement classes that are defined later.
The function really has two distinct parts: the declaration and the function body. The declaration stores all of the information about the function such as the parameters and return type (in reality, most of this is stored in a function type variable within the declaration). The function body is stored as a linked list of statements, starting at head. The other data members are used to store information from various phases in SUDS.
Data members:
class Decl *decl; // function declaration
string name; // name of function
class Stmt *head; // first statement in function
class Stmt *tail; // last statement in function
int startBbl; // id of first bbl
int endBbl; // id of last bbl
bool usefulFunc; // set if function is reachable from main
set<int> returnBbls; // set of bbls that end the function
map<string, class Stmt *> labelTable; // mapping of labels to stmts
class PointsToInfo ptiEntry; // points to information at beginning
class PointsToInfo ptiExit; // points to information at end
set<class FunctionDef *> funcsCalled; // list of functions called
set<int> lvars; // list of local variables declared
set<int> funcDefs; // defs in the function
dynamic_bitset<> endReaches; // set of defs live at end of function
vector< set<int> > parmDefs; // defs for each parameter
set<class Stmt *> callerStmts; // set of statements that call this function
set<class Stmt *> propToSlice; // set of call statements to propagate to
bool inSlice; // true if function is in slice
A Stmt (stmt.h) refers to a C statement. To make analysis easier, there are some additional statements that refer to C constructs not normally considered as statements.
Each statement has a code that represents the type of statement and a unique id. Statements are connected together using a doubly linked list. Functions in stmtlist.cpp can be used to manipulate linked lists of statements.
Data members:
StmtCode code; // kind of statement
Location location; // location in source code
unsigned long id; // unique statement id
bool isInitStmt; // true if stmt used to initialize a variable
bool bblStart; // true if stmt is a bblock starting point
bool bblEnd; // true if stmt is a bblock ending point
class Bblock *bbl; // pointer to basic block node that statement resides in
class FunctionDef *fn; // function in which stmt resides
class PointsToInfo pti; // points-to information before statement
set<int> defsout; // defs created by this statement
set<int> killed; // defs killed by this statement
bool inSlice; // set if in slice
bool sliceProcessed; // set if variable has been processed during slicing
bool isTainted; // set if at least one def is tainted
Stmt *prev; // For making a list of statements.
Stmt *next; // For making a list of statements.
There are many derived classes of statements described in the following table:
Class Name |
Description |
BeginBlockStmt |
The beginning of a block of code starting with a '{'. Contains a list of declaration statements to any variables declared in this score. |
EndBlockStmt |
The end of a block of code ending with a '}'. |
ExprStmt |
An expression statement, typically assigns a value to a variable. |
IfStmt |
if statement |
ElseStmt |
The else part of an if statement. |
SwitchStmt |
switch statement |
ForStmt |
for loop |
WhileStmt |
while loop |
DoStartStmt |
Start of a do-while loop (the word "do") |
DoEndStmt |
End of a do-while loop (the test condition) |
EndControlStmt |
Dummy statement that marks the end of a control statement (such as if) or loop. |
BreakStmt |
break statement |
ContinueStmt |
continue statement |
GotoStmt |
goto statement |
LabelStmt |
A label in C. This could be a case label, default label, or a label that you can goto to. |
CallStmt |
The first of two instructions used to make a function call. This class stores the function(s) being called and the arguments used to make the function call. |
ReturnStmt |
return statement |
ReturnFromStmt |
The second of two instructions used to make a function call. This calls keeps track where the return value, if any, is stored when the function returns. |
NullStmt |
An empty statement |
VaStartStmt |
Refers to a va_start call, used in functions with variable-length parameter lists. |
VaEndStmt |
Refers to a va_end call, used in functions with variable-length parameter lists. |
DeclStmt |
Declaration of a variable or type. |
FunctionDef |
Function definition. |
InstrumentedStmt |
A statement added by the instrumentation engine. The instrumented statement is stored in a string. |
An Expr (expr.h) refers to a C expression. All expressions have a code that designates the kind of expression and have a data type which can be used by the instrumentation engine.
Data members:
ExprCode code; // variety of expression
Location location; // location in program
const class Type *type; // type of expr (such as int)
Expr *next; // linked list connector
There are many derived classes of statements described in the following table:
Class Name |
Example |
Description |
Constant |
5 |
The beginning of a block of code starting with a '{'. Contains a list of declaration statements to any variables declared in this score. |
VarExpr |
x |
A single variable |
UnaryExpr |
-x |
A unary expression (see list of operators below) |
BinaryExpr |
x + y |
A binary expression (see list of operators below) |
RelExpr |
x < y |
A relational expression (see list of operators below) |
CastExpr |
(int) x |
Cast to a different type |
SizeofExpr |
sizeof(x) |
sizeof expression |
AddrOfExpr |
&x |
address of expression |
AddrArrowExpr |
&(x->y) |
address of expression involving a member selection deference |
DerefExpr |
*x |
pointer dereference |
MemberExpr |
x.y |
member selection |
PtrMemberExpr |
x->y |
member selection via pointer dereference |
ArrayExpr |
x[y] |
array reference |
VaArgExpr |
|
va_arg expression, used in variable-length argument lists |
BuiltinExpr |
|
builtin expression to the compiler |
AssignExpr |
x = y |
assignment expression, also includes variants such +=. |
CallExpr |
|
function call |
TrinaryExpr |
x ? y : z |
trinary logical choice expression |
StmtExpr |
|
gcc extension that allows a statement to be embedded into an expression |
The last three expressions CallExpr, TrinaryExpr, and StmtExpr are not present in the AST after simplification.
Here is a list of operators:
enum UnaryOp |
|
enum BinaryOp |
||
UO_Plus |
+ |
|
BO_Plus |
+ |
UO_Minus |
- |
|
BO_Minus |
- |
UO_BitNot |
~ |
|
BO_Mult |
* |
UO_Not |
! |
|
BO_Div |
/ |
UO_PreInc |
++ |
|
BO_Mod |
% |
UO_PreDec |
-- |
|
BO_Shl |
<< |
UO_PostInc |
++ |
|
BO_Shr |
>> |
UO_PostDec |
-- |
|
BO_BitAnd |
& |
|
|
|
BO_BitXor |
^ |
enum AssignOp |
|
BO_BitOr |
| |
|
AO_Equal |
= |
|
BO_And |
&& |
AO_PlusEql |
+= |
|
BO_Or |
|| |
AO_MinusEql |
-= |
|
BO_Comma |
, |
AO_MultEql |
*= |
|
|
|
AO_DivEql |
/= |
|
enum RelOp |
|
AO_ModEql |
%= |
|
RO_Equal |
== |
AO_ShlEql |
<<= |
|
RO_NotEqual |
!= |
AO_ShrEql |
>>= |
|
RO_Less |
< |
AO_BitAndEql |
&= |
|
RO_LessEql |
<= |
AO_BitXorEql |
^= |
|
RO_Grtr |
> |
AO_BitOrEql |
|= |
|
RO_GrtrEql |
>= |
The increment, decrement, (logical) and, (logical) or, and comma operators are not present in the AST after simplification. AO_Equal (=) is the only variant of the AssignExpr that is permitted after simplification. Most of the analysis takes advantage of this fact.
A Var (var.h) represents a variable. Each variable is assigned a unique identifier that is used in many of the analyses. The Var class stores a pointer to the variable's declaration, its type, and its entry in the symbol table.
Data members:
int id; // unique id
string name; // variable name
class Decl *decl; // variable declaration
class Type *type; // variable type (from decl)
class SymEntry *st_entry; // symbol table entry
set<int> def_set; // list of definitions of the variable
bool isGlobal; // set if not declared in function
bool isHeap; // set if refers to heap variable
bool inSlice; // set if variable is in slice
bool isTainted; // set if tainted
A Decl (decl.h) represents a variables declaration and is contained within a DeclStmt. It contains information about its name, type, and various properties. Declarations are not used during the analysis portion of SUDS – the Var class is used to keep track of the various states of variables.
Data members:
StorageCode storage; // static or typedef or none
class Symbol *name; // symbol being declared
class Type *type; // type of the declaration
class GccAttrib *attrib; // optional gcc attribute
class Expr *initializer; // initial value
class Var *var; // pointer to variable object
class Decl *next; // For linking into lists
A Type (type.h) refers to a data type in C. The basic type (such as int, array, pointer) is represented by a code (see list below). Any storage indicator and/or type qualifiers are also stored (see list below). Many types use a subtype (pointers, arrays, and functions to name a few). This is also captured in the base type class. Additional types store additional information.
For user-defined types created using typedef, the type class stores the basic type information along with the type name.
Names for structs, unions, and enums are called tags. The derived classes for these types store the tag names.
Data members:
TypeCode code; // basic type
StorageCode storage; // storage of type
TypeQual qualifier; // type qualifier(s)
class GccAttrib *attrib; // gcc attribute (optional)
class Symbol *typeName; // NULL if not typedef
class Type *subType; // subtype for pts, arrays, funcs, etc.
bool useSubType; // set if type uses subtype
Here is a table of the derived type classes:
Class Name |
Description |
NoSpecType |
No specified type – used to represent data on the heap |
VoidType |
void |
IntType |
integers, represents all variants and sizes of integers |
FloatType |
floating point numbers, represents all sizes of floating point numbers |
PointerType |
pointers |
ArrayType |
arrays |
BitFieldType |
bit fields |
FunctionType |
functions |
StructType |
structs and unions |
EnumType |
enumerated types |
EllipsisType |
The type of the "…" parameter used in variable length parameter lists |
VaListType |
The type of a va_list, used in variable length argument lists |
Here is are lists of storage code and type qualifiers:
StorageCode |
|
TypeQual |
ST_None |
|
TQ_None |
ST_Auto |
|
TQ_Const |
ST_Extern |
|
TQ_Volatile |
ST_Register |
|
TQ_Inline |
ST_Static |
|
|
ST_Typedef |
|
Here is a list of other data structures used in SUDS:
Class Name |
File Name |
Description |
Bblock |
bblock.h |
basic block |
Def |
def.h |
definition for data analysis |
GccAttrib |
gccattrib.h |
gcc attribute extension |
InstrDirector |
instr.h |
instrumentation director interface |
Label |
label.h |
label |
Location |
parse.h |
location in code |
ParseEnv |
parse.h |
parsing environment |
ParseEnvCtxt |
parse.h |
parsing environment context |
ParseCtxt |
parse.h |
parsing context |
PointsToInfo |
pointer.h |
points-to information |
Symbol |
symbol.h |
symbol from lexer |
SymEntry |
symbol.h |
entry in symbol table |
ScopeTbl |
symbol.h |
stores symbol table entries at a particular scope |
SymTbl |
symbol.h |
symbol table |
TopSymTbl |
symbol.h |
top level symbol table that contains three symbol tables (labels, tags, all others); there is only one top level symbol table (global variable symTables) |
This section outlines restrictions in the C language. You must either alter your program or fix these issues in SUDS.
Problem: |
Cannot use default type (int) for variables (such as "static x = 3;") |
Detected: |
Yes - parser error (unconfirmed) |
Workaround: |
Add the proper type to the declaration |
Possible fix: |
Add 'int' automatically if no type appears |
Likely to fix: |
No (doesn't occur often enough) |
Problem: |
Old style function declarations are unsupported |
Detected: |
Yes - parser error |
Workaround: |
Convert function to standard style function declarations. Run CIL [5] ahead on source code file ahead of time. |
Possible fix: |
Alter grammar and symbol processing to work with old style functions; argument ordering is problematic. |
Likely to fix: |
No (too many obstacles to overcome) |
Problem: |
Default declarations for functions unsupported |
Detected: |
Yes - parser error (unconfirmed) |
Workaround: |
Add the proper type to the declaration |
Possible fix: |
Add 'int' automatically if no type appears |
Likely to fix: |
No (doesn't occur often enough) |
Problem: |
Parsing of initializer lists outside of initializers |
Detected: |
Yes - parser error (unconfirmed) |
Workaround: |
Rewrite code to not use an initializer list |
Possible fix: |
Leverage simplification of initializer lists for declarations |
Likely to fix: |
No (doesn't occur often enough) |
Problem: |
Parsing problems with variables declared with unusual orders of storage classes, qualifiers, etc. (see K+R for proper rules) |
Detected: |
Yes - parser error |
Workaround: |
In most cases, it should be possible to rewrite the code. |
Possible fix: |
Redesign parsing of storage classes and qualifiers |
Likely to fix: |
No (doesn't occur often enough) |
Problem: |
Possible parsing problems with gcc attributes |
Detected: |
Yes - parser error |
Workaround: |
Eliminate or reposition the attribute |
Possible fix: |
The main problem here is understanding how to parse the gcc attributes. The documentation is confusing and "subject to change" |
Likely to fix: |
Maybe (if there is better information) |
Problem: |
Multiple struct variables declared on the same statement that also define the struct are not allowed |
Detected: |
Not detected by SUDS, code emitted from SUDS will not compile |
Workaround: |
Rewrite using non-generic structs |
Possible fix: |
Alter the type of all but the first declaration such that fields are not printed out. |
Likely to fix: |
No (doesn't occur often enough) |
Problem: |
Generic structs declared in structs that are included in multiple files are not allowed. |
Detected: |
Not detected by SUDS, code emitted from SUDS will not compile |
Workaround: |
Rewrite using non-generic structs if possible, may need to alter system include files. |
Possible fix: |
Change symbol table management. |
Likely to fix: |
Maybe (symbol table is fragile) |
Problem: |
Const / static variables can cause problems because initialization is separated from declaration |
Detected: |
Not detected by SUDS, code emitted from SUDS will not compile |
Workaround: |
Rewrite not using const variables |
Possible fix: |
Two possibilities: (1) Redo remaining phases to work with declarations that have initializers. (2) Automatically remove const modifier (does not work with static). |
Likely to fix: |
Maybe (occurs often, second fix is not hard but could cause other problems) |
Problem: |
Complex expressions in address-of expressions can cause SUDS to fail |
Detected: |
Yes – simplification error |
Workaround: |
In most cases, it should be possible to rewrite the code |
Possible fix: |
|
Likely to fix: |
No (doesn't occur often enough, no easy fix) |
Problem: |
Arrays declared static will not have sizes computed |
Detected: |
? |
Workaround: |
In most cases, it should be possible to remove the static designation. |
Possible fix: |
Compute the size despite the static designation |
Likely to fix: |
Yes |
Problem: |
Cast operations are separated from the expressions creating temporary variables of invalid types. |
Detected: |
Not detected by SUDS, code emitted from SUDS will produce warnings and/or errors when compiled. |
Workaround: |
Most warnings can be ignored, may or may not be possible to rewrite the code to avoid any warnings or errors. |
Possible fix: |
Could allow each expression to optionally have a cast instead of breaking it into two pieces |
Likely to fix: |
Not likely (non-ignorable errors don't occur often enough) |
Problem: |
Static local variables with initializers are problematic for instrumentation |
Detected: |
No |
Workaround: |
Try to rewrite code if possible |
Possible fix: |
Instrumentation would have to support non-simplified initializers |
Likely to fix: |
No (too much work) |
Problem: |
Variables declared using 'register' will lose 'register' designation since instrumentation models often take the addresses of variables. |
Detected: |
Losing the register designation should not cause a functionality problem. |
Workaround: |
not possible to workaround |
Possible fix: |
Add a command line switch to allow users to indicate how the register designation should be handled. |
Likely to fix: |
Maybe (not that hard but not that useful) |
Problem: |
Arrays declared extern with unknown sizes and no initializer |
Detected: |
No |
Workaround: |
Rewrite using a 'char *'. |
Possible fix: |
Use the size when it is declared without extern |
Likely to fix: |
No (requires modification to the fragile symbol table) |
Problem: |
Variable or field with same name as typedef name |
Detected: |
Yes – parser error |
Workaround: |
Rename one or other |
Possible fix: |
Have a separate table for typedef names. Another issue is to more strictly regulate the possible Type field |
Likely to fix: |
No (requires modification to the fragile symbol table) |
Problem: |
Investigate the scope table as only struct/enum definitions go into external scope. Should others go to external scope or should the global struct/enum definitions go into level 2? |
Result: |
No problem results from this, this symbol table dump appears off. |
Possible fix: |
Redo symbol table. |
Likely to fix: |
No (unless problem occurs) |
Problem: |
Loss of quantifiers (const/inline/volatile) for temporary variables – needed because temp variables are often not constant and cannot be declared inline |
Result: |
Some code emitted from SUDS may not compile. |
Possible fix: |
? |
Likely to fix: |
No (can't think of a reasonable fix) |
Problem: |
May not model all system functions that return pointers to arbitrary arrays. |
Result: |
Static analysis may miss some pointer relationships causing the data flow analysis to be too aggressive. |
Possible fix: |
Update the list of system functions that return pointers to arbitrary arrays. |
Likely to fix: |
Case by case basis when detected |
Problem: |
System calls that alter the points-to information of its parameters. |
Result: |
Static analysis may miss some pointer relationships causing the data flow analysis to be too aggressive. |
Possible fix: |
Make a concerted effort to recognize these functions and alter the points-to information. |
Likely to fix: |
Maybe |
Problem: |
List of input functions is incomplete |
Result: |
Missed opportunity to catch bugs |
Possible fix: |
Add missing functions to list. |
Likely to fix: |
Fix when missing input function is discovered |
Problem: |
Instrumented statements do not have a location. |
Result: |
Some output messages are awkward |
Possible fix: |
Have the instrumented statement grab a location (not always easy) |
Likely to fix: |
Maybe |
Problem: |
Handling of variable-arg function calls in instrumentation engine (including va_start and va_end) |
Result: |
Instrumentation capabilities for variable-arg function calls is limited. |
Possible fix: |
Depends on what kind of instrumentation, if any, would be added in these cases. |
Likely to fix: |
Not likely |
Problem: |
Processing of unary plus operator |
Result: |
? |
Possible fix: |
Convert unary plus to varExpr |
Likely to fix: |
No (doesn't occur often enough) |
Problem: |
derefUses is a deviation from the current strategy of recomputing the uses when necessary. |
Result: |
Could be a performance and space issue. |
Possible fix: |
Require recomputing the deref uses along with other uses. |
Likely to fix: |
No (unless performance and space become an issue) |
Problem: |
More precise type for array constants. |
Result: |
Missed opportunity to introduce instrumentation because of missing type. |
Possible fix: |
Create the array type for the constant. |
Likely to fix: |
No (doesn't occur often enough) |
Problem: |
More precise types for binary expressions (size, unsigned, etc.) |
Result: |
Missed opportunity to introduce instrumentation because of missing type. Potential issues for instrumentation relying on the precise size of variables. |
Possible fix: |
Redo computation of binary expressions. |
Likely to fix: |
No (unless it becomes a problem; may revisit if type system is overhauled) |
Problem: |
Sizes of variables not accurately tracked (esp. large constants) |
Result: |
Potential issues for instrumentation relying on the precise size of variables. |
Possible fix: |
Fix SUDS to strictly follow the rules of converting types in C. |
Likely to fix: |
No (unless it becomes a problem; may revisit if type system is overhauled) |
Problem: |
Types organization is messy (needs overhaul) |
Result: |
Hard to understand and modify code related to types. |
Possible fix: |
Overhaul type system |
Likely to fix: |
No (unless SUDS is redesigned) |
Problem: |
Printing of declarations is messy. |
Result: |
Hard to understand and modify code related to printing declarations. |
Possible fix: |
Redesign code that prints declarations |
Likely to fix: |
No (unless SUDS is redesigned) |
Problem: |
Duplication of statements: dup seems broken for begin block and call stmt |
Result: |
No known problems at the problem |
Possible fix: |
Avoid using dup but provide copy list functions. |
Likely to fix: |
No (unless it becomes a problem; would revisit in a redesign of SUDS) |
Problem: |
Memory management: constructors, destructors, duplicators |
Result: |
Memory is wasted, especially for types |
Possible fix: |
Redo memory management |
Likely to fix: |
No (unless it becomes a problem; would revisit in a redesign of SUDS) |
Problem: |
Get strange messages (such as warn_unused_result) when using include files and libraries from latest Linux release. |
Result: |
Get strange messages (such as warn_unused_result) when running SUDS. |
Possible fix: |
Need to understand what is causing the messages |
Likely to fix: |
Maybe (if I can figure out why this is happening) |
Enhancement: |
Embed the preprocessor command inside suds |
Likely to add: |
Maybe |
Enhancement: |
Add routine to compare types for use in sanity or other checks. |
Likely to add: |
No (unless SUDS is redesigned) |
[1] cTool. http://sourceforge.net/projects/ctool/
[2] Free Software Foundation. SSA for Trees. http://www.gnu.org/software/gcc/projects/tree-ssa
[3] L. Hendren, C. Donawa, M. Emami, G. Gao, Justiani, and B. Sridharan. Designing the McCAT Compiler Based on a Family of Structured Intermediate Representations. Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing, August 1992.
[4] M. Hind, M. Burke, P. Carini, and J. Choi. Interprocedural Pointer Alias Analysis. ACM Transactions on Programming Languages and Systems. July 1999.
[5] G. Necula, S. McPeak, S. P. Rahul,
W.Weimer. CIL: Intermediate Language and Tools for Analysis and Transformation
of C Programs. International Conference on Compiler Construction, Apr. 2002