Lab 6: Dataflow Analysis

1. Objective

In this lab, you will build a static analyzer to automatically detect potential divide-by-zero errors in C programs at compile-time using LLVM. The lab is divided into two parts:

Part 1: Implement the core logic for analyzing individual instructions (transfer functions and error checking)
Part 2: Build a dataflow analysis framework using the chaotic iteration algorithm to analyze entire functions

By the end of this lab, your analyzer will be able to detect divide-by-zero errors like the one in this simple C program:

// test01.c
int main() {
  int a = 0;
  int b = 1;
  int c = a + 1;
  int d = b / a;  // Divide by zero - your analyzer will catch this!
  if (c) {
    int a = 1;
  }
  return 0;
}

2. Conceptual Background

For a deeper exploration of these concepts, see A Menagerie of Program Abstractions.

Static Analysis & Abstract Interpretation

Static analysis analyzes software without executing it. Instead of tracking the concrete value of each variable (e.g., a = 0), which is often impossible to determine precisely, we approximate values using an abstract domain.

The Abstract Domain

For this lab, our goal is to determine if a variable could be zero. We use the following abstract domain (defined in include/Domain.h):

Zero: The variable is definitely zero (e.g., int a = 0;)
NonZero: The variable is definitely not zero (e.g., int b = 1;)
MaybeZero: The variable could be zero or non-zero (uncertain, e.g., user input)
Uninit: The variable has not been initialized

These values form a lattice, where MaybeZero is the “top” (most general/conservative) and Uninit is the “bottom” (least informative).

Transfer Functions

A transfer function models how a single instruction affects our abstract state. For example, in the instruction c = a + 1;:

If a has abstract value Domain::Zero
And the constant 1 is Domain::NonZero
Then Domain::add(Domain::Zero, Domain::NonZero) returns Domain::NonZero
So c gets the abstract value Domain::NonZero

Dataflow Analysis & Chaotic Iteration

To analyze a whole function, we use dataflow analysis with a chaotic iteration algorithm:

Initialize a worklist with all instructions
While the worklist is not empty:
- Pull an instruction I from the worklist
- Flow-in: Join the Out states from all of I’s predecessors to compute its In state
- Transfer: Apply the transfer function to produce a new Out state
- Flow-out: If I’s Out state changed, add all successors to the worklist
When the worklist is empty, we’ve reached a fixed point

For each instruction, we maintain:

InMap[I]: Abstract memory state before instruction I
OutMap[I]: Abstract memory state after instruction I

An abstract memory state (Memory) is a map from LLVM variables (as strings) to abstract domain values.

3. Lab Structure and Workflow Overview

Input Program Constraints

Your analysis only needs to handle a subset of C:

All values are integers (ignore other types)
Arithmetic operations: +, -, *, /
Comparisons: <, ==, etc.
Control flow: if-statements and loops
User input is introduced via functions where isInput() returns True
Other instructions can be treated as no-ops

Two-Part Implementation

Part 1: Instruction-Level Analysis

Files to implement: src/DivZeroAnalysis.cpp and src/Transfer.cpp
Implement DivZeroAnalysis::check() to detect potential divide-by-zero errors
Implement transfer functions (eval) for binary operators, comparisons, and casts
Build with the reference solution for Part 2 (use -DUSE_REFERENCE=ON)

Part 2: Function-Level Analysis

Files to implement: src/ChaoticIteration.cpp
Implement join() and equal() for merging and comparing memory states
Implement flowIn() and flowOut() for dataflow propagation
Implement doAnalysis() with the chaotic iteration worklist algorithm
Build without the reference solution

Utility Functions

The file src/Utils.cpp provides helpful functions:

variable(Value*): Converts an LLVM value to string representation
getOrExtract(Memory*, Value*): Gets abstract domain value for an LLVM value
printMemory(), printInstructionTransfer(), printMap(): Debugging helpers

4. Part 1: Instruction-Level Analysis

Implement the logic for analyzing individual instructions. Use the pre-built reference solution for Part 2 to test your transfer functions.

Memory Abstraction

Each Instruction has an associated abstract state:

InMap[I]: Abstract memory before instruction I
OutMap[I]: Abstract memory after instruction I

Memory is defined as std::map<std::string, Domain *>.

Example:

ID	Instruction	InMap (Before)	OutMap (After)
`I1`	`%x = call i32 @input()`	`{}`	`{%x: MaybeZero}`
`I2`	`%y = add i32 %x, 1`	`{%x: MaybeZero}`	`{%x: MaybeZero, %y: MaybeZero}`

Framework Entry Point

The run() method in src/DivZeroAnalysis.cpp is the entry point for your pass:

PreservedAnalyses DivZeroAnalysis::run(Function &F, FunctionAnalysisManager &) {
  outs() << "Running " << getAnalysisName() << " on " << F.getName() << "\n";

  // Initialize InMap and OutMap for all instructions
  for (inst_iterator Iter = inst_begin(F), End = inst_end(F); Iter != End; ++Iter) {
    auto Inst = &(*Iter);
    InMap[Inst] = new Memory;
    OutMap[Inst] = new Memory;
  }

  // Run the chaotic iteration algorithm (Part 2)
  doAnalysis(F);

  // Check each instruction for potential divide-by-zero (Part 1)
  for (inst_iterator Iter = inst_begin(F), End = inst_end(F); Iter != End; ++Iter) {
    auto Inst = &(*Iter);
    if (check(Inst))
      ErrorInsts.insert(Inst);
  }
  // ...
}

Implement `DivZeroAnalysis::check()`

File: src/DivZeroAnalysis.cpp

Implement the check() function to determine if a division instruction could result in divide-by-zero. Example: For %d = sdiv i32 1, 0, the divisor is 0 with abstract value Domain::Zero, so check() should return true.

Implement Transfer Functions

File: src/Transfer.cpp

The main transfer() function dispatches to specific eval() functions based on instruction type. Implement:

eval(BinaryOperator *BinOp, const Memory *In) - Handle arithmetic operations (+, -, *, /). Get abstract domains of both operands, call the appropriate Domain method (e.g., Domain::add(), Domain::sub()), and return the resulting domain.
eval(CmpInst *Cmp, const Memory *In) - Handle comparison operations (==, !=, <, >). Get abstract domains of both operands, call Domain::cmp() with the appropriate predicate, and return the resulting domain.
eval(CastInst *Cast, const Memory *In) - Handle type casting operations. Get the abstract domain of the operand being cast. For integer casts, return the domain of the operand.

Note: PHINode instructions merge values from different control-flow paths. The implementation for eval(PHINode*, ...) is provided. Study how it joins domains from all incoming values to produce a single output domain. For a detailed explanation of PHI nodes, see Appendix: Working with LLVM PHI Nodes.

Build and Test Part 1

Build with the reference solution for Part 2:

lab6$ mkdir build && cd build
lab6/build$ cmake -DUSE_REFERENCE=ON ..
lab6/build$ make

This compiles your Part 1 implementation and links it with the reference solution for Part 2, generating DivZeroPass.so.

Testing Workflow

Generate LLVM IR from the test program:

lab6/test$ clang -emit-llvm -S -fno-discard-value-names -Xclang -disable-O0-optnone -c -o test01.ll test01.c

This produces test01.ll with explicit memory operations (alloca, store, load):

; test01.ll (excerpt)
define dso_local i32 @main() {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  ; 'a = 0'
  store i32 0, i32* %2, align 4
  ; 'b = 1'
  store i32 1, i32* %3, align 4
  ; 'c = a + 1'
  %7 = load i32, i32* %2, align 4
  %8 = add nsw i32 %7, 1
  store i32 %8, i32* %4, align 4
  ; 'd = b / a'
  %9 = load i32, i32* %3, align 4
  %10 = load i32, i32* %2, align 4
  %11 = sdiv i32 %9, %10
  store i32 %11, i32* %5, align 4
  ...
}

Run mem2reg optimization to promote local variables to SSA registers:

lab6/test$ opt -mem2reg -S test01.ll -o test01.opt.ll

This produces cleaner IR:

; test01.opt.ll (excerpt)
define dso_local i32 @main() {
  %c = add nsw i32 0, 1
  %d = sdiv i32 1, 0
  %1 = icmp ne i32 %c, 0
  br i1 %1, label %2, label %4

2:
  br label %4

4:
  ret i32 0
}

Run your DivZero pass:

lab6/test$ opt -load-pass-plugin=../build/DivZeroPass.so -passes="DivZero" -disable-output test01.opt.ll > test01.out 2> test01.err

Output files:

test01.out: Error report (stdout)
test01.err: Debug output showing InMap and OutMap for each instruction (stderr)

See Section 6 for expected output.

Debugging Part 1

If your output doesn’t match expected results:

Check test01.err for the InMap and OutMap of each instruction
Verify transfer functions return the correct abstract domain
Use utility functions (printMemory(), printInstructionTransfer()) for debugging
Test incrementally with simple programs before moving to complex ones
Verify Domain::add(), Domain::sub(), etc. behave correctly

5. Part 2: Function-Level Analysis

Implement the chaotic iteration algorithm to analyze entire functions.

Rebuild Without Reference Solution

Reconfigure CMake to build your own implementation:

lab6/build$ rm CMakeCache.txt
lab6/build$ cmake ..
lab6/build$ make