CIS 5470: Software Analysis
Fall 2025 • University of Pennsylvania
📋 Course Information
Instructor
Prof. Mayur Naik
📍 AGH 642
🕐 Office Hours: TBA 📧 mhnaik@seas.upenn.edu
Teaching Assistants
Mayank Keoliya
📧 mkeoliya@seas.upenn.edu
Zain Aamer
📧 zaamer@seas.upenn.edu
📍 TA Office: AGH 642
🕐 TA Hours: By Appointment
📅 Course Schedule
Week | Dates | Topic | Lab | Due |
---|---|---|---|---|
1 | Aug 27 | Introduction to Software Analysis | Lab 1: Introduction to Software Analysis | - |
2 | Sep 3 | The LLVM Framework | Lab 2: The LLVM Framework | Lab 1 |
3 | Sep 8, 10 | Random Input Generation | Lab 3: Random Input Generation | Lab 2 |
4 | Sep 15, 17 | Automated Test Generation | Lab 4: Delta Debugging | Lab 3 |
5 | Sep 22, 24 | Delta Debugging | Lab 5: Statistical Debugging | Lab 4 |
6 | Sep 29, Oct 1 | Statistical Debugging | Lab 6: Dataflow Analysis | Lab 5 |
7 | Oct 6, 8 | Dataflow Analysis I | - | Lab 6 |
8 | Oct 13, 15 | Fall Break (Oct 9-12) / Dataflow Analysis II | Lab 7: Pointer Analysis | - |
9 | Oct 20, 22 | Pointer Analysis | Lab 8: Constraint-Based Analysis | Lab 7 |
10 | Oct 27, 29 | Constraint-Based Analysis | Lab 9: Dynamic Symbolic Execution | Lab 8 |
11 | Nov 3, 5 | Type Inference | - | Lab 9 |
12 | Nov 10, 12 | Symbolic Execution | Group Project | - |
13 | Nov 17, 19 | Advanced Topics | - | - |
14 | Nov 24 | Thanksgiving Break (Nov 27-30) | - | - |
15 | Dec 1, 3 | Course Review & Project Presentations | - | Group Project |
16 | Dec 8 | Last Day of Classes | - | - |
Finals | Dec 11-18 | Final Exam Period | - | - |
📚 Course Description
Your 500-line vibe-coded class project works perfectly. Google’s 100+ million line codebase? That’s a different universe.
At scale, software is complex, buggy, and insecure. Enter software analysis: a suite of techniques to automatically analyze code, uncover bugs, and ensure reliability. And this has real-world impact: when Google deploys to billions of devices, a single divide-by-zero error can drain millions of batteries worldwide – or worse, crash a warship or rocket’s propulsion system. Software analys tools are live: Meta’s Infer has prevented thousands of crashes has prevented thousands of crashes, while Google’s Tricoder fixes 5000+ bugs daily, to name a few.
This course provides a rigorous and hands-on introduction to the field of software analysis — a body of powerful techniques and tools for analyzing modern software, with applications to:
- 🐛 Systematically uncover insidious bugs
- 🔒 Prevent security vulnerabilities
- ⚙️ Automate testing and debugging
- ✅ Improve confidence in software behavior, even mathematically
⚠️ New: Starting this semester, we’ll also address the trillion-parameter elephant in the room: Large Language Models (LLMs). With LLMs writing more vibe-code than ever, it’s important to devise automatic ways of ensuring code doesn’t blow up in production. We’ll explore how LLMs can assist in software analysis tasks and their limitations. Our team is re-working the labs as we go, so bear with us!
Topics Covered
Dynamic Analysis
- Random testing & fuzzing
- Delta debugging
- Statistical debugging
- Runtime monitoring
Static Analysis
- Dataflow analysis
- Pointer analysis
- Type systems
- Constraint-based analysis
All topics include hands-on implementation using the LLVM compiler infrastructure. LLVM, created by Chris Lattner during his UIUC PhD, powers modern compiler technology. His work led to Clang, caught Apple’s attention, and enabled Swift’s development. Today LLVM underlies Apple’s toolchain, Google’s optimizations, and Meta’s production tools—making it ideal for understanding real-world analysis.
🎯 Learning Objectives
Upon completion of this course, you will be able to:
✓ Understand fundamental methods for analyzing, testing, and verifying software
✓ Analyze trade-offs between different techniques (scalability vs. precision)
✓ Implement analysis algorithms using LLVM
✓ Apply appropriate techniques to real-world problems
✓ Evaluate the effectiveness of different approaches
📋 Prerequisites
- CIS 240/CIT 595: Systems Programming (C/C++ required)
- CIS 120/CIT 594: Data Structures and Algorithms
- CIS 160/CIT 592: Mathematical Foundations
⚠️ Note: Labs involve substantial C++ programming with LLVM
📊 Grading
Component | Weight |
---|---|
Labs (upto 9) | 54% |
Quizzes | 3% |
Group Project | 23% |
Final Exam | 20% |
Late Policy: 6 late days total
📖 Resources
Textbooks
- No required textbook - All materials provided
- Recommended: Static Program Analysis (free online)
- Reference: Principles of Program Analysis (Nielson et al.)
Links
⚖️ Academic Integrity
All submitted work must be your own. You may discuss concepts, but code must be written independently.
AI Policy: ChatGPT/Copilot allowed for understanding concepts only - no direct code generation. Must disclose usage.
Violations → Failing grade + referral to OSC