Weekly Programming Topics
Testing and Debugging
Week 10 – Testing and Debugging · Reminder:
Testing and Debugging
Week 10 – Testing and Debugging
Reminder:
- We will have an upcoming in-class test during lab time.
- You will be given 2 hours to complete the test.
- Please submit your work (Python files and screenshots) before you leave the lab.
- We will mark your submission and provide feedback on whether you:
- Pass
- Need-to-fix
- Failed
Table content
- Testing
- Test Harnesses
- Defensive programming versus validation
- Pytest – generating test cases
- The Golden Rule for testing
- Two approaches for testing/debugging
- Bottom-up approach
- Top-down approach
1.1 Test Harnesses
“A test harness is a group of software and test data designed to test a program element by operating it under different situations and supervising its practices and results.” [1] Or: A test harness is a small, dedicated program written specifically to test a module. [2] Steps to create one:
- Define the module prototype (inputs and outputs)
- Brainstorm test cases BEFORE writing the code
- Create test data covering normal and edge cases
- Write the harness to run each test
- Compare actual vs expected output [1] GeeksforGeeks. (2025). Software testing – test harness. https://www.geeksforgeeks.org/software- testing/software-testing-test-harness/ [2] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology.
1.1 Test Harnesses (cont.)
Step 1. Define the module prototype (inputs and outputs)
- The module prototype is display_gem(gem)
- The module takes a GemRecord object and displays its fields
- The data types are expected to be as follows: [1] [1] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology. Field Data Type Description Example description String Text describing the type of gem “Emerald” weight Float Weight in grams 3.4 price Float Price in AUD 560.45
Step 2. Brainstorm test cases BEFORE writing the code
- Test 1 – normal input:
- id = 1,
- description = "Emerald"
- price = 580.99
- Test 2 – edge case:
- id = 2.9
- description = None
- weight = 4.568
- price = 580.99
- Test 1 – normal input:
- description = "Emerald"
- price = 580.99
- Test 2 – edge case:
- id = 2.9
- description = None
- weight = 4.568
- price = 580.99 Q: Looking at Test 2, identify what is "wrong" with each input value? Why these are called edge cases. (Hint: Take a look at their data types.) Q: Can you come up with a few more normal inputs and edge cases for GemRecord module? Field Data Type Description Example description String Text describing the type of gem “Emerald” weight Float Weight in grams 3.4 price Float Price in AUD 560.45
Step 3: Create test data covering normal and edge cases
- Considering a variety of possible inputs and expected outputs for the module above: [1] Q: Can you create test data for your newly created inputs from the previous step? [1] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology. Field Input Expected Output description Emerald “Emerald” weight 4 4.00 price 580.99 $580.99 Field Input Expected Output id 2.9 2 description None “Unknown” weight 4.568 4.57 price 580.99 $580.99
Step 4: Write the harness to run each test
- Based on the expected output, complete display_gem(gem) procedure
Step 4: Write the harness to run each test
- Finish up with the 2 test cases: 1 for normal inputs, 2 for edge cases
Step 5: Compare actual vs expected output
- Does the output match what was expected? [1] Q: Run display_gem(gem) on your previous inputs, what do you observe? [1] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology. Field Input Expected Actual Match? id 1 1 1 Yes description Emerald “Emerald” Emerald Yes weight 4 4.00 4 No price 580.99 $580.99 580.99 No Field Input Expected Actual Match? id 2.9 2 2.9 No description None “Unknown” None No weight 4.568 4.57 4.568 No price 580.99 $580.99 580.99 No
1.2 Defensive programming versus validation Q: Imagine a cashier who receives a $2.9 note
- Should the cashier figure out what to do with it?
- Should the bank never have issued it in the first place? [1] AI-generated image [1]
1.2 Defensive programming versus validation (cont.)
- The cashier = display_gem() – the module receiving the data
- The $2.9 note = the bad input – float instead of int, None instead of string
- The bank = the input stage – where data enters the system So …
- If you’re okay with cashier deals with bad note => display_gem() must handle unexpected input itself => Defensive Programming
- If you agree with bank never issues bad notes => Validate input before it reaches display_gem() => Validation
1.2 Defensive programming versus validation (cont.)
- Defensive Programming: "A software development practice in which the programmer assumes that undetected faults or inconsistencies may exist in code and implements measures to detect and safely handle such issues to improve software robustness and reliability.“ [1]
- Input Validation: "The process of ensuring that the data provided to a program meets specific criteria before it is processed. This process helps prevent errors, security vulnerabilities, and unexpected behavior by verifying that user input is both appropriate and safe.“ [2] [1] ScienceDirect Topics. (2016). Defensive programming. Elsevier.
https://www.sciencedirect.com/topics/computer-science/defensive-programming [2] Fiveable. (n.d.). Input validation. https://fiveable.me/key-terms/introduction-engineering/input-validation
1.2 Defensive programming versus validation (cont.)
- To put in plain terms
- Now let’s go back to our current example at Step 5. approaches? Defensive Programming Input validation Where? Inside the module At the point of input Approach Handle bad data when it arrives Reject bad data before it enters Field Input Expected Actual Match? id 1 1 1 Yes description Emerald “Emerald” Emerald Yes weight 4 4.00 4 No price 580.99 $580.99 580.99 No Test Field Input Expected Actual Match? id 2.9 2 2.9 No description None “Unknown” None No weight 4.568 4.57 4.568 No price 580.99 $580.99 580.99 No
1.2 Defensive programming versus validation (cont.) Defensive programming:
- The original:
- The fix
1.2 Defensive programming versus validation (cont.) Defensive programming: Q: What are the key changes that you notice? Q: It fixed the program. But what if there are 10 modules that all receives gem data? Do we fix all 10? Field Input Expected Actual Match? id 1 1 1 Yes description Emerald “Emerald” Emerald Yes weight 4 4.00 4.00 Yes price 580.99 $580.99 $580.99 Yes Field Input Expected Actual Match? id 2.9 2 2 Yes description None “Unknown” “Unknown” Yes weight 4.568 4.57 4.57 Yes price 580.99 $580.99 $580.99 Yes
1.2 Defensive programming versus validation (cont.) Validation:
- The problem with defensive programming alone
- def display_gem(gem): #fixed
- def calculate_tax(gem): #nah
- def save_to_file(gem): #nah
- def print_receipt(gem): #nah
- Every module has to individually defend itself against the same bad data. It leads to repeated, complex code across the entire program.
- But … where does bad data come from in the very first place?
- Input! – the moment id, description, weight and price are collected from the user
- Q: Is there any approach we can use to defend bad data input that we have been familiar with so far?
- input_functions.py file!
1.2 Defensive programming versus validation (cont.)
- Q: In input_functions.py file, do you think the original read_integer() is resilient enough? Probably not …
- The fix:
1.2 Defensive programming versus validation (cont.)
- Similarly, there is a function read_float(), but it’s currently not good enough
- The fix:
- How it works:
- “3.14”.replace(“.”, “”, 1) remove first “.” so is_digit() can check the rest. For example: “3.14” -> “314” -> True
1.2 Defensive Programming vs Validation (cont.)
TL;DR:
- Defensive Programming
- handle any unexpected input inside the module: ‒ Check for None before using a value ‒ Convert types explicitly: int(), float() ‒ Provide fallback defaults ‒ Makes later modules safer but code grows complex
- Validation at Input
- reject bad input before it enters the system:
- Loop until user enters a valid value
- Enforce ranges (min <= value <= max)
- Simplifies later code – modules can trust their inputs
- Regular Expression (Regex) – careful since it may break your mind a bit, for now!
- 1.3 Pytest
- Generate test case ‒ In step 5, we compare actual vs expected output by eye: modules, each with 10 test cases? ‒ That is 200 comparisons. Do you want to check them manually? Field Input Expected Actual Match? id 1 1 1 Yes description Emerald “Emerald” Emerald Yes weight 4 4.00 4 No price 580.99 $580.99 580.99 No Field Input Expected Actual Match? id 2.9 2 2.9 No description None “Unknown” None No weight 4.568 4.57 4.568 No price 580.99 $580.99 580.99 No
- 1.3 Pytest
- Generate test case (cont.) Pytest may come to save the day!
- pytest is an open-source testing framework for Python that automates the process of running test cases and checking their results.
- Rather than printing output and checking it manually, pytest allows you to write assertions — statements that describe what the output should be. pytest then runs all tests automatically and reports which ones passed and which ones failed.
- To get started, run: pip install pytest
- 1.3 Pytest
- Generate test case (cont.) Let’s apply this into our example GemRecord:
- test_gem.py
- How does it work?
- def test_...(): pytest automatically finds any function start with test_
- capsys: built-in pytest tool that captures what print() outputs
- capsys.readouterr(): retrives captured output so we can check it
- assert: check if something is True – if not, the test fails
- 1.3 Pytest
- Generate test case (cont.)
- Now let’s run the test case by running: pytest test_gem.py -v
- To put in simple term, it is equivalent to:
Field Input Expected Actual Match? id 1 1 1 Yes description Emerald “Emerald” Emerald Yes
weight 4 4.00 4.00 Yes price 580.99 $580.99 $580.99 Yes
- 1.3 Pytest
- Generate test case (cont.)
- You can continue to explore the file test_gem.py. There are multiple test cases there already.
- Pytest can run all of your test cases at once. Less time wasted on manually checking, more time to do real work!
- It becomes even more powerful later when you modify your code, as it instantly catches anything that stops working..
- The Golden Rule for testing
- When testing any module that processes a list, always test with exactly 0, 1 and 3 elements [1]
- 0 elements – an empty list. Does the program crash or handle it gracefully?
- 1 element – a single item. The boundary between empty and multiple.
- 3 elements – multiple items. Confirms the loop processes all items correctly.
- Run: pytest test_golden_rule.py -v for more details
[1] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology.
3.1 Bottom-Up
- When modules are tested together for the first time, errors are hard to isolate.
- Is the bug in read_gem()?
- Is it in display_gem()?
- Or is it in read_integer()?
- Bottom-up testing solves this by testing each module independently before combining them.
- Start from the lowest level modules first and work up:
- Level 1 (test first): read_integer(), read_float(), read_string()
- Level 2 (test next): read_gem()
- Level 3 (test last): display_gem(), display_gems()
3.1 Bottom-Up (cont.)
- The whole philosophy after this is: If small things work, big things work
- When each module is tested independently:
- Errors are isolated because if display_gem() fails, the bug is inside display_gem(), not somewhere else
- Bugs in small modules are simpler to find
- Module is confirmed working before the next one is built on top of it
- pytest makes this practical because writing a separate test file for each module and running them all with one command.
- For more information on bottom-up programming, please refer to:
https://www.youtube.com/watch?v=8dXfEADEZv0
3.2 Top-Down
- Top-down debugging is the opposite approach
- You write all modules first, run the whole program and fix errors as they appear.
- This is what most programmers do in practice – but it is less reliable than bottom-up testing because errors are harder to isolate
- The workflow is: read_integer() -> read_gem() -> display_gem() -
- display_gems(). When we run display_gems(), if something breaks, we need to back track to find out where
3.2 Top-Down (cont.)
- When something breaks in top-down debugging, we have two strategies to find where it is:
- Tracking variable:
- Add print() statements at key points to see what the data looks like as it moves through the program.
- Binary chop
- Let’s play binary_chop_game.py
- Q: What is the optimal way to win this game?
- It is the same with programming – start at the beginning of the code and the end, and print out the state of the variables. [1]
- It is the same with programming – start at the beginning of the code and the end, and print out the state of the variables. [1]
- This may find the problem sooner [1] [1] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology.
- Two approaches for testing/debugging
TL;DR: Bottom-Up Testing (preferred):
- Test each module independently before combining
- Write a test harness per function
- Most reliable – errors are isolated
- Python tools: pytest, unittest Top-Down Debugging (common in practice, less thorough):
- Write all modules, then find errors as they appear
- Add print() statements to track variable state
- Use the tracking variables or binary chop strategy
- Python debugger may be useful
Summary
- predefined inputs and expected outputs
- Design test cases before writing the module code
- Defensive programming handles unexpected input inside a module; validation blocks it at input
- Golden Rule: always test with 0, 1, and 3 elements for any loop or list operation
- Bottom-up testing (unittest/pytest) is more reliable than top-down
- Two debugging strategies: tracking variables in sequence, or binary chop