Weekly Programming Topics

Testing and Debugging

Week 10 – Testing and Debugging · Reminder:

Testing and Debugging

Week 10 – Testing and Debugging

Reminder:

We will have an upcoming in-class test during lab time.
You will be given 2 hours to complete the test.
Please submit your work (Python files and screenshots) before you leave the lab.
We will mark your submission and provide feedback on whether you:
Pass
Need-to-fix
Failed

Table content

Testing
Test Harnesses
Defensive programming versus validation
Pytest – generating test cases
The Golden Rule for testing
Two approaches for testing/debugging
Bottom-up approach
Top-down approach

1.1 Test Harnesses

“A test harness is a group of software and test data designed to test a program element by operating it under different situations and supervising its practices and results.” [1] Or: A test harness is a small, dedicated program written specifically to test a module. [2] Steps to create one:

Define the module prototype (inputs and outputs)
Brainstorm test cases BEFORE writing the code
Create test data covering normal and edge cases
Write the harness to run each test
Compare actual vs expected output [1] GeeksforGeeks. (2025). Software testing – test harness. https://www.geeksforgeeks.org/software- testing/software-testing-test-harness/ [2] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology.

1.1 Test Harnesses (cont.)

Step 1. Define the module prototype (inputs and outputs)

The module prototype is display_gem(gem)
The module takes a GemRecord object and displays its fields
The data types are expected to be as follows: [1] [1] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology. Field Data Type Description Example description String Text describing the type of gem “Emerald” weight Float Weight in grams 3.4 price Float Price in AUD 560.45

Step 2. Brainstorm test cases BEFORE writing the code

Test 1 – normal input:
id = 1,
description = "Emerald"
price = 580.99
Test 2 – edge case:
id = 2.9
description = None
weight = 4.568
price = 580.99

Test 1 – normal input:
description = "Emerald"
price = 580.99
Test 2 – edge case:
id = 2.9
description = None
weight = 4.568
price = 580.99 Q: Looking at Test 2, identify what is "wrong" with each input value? Why these are called edge cases. (Hint: Take a look at their data types.) Q: Can you come up with a few more normal inputs and edge cases for GemRecord module? Field Data Type Description Example description String Text describing the type of gem “Emerald” weight Float Weight in grams 3.4 price Float Price in AUD 560.45

Step 3: Create test data covering normal and edge cases

Considering a variety of possible inputs and expected outputs for the module above: [1] Q: Can you create test data for your newly created inputs from the previous step? [1] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology. Field Input Expected Output description Emerald “Emerald” weight 4 4.00 price 580.99 $580.99 Field Input Expected Output id 2.9 2 description None “Unknown” weight 4.568 4.57 price 580.99 $580.99

Step 4: Write the harness to run each test

Based on the expected output, complete display_gem(gem) procedure

Step 4: Write the harness to run each test

Finish up with the 2 test cases: 1 for normal inputs, 2 for edge cases

Step 5: Compare actual vs expected output

Does the output match what was expected? [1] Q: Run display_gem(gem) on your previous inputs, what do you observe? [1] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology. Field Input Expected Actual Match? id 1 1 1 Yes description Emerald “Emerald” Emerald Yes weight 4 4.00 4 No price 580.99 $580.99 580.99 No Field Input Expected Actual Match? id 2.9 2 2.9 No description None “Unknown” None No weight 4.568 4.57 4.568 No price 580.99 $580.99 580.99 No

1.2 Defensive programming versus validation Q: Imagine a cashier who receives a $2.9 note

Should the cashier figure out what to do with it?
Should the bank never have issued it in the first place? [1] AI-generated image [1]

1.2 Defensive programming versus validation (cont.)

The cashier = display_gem() – the module receiving the data
The $2.9 note = the bad input – float instead of int, None instead of string
The bank = the input stage – where data enters the system So …
If you’re okay with cashier deals with bad note => display_gem() must handle unexpected input itself => Defensive Programming
If you agree with bank never issues bad notes => Validate input before it reaches display_gem() => Validation

1.2 Defensive programming versus validation (cont.)

Defensive Programming: "A software development practice in which the programmer assumes that undetected faults or inconsistencies may exist in code and implements measures to detect and safely handle such issues to improve software robustness and reliability.“ [1]
Input Validation: "The process of ensuring that the data provided to a program meets specific criteria before it is processed. This process helps prevent errors, security vulnerabilities, and unexpected behavior by verifying that user input is both appropriate and safe.“ [2] [1] ScienceDirect Topics. (2016). Defensive programming. Elsevier.

https://www.sciencedirect.com/topics/computer-science/defensive-programming [2] Fiveable. (n.d.). Input validation. https://fiveable.me/key-terms/introduction-engineering/input-validation

1.2 Defensive programming versus validation (cont.)

To put in plain terms
Now let’s go back to our current example at Step 5. approaches? Defensive Programming Input validation Where? Inside the module At the point of input Approach Handle bad data when it arrives Reject bad data before it enters Field Input Expected Actual Match? id 1 1 1 Yes description Emerald “Emerald” Emerald Yes weight 4 4.00 4 No price 580.99 $580.99 580.99 No Test Field Input Expected Actual Match? id 2.9 2 2.9 No description None “Unknown” None No weight 4.568 4.57 4.568 No price 580.99 $580.99 580.99 No

1.2 Defensive programming versus validation (cont.) Defensive programming:

The original:
The fix

1.2 Defensive programming versus validation (cont.) Defensive programming: Q: What are the key changes that you notice? Q: It fixed the program. But what if there are 10 modules that all receives gem data? Do we fix all 10? Field Input Expected Actual Match? id 1 1 1 Yes description Emerald “Emerald” Emerald Yes weight 4 4.00 4.00 Yes price 580.99 $580.99 $580.99 Yes Field Input Expected Actual Match? id 2.9 2 2 Yes description None “Unknown” “Unknown” Yes weight 4.568 4.57 4.57 Yes price 580.99 $580.99 $580.99 Yes

1.2 Defensive programming versus validation (cont.) Validation:

The problem with defensive programming alone
def display_gem(gem): #fixed
def calculate_tax(gem): #nah
def save_to_file(gem): #nah
def print_receipt(gem): #nah
Every module has to individually defend itself against the same bad data. It leads to repeated, complex code across the entire program.
But … where does bad data come from in the very first place?
Input! – the moment id, description, weight and price are collected from the user
Q: Is there any approach we can use to defend bad data input that we have been familiar with so far?
input_functions.py file!

1.2 Defensive programming versus validation (cont.)

Q: In input_functions.py file, do you think the original read_integer() is resilient enough? Probably not …
The fix:

1.2 Defensive programming versus validation (cont.)

Similarly, there is a function read_float(), but it’s currently not good enough
The fix:
How it works:
“3.14”.replace(“.”, “”, 1) remove first “.” so is_digit() can check the rest. For example: “3.14” -> “314” -> True

1.2 Defensive Programming vs Validation (cont.)

TL;DR:

Defensive Programming: handle any unexpected input inside the module: ‒ Check for None before using a value ‒ Convert types explicitly: int(), float() ‒ Provide fallback defaults ‒ Makes later modules safer but code grows complex
Validation at Input: reject bad input before it enters the system:

Loop until user enters a valid value
Enforce ranges (min <= value <= max)
Simplifies later code – modules can trust their inputs
Regular Expression (Regex) – careful since it may break your mind a bit, for now!

1.3 Pytest: Generate test case ‒ In step 5, we compare actual vs expected output by eye: modules, each with 10 test cases? ‒ That is 200 comparisons. Do you want to check them manually? Field Input Expected Actual Match? id 1 1 1 Yes description Emerald “Emerald” Emerald Yes weight 4 4.00 4 No price 580.99 $580.99 580.99 No Field Input Expected Actual Match? id 2.9 2 2.9 No description None “Unknown” None No weight 4.568 4.57 4.568 No price 580.99 $580.99 580.99 No
1.3 Pytest: Generate test case (cont.) Pytest may come to save the day!

pytest is an open-source testing framework for Python that automates the process of running test cases and checking their results.
Rather than printing output and checking it manually, pytest allows you to write assertions — statements that describe what the output should be. pytest then runs all tests automatically and reports which ones passed and which ones failed.
To get started, run: pip install pytest

1.3 Pytest: Generate test case (cont.) Let’s apply this into our example GemRecord:

test_gem.py
How does it work?
def test_...(): pytest automatically finds any function start with test_
capsys: built-in pytest tool that captures what print() outputs
capsys.readouterr(): retrives captured output so we can check it
assert: check if something is True – if not, the test fails

1.3 Pytest: Generate test case (cont.)

Now let’s run the test case by running: pytest test_gem.py -v
To put in simple term, it is equivalent to:

Field Input Expected Actual Match? id 1 1 1 Yes description Emerald “Emerald” Emerald Yes

weight 4 4.00 4.00 Yes price 580.99 $580.99 $580.99 Yes

1.3 Pytest: Generate test case (cont.)

You can continue to explore the file test_gem.py. There are multiple test cases there already.
Pytest can run all of your test cases at once. Less time wasted on manually checking, more time to do real work!
It becomes even more powerful later when you modify your code, as it instantly catches anything that stops working..
The Golden Rule for testing
When testing any module that processes a list, always test with exactly 0, 1 and 3 elements [1]
0 elements – an empty list. Does the program crash or handle it gracefully?
1 element – a single item. The boundary between empty and multiple.
3 elements – multiple items. Confirms the loop processes all items correctly.
Run: pytest test_golden_rule.py -v for more details

[1] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology.

3.1 Bottom-Up

When modules are tested together for the first time, errors are hard to isolate.
Is the bug in read_gem()?
Is it in display_gem()?
Or is it in read_integer()?
Bottom-up testing solves this by testing each module independently before combining them.
Start from the lowest level modules first and work up:
Level 1 (test first): read_integer(), read_float(), read_string()
Level 2 (test next): read_gem()
Level 3 (test last): display_gem(), display_gems()

3.1 Bottom-Up (cont.)

The whole philosophy after this is: If small things work, big things work
When each module is tested independently:
Errors are isolated because if display_gem() fails, the bug is inside display_gem(), not somewhere else
Bugs in small modules are simpler to find
Module is confirmed working before the next one is built on top of it
pytest makes this practical because writing a separate test file for each module and running them all with one command.
For more information on bottom-up programming, please refer to:

https://www.youtube.com/watch?v=8dXfEADEZv0

3.2 Top-Down

Top-down debugging is the opposite approach
You write all modules first, run the whole program and fix errors as they appear.
This is what most programmers do in practice – but it is less reliable than bottom-up testing because errors are harder to isolate
The workflow is: read_integer() -> read_gem() -> display_gem() -
display_gems(). When we run display_gems(), if something breaks, we need to back track to find out where

3.2 Top-Down (cont.)

When something breaks in top-down debugging, we have two strategies to find where it is:
Tracking variable:
Add print() statements at key points to see what the data looks like as it moves through the program.

Binary chop
Let’s play binary_chop_game.py
Q: What is the optimal way to win this game?
It is the same with programming – start at the beginning of the code and the end, and print out the state of the variables. [1]
It is the same with programming – start at the beginning of the code and the end, and print out the state of the variables. [1]
This may find the problem sooner [1] [1] Mitchell, M. (2024). Week 10 – Testing and debugging [Lecture slides]. Swinburne University of Technology.
Two approaches for testing/debugging

TL;DR: Bottom-Up Testing (preferred):

Test each module independently before combining
Write a test harness per function
Most reliable – errors are isolated
Python tools: pytest, unittest Top-Down Debugging (common in practice, less thorough):
Write all modules, then find errors as they appear
Add print() statements to track variable state
Use the tracking variables or binary chop strategy
Python debugger may be useful

Summary

predefined inputs and expected outputs

Design test cases before writing the module code
Defensive programming handles unexpected input inside a module; validation blocks it at input
Golden Rule: always test with 0, 1, and 3 elements for any loop or list operation
Bottom-up testing (unittest/pytest) is more reliable than top-down
Two debugging strategies: tracking variables in sequence, or binary chop