It is only a one line fix
We had a problem report outstanding for a long time, simple little problem. When a user gets a message in their inbox, a yellow marker shows up on their screen to indicate they have a message waiting. Some messages are important messages that the user has to action and then they get a red marker on their screen. Some users get the important messages as a FYI, but were still getting the red marker when they can't action the message and in this circumstance they should receive neither a red or yellow marker. Low severity and low priority bug.
Fix comes in from Development. The fix is to remove the marker for these users, which is a simple one line fix.
We test it. It is a low severity/low priority bug amongst all the other bug fixes we have to retest, to do a lot of testing in this area requires a lot of data set-up, so we just do a quick check. All is well.
The release goes to Live.
Everyone reading this knows what is going to happen next, otherwise there would be no point to this post! Yes, something was broken. True the users who were supposed not get their marker didn't get their marker, however, no other users were getting any marker either and this marker is relied on for important messages, as the users need to take action - no action means the user is prevented from using the system. Not getting the marker meant some users were not actioning and getting locked out of the system.
Big “oops”.
OK, so all test groups make this type of error sooner or later and we learn from them (sort of, sometimes we don't). But the whole episode made me think about Michael Bolton's distinction between “checking” and “testing”. What we tend to do on low severity bug retests are “checks”. Heck, we have already tested this area once, lets just check the fix. We might do some regression testing, maybe through running an automated suite, but we rarely test a bug fix that is low severity.. Maybe other test groups do more testing on this type of bug fix, but mostly my group doesn't.
So, I asked myself, SHOULD we have done more testing? In order to answer that, I looked back at all the low severity bugs we had found, over 3000 of them. I looked at how many of these fixes had either failed retest and/or caused a problem in Live. Few had failed retest, less than 1 percent. I could only find one other low severity problem that had caused a problem in Live and that was also low severity. I also estimated what would it have taken to test these fixes rather than check them? A lot was the answer. The rate of return for putting testing effort into these bug fixes would have been incredibly low.
So, my conclusion was that, yes it was bad that the bug got through, but I don't want to start testing low severity bugs, it will cost too much effort that would be better targeted at more important areas finding more important bugs. I will just have to take the risk that another low severity bug fix will cause a problem in Live.
Testing is all about assessing risks and acting accordingly.