Peter Nairn

Lesson to learn - Part 2

Posted on Tue 11 Apr 2006 at 01:10 in Stories

As people seemed to like my last story, here is my favourite story of a user doing something to break a system, which also shows how testing techniques can fail.

 

We were building a large, complex system for a customer and spent 2 years on the development and testing.  I was the Test Manager and by the end of the project I was very pleased with the way it had gone from a testing perspective.  It was one of those rare projects where the requirements had been very well specified by the customer, the specifications had been well produced, the unit testing had been good and the System Testing had been given sufficient time to do a good job.  This was an extremely important application that had serious consequences if it failed.

 

When the project went live, I was asked to be the Support Manager on the live site.  My role was to train the Operations staff in how to run the system and to provide on-site support for any problems that came about.  

 

The system ran fine, with some small hiccups, for about 3 months and we and the customer we very pleased with the way it was running (apart from one “problem” that is a subject of another user story).

 

Then, one day, the system crashed.  Big time.  Corrupt data, users couldn’t do anything, machine had its legs in the air doing a good impression of a heap of junk.  The part of the system that had failed was an interim in-memory “database” that was used to capture user input before being written to the database on disk and then onto Optical disks.  I attempted to diagnose the fault and had no idea how the problem had arisen.  The fault logs were of no use whatsoever.  I cleared the in-memory database, reset the database and started everything up again.  This took about 2 hours which was a disaster at a critical time of the day.  The users then had to re-input all their data.  I increased the logging level and left it running. 

 

All went fine for a couple of days and then it happened again.  The customer was now livid and threatening all sorts of legal action, so the pressure was on.  Again, I got the machine running after copying off the logs.  It took me a couple of hours to plough through the logs and I found the problem.  The main user input to the system was names and addresses and the data held in interim memory was just a stream of data with the field separator of a double quote.  One user, who had only just joined the organisation, had seen the name O’Donnell and input O”Donnell.  The system thought that was a field separator and it threw all of the following database updates out by one field.  One of the following fields then trampled all over system memory and the machine just died.

 

We, as the test team, had tested this input field, we thought, to destruction and one of the characters that hadn’t been tried was a double quote.  The tester had used equivalence partitioning on the field and he didn’t choose double quote as an invalid input.  As a black box tester, he didn’t know that the internals of the system used double quote as a field separator.

 

Morals of this story:

 

- Test techniques won’t catch every bug. I know that this is not much of a moral, every tester already knows this, but it is sometimes worth reminding ourselves.

- Knowing something about the internals of the system can help you test more effectively

- Having at least two levels of logging (brief and detailed) in a live system can help you diagnose problems.  There were three levels on this system, level 1 was very brief, just a record of function calls, very small impact on performance, level 2 was less brief, recorded module calls, more impact on performance and level 3 was detailed showing all the user input at every screen, with even more impact on performance.

- Don’t agree to be the Support Manager.  I hated it, I much prefer Test Management.

Coincedence

Posted on Tue 11 Apr 2006 at 01:46 by philk10
Another good story - and I'm pretty sure we had problems with an O'Donnell as well !!!

maybe it should be called The Paddy Test...

Edited by philk10 on Tue 11 Apr 2006 at 05:47

Last Page | Page 49 of 50 | Next Page

RSS feed

- Subscribe