Peter Nairn

Some FACTS about testing

Posted on Thu 9 Apr 2009 at 04:42 in Stories

FACTS:

- The customer is always right

- Automated testing always gives the right answer

- I don’t believe in coincidence

 

Maybe these facts aren’t correct?

 

Here is a cautionary tale of something that happened to us over the last two weeks.

 

We have a process with our customer in order to update some reference data on the database periodically.  This data gets updated maybe a four or five time a year, data is amended or added to (never deleted).  This is their data, spe******ed information that, quite frankly, we don’t claim to understand nor do we need to (hold that thought!). Over the years we have made this a pretty slick process; the customer provides a spreadsheet with the additions and the amendments which gets folded into the data on the database.  We, the test team, have an automated script which compares what the customer wanted with what we have on the database.  The automated script takes minutes to run (for those interested, this is a VB script in Excel).  If all matches, we sign it off.  Up until last week, we have had no problems with this in over 4 years.

 

Last week, we found a discrepancy when the automated script was run.  Two values did not match.  No big deal, a bug report was written and the database team corrected the data in the database, we closed off the bug report as the automated script now said that the two matched.

 

End of story…..

 

Well, not exactly.  The changes went into the UAT environment, the users started using it and the system started behaving very strangely in a key application.  By coincidence, there was a fault raised in Live at the same time on this same area.  Aha!  Duplicate fault from UAT in Live – not a problem, not too serious, don’t panic chaps!  Upon investigation, the Live fault was found as being a user error so that’s OK then….

 

Well, not exactly.  To their credit, the UAT team stuck to their guns and insisted that the problem in their environment was analysed.  We sighed, grumbled about picky customer, you know the sort of thing, but we started investigation.  It quickly became apparent that the problem that the automated script had seen was not a problem in the database at all, but a problem with the spreadsheet that the customer had given us.  The database had, therefore, been incorrectly “fixed” to be equally incorrect.  Causing the problems in the UAT environment.  Also, this showed that the problem was not the same as in Live.

 

No problem!  Uncorrect the database and everything will be fine…

 

Well, not exactly.  The change was backed out of the database and the UAT environment was still behaving strangely.  There was much scratching of heads, tut-tutting, sage experts poring over complicated SQL and even more complicated COBOL programs.  After some considerable, stressful, hours, the problem was found.  The misbehaviour due to the incorrect values in the database had caused flags to be reset so that when the correct values were restored, these flags were still set incorrectly.  We corrected these and everything was OK – with 5 minutes to spare before the customer pulled the plug on an important release.

 

Debunking the facts!

 

- The customer is always right.  Not really, the customer can make mistakes too.  The real fault is believing that the customer has NOT made a mistake and jumping to the conclusion that the system is at fault.  More thought required in diagnosis.

 

- Automated testing always gives the right answer.  I never, ever, believed this one.  Automation without thinking about the problem is not testing.  It is, however, very easy to get complacent about an automated test that keeps giving you answers that look correct (discrepancy or no discrepancy).  We got complacent about this process of updating data.  We won’t make that mistake again AND we need to understand more about what this data does so that we don’t rely on UAT to determine whether it is right or not!

 

- I don’t believe in coincidence.  Here, there was a real coincidence that threw us off the scent for some time.  Coincidences DO happen!

 

Some interesting lessons learned (or, maybe, re-learned).

 

 

A consultant's tale

Posted on Fri 15 Jun 2007 at 09:38 in Stories

A few years ago, I went into a large multi-national bank to perform a consultancy task to review their testing practices.  As a consultant, you get one of three types of reactions when you enter the building:

 

  1. A Groan – This is often because they have had bad experiences with consultants before and think this one will be no different.  I don’t mind a groan, this often tells me that they have decent practices already and you won’t find much to worry about.  This type of assignment is generally praising what they have done and recommending improvements in some areas.
  2. A Hurrah – This is the juicy assignment.  A hurrah usually means that the workers know that what they are doing is insufficient and are hoping that the consultant will finally tell the bosses that they need to get their act together.  This assignment often means that there are no (or very bad) practices and you are starting from ground zero.
  3. No reaction – These are the worrying and most hateful assignments.  This often means that someone high up has called in a consultant with no buy-in from the staff and they are not going to co-operate with a single thing you do.  You have an up-hill battle to even find a desk or get told where the toilets are.  Your best bet is to do what you can, write the report and get out.

 

Anyway, back to this bank.  I got the “Hurrah” reaction when I walked in.  I soon found out why.  In the section I had been called into, there was a development team of about 80 and 2 testers, yes, count them, 1, 2.  These 2 testers were from one of the high street branches and had been called in a couple of years previously to give the system a quick once over before it went live.  2 years later, the system was still a mess.  Some releases had gone to live and some of it even worked, but the system had some chronic problems.

 

I started doing the usual things for gathering information and it quickly became clear that the main problem was not only the testing (surprise!) but the Project Managers.  One Project Manager, during my discussion with him stated “We incentivise our staff with a bonus which gets more the fewer bugs are found in the code”.  I asked him how the tester was incentivised and he replied with a puzzled look on his face “The same, of course”.  So, the testers got a greater bonus the fewer bugs they found!

 

The testers had been given no training in testing, they were both intelligent people but they were not IT people, they were business people.  I asked them for their test plans and test scripts and it came as no surprise to me that I got a 1 page test plan and 2 pages of test script.  

 

The reaction to the presentation I gave to the senior bosses was extremely positive.  You could almost hear the sighs of relief around the room that the problems had been identified and that an action plan had been proposed.  This came as a surprise to me as I was expecting some hostility (I had pulled no punches).  It turned out that Director had not been with the bank for very long, which I knew, and he had been haranguing the senior managers about the lack of quality and they just didn’t know what to do about it – now they had a way forward.

 

That 4 week assignment to review their testing practices turned into a 4 year stint with them as they asked me to put the action plan in place.  And a very happy 4 years they were for me.  And I kept on the two business people, trained them and they became very effective business testers.

 

The lessons I learnt from this experience were:

 

·        Just because it is a large company, it doesn’t mean they have good practices

·        Senior IT managers do not always understand how to improve quality, they need telling.

·        The situation of lack of testing in an organisation is an opportunity, not a reason to be depressed.

·        “Shooting from the hip” sometimes works!

 

 

Don't take people on trust

Posted on Mon 5 Mar 2007 at 09:30 in Stories

 

I, and others, have written a couple of blog entries on how activities outside of software testing can help understand testing (I refer to my entries on Dancing and Gardening).  Here is a story that has happened to me over the last couple of weeks that show the reverse, i.e. where testing skills have helped in my non-work life.

 

My wife is very interested in alternative therapies, herbal remedies, Reiki, crystal healing and the like.  This in turn has made her interested in the more mystical aspects that seem to go along with these alternative therapies, such as reading and understanding Auras, Angels and so on.  Their overriding philosophy is one of openness and honesty in all dealings – remember this for later!

 

Because my wife has this interest, I have also now got a (mild) interest although I am much more sceptical than her.  It is interesting, though, how a lot of the written works in this area have a direct correlation with some of the management books I have read, particularly on managing people, motivation and intuition.

 

However, I digress.  There is a company local to us that has a shop selling things like crystals, incense sticks, books, etc.  They also run a number of courses on the types of healing and how to become more in tune with yourself.  My wife is a frequent visitor to the shop and knows the owners quite well.  

 

A couple of weeks ago the company sent out an email asking for an investor/business partner and were offering a third of the business in return for the investment as well as asking for the investor to participate in the management of the business.  My wife was interested, so we asked for information.  They told us that they wanted £x for the third of the business and that the investor could expect to make about 6% return.  I thought £x was a bit high, so after some thought, we asked the following, basic, questions:

 

·                What was the investment to be used for

·                How was the business value of £3x calculated

·                What was the percentage return on investment for the previous year.

 

I also told them that depending on their answers I would then want my accountant to get involved.

 

All went quiet for almost a week.

 

We then got an email saying that they had decided to put the business up for sale.  The price?  £x/2.  So they had valued the business for an investor at £3x, but were prepared to sell it for a sixth of that value.  It smelled very fishy.  We suspect that the £x investment was to clear debts and the business is in trouble.  So much for openness and honesty, it feels like they were trying to con us.

 

So, by asking simple questions that you would expect a tester to ask (or, indeed any potential investor) we avoided a potentially damaging financial loss.

 

The lesson from this story is that even though someone would appear to have the highest values, you still cannot fully trust them.  This can be true in Testing.  Even the most honest appearing Project Manager can try to hide things from you to get what they want.

 

Bottom line:  If it looks like a duck, walks like a duck and quacks like a duck, you still need to do a DNA test to show it really is a duck!

Computers more important than human life?

Posted on Thu 1 Feb 2007 at 08:58 in Stories

Computers are more important than human life?

 

Way back in the early 1980s I worked on a large mainframe computer in London.  This beast ran at an amazing speed (for then) and generated a significant amount of heat.  The computer room was the size of a football pitch and contain latest state of the art hard disk drives, tape drives, printers, etc.

 

The computers were cooled using halon, so there were massive tanks of the stuff circulating round the hardware trying to keep it cool.  The management’s main concern was a fire as there was no offsite backup of the data and so they installed halon gas extinguishers into the ceiling of the computer room which would go off in the event of smoke being detected.  They calculated that the data was safe for about 10 seconds in the event of fire, so set the halon to go off 7 seconds after detecting the fire.  

 

For those of you who don’t know, halon has the property of removing oxygen so the fire would go out.  One tiny detail was overlooked – humans need oxygen to live.  If you were at the far end of the computer room, i.e. furthest away from the door, we calculated we could not run to the door in less than 20 seconds, even at panic speed.  Given the efficiency of halon, we reckoned we would be dead in less than 15 seconds.  But, it got worse, 10 seconds after the fire alarm went off the lead lined exit door was locked automatically to prevent any fire spreading, so even if you were half way down the room you wouldn’t get out in time.

 

Where were the system consoles positioned?  You’ve guessed it, right at the far end of the computer room.

 

Of course the management were very concerned when we pointed this out to them, they immediately sprung into action and moved the system consoles to just near the door – 2 years after we notified them of the problem.

 

Recruiting your successor

Posted on Thu 12 Oct 2006 at 01:34 in Stories

The project is being outsourced, hey-ho, never mind, not sure it is a good idea, in fact I am sure it isn’t but that is what is going to happen and nothing I do or say is going to change that.  But, this blog entry isn’t about outsourcing and the pluses and minuses, this entry is about recruiting your successor.  

It is a strange experience trying to recruit someone who will be doing your job and when they do you will be out of a job.  Now, now, put those hankies away, I have been expecting this for over a year, I am a contract test manager so expect to have periods of unemployment, it goes with the territory.

I have been trying to recruit my replacement for some months now and it is proving to be extremely difficult.  Here are the difficulties I have had:

·          The quality of test managers in the favoured country for outsourcing has been incredibly poor.  

·          The problem I have with rejecting people is that it could be seen as me finding fault so that I keep my job.  This is simply not true as I am a professional and will behave professionally at all times, regardless of personal thoughts.  Besides, I know I know am leaving so I want to have the best person I can find to take over from me.  I have made sure that for every interview I have had a senior permanent member of staff in the interview who can verify that I have been fair in my questions and answers and that my decision is a purely professional one. 

·          It is very difficult to interview someone over the phone and that being the only interview you can have and then you have to make a decision based on the telephone interview, but that is the process.

·          Despite being professional, see second point, I have a real affection for this project and the test team, most of whom I recruited and I want someone who has the drive, intelligence, testing skills and people skills that will enable the team to progress and not regress.  I have to ask myself “Am I too picky?”  I don’t think so, but I keep asking myself that question.

So why are the test managers in the favoured country so poor?  Obviously, I have not interviewed EVERY test manager in the country, so maybe my sample was just bad.  Here are some of the problems I encountered with them:

·          Poor grasp of English.  As the customer is English and does not speak favoured country language, English has to be good, not just adequate.  This role requires considerable customer facing skills.

·          Poor grasp of testing.  I expect a test manager to at least understand the concepts of testing and I want someone who has a good understanding of testing.  Not everyone agrees with me, some people say if you can manage one thing, then you can manage anything.  There is something in that, but on this project the test manager NEEDS to know a lot about testing.

·          Poor people skills.  Maybe it’s the culture of favoured country, although having managed a number of people from the country for some time I don’t think so.  I expect any manager to have decent people skills, it is part of the job.

·          A lack of understanding of what the test management role is.  These people are being put forward to me as experienced test managers and they do not understand some of the basics of what a test manager does, e.g. managing bug statistics.  I find that incredible.

So, recruiting your successor is not easy, I have found one good candidate so far out of many put forward, hopefully he will turn out OK.

And to sign off, this is the priceless statement one interviewee made “System Testing is a dumb task, you need no skill for it.  The real skills are in User Acceptance testing.”  My fellow interviewer said afterwards “Fortunate it was a telephone interview, I think you would have killed him had it been face-to-face”. 

Oh, and just on the of-chance that anyone was wondering where I have been for the last 3 months since my last blog entry, the project has been taking all of my time.  I am still up to my neck in muck and bullets, but I will try to keep on blogging when I get a spare few minutes

Developers - don't you just love 'em?

Posted on Tue 20 Jun 2006 at 12:48 in Stories

Developers – don’t you just love ‘em?

 

As part of releasing a new version of the software to Live, we perform a sanity test on the release day to check the release has gone Live OK and/or to be aware of potential problems when the users start using it after the release.  One of my testers found a problem with the release during this sanity check that was not evident in the test environment.

 

Conversations went something like this:

 

Tester:  I have found a problem with our Master/Slave installation

 

Me: Well, raise a bug report, high priority

 

Some few minutes pass as Tester tells Developer a bug report will be raised……..

 

The sound of heavy footsteps coming to my desk….

 

Developer (clearly upset):  Why do you want a bug report raised on this, it is not a bug!

 

Me:  Is there a problem with the installation script?

 

Developer: Yes

 

Me: Then we should raise a bug report

 

Developer:  But it is a very rare occurrence.

 

Me:  But not impossible?

 

Developer:  No, but it would be very unusual

 

Me:  So, it is possible and is a problem, therefore we raise a bug report.

 

Developer:  Well it isn’t a high priority problem.

 

Me:  What is the workaround then?  (Note:  High priority means there is no acceptable workaround)

 

Developer (after thinking for a few seconds):  We would send an engineer to the site with a memory stick

 

Me:  And would that be done within our SLA for this type of problem?

 

Developer:  It would only take a few minutes to do the installation

 

Me:  Even if the engineer had to drive 300 miles to the site?

 

Developer:  Well, I suppose not

 

Me:  So it is high priority problem

 

Developer:  It won’t get fixed.

 

Me:  That is not your call, nor mine – that is a decision made at the bug review meeting that I chair.

 

Developer:  To fix it we would have to redesign the system, do you want to redesign the system?

 

Me: I don’t do the design, I report the problems with the design.

 

Developer (now getting desparate):  Well, you won’t be popular for raising this.

 

Me:  I didn’t go into Test Management to be popular.

 

Developer storms off to get a doll in my image so he can stick pins in it (I suspect).

 

Some more time passes……

 

An email comes out that some Master/Slave devices in Live have had an installation problem…

 

The one statement that almost had me in fits of laughter was the “you won’t be popular for raising this”.  Did he really think that every time we find a problem that I sit down and determine whether to raise it or not depending on how it would affect my popularity? Sheesh.

Lesson to Learn - Part 3

Posted on Fri 9 Jun 2006 at 04:56 in Stories

In my second lessons learned, I mentioned there was a customer “problem” on a previously working system.  Here is the story.

 

The system had been sized to cope with X number of input forms per day and performance and load testing showed that the system could more than cope with twice that number, so we were confident that there would be no performance problems.  After a few weeks of live running, the customer complained that the throughput of forms was below what was expected and we needed to improve the performance of the system.  I was surprised as the system performance monitors showed that the system was never busy and never ran out of any resources.  I checked the logs and all looked fine, there was plenty of spare capacity.  I checked the networks and they were running smoothly with no bottlenecks.  I didn’t know what to do next.

 

The only conclusion I could come to was that there was something in the way the customer was using the system that was reducing the throughput of forms, so I went to the data input room and watched them input data for a couple of hours at the busiest time of the day.  All seemed fine, the input clerks were going as fast as they could and they were keeping up with the system and the system was keeping up with them.

 

What next?  I then went to the machines that were used for analysing the data on the forms, again at the busiest time of the day and all the operators there were going flat out, but again they were keeping up with the system and the system was keeping up with them.  I completely foxed, everything seemed fine everywhere, so why was the throughput down?

 

As a last resort, I decided to follow the forms from delivery to the building right through to final analysis.  This is what I found.

 

The forms arrived in a van on the ground floor at the Goods-in area.  The forms were off loaded onto trolleys and sent in the lift to the 12th floor.  There, the forms were taken off the trolleys, each one was booked in to a log, loaded back onto trolleys and the trolleys were then sent down a long corridor to the data input room where they were off loaded into piles for the data input clerks to process. 

 

During the busy time of the day, the trolleys were working full blast, the forms were sent to the data input room as fast as they arrived.  During the non-busy time, however, the porters would wait for a full trolley before sending it to the data input room.  This could cause a delay of half an hour, an hour or even longer between trolley loads.  The data input people were happy because to gave them a break.

 

All this meant that the throughput was delayed at certain times of the day and everything was idle.  The customer had complained that the average throughput was too low, however it wasn’t, the throughput was below average at certain times of the day.  

 

Lessons learned:

·         The customer may not tell you the whole story

·         The problem may be with the customer’s processes, not with the system

·         Don’t always assume that the busy part of the day is the cause of the problem (yes, I know that is counter-intuituve)

 

How do you know you are getting it right with your customer?

Posted on Fri 2 Jun 2006 at 09:16 in Stories

How do you know when you are “getting it right” with your customer?

 

I have dealt with customers as both Test Manager and Project Manager for some years now and I used to worry about whether I was “getting it right”.  I came to the conclusion some time ago that if you aren’t getting complaints and your management are not getting complaints about how you work with them, you must be doing alright.  Here is a little story about something that happened this week that made me feel really good about my relationship with the current customer.

 

Background:

 

I started on this project 3.5 years ago and the Test Group relationship with the customer was one of the worst I have ever come across.  They didn’t trust us, accused us of lying, hiding facts and the customer meetings I had were extremely difficult.  I had to work really hard to improve that relationship, which I think I managed to do.  One of the things I did which helped was to instigate a weekly bug review with the customer.  At that bug review, we review all the bugs that have been raised in the week (from System Test and Live) and all the bugs that have changed to a terminal state (i.e. closed or set to “not a fault”).  The purpose of this meeting was to make sure we all agreed whether it is a bug, that the severity is correct and the schedule for fixing is reasonable.  This meeting has been running for just under 3 years and has been very successful in improving openness and mutual understanding.  

 

One of the things that happens to us in software development is that a fix or enhancement can break something else or that a bug re-appears for whatever reason.  The UAT Manager at these meetings christened such bugs as coming from the “Change Fairy”.  A number of bugs over the years have been classified as having occurred due to the Change Fairy.

 

The story:

 

This week we held a “Celebration of Success” dinner which was a dinner for a number of the project team, customer and suppliers, about 100 people in all.  It was a posh do, dinner jackets, very formal.  After we all had eaten, there were presentations to a few people and speeches.  The UAT Manager presented to me a “Change Fairy”.  This was something he had made himself out of bits he had got from an old clock, a fairy that he had put on some wings, put into a cage and put on a plaque with the motto “Change Fairy – Do not Release”.

 

I think this shows how much our relationship has improved and is now very good, that a customer who had totally distrusted us can now pull a comical stunt like that at a public function.  I was struck dumb and so pleased. 

 

Yes, I think I am “getting it right” with this customer.

 

 

Lesson to learn - Part 2

Posted on Tue 11 Apr 2006 at 01:10 in Stories

As people seemed to like my last story, here is my favourite story of a user doing something to break a system, which also shows how testing techniques can fail.

 

We were building a large, complex system for a customer and spent 2 years on the development and testing.  I was the Test Manager and by the end of the project I was very pleased with the way it had gone from a testing perspective.  It was one of those rare projects where the requirements had been very well specified by the customer, the specifications had been well produced, the unit testing had been good and the System Testing had been given sufficient time to do a good job.  This was an extremely important application that had serious consequences if it failed.

 

When the project went live, I was asked to be the Support Manager on the live site.  My role was to train the Operations staff in how to run the system and to provide on-site support for any problems that came about.  

 

The system ran fine, with some small hiccups, for about 3 months and we and the customer we very pleased with the way it was running (apart from one “problem” that is a subject of another user story).

 

Then, one day, the system crashed.  Big time.  Corrupt data, users couldn’t do anything, machine had its legs in the air doing a good impression of a heap of junk.  The part of the system that had failed was an interim in-memory “database” that was used to capture user input before being written to the database on disk and then onto Optical disks.  I attempted to diagnose the fault and had no idea how the problem had arisen.  The fault logs were of no use whatsoever.  I cleared the in-memory database, reset the database and started everything up again.  This took about 2 hours which was a disaster at a critical time of the day.  The users then had to re-input all their data.  I increased the logging level and left it running. 

 

All went fine for a couple of days and then it happened again.  The customer was now livid and threatening all sorts of legal action, so the pressure was on.  Again, I got the machine running after copying off the logs.  It took me a couple of hours to plough through the logs and I found the problem.  The main user input to the system was names and addresses and the data held in interim memory was just a stream of data with the field separator of a double quote.  One user, who had only just joined the organisation, had seen the name O’Donnell and input O”Donnell.  The system thought that was a field separator and it threw all of the following database updates out by one field.  One of the following fields then trampled all over system memory and the machine just died.

 

We, as the test team, had tested this input field, we thought, to destruction and one of the characters that hadn’t been tried was a double quote.  The tester had used equivalence partitioning on the field and he didn’t choose double quote as an invalid input.  As a black box tester, he didn’t know that the internals of the system used double quote as a field separator.

 

Morals of this story:

 

- Test techniques won’t catch every bug. I know that this is not much of a moral, every tester already knows this, but it is sometimes worth reminding ourselves.

- Knowing something about the internals of the system can help you test more effectively

- Having at least two levels of logging (brief and detailed) in a live system can help you diagnose problems.  There were three levels on this system, level 1 was very brief, just a record of function calls, very small impact on performance, level 2 was less brief, recorded module calls, more impact on performance and level 3 was detailed showing all the user input at every screen, with even more impact on performance.

- Don’t agree to be the Support Manager.  I hated it, I much prefer Test Management.

A lesson to learn

Posted on Fri 7 Apr 2006 at 01:54 in Stories

You know, sometimes you get tripped up by being too narrow minded and believing what you see or hear is the truth.  Here is a little story that happened to me this week that I thought I would share – maybe others will learn the lesson, maybe I will next time!

 

We have had a problem in live running with one user (out of many thousands) who was complaining that on one particular screen the system kept freezing.  The user has complained a number of times and no-one could reproduce the problem.  His machine was swapped out, his mouse changed, keyboard changed all to no avail, he continued to report it was freezing. The help desk had repeatedly spoken to the guy to determine what he was doing and he seemed to be doing everything correctly.  We, in the test team tried, and failed to recreate the problem.  In the end I got one of my team to write an automated script that repeatedly went into the screen, doing something (varying combinations of input) and coming out.  We ran that script for hours and it never froze.  I told the tester to give up, it wasn’t reproducible.  Then the customer started to get letters from the user complaining that the system was unusable and, not surprisingly the customer put pressure on us.  Exasperated, I looked at all of the calls that this user had made to the helpdesk and he had made a lot of calls.  On examining each of the calls to do with freezing screens they all looked to be the same and no extra information, then on one call he mentioned a blank screen appearing.  My test expert in this area immediately said “I know what he is doing”, ran off to her test machine and recreated the problem immediately.  The problem was not that the screen was freezing, the screen was very much alive which is what had fooled us all.  The cause was that the user, instead of clicking on a hyperlink to select an item was clicking and dragging the item to another part of the screen (frames are wonderful!) and the browser had gone berserk.  Depending on what he clicked and where he dragged it to there were different results, a blank screen, a screen with hyperlinks on that wouldn’t work or the wrong screen.  My tester had raised a bug report on this over 2 years ago and it had been rejected as “will never happen in live” and the customer had agreed with that assessment and I had gone along with the decision.

 

Morals of the story. 

 

Don’t assume you know what a user means when recording a problem.

Don’t rely on a tool to recreate an intermittent problem. 

If the tool doesn’t show up the problem, don’t assume it doesn’t exist.

Bad decisions may come back to haunt you!


RSS feed

- Subscribe