Lesson to Learn - Part 3
In my second lessons learned, I mentioned there was a customer “problem” on a previously working system. Here is the story.
The system had been sized to cope with X number of input forms per day and performance and load testing showed that the system could more than cope with twice that number, so we were confident that there would be no performance problems. After a few weeks of live running, the customer complained that the throughput of forms was below what was expected and we needed to improve the performance of the system. I was surprised as the system performance monitors showed that the system was never busy and never ran out of any resources. I checked the logs and all looked fine, there was plenty of spare capacity. I checked the networks and they were running smoothly with no bottlenecks. I didn’t know what to do next.
The only conclusion I could come to was that there was something in the way the customer was using the system that was reducing the throughput of forms, so I went to the data input room and watched them input data for a couple of hours at the busiest time of the day. All seemed fine, the input clerks were going as fast as they could and they were keeping up with the system and the system was keeping up with them.
What next? I then went to the machines that were used for analysing the data on the forms, again at the busiest time of the day and all the operators there were going flat out, but again they were keeping up with the system and the system was keeping up with them. I completely foxed, everything seemed fine everywhere, so why was the throughput down?
As a last resort, I decided to follow the forms from delivery to the building right through to final analysis. This is what I found.
The forms arrived in a van on the ground floor at the Goods-in area. The forms were off loaded onto trolleys and sent in the lift to the 12th floor. There, the forms were taken off the trolleys, each one was booked in to a log, loaded back onto trolleys and the trolleys were then sent down a long corridor to the data input room where they were off loaded into piles for the data input clerks to process.
During the busy time of the day, the trolleys were working full blast, the forms were sent to the data input room as fast as they arrived. During the non-busy time, however, the porters would wait for a full trolley before sending it to the data input room. This could cause a delay of half an hour, an hour or even longer between trolley loads. The data input people were happy because to gave them a break.
All this meant that the throughput was delayed at certain times of the day and everything was idle. The customer had complained that the average throughput was too low, however it wasn’t, the throughput was below average at certain times of the day.
Lessons learned:
· The customer may not tell you the whole story
· The problem may be with the customer’s processes, not with the system
· Don’t always assume that the busy part of the day is the cause of the problem (yes, I know that is counter-intuituve)