Testing and Telephony
• Wednesday, September 9, 2009 - it might be time to change the way we do things here ...
The product I've been working on for the last 3 years has been a fairly small thing until very recently, and we've gotten pretty good at testing it. But all of a sudden we're adding Stuff that increases the testing load - the GUI can now be used with multiple browsers, not just IE on Windows. We've added some functionality and ease-of-use features that (1) have to be tested (2) have to behave consistently between the GUI and more-limited TUI interfaces (3) multiply the ways various other features can be invoked, which all need to produce the same result, and (4) make it very attractive to use a telephony feature that used to get very little use, so used to get fairly light testing.
The Sales people are delighted.
I foresee much angst over test duration estimates, the next time we do a major release and have to test the *entire* product.
I'm currently updating some test documents, with no urgency on the task, so I'm taking some time to think about whether some test cases are actually providing useful information. Anything that's there now certainly seemed like a good idea at the time it got written, but sometimes, later and more detailed understanding of what some area of the product is really doing makes me realize that a certain test isn't actually useful. If it isn't useful, it needs to go away.
I hope I get get my coworkers do do likewise as they update test docs, but I'm thinking that probably won't bring down the complete test execution time as much as we'll need to. I think we'll need to add some exploratory testing of the GUI (guided by knowlege of what sorts of things work in one browser and break in another) and some explicitly tracked combination testing, so that we aren't going numb and missing problems because we've run the same test so often.
I don't immediately see any way for additional automation to help ... must think about that some more, though. |
Comments (0) :: Permanent Link
|
• Tuesday, April 14, 2009 - Bugs are herd animals
Where there's one, there's likely to be more.
Especially if there's a state machine involved.
Especially if the state machine in question has been worked over and added to by many hands over several years. |
Comments (0) :: Permanent Link
|
• Friday, November 14, 2008 - Risk Assessment Out of Whack
Like many testers, I have a target product that works pretty well most of the time. Whether its a new release, or getting the product to work with new hardware, or adding new features, normal transactions start working pretty quickly and don't often break. (Not "never" break though, start believing *that* and you'll let something *really* embarassing get out to the field.) The interesting work is in the product corners, where legal but unusual interactions and assorted error conditions both internal and external produce all sorts of bad things. Error messages played to end-users, dropped calls, hung channels, process crashes, resource leaks that will eventually produce those other symptoms - I spend my days looking for them. I spend my creative energy thinking of new ways to drive my SUT into situations where something bad might happen. When a telephony problem gets reported from the field, I attempt to reproduce it, and if I'm able to repro it on-demand (or even "run this scenario and it will probably hit within an hour"), I have both immediate help for the fix-and-verify effort, and another item for the arsenal for beating up on my SUT.
But of course, most real-world use of my product has it running through the normal paths - that work just fine - over and over again. In the real world, it just doesn't get driven into the corners very often. Hundred, thousands, hundreds of thousands of phone calls may work just fine before before circumsmtances and timing combine "just so" to hit a bad place in the product. So I spend my working days hunting for low-probablity events, and I *find* them. (It's getting harder - I've found, and my developers have fixed, all the easy problems. But that leaves the fun ones ...)
Anyway, it distorts my gut-feel for how often a low-probability event is going to bite. I put (say) 40 thousand calls into a SUT for an overnight load run, and got 1 hit on some channel problem? That's a pretty low probablility. OK, well, that hypothetical 40K call scenario might equate to a couple of months of run time for a customer site, but I'm experiencing it as happening once a day. I am aware of the issue, and try to adjust for it when evaluating intermittent bugs. I've noticed that the effect does spill over into other parts of my life, most notably in trying to decide if I should take some health symptom to a doctor or not. I don't want to be a hypochondriac, but neither do I want to ignore something I shouldn't ... I have a tendency to err on the side of paranoia, not helped by the fact that my husband's swollen throat last year turned out to be cancer. (He *didn't* blow it off, and as such things go, he got off pretty lightly, and is healthy again.)
As a tester, I sometimes describe myself as a professional pessimist ... both professionally and personally, sometimes I need to watch out for not going overboard on that.
|
Comments (0) :: Permanent Link
|
• Wednesday, July 23, 2008 - Software Security is changing
A while ago I went to a security testing presentation, sponsored by the local (Greater Boston) ASQ chapter. I've been to several such over the past few years (starting around 2002, I think), and they all muddle together in my mind as "you really ought to do security testing for your software, here are some cool and frightening demos of what will happen if you don't". Well, the more recent ones include some suggestions for how you might go about it. (Plus the implication of "if you want it done right, hire my company, or at least buy my book :)
This presentation was different. The presenter talked about projects to gather data on security flaws - weaknesses in architecture, design, and implementation, and software attacks and attack patterns, and working on vulnerability theory, and protection schemes.
See http://makingsecuritymeasurable.mitre.org/ - there's a *lot* of stuff there.
Software security is growing up. When I first became aware that there was such a thing as security testing, it was pretty clearly something I did not know how to do - it needed a special mindset, and my mind just doesn't twist that way. But now, the people whose minds do twist that way seem to be starting to make sense of it, and write about it, in ways that will let people like me do useful security testing if/when I need to. Cool.
And there are tools - here's a list from April 2007, that should still be mostly useful: http://www.networksecurityjournal.com/features/open-source-security-tools-applications-resources-041007/
Looks like fun ... |
Comments (0) :: Permanent Link
|
• Tuesday, May 13, 2008 - Adventures with one of those elusive intermittent bugs
We took to saying the process had vaporized, because it exited unexpectedly leaving no useful information behind. It was repeatable, but not on demand - I would fire the load scenario and wait. There was nothing useful in the log files, even if I turned the log level up. There was no Dr Watson file (this is Windows 2003), and there should have been. And, I couldn't make it die if I ran the process with a debugger attached.
We had an awful time with this one - my developers were pulling their hair out, and my developers are *good*. They finally had to go to Microsoft support for advice, which eventually did the trick.
My secret identity is "Mostly-clueless-with-a-PC Girl", so I'd better make notes about this for future use. (So what am I doing working in a Microsoft shop? Well, the product is a conference bridge, and I do the telephony testing.)
(1) Make sure your entire process is subject to an exception handler, make sure there's an exception handler of last resort.
(2) The normal exception handlers are stack-based, therefore if your bug is trashing the stack, you may not get a Dr Watson file.
(3) Check your string handling and similar functions - might you be overwriting a buffer somewhere? That's a prime candidate for stack overwrite.
(4) Might you be throwing an exception within an exception handler? Repeatedly?
The advice from Microsoft support was to add a "vectored exception handler", which wrote the exception code and address to a file, then returned to commence the normal, stack-based exception handler.
Quote from explanation that got passed around: "The idea is that the vectored exception handler gives us a chance to log some data before the complex SEH unwind begins and possibly goes haywire. This API was added in XP/2K3, likely to provide a means to troubleshoot the scenario we are encountering. Since SEH is stack based it is susceptible to severe malfunctions."
This exposed the information that trouble starts with an access violation from a bad read (from still-under-development proprietary hardware), then additional exceptions get thrown during the attempt to handle the first one.
Whew! Useful information at last! Developers added defensive code for the bad read(s), and are investigating why it happens in the first place. |
Comments (0) :: Permanent Link
|
• Monday, May 5, 2008 - About time something like this showed up
Noticed in my local (Boston MA area) IEEE newsletter: a class offering, "Technical Writing for Technical Professionals". This is not "how to be a tech writer", but is intended to help working engineers, scientists, and other techies improve their writing of technical documents. Great idea, if it reaches even a few of the people who need it, the world will be a better place.
If this, or something similar, has been offered before, I haven't noticed it. I've waded through a lot of really poorly written technical prose over the years, so I'm all for it. A distressing number of my various developer colleagues seem to have believed that if they knew how to program, they didn't need to know how to write - and some of my tester colleagues seem to have believed that no one else was ever going to try to make sense of their test documents, so they didn't need to bother with writing stuff other people could read. Some of the worst offenders were native speakers of English.
A few years ago, I taught an in-house class for a previous employer that covered some of this type of thing - it was focused on reviewing specifications, but that's just the flip side of writing them. No way of knowing if it helped much, unfortunately, as my class pretty much coincided with the beginning of the "let's move real development work away from this campus" thing for that employer, and the numbers of specs being produced dropped off a lot. |
Comments (0) :: Permanent Link
|
• Tuesday, April 15, 2008 - Intermittent bugs: the really elusive kind
Once in a while, there's a bug that's really difficult to isolate.
Reproducible? Oh yes - but not on demand. The effect is really obvious (i.e., a process crashes), but it doesn't leave any useful traces behind - the core dump or Dr Watson file is consistently non-existent or corrupt, and the log files don't show anything significant. Turning up the log level either changes the timing so the bug doesn't hit, or doesn't add any useful information.
I haven't yet found a good way of dealing with these - the best I've been able to do is a slow and rather painful accumulation of peripheral information. What's going on on the SUT when the bug hits? What sort of problems can cause the process to exit without leaving tracks? (Buffer overflows are a possibility ... maybe down in very low-level software. And though BIOS or hardware problems are not the way to bet, they aren't out of the question.)
Some years ago, chasing one of these for a previous employer, the vital clue was provided by a coworker who noticed that the bug happened a specific time after a process failover - and that specific time delta matched a TCP timeout, which was an indication that some connection wasn't being cleaned up after the failover, which in turn gave the developer an idea of where to look.
So far, I've dealt with these only while a release was under development, never as a problem reported by a customer. Good thing - I don't think there's any way to solve them quickly, and mistakes due to working under pressure can send you down false trails for a long time. |
Comments (0) :: Permanent Link
|
• Thursday, April 10, 2008 - new toys (I mean tools!) are so much fun
| While the Hammers pound the new SUTs, looking for those intermittent bugs, I had occasion to help deal with a SIP problem that turned out to involve a large message overflowing a buffer. So I got to play with SIPp, which I'd downloaded a while ago but hadn't yet used for anything. Fun! It lets you build anything you like! (Which, of course, means you'd better build it correctly :) |
Comments (0) :: Permanent Link
|
|
|
|
|