Testing and Telephony
• Monday, October 10, 2011 - Carrier-grade, and not
Carrier-grade equipment is high reliability and high availability, and you just assume it is also high capacity, and high price to pay for all that. But there's another thing that goes along with carrier-grade equipment, and that's the customer staff that deals with it - typically experienced people with a lot of understanding of the technology and associated issues for that kind of gear.
Back in the early 90s, when I was getting my first introduction to systems being deployed into a telco's central offices, the provisioning interface for my product was none too pretty, and could have some problems if you handled it cluelessly. But that was OK, because the people using it were either technicans strictly following documented procedures, or senior staff who knew what they were doing. There would be a lot of detailed requirements for new products, but "easy to administer" just wasn't on the list. It didn't need to be, and customers wanted us to be spending engineering resources on making things good for the end-users. When we took a new product into customer labs for preliminary testing, they didn't just kick the tires, they gave it a good bashing, and gave it back with a detailed problem list - and then did the same for the next iteration. It was painful, but they helped us a lot. (There was one guy in particular who beat us up regularly, we feared him, and were grateful for the tough love.)
So in the way of technology development, systems originally developed for service providers and very large enterprises get downsized and made available for smaller and smaller enterprise networks, and that's a different ecosystem. The smaller the system, the more likely it is that the customer staff looking after it really needs it to be easy to administer. They may be very knowledgeable, but their attention is split among a lot of different equipment, and they don't have time to remember the quirks for any one thing. Or the whatever-it-is might be new technology to them, bought in hopes of fixing some specific local problem, so they don't know exactly how it needs to be set up.
When a company that has always made carrier-grade equipment starts going after that smaller-enterprise market, that's a cultural shift for Engineering. Now, some ease-of-use issues that Dev used to dismiss as "it's supposed to do that" become things that Test, and Support and Marketing, have to push back on as "no, really, guys, you gotta do something about that". Dev wants to do the right thing, but they're under pressure to get that code done, and they've become accustomed to thinking that ease-of-use doesn't matter all that much, so sometimes they need the reminder that "done" now includes a little more than it used to.
The in-progress cultural shift does make fitting in at a new company a little more complicated. I hope that being aware of it will help.
|
Comments (0) :: Permanent Link
|
• Thursday, September 22, 2011 - Time for another look (not now, but soon)
The SQGNE talk last week was Capers Jones presenting his current findings on the state of software quality practice. I've heard several versions of this talk over the years,and I'm always interested hearing in the latest one: this guy has data to back up his statements about what works and what doesn't. I love data! Here's the link: http://sqgne.org/presentations/2011-12/Jones-Sep-2011.pdf
Important reminders from that talk: "no single quality method is adequate by itself", and "Inspections + static analysis + testing > 97% efficient" - that's the combination with the biggest bang for the buck.
What it made me think about this year, helped along by a question from a coworker, was certification. One of the items on the "Poor Quality Results" slide was "Informal testing and uncertified test personnel" - Jones thinks certification for testers is a good thing.
OK, some people think certifications (for testers) are good, and other people think they are bad. Some 7 or 8 years ago I looked into getting an ASQ certification. I abandoned the idea when I read through the "body of knowledge" material for the first module, and found myself going "I disagree with that ... and with that ... and with that ...". But hey, things change over time, and we hope they change for the better. Sometime soon I should take another look at the ASQ certification and see if I'm happier with it now. It will be a while though, right now I'm getting acquainted with my new SUT, new tools, new company.
|
Comments (0) :: Permanent Link
|
• Monday, August 22, 2011 - Back to the carrier-grade world
I started my new job last week, and my new SUT is a Session Border Controller. I'm very pleased with this - I like grubbing around in network stuff. So now I'm getting re-acquainted with life in the carrier-grade world - ISO 9000 certification, with documented procedures for everything (that's needed and I'm ok with it), and a *big* lab with modern equipment and dedicated Lab Support - wheee!!!!
The testing mindset will be easier, I think - in my previous job, sometimes I had to remind myself that yes, that area is fragile, but customers don't push on that area so it isn't worthwhile expending more test resources on it, the product is explicitly not carrier-grade! Here, I can just assume that if I can show something is broken, it needs to get fixed (subject to priorities of course).
|
Comments (0) :: Permanent Link
|
• Wednesday, June 22, 2011 - Names Have Power
Recently I came across a reference to "BDD", and context indicated similarity to Test Driven Development, so I looked it up. It is Behavior Driven Development, and the short-form description appears to be "it is TTD with different naming conventions".
http://dannorth.net/introducing-bdd/
I can definitely see how this concept (and tools to support it) can be useful. I know how the names you use change the way you think about things, and the effects may be subtle, but subtle effects can add up. If your organization charges different prices to members and non-members, whether you call the difference "membership discount" or "non-member surcharge" reflects and reinforces different attitudes about membership. I used to study aikido (I was terrible at it, but I enjoyed it), and at my dojo, the person we practiced with was our "partner". If a new person or visitor said "opponent", they were corrected - politely, but automatically. I think it helped keep the practices safe.
I'm not going to do anything with this for a while, as I'm still getting acquainted with modern programming tools (like Eclipse) and want to continue on with what I've started for a while. But eventually I do want to try this other toolset. |
Comments (0) :: Permanent Link
|
• Tuesday, May 10, 2011 - WHEEE that was fun!
I just did a technical audition for a possible new position. I got a toy program in Java plus a single JUnit test case, and the assignment was to get this up in Eclipse, build some JUnit unit tests with 80% or better code coverage, and report bugs found.
I've never done anything at all with Java, let alone JUnit, or Eclipse. I haven't been a developer in over 20 years, and I think IDEs were getting started then, but I've never used one. (Well, I suppose Hammer Visual Basic could be considered an IDE, but it is *so* special-purpose ...) I downloaded Eclipse to my little Ubuntu laptop at home, and though getting acquainted with it and Java and JUnit on a too-small screen was kind of aggravating - WOW. I'm hooked. Programmer tools are dramatically better than they used to be (duh!), which I already knew in a general kind of way, plus I'd gotten a clue from the bit of getting acquainted with Python that I've done recently. All this support stuff done for me that I used to have to do myself, or do without (like code coverage information - I found EclEmma the Eclipse code coverage plugin), and it's Free Open Source Software! FOSS rules!
I had a blast with that assignment. I'd forgotten how much *fun* it is to sling code. I've missed it, and I want it back. Fortunately, just about all of the QA/test job openings I've seen want automation/programming skills.
I've been writing scripts for my telephony test tools all along, why hasn't that satisfied my desire to code? Well, it does when I'm figuring out how to do something new, but I haven't gotten much of that lately. Most of the script work I do now consists of making minor tweaks to an exisitng script. The results can be satisfying, especially when I succeed in reproducing a problem found at a customer site that has my developer baffled, but they rarely scratch the coding-itch. |
Comments (0) :: Permanent Link
|
• Monday, January 10, 2011 - Getting my developer chops back, part 1
Devoted though I am to the joy of watching a SUT I'm targeting go smash ... building things is fun too, and building Hammer and sipp scripts is just not very interesting any more, as it's been a long time since I needed to do anything challenging with either of them. (That could change any time for sipp, but so far it hasn't.)
So I've decided that I ought to work on my long-neglected programming skills. I took a discarded laptop from the junk pile at work (offered to pay for it, which offer was declined), put Ubuntu Lucid Lynx on it, put some more memory in it, bought a couple of Python books, and started getting acquainted with Python.
Discoveries so far:
Four and a half years of working in a Windows shop have degraded my Unix command-line skills, though I find that my fingers on the keys remember more than my conscious mind does. It's coming back ...
I'm still getting used to its idioms, but already I like Python a lot. Not surprising, since it can be used either for functional or object-oriented programming, and though I've never done object-oriented before (1) Modula-2 is sort of halfway there, it has information hiding but not inheritance, and I was on the losing side of the C versus Modula-2 holy war and (2) in the early 90s I did a 1-day tutorial at a testing conference, meant to explain object-oriented programming concepts to testers, and by the midmorning break I was going "oh, it's frame theory, that's pretty cool". I was taught about frame theory when I was studying Artificial Intelligence as an MIT undergraduate, and at the time I thought it was interesting but had no idea it was going to turn out to be practical. (But I don't think that long-ago problem set counts as having done object-oriented programming.)
Programming tools have improved a lot since I was a developer. Python offers a lot of modules to provide functions I would have had to write myself back then, and editing code with an IDE is much easier than using vi for that purpose.
Fun! |
Comments (0) :: Permanent Link
|
• Thursday, December 23, 2010 - Another point about load testing
If normal operation for your application is handling a lot of transactions at the same time (few dozen? many thousands?), at least some functional testing ought to get done while that is happening. Not all of it, and maybe even not most of it, it depends on how likely you think it might be that there'd be problems with stuff like resource contention. But at some point you probably need to evaluate the user experience in as realistic a setting as possible, and that includes the system workload.
For me at present, this is mostly covered by means of the "conference party". The product is a conference bridge with both audio and web support, and when we think the functionality in a release is pretty much working, we schedule some sessions where everyone available gets into a conference on a test system, and we do the sorts of things people do in a conference - talk, look at powerpoint presentations, do polls, mute lines, all that stuff. We're a very small company, so even if most employees are in on the conference, it is less than a span's worth of ports. I connect some Hammer spans and have a call load running during the conference party, and everyone is alert for voice quality and response time issues, as well as for anything that doesn't work as expected. It helps us know that everything really is all working together. |
Comments (0) :: Permanent Link
|
• Friday, December 17, 2010 - Load Testing: start as early as you can
I get the impression that a lot of companies think that load testing is something that gets done fairly late in the development cycle. I'm a longtime believer in load testing early and often. If your product's normal operating mode is carrying dozens or thousands or millions of transactions at a time, you really ought to make it do that as early as possible. Some or many things might not work, but at least a few basic transactions ought to be working end-to-end very early on. (And if not, why not?) Over the years, I've gotten a lot of useful results from some very simplistic load scenarios.
I particularly remember one incident in my past when I connected a few spans of bulk call generator to a voicemail server very early in the release cycle and fired up a simple incoming-call load scenario at, um, I think somewhere between 8,000 and 12,000 calls per hour, it was more than a maintenance window load but still pretty light, well below the 60,000 calls per hour the system was advertised to support (a quite respectable call volume for its day). I watched the logs for a little while, and then let out a virtual scream to Development for help, because that nearly-trivial scenario was running just fine EXCEPT that the system was in load limiting (dropping calls due to lack of resources). An experienced senior developer took a look and discovered that the application was making vastly more database accesses than it should have - several recently-hired developers had been working on the application, and hadn't realized what data needed by a call would already be in cache by the time the call reached their code, so they were making database accesses they didn't need. The code got cleaned up, the hole in those developers knowlege got repaired, and we continued with the release with *that* complication out of the way. |
Comments (0) :: Permanent Link
|
• Tuesday, November 16, 2010 - interesting talk last week
Last week I went to the SQGNE talk, which was Capers Jones presenting his stats on software quality. He's been collecting data since 1984, and he's got a lot of it. The slides from the presentation are here: http://www.sqgne.org/presentations/2010-11/Jones-Nov-2010.pdf
There's a lot of interesting stuff in there, but 2 things in particular stuck in my mind.
First, while discussing static analysis as a defect removal tool, he said it was very effective, and generally finds different errors than testers do (which I knew), "and it's free". Huh? When I was last paying attention, which admittedly was quite a few years ago, the tools were expensive. Well, sometimes it's free - apparently it is used extensively in the open source communities, so there are open source static analysis tools for the popular open source development languages. Yay for open source!
Second, he noted "bad test cases" as a "common and troublesome" originator of software defects. Not on what I think of as the classic list (requirements, design, coding errors, bad fixes), but oh yes, I can see that. Test cases written by someone who doesn't understand what they're testing, that don't actually do anything to the feature they're supposed to exercise, that were originally good but didn't change when the target functionality did, that actually do expose a problem but the tester doesn't know how to look for it ... As a tester I'm supposed to be part of the solution, and it is good to be reminded that I can also contribute to the problem. |
Comments (0) :: Permanent Link
|
• Tuesday, November 9, 2010 - Guessing is no substitute for real information
I have been puzzled by the observation that my load scenarios indicate that my product has some problems in a particular area, but Support says they almost never get any complaints related to that - and the few complaints that do appear are almost always traceable to something fixable in the customer environment. Now, my load scenarios drive my lab systems much, much harder than the load any real customer puts on their systems, even at busy hour. I was finally able to run some experiments with the load backed off to a realistic busy hour, and hey! those problems disappeared. So though that area is Not Right, it probably hasn't/won't clobber us in the field. That is actually what I thought was going on, but I was quite relieved to have some data to back it up.
I don't like it that I see those problems, but it is pretty clear that going after them is not the right thing to do. Fixing that stuff would be difficult and expensive, it is apparently not actually affecting anyone but me, and our product is explicitly not carrier-grade. (You need carrier-grade? We'll be happy to refer you to a reseller of our parent company's equipment ...)
There is, of course, that "apparently". For the most part our customers seem pretty pleased with our product - they pay for annual maintenance, they buy more ports, they buy more systems, etc. But like every vendor, occasionally we lose a customer, and very occasionally we don't find out why. So there's that occasional nagging thought that these problems *might* actually be hitting someone in the field.
Problems reported by customers are indications of what parts of the product need attention. When those complaints come in, we can fix the problems as reported, and maybe do more or different testing in that area to shake out other problems before customers find them. But for whatever reason, some customers don't complain when stuff doesn't work to their satisfaction. Please complain! We need that guidance! |
Comments (0) :: Permanent Link
|
• Monday, November 1, 2010 - That took too long
A customer reported a problem with their system, which (1) uses SIP and (2) connects to a PBX that does something legal but a bit obscure. So I needed a SIPp script to reproduce it ... I hadn't an done an outdial catcher with SIPp before, but that shouldn't take too long, right?
Wrong. Our system doesn't listen for SIP on the same port it sends SIP from, which is perfectly legal, but means that SIPp has to be told what port to send on. No problem, there's an action for that. Only it doesn't work in either of the 2 Windows ports of SIPp that are available.
OK fine, am I a geek or not? IT found me an old but working laptop in the junk pile. I put Ubuntu server on it (server so it would fit on the small disk, and besides, I'm quite happy to use the Unix/Linux command line) and got SIPp installed. This let me get my script working and repro the problem with both our logging turned up and a Wireshare trace going. And once we could see the problem, my developer gave me a fix in 5 minutes.
It took too long, but the results were satisfying. |
Comments (0) :: Permanent Link
|
• Saturday, August 28, 2010 - I love test tools that don't hold my hand
Sometimes I just want to run a simple call quickly, and then Hammer's Testbuilder interface is great - just click on the actions needed, and it takes care of all the details, correctly, for you. Other times, I need to control the signalling details - the exact millisecond when a bit gets asserted on a CAS line, particular values in some of the more obscure ISDN SETUP IEs, forcing a release collision, that sort of thing. The low-level scripting features of Hammer's HVB rarely let me down, but every once in a while, I can get pretty close to what I want, but not exactly. The Hammer is a splendid voice-app test tool, it is not a protocol tester. But it's what I have, and generally I can do a pretty good job of making it fake being a protocol tester when I have to.
SIP and related protocols are a lot more complicated, and the SIP world being what is is, every so often one has to deal with interop problems. We've been trying to figure out and fix a problem our SIP implementation has in an environment (equipment plus configuration) we've never encountered before. So, as my developer fed me information about exactly what mattered in the relevant messages, I was able to use SIPp to turn my desktop system into a simulation of that new environment. The Record-Route headers need to look like *that*? We don't have any equipment in-house that supports that, but it doesn't matter, with SIPp I can fake it, SIPp doesn't care that those IP addresses don't exist. So my developer can test his fixes in house instead of having to do experiments in the customer's network, and we are all much happier because of it. I can build any message I like with SIPp because the scripts are XML, if the syntax is legal, it doesn't care about the content. Making a message your target system can do something useful with is your lookout.
SIPp is my friend.
It doesn't hurt that it's free.
|
Comments (0) :: Permanent Link
|
• Tuesday, July 13, 2010 - It does WHAT? My SUT can still surprise me
| I configured my SUT for SIP, and made sure that it still registers correctly with my lab SIP proxy - it's one of those things that I don't test all that often because that area of code just doesn't get messed with, and having it on or off really doesn't affect my test scenarios, but I do test it occasionally because some customers do use it. And then I also turned on a mode in the product that we support, but no customers use, because after all, some day some customer might want it, and after we went to all that trouble to implement it, it is worth the occasional brief test just to make sure nothing has died horribly. And I discovered ... that combination causes some additional resources to be registered. I was not expecting the SUT to do that, but OK, that actually is proper behavior, only the registrations are not expiring correctly, and not all of those additional resources that should be, are actually being registered. So I went looking into the source code, and discovered that the implementation (which happened before I came to work here) had not been finished - there were nice prominent TO DO labels on comments near the relevant code, discussing the missing functionality. No one has ever cared about this, for several years now ... well, some day someone might care, so I put in a bug for it, explaining what I'd found. I'm pretty sure I'll get some warning if we ever do get a customer wanting to use this mode, and a good thing too, because it really ought to have some extra testing before it gets used for real. |
Comments (0) :: Permanent Link
|
• Tuesday, July 6, 2010 - Here we go again
Regression testing is more interesting when I know I'm going to find problems.
We're working a on a new release, and there are several sources of destabilization: a new version of the operating system potentially affects timing-sensitive stuff, a new platform with a faster processor definitely puts timing-sensitive stuff at risk, and we've integrated a big chunk of new functionality that was off on its own branch for a long time, with the expected merge difficulties. The basic happy-path functionalities are mostly working just fine, and the exceptions have pretty straightforward problems with straightforward fixes.
Around the edges, well, there are some problems. In particular, I've had a couple of hits on a very-intermittent D channel outage that I thought was fixed, many months ago. This is worrying. Is this the same problem, and it wasn't really all the way fixed before? Or something else that just happens to produce the same symptom? So far I've been testing mostly on the new platformm - is the problem on the new platform only, or will it also hit the older hardware? Also, I've uncovered one of those annoying "the channels don't all cleanup properly" problems when I tear down a conference with a lot of connections and a particular combination of features in use, which my developer says is going to be a major pain to fix. I'm sure I haven't found all the weirdnesses yet, and so far I've barely touched that new mode of behavior that got added, because we need to make sure the standard product will be shippable first. |
Comments (0) :: Permanent Link
|
• Friday, January 8, 2010 - Not *that* Hammer :)
At work, my default binding for the word "hammer" is "telephony test tool made by Empirix", and if I say something like "let me hit that with a hammer", then "it" is a telephony product of some sort and I mean that I'm going to run some test calls into it.
But yesterday, I was changing some hardware in a rack-mounted unit, and the slider got stuck. So I picked up a hammer (the hand tool) and whacked the slider until it moved again.
I thought it was funny. My husband would say that means I ought to get out more. |
Comments (0) :: Permanent Link
|
• Wednesday, December 23, 2009 - it does WHAT?
My company is currently working on making our product compliant with some US government specifications. This is an interesting excercise ... there are pages and pages and pages of specifications, and some of them are explicit and unambiguoius. But lots of others don't apply to our product, and others do, but seem to assume a different architecture.
So I was very happy when my developers came back from the preliminary network integration testing with an actual acceptance test document, with test cases and expected results. Yay! Of course, the terminology is a little confusing, so I started rewriting it to make a parallel document that follows our own conventions, that will make it easier for all of us here to follow.
And I was rewriting one particular test case, and came crashing to a halt, just staring at the expected results ... it's supposed to do WHAT? I'd been mildly wondering why a certain possible call scenario hadn't been mentioned before, and there it was - but "refuse the call" was not a result I was expecting. That's not what our product currently does with that situation. Clearly, the target environment has some very specific assumptions about how this feature is going to be used, and in this case at least, it's not what we were expecting. We don't understand that target environment. I wonder what other assumptions we're missing? |
Comments (0) :: Permanent Link
|
• Monday, December 7, 2009 - a good new load scenario
A while ago, we had a new customer move into the "most intensive known user of our product" position. To help make sure they'd be happy with their purchase, I made a new load scenario. This one models the new customer's usage patterns, with test compression so that (1) I push the SUT up near its resource limits, so I drive the system far harder than the customer ever will and (2) 2 hours of my load scenario is approximately the same amount of traffic as a normal working day for this customer. So, a weekend run of this scenario gives me the equivalent of about 6 weeks worth of traffic on this customer's installation.
I won't rerun this scenario too often - this customer's usage patterns are a bit unusual - but I'm glad that I have it canned, and that it appears to be a reasonable model of this customer's usage. I hope to build more scenarios that I can reasonably name for particular customers (or classes of customers). I've got no objection to generic tests, but there's a certain warm feeling I get from knowing that a particular scenario covers behavior that I *know* specific users value. |
Comments (0) :: Permanent Link
|
• Wednesday, October 14, 2009 - Intermittent bug NAILED!
Hah! Mysterious intermittent D channel failure was finally traced to a driver problem and FIXED!
I couldn't make it happen very often, even when driving the most likely scenario pretty much continuously for several weeks of a fortuitous lull in other testing needs for the SUT (of course, it needed the *big* SUT) and the Hammer. And it took a while to convince my developer that no, it wasn't a problem with the test equipment - quite understandably, the developer has more confidence in what's under his control, and less confidence in my test equipment, than I do. But I finally collected enough clues to let him know where to look, and he found some a problem that would plausibly account for both the behavior and its low probablility of happening.
This gives me great satisfaction. |
Comments (1) :: Permanent Link
|
• Tuesday, September 22, 2009 - not my cup of tea
It’s a bit slow in telephony-testing-land, I’ve got long-duration load tests in progress but don’t have anything going on that needs interaction from me. And there’s some GUI stuff that’s overloading the people who normally do it, and more pressure than usual to make the currently promised ship date. So I’m helping with some GUI testing. Fortunately I have both a test plan to follow (so I know what sorts of things to try) and a test oracle to consult (I ‘m checking something that got re-implemented, so I can compare behavior to the original).
Pro:
I have stuff to do, which is good, because “bored Meredith” is a bad thing.
I’m getting exposed to parts of our product that normally I have very little contact with. I’ve already had a few “that’s kinda cool, I didn’t know it could do that” moments.
Con:
My usefulness is limited. I can and have found discrepancies, such as features that work fine with one browser, but show no data if tried with a different one. But I don’t know what sorts of things developers of this stuff tend to get wrong, so I don’t know where to go exploring for problems. And sometimes I retest something, and I think I’m running the same test, but I get different results – I’m sure I’m doing something different, but I don’t know what it is, or where to look for clues, so my bug descriptions aren’t as good as they might be.
Well, the telephony work should pick up again soon.
|
Comments (0) :: Permanent Link
|
• Wednesday, September 9, 2009 - it might be time to change the way we do things here ...
The product I've been working on for the last 3 years has been a fairly small thing until very recently, and we've gotten pretty good at testing it. But all of a sudden we're adding Stuff that increases the testing load - the GUI can now be used with multiple browsers, not just IE on Windows. We've added some functionality and ease-of-use features that (1) have to be tested (2) have to behave consistently between the GUI and more-limited TUI interfaces (3) multiply the ways various other features can be invoked, which all need to produce the same result, and (4) make it very attractive to use a telephony feature that used to get very little use, so used to get fairly light testing.
The Sales people are delighted.
I foresee much angst over test duration estimates, the next time we do a major release and have to test the *entire* product.
I'm currently updating some test documents, with no urgency on the task, so I'm taking some time to think about whether some test cases are actually providing useful information. Anything that's there now certainly seemed like a good idea at the time it got written, but sometimes, later and more detailed understanding of what some area of the product is really doing makes me realize that a certain test isn't actually useful. If it isn't useful, it needs to go away.
I hope I get get my coworkers do do likewise as they update test docs, but I'm thinking that probably won't bring down the complete test execution time as much as we'll need to. I think we'll need to add some exploratory testing of the GUI (guided by knowlege of what sorts of things work in one browser and break in another) and some explicitly tracked combination testing, so that we aren't going numb and missing problems because we've run the same test so often.
I don't immediately see any way for additional automation to help ... must think about that some more, though. |
Comments (0) :: Permanent Link
|
|
|
|
|