Systems & Software Talk 

Case Study of Early (actually, really late) Application Performance Issue

04:41, 2007-Feb-8  ..  Posted in Performance Testing  ..  0 comments  ..  Link

CONTEXT and SENSITIZATION

Let us figure out where we are in the development life cycle and get sensitized:

  • The application is purported as fully developed and is being tested by those who operate with black boxes.
  • Those people who typically operate with glass or white boxes were way too busy trying to survive the transformation of the original 1.5-year project estimate into the marketing-driven ship date of a year earlier than the estimated ship date.
  • The application has been through several test evolutions and is being tested on a staging or pre-production system.
  • You haven’t yet thought about retaining an attorney.
  • The National Guard has not yet been called out.
  • A certain "famous" tart who uses her toddler as an automobile’s steering-wheel mounted safety airbag, still has not learned to sing.
  • The project team has no clue about how loud and often the customer will scream when this application transforms their otherwise screaming computers into molasses at minus 40° F, or minus 40° C.
  • The project manager apologetically approached you and asked you if you would conduct performance testing on this application as it was due to go live in one week.
  • Much like the coaches budgeting and using timeouts in an NFL game, the executives have by now used up all their typical statements:
    1. This is not rocket science.
    2. Any monkey can do this.
    3. Why didn’t we think of this sooner? (It is a good sign that they are yet using "we" instead of "you".)
  • Project team members still talk cordially to each other and look at each other as team members even though a bit of skepticism is blurring their vision and their tongues have a mild coating of cynicism.
  • Management has not yet sharpened or tuned one of these -->C  and visually overlaid team members with this:

Courtesy of: www.dartboards.com Winmau Blade IIITM – a trademark of Winmau Dartboard Company, LTD.

We now know who you are and we have an approximation of where we are (below) at in the development life cycle. An old Carole King song is repeating in your head as you listen to the pleas of the project manager.

You just happen to be in the ATF Zone (All-Too-Familiar)! While this is not about the real ATF - Alcohol, Tobacco, and Firearms, this situation and your location in the development life cycle can be just as dramatic and intriguing, and – any one or more of those items might come in handy!

' Would it be appropriate at this time to call your attorney?

If you are not yet sobbing uncontrollably, grab some Kleenex and read on. (Kleenex is a trademark of the Kimberly-Clark Corporation)

 I (I is you!) Agreed To Help The Project Manager (Oh Dear)

Of course you did! That is your job. You are a performer! What you work on may not be a performer, but you are!

What You Agreed To

While you are not an ordained minister, you agreed to bless the application – within the required timelines. (What were you thinking???) You also mumbled that blessings might not be appropriate. The PM’s eyeball rolling was an acknowledgement to your mumbling.

Ready, Set, Go!

  1. Risk notebook ready for input.
  2. Caveat recipes access – enabled.
  3. Reflecting sunglasses ready for encounters with project non-friendlies where eye contact with the host’s searing laser-emitting pupils may blind you.
  4. Fact-finding map at the ready.
  5. Popular IT-clichés/questions counters below reset:
  • Kick-the-tires counter = 0
  • Circle-the-wagons counter = 0
  • Well what are the industry standard page load specs counter = 0

Your Research Revealed

  • That the only capacity planning information available on the project was on napkins – napkins marked "Dimpy’s Bar & Grille". Perhaps the napkin user was testing his or her capacity for alcohol. There were no dates or version markings. The ink had coalesced with the cheeseburger grease wiped from the consumer’s mouth. You could still make out the partial words "pe..orm" and "b.tt.r."
  • That there were no performance specifications. You dig deep into bio-RAM and find an appropriate melodyric, and alter it to fit the occasion. "I don’t care what they say, I won’t stay in a world without specs." Thank you Peter & Gordon.

Melodyric (měl'ə-dĭr'ĭk) Combination word that means melody and lyric. Remember! IT people are notorious for inventing new words. Should you or I be any different?

  • That there were no service-level agreements with the customer base.
  • That swimming upstream into the development life cycle to get any information about performance proved futile. Architects, BAs, SAs, DBAs, and Developers – all looked at you as if you had asked them to join your multi-level marketing company down-line.
  • There were no performance measurements made anywhere upstream, therefore there was no factual evidence to indicate how this baby may or may not perform. No one seemed to be the least bit concerned except the PM.
  • That the people within the company trained in CPR were truly trained as they had to resuscitate you from near drowning in Lake Sorrow at least four times.

What Have You Accomplished Thus Far – In Parallel With The Above Research?

You have:

  1. Provided a performance test specification capture document to the requestor.
  2. Requested and received an approximate concurrently executing user count that clearly indicates the need for a simulation tool such as QALoad or LoadRunner.
  3. Discovered that all simulated users will need to be authenticated against an LDAP server.
  4. Arranged for a Proof-Of-Concept recording session with the appropriate team member to determine if your performance test tool is a fit.
  5. Acquainted yourself with the application a bit after you acquired power-user credentials. This experience made you wonder if this app was running on a 750 KHz 8-bit microprocessor with 4 KB RAM. ß Big clue!
  6. Studied system and network architectural diagrams.
  7. Made the appropriate requests to have your monitoring account setup on servers so your tool could pull server health counters.
  8. Discovered that the database will be sized as production and populated with production data and that you need not be concerned with SOX or any other data confidentiality issues. (Boy are we making this easy!)
  9. Updated and circulated your resume’.

Your Performance Testing Toolkit Consists of:

  1. LoadRunner/PerformanceCenter
  2. More than one controller
  3. Many thousands of Virtual User Licenses
  4. A load farm of tens of servers
  5. OpenSta
  6. Opnet’s IT-Guru/ACE
  7. VBScript Utilities:
  8. Reusable code for making ODBC connections and queries.
  9. LoadRunner Vuser log file parsers
  10. You have access to HP-OpenView

What Will You Do?

Reminder:

  • You need to make a statement about the performance of this application and architecture. You have less than a week to accomplish something that ordinarily takes about three weeks at a minimum.

Oh Wait! Some Other Interesting Developments Or Wrenches As It Were

  1. Your Proof-of-Concept recording of the application with LoadRunner’s Vugen exposed a proprietary protocol bundled in octet streams. You will not be able to correlate or parameterize data. The 3rd party developer of this application protocol is rightfully unwilling to expose the decode methods.
  2. You even tried Winsock. You were unsuccessful.
  3. OpenSta will not work either as a result of the above developments.

What would you do?

Your thoughts will be compiled into a checklist along with the methods/techniques actually used in this case. In March 2007. I will blog the techniques used to discover a major bottleneck.



Is Your Load-Generating Capability A Bottleneck?

03:16, 2006-Apr-20  ..  Posted in Performance Testing  ..  0 comments  ..  Link

How often do performance tests occur where the tool itself or the load generating equipment, or the network upon which the equipment resides, is in fact the bottleneck? I think if anyone knew the answer, we might be surprised at that answer. I would suggest that once is too much. I would also suggest that designing a load farm requires sizing and capacity planning. One never hopes to deliver false performance results to a customer. False results can create a huge expense; sending people scrambling to chase production performance issues that went undetected in performance testing, due to inability to impart proper load. The latter alone underscores the importance due load farm design.

How does one go about designing a load farm? I offer up general ideas only, given the amount of text required to deliver a comprehensive and robust answer. Many ideas can be gathered via a web search for sizing, capacity planning, and/or any other qualifying terms needed to narrow the result set. Additionally, one can get some good ideas right here at QA forums. Here is a link to a rather fruitful discussion about this topic:

http://www.qaforums.com/ultimatebb.php?ubb=get_topic;f=6;t=000920

The tool vendors can also be helpful. If a tool purchase from a vendor is imminent, they can be really really really really helpful!

Now comes the hard part! Telling your boss you need ten dual-core ZigaGigaTriga Hertz-clock speedy Intel ** SeePeeYous with 20 PetaBytes of RAM and quad-GB NICs - on a SuperDuper DS3 network with liquid nitro-cooled wrap, will probably be career-limiting and earn you a pink slip. If you are a person of the right gender or a cross-dresser, you might be okay with a pink slip. If not:

  1. take a sales course at your local community college before you go to the boss.
  2. have your resume' current and in circulation

** Why did they not ever spell "intel" out completely? 



Performance Test Estimation

07:30, 2006-Apr-11  ..  Posted in Performance Testing  ..  0 comments  ..  Link

This entry is adapted from a QA Forums discussion at http://www.qaforums.com/ultimatebb.php?ubb=get_topic;f=2;t=002092

Estimation for performance testing is at best, imprecise and challenging.
Assumptions made for the below estimating ideas:
1) Performance Testing is being added late in the development cycle. Performance Testing in this context consists of developing simulated user scripts using simulation tools, where the scripts are intended to apply a defined load against a complete system architecture.
2) The testing is not a repeat using existing test assets.
3) Other roles will be required to either complete or assist with tasks derived from the below outline.
4) The system-under-test exists - or will by the time one is ready to test, and/or an environment is available for simulated user script development.
It is best to have a project planning tool like MS-Project to create an adjustable/adaptable template. This template should address some basics:
1) Project Management / Administration of the performance test project.
1.1) Status reporting, issues reporting
1.2) Scheduling, availability issue management, etc.
2) Performance Test Requirements/Specifications development and reviews.
3) Architectural Analyses of the system-under-test.
4) Security/Firewall/Proxy hurdles handling. Certificates, 3rd party components, etc.
5) Monitoring requirements capture
6) Performance Test Plan development - if applicable. If your specification process is robust enough, a plan can perhaps be omitted.
7) (THIS IS USUALLY THE MOST TIME-CONSUMING TASK)Test data needs identification and creation, or readiness of existing data.
7.1) Data management, security, backup, restore, etc.
NOTES! Some projects may find it less costly to use the performance test tool to create data. Some projects require data scrubbing.
8) Script Development * n scripts
9) Scenario Development * n scenarios
10) Test execution * n executions
11) Results Analyses and Reporting * n test executions
12) Post-project administration
(again - the above is high-level and does not cover all)
Notes:
Protocol complexity, script complexity, skill-sets, unforeseen tools issues**, and so on - can all impact the schedule duration. I haven't covered all, but this should get you going.

** I know of no tool without defects. Some of them are crippling.
Guidelines:
If one estimates a project at less than 120 project hours, and that project requires at least four to eight scripts of moderate complexity - one should look over your numbers again!
Are there ways to compromise? Yes. The compromises go in the RISKS section.



No Customer Will Ever Do That!

09:20, 2006-Mar-25  ..  Posted in Performance Testing  ..  1 comments  ..  Link

Probable MPAA Rating of PG-13 because a street synonym for excrement is used.

I have to believe that this blog-title provides common ground for many of us in this business. I think it binds us all together in uncontrollable ROTFL.

Who are the people uttering such a statement? Are these people smarter than we are? Is Lex Luthor secretly populating the planet with a breed of marketing people who have been taught to contribute one and only one thing to a design review meeting?

People at QA Forums inquire often about Tester-to-Developer ratio. I often wonder about the ratio of Title-Utterance  TO  Customer-Actually-Did-Do-That.  My own experience suggests it is 1 : >10. What does your own data tell you?

ABOUT THIS ENTRY

This explores a missed opportunity at design time to prevent a production performance issue. It also illustrates the potential cost savings against the background of actual costs in terms of immediate losses.

CREDITS

Much credit has to go to those people who fought long and hard for cultural change, some of which resulted in getting me into a design review meeting to begin with, where simian representatives were not normally included. "Help me Dr. Zira!"

SETTING

Come join me in a past design review of a motherboard for a factory-hardened Programmable Logic Controller (PLC). The seven characters were indeed characters. Their eyes extruded resentment at me, "What is a lowly tester doing here in the midst of some of the greatest brains this company has to offer?" "Shouldn’t you be off eating bananas?" "If there isn’t duct tape across your labrum, there should be."

CONTEXT MOTHER

… board that is. This design review was intended to approve a motherboard pilot product. Of major concern was the Zilog Z80 allocated to handle all five communications channels. Two channels were intended to support Ethernet. One channel was intended to drive a proprietary high-speed communications channel. One channel was for RS-232 and the last channel was intended to support RS-488. Someone indicated that the shortage of real estate on the motherboard was the primary reason for selection of the smaller 4-MHz Z80, over the large space-gobbling 25-MHz 68020. It was at that moment that I closed my eyes to see lots of little flashes of blue light resulting from increased neural transmitter activity. Or was that glow the glow of a waning Vodka presence from the night before? Anyway, I opened my eyes and the duct tape loosened its grip on my lips. I stammered a bit and said, "I don’t think a Z80 will handle all these channels at full-demand." I walked to the white-board with some of my scratchy hand calculations, thinking, "Hey, I could be a doctor writing prescriptions!" All the eyes in the room tried to laser pain to my legs in order to prevent me from arriving at the white-board. I made it! I transferred my papered prescription for performance disaster to the white-board and walked those skeptical sets of eyes through the calculations. Two techie designer heads bobbed while trying to remain inconspicuous to the person with the power – Mr. Magisterial Marketer!  Referring to driving all comm. channels full-bore and to no one's surprise, Mr. Magisterial Marketer exclaimed, "No customer will ever do that!"

LACUNA TO CALCULATE

  • Average cost burden (not to be confused with pay) to company = $80/hour
  • Duration of meeting = 1.5 hours
  • Attendees = 7
  • Cost to change board = $3,500.00, estimated.
  • Not counting the cost of 68020 vs. Z80 = 150% of Z80. This change would have cost roughly $4,400.00. One of course would need to calculate long term costs of the 68020 and reconcile that against projected sales; and blah blah blah. It might be that one would discover that to be a moot point after reading this.
  • Other costs and data:
    • Duct tape = $1.50
    • Number of people wishing I was off eating bananas = 6.
    • Cost of me eating a pound of bananas in those days = $0.17.

MONTHS LATER

A customer from down under was feeling down under apparently. This customer called one of our executives and gave some background information and then said (as the story goes), "Get this shit out of here and get me something that works!" Do you need to know what this customer was talking about? Probably not! Mr. Magisterial Marketer obviously forgot to include release notes that instructed the customer that despite having 5-comm channels, one could not rely on them to perform when most needed.

I will abandon the story at this point and move on to costs.

PAUSE TO CALCULATE

  • Palliate-the-Customer Entourage Travel Expenses = > $100,000.00
  • Repair/Replacement/workaround costs = ~$20,000.00

COST SUMMARY (Concept courtesy of Master Card)

  • Cost to repair the performance issue at design time = $4,400.00
  • Cost of post-production performance issue (not counting negative word-of-mouth long-term impacts) = > $120,000.00
  • The look on Mr. Magisterial Marketer’s face when I would later encounter him and say to him, "Sixty eight Oh Twenty!" = Priceless!

LESSONS LEARNED

  • Despite having factual evidence, improper decisions can still prevail.
  • The almighty dollar prevails over common sense and factual data.
  • The statement ""No Customer Will Ever Do That!" carries clout when spoken by a marketing person.
  • One can do "Performance Testing" early on. In this case, "Performance Testing" was executed on paper.

JAKE YOU TALK ABOUT CALCULATIONS AND DON’T TELL US HOW

What’s up with that?

I could spend the next four pages summarizing. I will keep it well under that with an overview, and within context. Some of this of course is telling you what time it is with your own watch! Bare-the-facts with me!

  • A CPU uses clock cycles to fetch and execute instructions.
  • Not all instructions carry equal weight in terms of clock cycles required.
  • The instructions I identified used from one to four clock cycles per.

In this case, I had done some prep-work in advance. I acquired some open source Z80 assembler code that was used to empty an inbound communications buffer. I walked one byte through the code in order to identify both the count and the instructions used. From this I added up the clock cycles. I made an assumption that the outbound transmit logic would be roughly the same. My numbers were thus doubled. I calculated these out for each communication channel and arrived at a figure in a specific unit of time (required clock-cycles per second) that suggested at least four Z80s at 20-MHz or a screaming 68020 at 25-MHz. The party was not over at this point as I had to determine the communication interrupts impact on the main engines. I no longer remember or care to remember those details. Anyway, I concluded that the main engines were capable of handling the potential interrupt rate.

RELEVANT DATA

Not all assemblers are equal in terms of how many instructions are actually assembled at "assemble" time.

Bus speeds and on-board caching may have an impact on similar calculations or create latency. To what extent if any, I am no longer in tune with this low-level world – by choice.

WHY WAS THE CUSTOMER MAD?

The PLC could not keep up with the network demand interrupts to process like items flying by on a conveyor belt. OUCH!

DID YOU EVER TEST THIS PRIOR TO RELEASE?

Did I at some point actually test this prior to it going out the door? Yes. It behaved as I expected - poorly. I filed the appropriate high-severity defects.

FOR A VISUAL DEMONSTRATION OF INTERRUPTS

http://www.sqablogs.com/JakeBrake/178/Easy+Interrupt-Handling+Demo.html



Performance "Testing" When Can it Start?

06:48, 2006-Mar-13  ..  Posted in Performance Testing  ..  2 comments  ..  Link

A long long time ago at a DoD Contractor far away…

I was initially assigned to a project as QA Manager. Initially my role was to fulfill the contractual obligations for both QA and Testing. This is when I learned about the differences. The USAF Program Office (PCO) told me I could not be both and then proceeded to educate me about why. That education about the differences is something I hold onto when opining in relevant QAF threads. The PCO asked me to choose the domain I wished to operate in and asked me to staff the vacant role I would leave behind. I conferred with the PM and found out that I would be responsible for staffing a team regardless of which role I chose. I chose Testing since the System Specification (SSS extra S for Segment) ) was loaded with all kinds of techie, geeky stuff that I couldn’t wait to tackle. I was strongly urged to handle the System & Systems Integration aspect, which by USAF definition included Performance Testing. "Bitchin!" ("Bitchin" had just superseded "boss" had just superseded "Heavy" had just superseded "Far Out (recycled))

I initially grew the test team to a total of four out of ultimately nine capable engineers and we set out to do our thing. We had documentation that would make most QAF members drool with envy. The governing document from which all else flowed pertinent to testing was the SSS. One key spec that caught my eye immediately was, "The system shall never exceed 65% CPU utilization during normal operations." Bitchin! (I will let you know when I mean bitching** as opposed to bitchin’) A performance-related spec! I was in binary heaven. And there were more specs of this category!

The design and development staffs consisted of technical leads only. They had to find 12 more designers/developers yet. We the Test Team were already planning! The contract had been awarded only two months prior.

The entire system (hardware and software) was to be built by this defense contractor. The system was intended to be the first of a new generation of Air-Intercept Control facilities. Ultimately, the system would be integrated with existing RADAR and telemetry systems. The system was to have the following as its processing "plant": Two DEC-VAX-11/780s, one with 2mB RAM, the other with 4mB RAM and a shared 4mB RAM solid-state unit. The main mission beside all the display bells and whistles was to track aircraft of course. In addition, the system was to provide on-demand intercept solutions for up to 35 controllers. Here is where a key performance spec came into play. This spec stated that, "The system shall provide real-time collision avoidance processing (CA) for up to 300 aircraft being tracked by the system." My immediate thought was "No Way!" Another key spec that this was to play along with was, "The system shall factor in winds aloft data (WAD) into all intercept solutions." By now, some of you might be thinking this is a lot of work for two 7.8 MHz systems. You are correct. I should note that all display processing was to be offloaded to dual-68020 display controllers interfaced to each control console. That would help.

I was due to present my System Test Plan at a formal Preliminary Design Review (PDR) two months after first seeing those specs. This plan according to PCO was to address how we would test this system in-house through all phases. The plan also had to address operational testing and evaluation (OPTEVAL). I felt I was in my element – for awhile. The CA spec gnawed at me. I conferred with several key people who advised me to get busy with making a case for PDR. I ran some desk calculations and kept coming up with data that showed there was no way this system could do the CA processing along with three RADAR and two high-speed telemetry interfaces.

PDR was here. In a boardroom full of 30 USAF personnel, their technical advisors, and about 20 of "us", the PDR progressed. My turn! I was a shy person at that age. I began talking while sitting. The PM next to me kicked my ankle and gestured me to stand. I was beet red and lacked any confidence in what I was about to present. As I talked, I waited for someone to stop me and ask me to leave. I was putting my entire career on the line. I was telling the USAF and their high-caliber technical advisors that the CA and WAD processing were too much and the system could handle WAD without CA for up to 100 aircraft, or – WAD and no CA. I presented the evidence and my confidence grew as I saw spec doubts dripping from several little splinter discussions. My blush was gone. One small group of advisors said they would take my analysis for their own and get back to us in a week. I had one minor faux pas a few minutes later when I presented the strategy for checking loss of secondary RADAR from aircraft during OPTEVAL. I said that we would have the aircraft bank steeply, wing down toward the RADAR in order to hide the aircraft transponder antenna. One USAF rep popped up and asked, "Why don’t you just have the pilot turn his transponder off?" I said, "Good idea! I like that better than my own." My face lit up RGB-red (255,0,0) in just under 0.65 seconds on that one. My anti-perspirant failed and the meeting adjourned.

As promised, the USAF delivered the response to my analysis. They approved a change to drop the CA spec. I felt relieved. I felt great! I had my job! Independent analyses agreed with my own! Bitchin'! ("Awesome" was still a decade away) This project however, was part of the project majority since it was a year late and had the usual cost-overruns! Keeping the CA would have added an order of magnitude to both those parameters. CA is found in most Air Traffic Systems today. Many question the reliability of it however. The processing logic is found both on the ground and aboard aircraft. It is still far from perfect and requires mucho compute horsepower. When it ultimately works in a robust fashion, it will be bitchin’ and the controllers now bitching** about CA today will exclaim, "Bitchin’!"

To me, this demonstrates:

  1. System Testing or any testing, can start on day one,
  2. "Testing" is not necessarily the act of "testing" per se, but involves examination and analyses, and ultimately – test and demonstration, and
  3. The MIL-STD-483 "waterfall" was not really! (a friendly jab at the Agilistism-istic-ators! )
  4. Bitchin' did not used to equal Bitching.


About Me

Home
My Profile
Archives
Friends
My Photo Album

Links

Corey Goldberg
Effective Testing?
Bj Rollison I.M. Testy Blog
Alan Page: Software Testing & Rants
Dmitry's LoadRunner and QTP Blog
Veterans History Project
Air Traffic Control Watch
Music Making Fun
My home 1972-1975

Categories

Functional Testing
Performance Horror
Development
Performance Testing
General
Tools Tips
Warped Humor
LoadRunner Tips and Tricks

Recent Entries

They Need To Test More...
Software Disorder
LoadRunner (tm) & RTE 4 Func/Regr
LoadRunner (tm) Random Think time Function
Are Rock, Paper, Scissors-based Decisions Obsolete?

Friends

LauraScharp
philk10
richardw100
aalhait
jimhazen
strazzerj
Lynnem
bru
EklecticTester
jgottlieb
leakybrain
michaeljf
prainbow
rajeshmathur
rstens
Yury
zeeslo
whollymindless

Syndication

RSS Site Feed