Application Upgrade Testing: Three Reasons Why IT Operations Needs to be Involved
Posted August 28th, 2008 by Joe PendryAs we mentioned in the past, one of the leading causes of downtime among critical business applications are changes. Application upgrades in particular, can cause problems for organizations – especially when the company chooses to perform quality assurance and testing practices before systems reach the production environment without sufficient IT operations testing.
In an upcoming best practice guide, StackSafe worked with the IT Process Institute to examine this problem from a number of angles. One interesting finding is that problems can arise when testing and QA activities are handled predominantly during the application development process – without sufficient pre-production testing.
We pulled some information from the guide (which we will post a link to when it is released); specifically, the three key problems that result from over-reliance on QA testing of application upgrades:
1) Responsibility alignment. Those who are responsible for the availability, proper function, and security of business-critical systems don’t necessarily control the upstream activities that have a big impact on performance in the production environment.
Typically, production management has no way to warrant what other groups are doing. It is often the case that, when IT operations tries to influence an upstream function, they run into a core conflict.On one side of the conflict, developers responsible for new applications are measured on throughput and speed. As a result, they improve their business value by “doing more faster,” such as using Agile development techniques and SOA. The other side of the conflict, operations is responsible for managing what is in place and is measured on availability, security, and cost containment. As a result, they improve their business value by doing more with less, such as using standard architectures and process frameworks to reduce variance and management overhead. With different sets of objectives, these groups may feel like the other is preventing them from achieving their objectives.
2) Limited view. Quality and test groups, which are often in the development organization, sometimes have a limited view of the production change and release process. They focus primarily on functional testing, which is designed to verify that changes meet functional requirements. However, application and infrastructure upgrades also require integration testing or fit-for-purpose testing, which is focused on verifying that changes work with all the components of the computing stack.
There may be many different configurations of a computing stack that need to be tested once functional testing is complete. This post-development, pre-production testing and verification responsibility often falls on the production team, which has access and responsibility for the quality of production systems. Building and maintaining staging environments that are representative of different production environments is difficult, and IT operations groups may not have the testing expertise or process rigor typically found in the development, QA, and functional testing teams. Unfortunately, the core conflict related to doing things faster in development and doing more with less in production, often means less pre-production integration testing.
3) Increasing complexity. The modern computing environment is complex and has numerous dependencies between the layers of the computing stack. As we touched on in an earlier research report , 65% of companies indicate that combined, vendor-developed patches or in-house developed patches were a leading cause of failed changes. Other complexity issues were noticed:
- 54% indicated interaction between multi-tiered systems was a leading cause
- 43% felt insufficient pre-production testing was a leading cause
The net effect of this complexity is that it is difficult to gauge the risk of an application or system modification without pre-production integration testing. Overall, the combination of the need for speed, cross-functional processes, and complex computing environments, make pushing quality upstream a challenging but necessary endeavor for those responsible for proper function and availability of production IT systems.
Popularity: 4% [?]
Filed Under: Change Management, Downtime, IT Operations, Testing















August 29th, 2008 at 3:03 pm
Joe,
Nice post. I think all too often IT Ops winds up doing production troubleshooting following application upgrades only to realize that if extra effort had been put into the final testing stages (with their help), the fires would never have cropped up in the first place.
In a prior life at OPNET Technologies, we used to talk about Application Network Reviews where the network team would take samples of traffic flows and packet streams to predict the impact of these changes to capacity and end user response time.
Following this type of analysis you can use the results to fine tune end user monitoring tools that Ops should have deployed. In addition and new elements that are added into the mix (new content servers, authentication systems, etc.) should be added to the service definitions in a BSM implementation to ensure that they are part of the monitored ecosystem when the upgrades go live into production. I talked about some of technology and considerations that should go into this side of the equation in a recent post on our blog - http://www.wearebsm.com/managed_objects/2008/08/end-user-monitoring-experience.html
I look forward to reading the best practice guide when it’s released.
Abbas.
September 24th, 2008 at 3:37 pm
[...] have blogged in the past about the difficulty with testing application upgrades. When upgrading applications, for example, how can IT groups be certain the implementation of these [...]