Links List for 1.16.09

Posted January 16th, 2009 by Joe Pendry

We previously reported about Salesforce.com’s recent outage, however this guy at ServerWatch doesn’t seem to understand what all the commotion was about. After all, downtime is a common occurrence for all enterprise applications in general…even adding that Microsoft’s website went down last week when users raced to download the beta of its Windows 7 operating system. Salesforce.com’s uptime is pretty impressive and worth noting. Why is it when good things happen, we have to work hard to spread the news, but when something unwanted occurs, we can’t get a word in edge-wise? Welcome to the world of IT, folks.

What would a Links List be without an article about a list? EMA recently released a report that highlights what it believes to be 12 top IT management trends that will expand in popularity this year. Some of the topics worth noting include virtualization, IT security, cloud computing and green IT.

Trying to squeeze the idea of cloud computing in a box is nearly impossible. It’s a complicated idea that is hard to define with one simple term. Here, one person tries to sum it up by creating 10 “as-a-service” categories that it can find a home within: Storage, Database, Information, Process, Application, Platform, Integration, Security, Management/Governance and Testing.

Popularity: 12% [?]

Filed Under: Cloud Computing, Downtime, IT Operations, IT Operations Research

Comment now »

To Stage or not to Stage?

Posted January 12th, 2009 by Dennis Powell

While StackSafe was exhibiting at the recent Gartner Data Center conference, we conducted a small survey of about 50 attendees about the types of projects that IT groups are planning for 2009.  I was recently looking through the results of that survey and thought that some of the responses would be of interest to IT’s About Uptime readers – especially the data about staging environments.

The survey interviewed IT professionals from companies of 1000+ employees, with participants answering questions about their priority projects for the coming year.  The seven projects covered in the survey ranged from upgrades of existing systems to merging of data centers, and each was selected because it would introduce a large volume of changes to existing systems.  As we have discussed on this blog, changes (especially in the most complex environments) often lead to problems.

It was somewhat encouraging, therefore, to find that more than 50% of IT shops expect to deploy a formal staging environment to support their top project in 2009. Formal staging environments are test labs that are built to represent the technical complexity of the production environments. Customers make pre-deployment changes and test the changes in the formal staging environment prior to rejecting the change or certifying the change for production deployment.

Now for the bad news: for four of the seven different types of projects covered in the survey, less than 60% of upgrades, patches or other configuration changes are tested. So this means that, even though the capabilities are there, a large number of changes are not tested.  Risky business.

Other findings:

  • Formal staging platforms were favored when projects had completion schedules that were estimated to take longer to complete
  • Participants who relied on QA groups to test their project showed slight favoritism toward formal staging platforms over participants who expected non-QA groups to perform project testing
  • 72% of all projects have been delayed by 20% or more over their original completion date
  • 43% of participants reported that at least 20% of their production changes are emergency changes
  • And finally, 10% of all changes introduced into systems need to be rolled back

How do these findings match your IT organization’s reliance on formal staging environments?

Popularity: 11% [?]

Filed Under: IT Operations Research, Testing

Comment now »

Links List for 1.9.09

Posted January 9th, 2009 by Joe Pendry

New Year, New Outages:  AT&T was one of the first victims of major downtime this year, suffering its second wide-scale service outage in as many weeks.  While the outage appears to have only affected the East Coast, it disrupted the lives of many users seeking to go mobile for coverage and details of the MacWorld Expo.  No details have been made available as to why the service went down, other than that it was a “network glitch.”

Salesforce Suffers: Adding to the New Year’s downtime woes, Salesforce.com also experienced major downtime this week – the site was completely down for as much as 40 minutes, affecting thousands of customers.  As Salesforce’s SaaS applications become more and more important to the enterprise (with dozens of applications/services running off the Salesforce backbone), any downtime from the giant provider can cripple customer operations and cost millions in lost revenue.  Another proving point that, sometimes, 99 percent availability just isn’t enough.

Movement in the Cloud: The cloud also saw some major shakeups this week, with EMC buying pieces of SourceLabs and Sun purchasing Q-layer, This is more proof that the cloud is here to stay, and that testing applications in and against cloud environments will only become more important in 2009.

Management Must-Dos: Network World’s Denise Dubie published an article this week looking at the “Management Must-Dos in 2009,” with ITIL, specifically process, being one of the top concerns.  We couldn’t agree more.

Popularity: 12% [?]

Filed Under: Cloud Computing, Downtime, ITIL

1 Comment »

Final New Years Resolution: Focus on Process

Posted January 7th, 2009 by Joe Pendry

So here we are in the first full week of January and you are still looking for a New Year’s Resolution that can help reduce production problems and downtime.

If our first three suggestions about controlling access to production systems, managing configuration drift, or change requests were not what you needed, perhaps our final suggestions will work. Like our earlier suggestions, this comes from our report with the IT Process Institute (ITPI) about how the best organizations handle changes to minimize problems.

Best of all, this suggestion doesn’t require any technical work or management – just good old fashioned willpower like “old school” resolutions do.  With a little organizational discipline, you can gain significant IT benefits.

The resolution is “I/we will build a process-oriented culture and respond quickly when the process isn’t followed.”  You see?  It is all about discipline.  Once you have determined your normal process for access, configuration, release and change practices, the key is to make sure everyone follows those processes.

But it takes more than sending out a memo to the IT team.  Unfortunately, the culture might not reinforce the process at some organizations.  As we identified in the report:

“IT professionals are smart and want to do a good job. But they, like everyone else, respond to environmental forces that shape their behavior. They adjust their expectations and work style to fit their organization. Unfortunately, many IT organizations have a history of rewarding brilliant technologists and star firefighters, but do not have a process culture. Compounding the problem is that CIOs and IT managers often don’t naturally have the skills needed to assess and shape organizational culture.”

Three steps can get you most of the way toward a culture where process is an important part of the IT team’s day to day job.

  1. Build process consensus and identify causes of the most frequent exceptions – Working with human resources to tie process adherence to performance reviews can go a long way toward greater acceptance of processes.  Also, some investigation can help.  Who makes the most emergency changes?  What applications and upgrades have the lowest success rate?
  2. Standardize exception response – Exceptions are a part of life.  The key is planning for them and reviewing the reason for them after the fact.  It may be that an exception needs to become a part of the normal process if it can minimize production problems.
  3. Integrate development and production processes – This is the trickiest for large organizations.  Make sure that the processes cross the “wall” that sometimes exists between development/QA and operations.  If changes follow a standard procedure all the way back to development, there are less opportunities for disruption.

And what does all this discipline get you?  Much like a New Years diet make you a healthier and (potentially) happier person, IT process discipline makes for a healthier and happier IT team.  Some of the benefits we noticed for top performers in the ITPI report included:

  • Significantly (60% to 70%) lower downtime
  • Less overall production process
  • Improved change success rate (percentage of changes that met functional objectives and were completed during the planned time, and actions exactly followed the build instructions)
  • Fewer releases that cause a service outage or incident (from 48% up to 74%)
  • Greater capacity to fix incidents within SLA limits—Top performers fix 93% of incidents within SLA limits, which is 16% higher than medium performers and 43% higher than low performers.

In a time when people are trying to be more disciplined in their personal lives (working out, dieting and watching less TV) why not transfer that mindset to the IT team?

Popularity: 12% [?]

Filed Under: Change Management, IT Operations, IT Operations Research

1 Comment »

Links List 1.2.09

Posted January 2nd, 2009 by Joe Pendry

Not surprisingly, it’s been a slow news week. Since we are working on our own list of IT Resolutions, we thought it would only be fair to highlight lists from some of the leading IT media outlets.

Information Week lists the Top 10 CIO issues for 2009. Attacking the 80/20 ratio is listed at number two, mirroring our resolution to link change requests to business processes.

InfoWorld has their own resolutions list. Number two caught our eye – “slay sacred cows.” By this they mean look at “replacing Microsoft Office, swap out Microsoft Exchange, or replace Oracle Database as part of an effort to reduce long-term costs.” If you do undertake one of these major migrations, pre-production testing will be key to ensuring a smooth transition.

Network World has an interesting list of web sites IT pros should master in 2009. From large social networks like LinkedIn and Twitter to sites that make sense of the latest buzz like enterprise mobility and cloud computing to online communities for sharing best practices, this list covers a wide variety of sites for surfing.

And completing our trip around the “worlds” of IT media, Computerworld provides six budget tips for surviving 2009. Tips five and six both speak to change that may be demanded due to budget pressure. They say IT teams will be asked to trim back “dead wood” and look for cost savings with in existing infrastructure. IT should also be prepared to answer questions about how virtualization and cloud computing could be the “silver bullet” to rising IT costs.

Popularity: 22% [?]

Filed Under: IT Operations, Testing

Comment now »

New Year’s Resolution: Link Change Requests to Business Process and Infrastructure

Posted December 30th, 2008 by Jonah Paransky

Over the last couple weeks, we have published a series of New Year’s resolutions for IT organizations that want to minimize downtime in 2009.

The purpose of these recommended resolutions is to convey some of the activities that top-performing IT organizations demonstrate. We recently worked with the IT Process Institute (ITPI) to look at the activities that the best IT organizations do to minimize downtime and disruption that result from changes including upgrades and patches.

Our first resolution for 2009 is “I/we will do a better job of controlling access to my production system.”

Our second resolution is “I/we will better manage my production system to make sure it matches my target configuration.”

The topic for today (our third resolution) is “I will strive to link change requests to business processes (vital business functions) and underlying infrastructure”.

Regular readers of this blog are likely familiar with ITIL recommended change practices. However, as described in the application and infrastructure upgrade best practice guide, the standard change management best practices are necessary, but not enough to achieve top levels of performance.
As described in the report, by linking change requests to the business process and infrastructure, you should require the following questions to be answered for every change.

  • What is the overall complexity and severity of the proposed change?
  • What is the context of the change; specifically, what business cycles or processes could be affected by application or infrastructure upgrades?
  • How successful have we been at these kinds of changes in the past?
  • What is the likelihood that our test environment accurately reflects the production environment?
  • Can rollback plans be accurately tested before the proposed changes hit production?
  • Can the changes be fully vetted, and can the outcomes be understood?
  • Who are the main stakeholders in the business and in IT development and operations?
  • Are we dependent upon outside suppliers, contractors?
  • Show me the documentation

To be able to answer these questions, organizations need to build a detailed understanding of how changes link back to the underlying infrastructure and the business processes that rely upon that infrastructure. This includes developing both bottom-up mapping among applications and underlying systems as well as top-down mappings of business processes to the applications themselves.

Of course, with all New Year resolutions, we have to ask if the potential effort is worth the results. As seen in the best practice guide, top levels of change management performance come with clear advantages including reducing the number of releases that are rolled back (3.3% of releases rolled back versus 8.5% for low performers in the study, an improvement of over 50% in rollback rate) and minimizing configuration drift. In 2009, reducing rollback rates is another way to improve operational efficiency, a key focus for many IT executives trying to break out of the 80%/20% (or 70%/30%) trap and reduce operational costs.

We look forward to completing this series of resolutions during the first week of 2009.

Have a Happy, Healthy and downtime free New Year!

Popularity: 12% [?]

Filed Under: Change Management, Downtime, IT Operations

1 Comment »

New Years Resolution: Better Manage Configuration Drift

Posted December 24th, 2008 by Joe Pendry

Last week we kicked off our New Years resolutions for IT teams that want to minimize downtime in 2009. These resolutions are the result of some best practices uncovered in a best practice guide we co-developed with the IT Process Institute. Our first resolution was “I/we will do a better job of controlling access to my production system.”

Because we think you should have an option when it comes to resolutions, today we will focus on another suggestion. This time we want to focus less on people and more on the environment. Therefore, our second resolution is “I/we will better manage my production system to make sure it matches my target configuration.” Down with configuration drift!

Why is this important? Because even slight variations between what you think your environment looks like and what it actually looks like can cause the best deployment plans to fail. Not to mention all of the time and effort that must be spent validating that systems are configured the way they ought to be each time a new change is planned for introduction.

This resolution is only for the brave-hearted, of course. Kind of like a resolution to count the calories you eat – you can be disgusted with what is actually going on when you decide to measure. Or, as the best practice guide states:

“When IT executives first start monitoring configuration drift, they are often surprised by the amount of unauthorized or out-of-process change activity that occurs on systems they thought were locked up and stable. If due to out-of-process changes, a system is in an unapproved and unknown state, the level of security and operational risk is also unknown.”

The top-performing IT organizations handle configuration drift through two important steps. First, they approve a “golden” build that becomes the only system to which updates or other changes are made. Second, they monitor systems for unauthorized changes and configuration drift. It is all about control and protection.

And what do you get if you make this resolution and hold steady? Glad you asked. Some of the more impactful benefits include improved auto-detection of security breaches and shorter release preparation time. Over time, you can also expect lower administration, maintenance and support costs. (Not a bad benefit given the tighter IT budgets projected for 2009).

Popularity: 14% [?]

Filed Under: Change Management, IT Operations

3 Comments »

Links List 12.19.08

Posted December 19th, 2008 by Joe Pendry

Two interesting takes on the challenges to the growth and acceptance of the cloud caught our eye this week. First, reliability – Dana Gardner writes about the rising importance of IT Systems Analytics in light of increased use of SaaS and cloud. He suggests a shift in IT ops thinking from, “’It’s broken, so I won’t use it’, to a more mature attitude, which says, ‘It will be up most of the time, but when it does break, how do I make sure that I remain accountable, as the IT manager, the IT Director, or the CIO. How do I remain accountable for those services to my organization, and how do I make sure that I can pinpoint the cause of the problem, and get it rectified as quickly as possible?’”

Second, Tarry Singh highlights a post from Alistair Croll on GigaOm that takes a real-world look at the security risks of the cloud. He points out that most computer breeches (and we could add downtime) are a result of human error. Cloud operations tend to take humans out of the equation, potentially equating to fewer security breeches and less downtime.

Similar to the New Years Resolution series we’re running here on Its About Uptime. InfoWorld published the 7 deadly sins of IT management. A case study in number seven, “Pride creep,” got our attention. The article details a multibillion-dollar manufacturer that built its own ERP solution and then rolled it out without any testing. The lack of testing and subsequent incompatibility with other systems was one of the many reasons the system failed and was ultimately scrapped.

Finally, with all of the doom and gloom about the economy and IT spending, some interesting news came out of a report from McKinsey – Executives (IT and non-IT) would like to spend 40 percent of their IT spend on new projects but currently only allocate 20 percent (on average). Feedback from executives showed that they understood the value in increasing IT spend on solutions that will make them more competitive in a difficult market.

Popularity: 14% [?]

Filed Under: Cloud Computing, Testing

1 Comment »

Controlling Access to Production Systems in the New Year

Posted December 18th, 2008 by Joe Pendry

As we wind down the year at IT’s About Uptime, we thought we would kick off our series of recommended New Years Resolutions.  These resolutions are recommended specifically for our readers from IT organizations.  Our own actual resolutions include the usual:  cutting out the doughnuts, watching less TV, improving pre-deployment testing capabilities for organizations with changes to their mission critical applications, etc.

The purpose of these recommended resolutions is to convey some of the activities that top-performing IT organizations demonstrate.  We recently worked with the folks at the IT Process Institute (ITPI) to take a look at the activities that the best IT organizations do to minimize downtime and disruption that result from changes like upgrades and patches.  Why not pass along these findings in the name of the holiday season, right?

Our first resolution for 2009 is: “I/we will do a better job of controlling access to my production system.”

Seems logical, right?  But it turns out this is much harder to accomplish than you might think.  Much like laying off the doughnuts.

The top-performing IT organizations are often paranoid about access.  To meet service level commitments, these groups obsess about the responsibility and authority to manage the people and the process used for application and infrastructure upgrades.  They implement and optimize controls that ensure that only people in specific roles make modifications to the production environment.  This approach involves two key principles:

  • Access is controlled by production and granted on the basis of job function.
  • As a prerequisite, roles are defined and duties are segregated.

As stated in our Best Practice Guide with ITPI

“Organizations should manage roles-based access so they can help prevent out-of-process modifications to key systems. Developers shouldn’t have access to production systems. Production shouldn’t have access to source code. However, to facilitate second- and third level troubleshooting, many organizations have a “break glass” process to grant temporary development access to production systems. The duration of access should be limited, privileges should be limited, and modifications to production systems should be monitored and documented for audit purposes.”

And why should you follow this resolution?  What is the benefit for being disciplined about controlling access?  Some fairly significant benefits came to light in the study.

  • Improved rate of auto-detected security breaches
  • Reduced process variability
  • Reduced emergency change rate—Emergency changes are changes that are tracked, but that do not receive a standard review before they are implemented (for example, changes implemented before the next weekly change meeting).

So while you are having fun at the office Holiday party this week, just remember that you might have to come down on your co-worker with regard to their production system access.  After all, it is (or should be) your resolution.

Popularity: 16% [?]

Filed Under: Change Management, Downtime, IT Operations, IT Operations Research

5 Comments »

StackSafe Wins Start-Up Competition

Posted December 18th, 2008 by Joe Pendry

We don’t normally talk about company topics on this blog, but we received some good news at StackSafe this week.  We were named the winner of the National Council of Entrepreneurial Tech Transfer’s 2008 National University Start-Up Competition.  As discussed in a press release we recently distributed, “After various presentations to VC, angel investor and university judges, StackSafe emerged from 400 competing university startups as an elite investment-worthy company.”

We are working hard to bring new technologies to the market that help IT teams work more effectively and with better results.  It is quite an honor to be recognized for this work and for the potential that others see in our solutions.

Popularity: 13% [?]

Filed Under: StackSafe Corporate

Comment now »