Top 4 Questions to Ask Your Outsourced Infrastructure Provider about Uptime and Availability
Posted February 12th, 2008 by Jonah ParanskyInfrastructure outsourcing is becoming a more popular option for companies of all sizes, from 100 person organizations to 100,000 employee behemoths. The infrastructure that e-mail, websites, CRM systems, and even financial processing systems run on are all being outsourced to third party infrastructure providers at a furious rate. This is because often these providers bring scale, expertise and guarantees that promise a clear ROI to the business and IT organizations that make these outsourcing decisions.
However when looking to outsource infrastructure management, companies must understand the implications to availability and uptime and weigh those implications as part of the decision process. Don’t forget that depending on the end-to-end IT service you will be offering on the infrastructure, your tolerance for downtime will change. Also, don’t think that web 2.0 startups don’t have to worry about these issues, often Web 2.0 demands more availability.
At the same time, beware of “over engineering”. Joel Spolsky makes the important point that you have to find the right balance between availability and cost for your organizations needs.
Question 1: What kinds of guarantees are provided by the infrastructure outsourcer around availability and uptime?
When uptime guarantees come into play, the devil is in the details. Downtime issues have been known to destroy vendor relationships and cause significant public pain. There are even current lawsuits on the use of “Always On” by service providers who then have unplanned downtime.
Several areas to be concerned about include:
- What is the provider guaranteeing? Is the guaranteed uptime a measure of the connection of their facility to the internet? Of a specific infrastructure component? Or of the complete end-to-end availability of your IT service? How is the guarantee measured? Don’t expect miracles, if you can’t figure out a good way to measure the end-to-end uptime of your IT service, they probably can’t either.
- Will the vendor report downtime to you – or are you forced to report downtime to the vendor to receive any deserved credits?
- While you may agree on 99.9% uptime – is it measure daily? Monthly? On a 3 month sliding scale? Annually? This can be the difference between an outage lasting 1.4 minutes to an outage lasting 8.5 hours being considered within your contracted downtime allowance. As pingdom points out, there are many hosting companies out there with poor or odd uptime guarantees.
- What about planned downtime? If your application needs to be up and running 99.9% of the time, and the vendor allows for an 8 hour maintenance window each week that doesn’t apply to your downtime statistics, you are not going to be happy with the results.
- Do the penalties associated with the guarantees offer any real value? On the flipside, there are those that argue that there are more effective vendor control models than service level agreement (SLA) penalties.
Question 2: How does the infrastructure provider architect their infrastructure for availability?
Explore in detail the baseline architecture being used to support your end-to-end IT service. Key areas to explore include:
- Are their single points of failure in the infrastructure design? At the network and hardware level, what levels of redundancy exist?
- Is the infrastructure distributed across multiple locations?
- What is the redundancy built into the physical plant? Generators, Power, connectivity, location all factor into the likely actual availability of your infrastructure.
Question 3: How does the vendor plan for Disaster Recovery and Business Continuity issues?
It is an IT reality: data centers fail. Even with multiple generators, battery backup and multiple inbound connections, failures do happen. The important point is that the outsourcer has taken this into account, by performing regular disaster recovery and business continuity testing. Questions to ask include:
- Do you maintain a formal disaster recovery/business continuity plan?
- How often are disaster recovery/business continuity plan tests conducted? You should be able to see the results of these tests, even if only on an “eyes only” confidential basis.
- Are the disaster recovery tests conducted on a component basis, or across the entire end to end service, across multiple sites and datacenters?
- Has the provider had a continuity event in the last three years, and what happened?
Question 4: How does the outsourcer test the impact of changes prior to release into production?
Changes are the leading cause of downtime. It is critically important to understand how your proposed Outsourcer introduces change into the infrastructure environment. Issues to investigate include:
- Do they automate Change Management? Or is it a manual process?
- Do they have a high level of Change Management Maturity?
- Do they have a high level of IT Operations focused Testing Maturity?
- Since our research shows that emergency changes often cause problems in production, what percentage of all of their changes are “emergency changes?”
- Do they “Patch and Pray”, or do they have a formal process for understanding the impact of changes prior to production release?
Popularity: 10% [?]
Filed Under: Business Continuity, Downtime, IT Operations















July 31st, 2008 at 7:47 pm
[...] Outsourcing critical infrastructure comes with risk. Make sure to architect for availability, and don’t assume that your provider will achieve 100% upt… [...]