- It is crucial for capacity managers to provide capacity in advance of need to maximize availability.
- In an effort to ensure maximum uptime, organizations are overprovisioning (an average of 59% for compute, and 48% for storage). With budget pressure mounting (especially on the capital side), the cost of this approach can’t be ignored.
- Half of organizations have experienced capacity-related downtime, and almost 60% wait more than three months for additional capacity.
Our Advice
Critical Insight
- All too often capacity management is left as an afterthought. The best capacity managers bake capacity management into their organization’s business processes, becoming drivers of value.
- Communication is key. Build bridges between your organization’s silos, and involve business stakeholders in a dialog about capacity requirements.
Impact and Result
- Map business metrics to infrastructure component usage, and use your organization’s own data to forecast demand.
- Project future needs in line with your hardware lifecycle. Never suffer availability issues as a result of a lack of capacity again.
- Establish infrastructure as a driver of business value, not a “black hole” cost center.
Member Testimonials
After each Info-Tech experience, we ask our members to quantify the real-time savings, monetary impact, and project improvements our research helped them achieve. See our top member experiences for this blueprint and what our clients have to say.
8.0/10
Overall Impact
$2,840
Average $ Saved
10
Average Days Saved
Client
Experience
Impact
$ Saved
Days Saved
Cork County Council
Guided Implementation
8/10
$2,840
10
BlueAlly Technology Solutions, LLC
Guided Implementation
10/10
$10,000
5
Randolph Brooks Federal Credit Union
Guided Implementation
8/10
$2,231
2
Availability & Capacity Management
Please note: This course will be updated in July 2023.
Maximize the benefits of infrastructure monitoring investments by diagnosing & assessing transaction performance, from network to server to end-user interface.
This course makes up part of the Infrastructure & Operations Certificate.
- Course Modules: 4
- Estimated Completion Time: 2-2.5 hours
- Featured Analysts:
- Darin Stahl, Sr. Research Director, Infrastructure & Operations Practice
- Gord Harrison, SVP of Research and Advisory
Workshop: Develop an Availability and Capacity Management Plan
Workshops offer an easy way to accelerate your project. If you are unable to do the project yourself, and a Guided Implementation isn't enough, we offer low-cost delivery of our project workshops. We take you through every phase of your project and ensure that you have a roadmap in place to complete your project successfully.
Module 1: Conduct a Business Impact Analysis
The Purpose
- Determine the most important IT services for the business.
Key Benefits Achieved
- Understand which services to prioritize for ensuring availability.
Activities
Outputs
Create a scale to measure different levels of impact.
- RTOs/RPOs
Evaluate each service by its potential impact.
- List of gold systems
Assign a criticality rating based on the costs of downtime.
- Criticality matrix
Module 2: Establish Visibility Into Core Systems
The Purpose
- Monitor and measure usage metrics of key systems.
Key Benefits Achieved
- Capture and correlate data on business activity with infrastructure capacity usage.
Activities
Outputs
Define your monitoring strategy.
- RACI chart
Implement your monitoring tool/aggregator.
- Capacity/availability monitoring strategy
Module 3: Develop a Plan to Project Future Needs
The Purpose
- Determine how to project future capacity usage needs for your organization.
Key Benefits Achieved
- Data-based, systematic projection of future capacity usage needs.
Activities
Outputs
Analyze historical usage trends.
- Plan for soliciting future needs
Interface with the business to determine needs.
Develop a plan to combine these two sources of truth.
- Future needs
Module 4: Identify and Mitigate Risks
The Purpose
Identify potential risks to capacity and availability.- Develop strategies to ameliorate potential risks.
Key Benefits Achieved
- Proactive approach to capacity that addresses potential risks before they impact availability.
Activities
Outputs
Identify capacity and availability risks.
- List of risks
Determine strategies to address risks.
- List of strategies to address risks
Populate and review completed capacity plan.
- Completed capacity plan
Develop an Availability and Capacity Management Plan
Manage capacity to increase uptime and reduce costs.
ANALYST PERSPECTIVE
The cloud changes the capacity manager’s job, but it doesn’t eliminate it.
"Nobody doubts the cloud’s transformative power. But will its ascent render “capacity manager” an archaic term to be carved into the walls of datacenters everywhere for future archaeologists to puzzle over? No. While it is true that the cloud has fundamentally changed how capacity managers do their jobs , the process is more important than ever. Managing capacity – and, by extent, availability – means minimizing costs while maximizing uptime. The cloud era is the era of unlimited capacity – and of infinite potential costs. If you put the infinity symbol on a purchase order… well, it’s probably not a good idea. Manage demand. Manage your capacity. Manage your availability. And, most importantly, keep your stakeholders happy. You won’t regret it."
Jeremy Roberts,
Consulting Analyst, Infrastructure Practice
Info-Tech Research Group
Availability and capacity management transcend IT
This Research Is Designed For:
✓ CIOs who want to increase uptime and reduce costs
✓ Infrastructure managers who want to deliver increased value to the business
✓ Enterprise architects who want to ensure stability of core IT services
✓ Dedicated capacity managers
This Research Will Help You:
✓ Develop a list of core services
✓ Establish visibility into your system
✓ Solicit business needs
✓ Project future demand
✓ Set SLAs
✓ Increase uptime
✓ Optimize spend
This Research Will Also Assist:
✓ Project managers
✓ Service desk staff
This Research Will Help Them:
✓ Plan IT projects
✓ Better manage availability incidents caused by lack of capacity
Executive summary
Situation
- IT infrastructure leaders are responsible for ensuring that the business has access to the technology needed to keep the organization humming along. This requires managing capacity and availability.
- Dependencies go undocumented. Services are provided on an ad hoc basis, and capacity/availability are managed reactively.
Complication
- Organizations are overprovisioning an average of 59% for compute, and 48% for storage. This is expensive. With budget pressure mounting, the cost of this approach can’t be ignored.
- Lead time to respond to demand is long. Half of organizations have experienced capacity-related downtime, and almost 60% wait 3+ months for additional capacity. (451 Research, 3)
Resolution
- Conduct a business impact analysis to determine which of your services are most critical, and require active capacity management that will reap more in benefits than it produces in costs.
- Establish visibility into your system. You can’t track what you can’t see, and you can’t see when you don’t have proper monitoring tools in place.
- Develop an understanding of business needs. Use a combination of historical trend analyses and consultation with line of business and project managers to separate wants from needs. Overprovisioning used to be necessary, but is no longer required.
- Project future needs in line with your hardware lifecycle. Never suffer availability issues as a result of a lack of capacity again.
Info-Tech Insight
- Components are critical. The business doesn’t care about components. You, however, are not so lucky…
- Ask what the business is working on, not what they need. If you ask them what they need, they’ll tell you – and it won’t be cheap. Find out what they’re going to do, and use your expertise to service those needs.
- Cloud shmoud. The role of the capacity manager is changing with the cloud, but capacity management is as important as ever.
Save money and drive efficiency with an effective availability and capacity management plan
Overprovisioning happens because of the old style of infrastructure provisioning (hardware refresh cycles) and because capacity managers don’t know how much they need (either as a result of inaccurate or nonexistent information).
According to 451 Research, 59% of enterprises have had to wait 3+ months for new capacity. It is little wonder, then, that so many opt to overprovision. Capacity management is about ensuring that IT services are available, and with lead times like that, overprovisioning can be more attractive than the alternative. Fortunately there is hope. An effective availability and capacity management plan can help you:
- Identify your gold systems
- Establish visibility into them
- Project your future capacity needs
Balancing overprovisioning and spending is the capacity manager’s struggle.
Availability and capacity management go together like boots and feet
Availability and capacity are not the same, but they are related and can be effectively managed together as part of a single process.
If an IT department is unable to meet demand due to insufficient capacity, users will experience downtime or a degradation in service. To be clear, capacity is not the only factor in availability – reliability, serviceability, etc. are significant as well. But no organization can effectively manage availability without paying sufficient attention to capacity.
"Availability Management is concerned with the design, implementation, measurement and management of IT services to ensure that the stated business requirements for availability are consistently met."
– OGC, Best Practice for Service Delivery, 12
"Capacity management aims to balance supply and demand [of IT storage and computing services] cost-effectively…"
– OGC, Business Perspective, 90
Integrate the three levels of capacity management
Successful capacity management involves a holistic approach that incorporates all three levels.
Business | The highest level of capacity management, business capacity management, involves predicting changes in the business’ needs and developing requirements in order to make it possible for IT to adapt to those needs. Influx of new clients from a failed competitor. |
---|---|
Service | Service capacity management focuses on ensuring that IT services are monitored to determine if they are meeting pre-determined SLAs. The data gathered here can be used for incident and problem management. Increased website traffic. |
Component | Component capacity management involves tracking the functionality of specific components (servers, hard drives, etc.), and effectively tracking their utilization and performance, and making predictions about future concerns. Insufficient web server compute. |
The C-suite cares about business capacity as part of the organization’s strategic planning. Service leads care about their assigned services. IT infrastructure is concerned with components, but not for their own sake. Components mean services that are ultimately designed to facilitate business.
A healthcare organization practiced poor capacity management and suffered availability issues as a result
CASE STUDY
Industry: Healthcare
Source: Interview
New functionalities require new infrastructure
There was a project to implement an elastic search feature. This had to correlate all the organization’s member data from an Oracle data source and their own data warehouse, and pool them all into an elastic search index so that it could be used by the provider portal search function. In estimating the amount of space needed, the infrastructure team assumed that all the data would be shared in a single place. They didn’t account for the architecture of elastic search in which indexes are shared across multiple nodes and shards are often split up separately.
Beware underestimating demand and hardware sourcing lead times
As a result, they vastly underestimated the amount of space that was needed and ended up short by a terabyte. The infrastructure team frantically sourced more hardware, but the rush hardware order arrived physically damaged and had to be returned to the vendor.
Sufficient budget won’t ensure success without capacity planning
The project’s budget had been more than sufficient to pay for the extra necessary capacity, but because a lack of understanding of the infrastructure impact resulted in improper forecasting, the project ended up stuck in a standstill.
Manage availability and keep your stakeholders happy
If you run out of capacity, you will inevitably encounter availability issues like downtime and performance degradation . End users do not like downtime, and neither do their managers.
There are three variables that are monitored, measured, and analyzed as part of availability management more generally (Valentic).
- Uptime:
- Reliability:
- Maintainability:
The availability of a system is the percentage of time the system is “up,” (and not degraded) which can be calculated using the following formula: uptime/(uptime + downtime) x 100%. The more components there are in a system, the lower the availability, as a rule.
The length of time a component/service can go before there is an outage that brings it down, typically measured in hours.
The amount of time it takes for a component/service to be restored in the event of an outage, also typically measured in hours.
Enter the cloud: changes in the capacity manager role
There can be no doubt – the rise of the public cloud has fundamentally changed the nature of capacity management.
Features of the public cloud | Implications for capacity management |
---|---|
Instant, or near-instant, instantiation | Lead times drop; capacity management is less about ensuring equipment arrives on time. |
Pay-as-you go services | Capacity no longer needs to be purchased in bulk. Pay only for what you use and shut down instances that are no longer necessary. |
Essentially unlimited scalability | Potential capacity is infinite, but so are potential costs. |
Offsite hosting | Redundancy, but at the price of the increasing importance of your internet connection. |
Vendors will sell you the cloud as a solution to your capacity/availability problems
Traditionally, increases in capacity have come in bursts as a reaction to availability issues. This model inevitably results in overprovisioning, driving up costs. Access to the cloud changes the equation. On-demand capacity means that, ideally, nobody should pay for unused capacity.
Reality check: even in the cloud era, capacity management is necessary
You will likely find vendors to nurture the growth of a gap between your expectations and reality. That can be damaging.
The cloud reality does not look like the cloud ideal. Even with the ostensibly elastic cloud, vendors like the consistency that longer-term contracts offer. Enter reserved instances: in exchange for lower hourly rates, vendors offer the option to pay a fee for a reserved instance. Usage beyond the reserved will be billed at a higher hourly rate. In order to determine where that line should be drawn, you should engage in detailed capacity planning. Unfortunately, even when done right, this process will result in some overprovisioning, though it does provide convenience from an accounting perspective. The key is to use spot instances where demand is exceptional and bounded. Example: A university registration server that experiences exceptional demand at the start of term but at no other time.
Use best practices to optimize your cloud resources
Even in the era of elasticity, capacity planning is crucial. Spot instances – the spikes in the graph above – are more expensive, but if your capacity needs vary substantially, reserving instances for all of the space you need can cost even more money. Efficiently planning capacity will help you draw this line.
Evaluate business impact; not all systems are created equal
Limited resources are a reality. Detailed visibility into every single system is often not feasible and could be too much information.
Simple and effective. Sometimes a simple display can convey all of the information necessary to manage critical systems. In cars it is important to know your speed, how much fuel is in the tank, and whether or not you need to change your oil/check your engine.
Where to begin?! Specialized information is sometimes necessary, but it can be difficult to navigate.
Take advantage of a business impact analysis to define and understand your critical services
Ideally, downtime would be minimal. In reality, though, downtime is a part of IT life. It is important to have realistic expectations about its nature and likelihood.
STEP 1 |
STEP 2 |
STEP 3 |
STEP 4 |
STEP 5 |
---|---|---|---|---|
Record applications and dependencies Utilize your asset management records and document the applications and systems that IT is responsible for managing and recovering during a disaster. |
Define impact scoring scale Ensure an objective analysis of application criticality by establishing a business impact scale that applies to all applications. |
Estimate impact of downtime Leverage the scoring criteria from the previous step and establish an estimated impact of downtime for each application. |
Identify desired RTO and RPO Define what the RTOs/RPOs should be based on the impact of a business interruption and the tolerance for downtime and data loss. |
Determine current RTO/RPO Conduct tabletop planning and create a flowchart of your current capabilities. Compare your current state to the desired state from the previous step. |
Info-Tech Insight
According to end users, every system is critical and downtime is intolerable. Of course, once they see how much totally eliminating downtime can cost, they might change their tune. It is important to have this discussion to separate the critical from the less critical – but still important – services.
Establish visibility into critical systems
You may have seen “If you can’t measure it, you can’t manage it” or a variation thereof floating around the internet. This adage is consumable and makes sense…doesn’t it?
"It is wrong to suppose that if you can’t measure it, you can’t manage it – a costly myth."
– W. Edwards Deming, statistician and management consultant, author of The New Economics
While it is true that total monitoring is not absolutely necessary for management, when it comes to availability and capacity – objectively quantifiable service characteristics – a monitoring strategy is unavoidable. Capturing fluctuations in demand, and adjusting for those fluctuations, is among the most important functions of a capacity manager, even if hovering over employees with a stopwatch is poor management.
Solicit needs from line of business managers
Unless you head the world’s most involved IT department (kudos if you do) you’re going to have to determine your needs from the business.
Do |
Do not |
---|---|
✓ Develop a positive relationship with business leaders responsible for making decisions. ✓ Make yourself aware of ongoing and upcoming projects. ✓ Develop expertise in organization-specific technology. ✓ Make the business aware of your expenses through chargebacks or showbacks. ✓ Use your understanding of business projects to predict business needs; do not rely on business leaders’ technical requests alone. |
X Be reactive. X Accept capacity/availability demands uncritically. X Ask line of business managers for specific computing requirements unless they have the technical expertise to make informed judgments. X Treat IT as an opaque entity where requests go in and services come out (this can lead to irresponsible requests). |
Demand: manage or be managed
You might think you can get away with uncritically accepting your users’ demands, but this is not best practice. If you provide it, they will use it.
The company meeting
“I don’t need this much RAM,” the application developer said, implausibly. Titters wafted above the assembled crowd as her IT colleagues muttered their surprise. Heads shook, eyes widened. In fact, as she sat pondering her utterance, the developer wasn’t so sure she believed it herself. Noticing her consternation, the infrastructure manager cut in and offered the RAM anyway, forestalling the inevitable crisis that occurs when seismic internal shifts rock fragile self-conceptions. Until next time, he thought.
"Work expands as to fill the resources available for its completion…"
– C. Northcote Parkinson, quoted in Klimek et al.
Combine historical data with the needs you’ve solicited to holistically project your future needs
Predicting the future is difficult, but when it comes to capacity management, foresight is necessary.
Critical inputs
In order to project your future needs, the following inputs are necessary.
- Usage trends: While it is true that past performance is no indication of future demand, trends are still a good way to validate requests from the business.
- Line of business requests: An understanding of the projects the business has in the pipes is important for projecting future demand.
- Institutional knowledge: Read between the lines. As experts on information technology, the IT department is well-equipped to translate needs into requirements.