Turning disc by Toshiyuki IMAI.
With most enterprise arrays now providing the ability to allocate more storage than is physically installed and with data moving transparently between tiers of low cost and high performance with no human intervention – what could possibly go wrong?
For a long time now SAN infrastructure has been at the core of the Data Centre satisfying a need to hoard as much data as could possibly be collected. With the required capacity growing at an exponential rate, it has become common to predict massive data growth when designing services, often calculating growth for two or more years. This scenario leads to large amounts of storage being purchase in anticipation of things to come, by the time two years has passed, some of the attached systems will become legacy as they are superseded by shiny new solutions and some of the systems will just not have grown as expected. Even those systems that have behaved as designed are probably using no more than 70% of the storage as likely as not due to a “fudge factor” arbitrarily thrown into the mix by the architect.
Virtualised storage arrays (each in it’s own way it would seem) allow us, within reason, to allocate as many devices of whatever size we like to as many hosts as we see fit with only a small pool of physical disk. Allowing us to satisfy our storage hungry analysts whilst buying the disk in a phased manner and delaying spend until it is necessary. Add into this the reducing cost per Gigabyte over time and the benefits mount up.
At the same time, the array architecture has developed in such a way that it is harder to facilitate small amounts of physical growth. With larger RAID groups becoming more common and larger disk sizes compounding the issue, traditional SAN allocations became inflexible and for a small requirement, potentially expensive. The smallest building block becomes a RAID Group and for a one Terabyte requirement on a traditional storage array might require the business to grow an array by thirty Terabytes or more. With virtualisation comes the ability to create more devices for allocation without having to have physical storage installed to match. There is no longer a need to grow a system until the written data exceeds thresholds and the larger building blocks are far more palatable when they are shared between all attached systems.
As with everything in life, there is always a flip side. The balance in this case is increased risk; what if the physical pool fills up. At the very least, writes will have to stop but in some cases I/O stops altogether, neither scenario is one that a business will want to contemplate. This is where the need for careful planning and monitoring is essential.
In a simple virtualised environment, physical storage will be configured into Storage Pools and a number of virtual devices created and attached to these pools. As data is written to the logical device it will be written across the physical disks making up the storage pool.
Considerations when designing virtual solutions:
The number of servers that will be connected – To provision for sufficient connectivity and I/O capability. It is often not available but any additional I/O profiling for attached systems will also be useful.
The average server allocation size and utilisation – To calculate required space for allocation and physical writes.
The maximum allocation size – Large allocations could result in a single server having the ability to fill all available pool space.
Maturity of the service – Mature systems will require larger initial space with slower growth; new systems may start small and grow into the allocated space over an extended period.
Performance – Is there sufficient throughput for the attached servers?
Criticality – There is an associated risk with over provisioning. There are ways to mitigate this and in some cases systems will be too critical to the business to accept any additional risk.
Mixed Production, Test and development environments – Will the different environments share the same pools or should they be ring fenced?
Alerting – What level of alerting is configured? At what level of utilisation do warnings start? Can unexpected write bursts from a single application be identified and highlighted?
Capacity for growth – Can the pool be quickly expanded?
Time to deploy – How quickly can the vendor react to increase the Pool size.
Plan B – Can space be redeployed to production systems?
Cost reduction per GB – Flexibility may be cited as a benefit but the main driver for virtualised storage is to drive the cost per GB down, on the most part this is achieved by deferred purchase and tiered storage.
With thin provisioning, monitoring is essential to maintaining a low risk environment.
As a minimum the capacities must be monitored on a regular basis depending on the activity in the environment, the attitude to risk and the subscription and utilisation figures. In most large environments daily should be sufficient or even weekly.
We have found the following capacity metrics to be the most useful and collect on a weekday basis so that we can produce trend analysis to support future forecasts.
For each pool on each array we collect:
- Installed storage
- Virtual TB
- Nett subscription
Experience shows that the larger environments have a very linear growth both in allocated and written storage and the trending figures provide quite accurate forecast estimates.
Performance figures should also be reviewed but a less regular basis is sufficient – weekly or even monthly.
With virtualisation improving the overall utilisation and ever larger physical disks being installed we see as a result that the I/O profile per disk is also increasing. These performance trends should be monitored and reviewed to anticipate thresholds being reached.
As a result of the increased utilisation and larger physical disks, we are also seeing the number of host connections to front-end ports (fan-out ratio) and the amount of storage allocated to front-end ports both increasing. The host ports should be monitored in the same way as the disks to anticipate performance thresholds being reached.
There is no point increasing the size of the pools to accommodate further allocations if the path to the data is already fully loaded.
This performance monitoring is specific to the capacity planning and does not replace the normal daily performance monitoring and alerting on the storage arrays.
What can be done to prepare for emergencies?
Most virtualised storage solutions have the functionality to preallocate storage. For systems that cannot afford any outage the storage can be preallocated in advance.
Understand the process with the vendors to increase the installed physical capacity.
- How long for the vendor to install new storage from the request?
- Can elements (such as procurement) be prepared in advance of the need or post deployment to avoid delaying the process?
- Can hardware be left on site to reduce hardware deployment timescales?
- Are sufficient resources available to accept additional disk?
- Free drive bays
- Floor space
Understand the environment.
- What systems are attached that could be sacrificed to maintain critical services?
- Are there device replicas in the pool for non-production purposes that could be released?
These observations are based on simple virtualised pools, most hold true for tiered storage environments but these also bring their own sets of concerns and even more metrics to follow.
Please Rate and Like this blog. If you can, please leave a Comment.