BusinessCloudSummit09-ZachNelson-NetSuiteOn Wednesday last week I attended this cloud computing event in London – it was a very long but interesting day. A few key observations for me were that

  • The focus was mainly on SaaS with some discussion of PaaS – much less discussion of IaaS which surprised me, but I guess was partially due to the business-focus of the event. None of the speakers or panel contributors were from Amazon, Rackspace etc for example.
  • Also, any consulting companies were pretty invisible also (except Smart421 of course!) – but as one of the panel discussions was relating to consultancy in a cloud computing era I was surprised that the panel was made up of vendors rather than consulting companies. Having said that, the event sponsors were primarily SaaS providers (Salesforce.com, NetSuite) and so this probably skewed things somewhat.
  • The general view on the classic “data security” challenge from the vendors was that the concern has been addressed, but that the convincing customers of this still had some way to go.
  • A key recurring theme was that the success or otherwise of cloud-based implementations was largely determined by the effectiveness of the user community engagement/change management process – and that this was true before cloud computing and just as true now. My summary – regardless of the technology-du-our, get the basics right.
  • Disappointingly, there was no mention of Enterprise Architecture (EA) at all during the day – and yet really many of the sessions were about how cloud computing brings new architectural choices to the CIO/CTO’s table, and so for me this is a classic EA opportunity. This just shows how much mind-share the discipline of EA has in the wider IT and business communities – not very much I fear.

BusinessCloudSummit09-RegistrationQueueThe day started with a classic demonstration of poor elasticity and scalability – when the registration process crashed and burned under the pressure of the number of delegates, as you can see from the photo. I gave up on my free bag of goodies. This means that things started a bit behind schedule, but the organisers recovered the timing well and got things back on track during the morning sessions.

From a PaaS perspective, the only platform that really got any airtime was Force.com and from this event you’d think it was the only credible game in town.

There was a really interesting presentation on the G-cloud from Martin Bellamy (from the CIO of the Cabinet Office) – the ambition of the G-cloud project is quite breathtaking in its scale, but then also the fact that data centre consolidation in the public sector is so rare was also pretty scandalous to me (as a tax payer!). Martin explained that current public sector data centre utilisation was around 10% and gave a good justification of why this was so low (e.g. they target a 60% max utilisation for “head room”, plus the 10% figure includes DR/dev/test environments etc) – so the business case for just data centre consolidation using a cloud architecture is very compelling. One thing that encouraged me was that he was proposing incentives to the various departments for reuse, e.g. if you add a reusable asset to the base set of assets (which he referred to as an “app store”) then the department receives some kind of discount. I must admit to wondering how well a centrally funded project can deliver the G-cloud however…given the breadth of the undertaking.

BusinessCloudSummitI’m going to be attending the Business Cloud Summit tomorrow at Lancaster London Hotel – at the moment you could probably spend you entire life flitting between cloud-related events like this, but I picked this one as it looked like it had good sponsorship (and therefore hopefully good attendance), good speakers and also as it had a business emphasis. I wanted to avoid another “this is how you do it on Amazon Web Services” event as these seem to be ten-a-penny and also I can Google that can’t I…

I’ll report back how it goes via this blog…

I’ve been hearing more about web traffic management products recently, and they’ve been of increasing interest to me as they seem to fit nicely with a cloud deployment model – e.g. if you’ve got your SaaS app deployed to your favourite cloud provider so that you can scale out quickly without large upfront capital costs, then you are probably going to need something infront of that deployment architecture to manage the incoming traffic. Hopefully (and this will become more common) your cloud deployment might auto-scale to increase its own capacity on demand, and so your environment will always be able to cope with demand, so you don’t need to manage the incoming traffic – right? Well, I’m not so sure – I can see some barriers to this:

  1. Cloud auto-scaling implies unpredictable costs, and whilst maybe extra traffic = extra cost = extra revenue to offset it, I’d expect most SaaS providers would want to set a cap on costs and so any traffic exceeding that ‘cap’ still needs to be managed gracefully.
  2. Traffic management allows finer grained control over where you deploy your valuable £/$, e.g. favouring response times for customers on the ‘buy’ pages of your app over those looking at ‘about you’. So in times of high traffic you don’t just keeping throwing virtual servers (and therefore cash) at the problem – you get more selective instead.
  3. Finally, web traffic management products give some comfort as a ‘last resort’ in case auto-scaling etc goes wrong or is incorrectly set up etc – or your cloud vendor doesn’t meet their SLA :)

We’ve come across the following products in this space:

  • Zeus ZXTM software appliance – I intend to delve into Zeus ZXTM in a little more detail in another blog post at some point…as it is a software appliance (as opposed to the next two products) it fits a public cloud model better.
  • Big IP F5
  • IBM WebSphere DataPower – this is not in exactly the same market as the other two products, but is capable of fulfilling this role (plus other things) and we have worked with customers to deploy it for this kind of ‘use case’

Maybe one day auto-scaling will be mainstream enough and incremental costs will be low enough that traffic management won’t be required – but we’re not there yet I suspect.

cloud and seaDo the service level agreements (SLAs) offered by public cloud providers hold water? Or are they useless to a customer -and not worth the cyberspace they are written on? We decided to pick the cloud fraternity poster child – Amazon EC2 – review their current SLA offering (which is defined here), and then compare it with ‘traditional’ hosting vendors offerings.

First let’s review the Amazon EC2 SLA. We’re not offering a legal perspective here, but several things strike us as interesting about this:

  • The SLA does provide for some punitive damages in the sense that should the service availability be between 90% and 99.95%, Amazon would still have to pay out 10% of your payments to them.
  • If EC2 was only available 60% of the time, their liability would be limited to 10%.
  • In no way are the damages that Amazon pay out linked to your potential losses. If Amazon had a major outage that could cost you millions, they might only pay out a few thousand in compensation.
  • The definition of ‘unavailable’ that they use here could potentially allow you to claim service credits even when your application is functioning perfectly well if we’re reading the SLA correctly, because ‘region unavailable’ appears to mean that one (not all) of the availability zones within a region is down. For example, if there are two availability zones within your region – your application could be running in both of these zones, such that it fails over between them. You’d get credit if a zone went down even in the event of the application failing over to the backup zone – even though your app is still ‘up.
  • Availability is measured in 5 minute blocks, which implies that they exclude all 4 minute downtimes. 4 minutes is a long time for a critical system to be down.
  • The other angle on this is that the advantage of using EC2 is that it gives you much easier ways to recover from any outage that does occur (provided it doesn’t bring down the whole region, when things get a little trickier).

Having said all of that, how different is this arrangement from what traditional hosting providers offer?

The view from our Head of Service Management is that it looks fairly standard – the hosting providers we work with have similar offerings. One of them offers up to 50% off the monthly fee for poor availability which sounds great but when you read the small print it’s measured in quarterly periods so they can get away with a bad month so long as they have two good months in the same quarter. Amazon’s small print doesn’t look too pleasant from an admin point of view as you must claim within 30 days of the last outage, and have to provide evidence of the outage etc.

Service credits are things that procurement and legal departments like but IT departments can’t be bothered with them, as they cause too much hassle to claim back a small sum that doesn’t really hurt the supplier and doesn’t compensate the customer for the real loss. We have come across some companies who offer a service whereby they take 100% of the service fee if they meet service levels i.e. you don’t have to prove it or claim it, which is obviously more attractive.

Our conclusion is that Amazon have paid lip service to service credits, and have probably done enough to satisfy procurement/legal requirements but nothing to shout about.

Thanks are due to my colleagues Paul Russell and Andy Budd for their input on this material…

Further reading on this subject…

Following on from the previous post, here are some more random notes about CloudBurst – pros, cons, features, limitations etc.

One thing that occurred to me was “Why is cloudburst a hardware appliance?” – it could just be a software appliance…well the key reason it seems to me is that it holds lots of OS/WAS admin passwords etc and so the ‘hardening’ of a hardware appliance with tamper-resistance etc is a key feature.

Deployment patterns and deploying

WAS hypervisor edition is actually an OVA image with 4 virtual disks, with multiple WAS profiles set up already – so the hypervisor edition is actually a VM image, rather than a customised WAS code base it seems.

There are patterns for a single server deployment, small cluster (3 VMs), large cluster (15 VMs) etc. You can modify a standard pre-packaged VM (e.g. add a fixpack etc) and then ‘capture’ back into CloudBurst as a standard catalogue VM for use in new patterns.

Control is available over whether certain pattern properties (passwords, memory size etc) can be overridden for each instance of that pattern or not.

A key point – keep track of any changes made to your VM patterns (e.g. any tuning done) and then ‘bake’ the in the pattern in CloudBurst so that any future deployments get the changes – otherwise they’ll be lost when you redeploy the pattern.

The first image transfer to each hypervisor can take up to 1 hour to transfer (obviously this is environment dependant)!

IP addresses are allocated on the fly when deploying, i.e. it pulls them from a pool of available IP addresses that the admin user sets up.

Script packages

Post deployment, CloudBurst can run any script, not just wsadmin scripts – essentially it ssh’s over to the VM and uses a zip file and an executable name (e.g. <some_path>/wsadmin.sh) with some arguments (e.g. what JACL file to run). ‘wsadmin’ scripts can be used against the deployment manager VM to install an application (EAR file) into the cloud cluster. Some “wsadmin” scripts are provided out of the box for common tasks – setting up global security etc.

Management facilities
CloudBurst provides a centralised access point to the consoles for each VM/WAS instance.

You can control separate permissions for each user (with LDAP integration), so you can have multiple users using a single CloudBurst box at one time, creating their own ‘private’ patterns etc.

You can use it to control the hypervisors to create snapshots of all the VMs in a particular deployment, so for example you can run some tests and then quickly recover the entire virtual deployment (i.e. all the VMs).

License management/metering etc – seems a pretty limited offering, it relies on admin REST APIs exposed by CloudBurst that are called by something like Tivoli etc.

CloudBurst admin console interface seems v..e..r..y.. slow to respond sometimes.

CloudburstBox530x155We’ve got a bit more under the skin of CloudBurst now and I wanted to post some info that’s not been written by anyone in Marketing…about the realities of the product (good or bad) rather than the saleman’s spin.

So what does it do? Well, in a nutshell it holds VMWare ESX and ESXi VM images of WebSphere Application Server (WAS) on disk and can install them to some IP addresses of your choice at your command and follows them up with some automated WAS admin scripting of your choosing. Some pre-created WAS Hypervisor Edition VM images exist (based upon Novell SLES 10 SP2) or you can create your own and package them, up using OVF (Open Virtualisation Format). There’s no runtime component to the product other than it’s VM management/deployment role, i.e. it relies on WAS XD if you want load balancing etc. There’s more to it than that but that’s the bare bones of it.

So what are the key use cases for CloudBurst – why would someone want one when they can install VM images themselves? Well, the key reason for use is to take the deployment cost out of an IT organisation. The creation of the OVF VM images is still going to be just as easy/traumatic as it was before, but once you’ve got a “pre-canned” environment setup in CloudBurst then you can roll that out repeatedly and with confidence with very little manpower required.

Who would use it? Well, if you get benefit from being able to ‘can a topology’ rather than just putting a single machine image ‘in a can’ then there could be real cost savings and agility/reproducibility benefits from being to roll out a defined topology to your private cloud very quickly and repeatedly. So if your organisation has many projects running throughout the year that need multiple dev, test, pre-prod, prod etc environments created and ripped down all the time, then you’d very quickly get a payback I suspect. It would also make you more likely to kill off an unused environment if you knew you could painlessly recreate it, allowing you to have less capacity needs overall.

The immaturity of the Open Virtualisation Format (OVF v1.0 was only released in March 2009) is a key constraint at the moment and this is an industry-wide issue – it’s early days. A key impact relating to CloudBurst is that each VM image is a bit of a beast at 20Gb minimum (not entirely clear why this is – maybe due to a WAS install being big anyway, due to the way virtual disks are managed in the OVF standard, and the images being uncompressed?). This directly impacts deployment times just due to the sheer volume of data to be shunted around, but it’s not immediately clear to me if this an OVF issue (it does have some compression support) or an issue with the implementation/use of the standard. If deployed more than once to the same hypervisor then deployment times can be accelerated as all this data doesn’t need to be provided a second time. It can take something like 7-15 minutes to deploy a VM image.

There are two key design approaches when creating your VM images (the best approach is probably a mixture of the two):

  • use a ’stock’ WAS image and then customise it with your coifing/settings and your EAR files installed etc, and create a new OVF image for deployment
  • use a ’stock’ WAS image and then do the customisation post-deployment using WAS admin scripting

So where’s it going from here…? Well support for Xen-based VM images must be likely as this is crucial for anyone who is going to ultimately deploy into Amazon EC2. Portal Server is already available on EC2 on a pay-as-you-go basis and WAS is coming. Also, it’ll be interesting to see if IBM support non-WAS images in later versions of the product.

woman holding cashOne trend that we’ve noticed is that organisations are typically very poor at organising themselves to create good cross-charging schemes for the supply of internal IT infrastructure or IT shared services, and often only have a very coarse-grained view of what the provision of these services really cost. At an individual change project level, this makes it impossible to make educated judgements about the likely ‘run’ cost of a solution, and so the architectural trade-offs that have to be made must be sub-optimal.

An explanation I’ve heard in the past for this is that the internal accountants just don’t “get it”, but that’s never rung quite true with me. I read an interesting suggestion on the cloud computing Google group for another reason – that if internal IT managers did accurately define the costs of internal shared IT service provision then they would be opening themselves up for direct comparison with external providers of equivalent services, so it’s a defensive mechanism basically. They don’t want a Cx0 coming to them saying “OK – you change me £x per service call, but I can get them for £y from [substitute your favourite cloud/SaaS etc offering here]“…

This must be one of the most over-blogged topics in the known universe – but this subject keeps coming up in discussions both internally with in Smart421 and externally, so I thought I’d post a summary for reference purposes…

Pros

  • Agility, e.g. don’t have to wait for your infrastructure team to try something out on a dev server, or get a trial of a SaaS app up and running etc
  • Elastic – can have 20 servers up and running inside an hour and then turn them off 2 hours later after a performance run
  • Scale – scale as your business/demand grows without having to plan in detail
  • Linear cost – Pay for what you use, no upfront investment at all – can even pay by the hour for WebSphere Portal server now on Amazon (WAS coming soon)
  • A way of having a DR/business continuity strategy cost effectively
  • Flexible – want to host in the US? Then change your mind and want to host in Europe? Or both for resilience? Fine with Amazon for example…
  • Access – have access from more locations without putting additional infrastructure in place
  • Control – control your entire deployment from a single browser console (e.g. with Amazon)
  • Greener – maybe…fewer bigger and more efficient data centres rather than everyone having their own
  • Stick to the knitting – do what you do (insurance etc) not running data centres

Cons

  • Loss of control – security etc is in your service providers hands, do you really know they are backing up your data ok? Data separation in a multi-tenancy architecture? SLAs defined? Is the availability really good enough etc? Can you export your data for backup and/or reporting purposes?
  • Regulatory – e.g. the Data Protection Act requires that data is secure and “not transferred to other countries without adequate protection”
  • Maybe more costly if you know your demand (i.e. if elasticity is not a big requirement for you – as you potentially pay a premium for it – someone has to build the data centre…)
  • Limited choice of environments – cannot have a non-standard deployment so easily, best to stick to LAMP etc
  • Very hard/impossible to migrate from one cloud provider to another I suspect at the mo – so a new form of vendor lock-in basically
  • Real risk of business users buying their own SaaS apps without any governance – and creating another generation of silo’d business apps (just this time hosted outside rather than inside!)
  • Performance/latency – not as quick if your components are hosted “not on your doorstep”
  • Extra bandwidth costs incurred
  • Is your cloud provider’s business stable – will they be there is 2 years time?

With all the excitement surrounding cloud computing, I’ve been chewing over what the realities are in an enterprise of deploying into a private cloud environment. Whilst there are obviously great cloud ‘use cases’ like the ability to scale up rapidly to create a performance test environment etc – the rather more mundane fact of corporate life is that many of the deployments are relatively small scale, maybe a few servers at most to support specific and specialist business operations. But we still want to have them hosted inside our corporate private cloud – we want them deployed to a virtualised environment, with fail-over support, higher resource utilisation and managed in the same way as the rest of our IT estate and so on.

But some relatively specialised, industry-specific software packages still require the use of hardware dongles plugged into a USB port in order to enforce license agreements. So then what do you do? Well, you can ‘virtualise away’ the dongle to the extent that there is technology out there to allow VMs to connect over TCP/IP to USB ports that are actually hosted elsewhere, but that still leaves you with a hardware affinity in the data centre, e.g. you couldn’t transparently move a dongle-dependant VM from one data centre to another in the cloud. And it also gives you an additional single point of failure on the infrastructure design that needs to be addressed. Obviously you would hope that vendors would be moving away from using dongles – but some still will for the foreseeable future.

It’s a silly but real example, and the point is that that the vision of everything moving to being deployed to private clouds is subject to these kinds of practical considerations, and the application vendors have a role to play in enabling their products for these kinds of deployment models…