Yesterday at AWS re:Invent, Andy Jassy delivered the main keynote. As you can see from the photo below, the event was immense – the day before I was in the APN Summit so it was AWS partners only, and that felt big.

1384361167005But this was 9,000 attendees from 57 countries in a room. The photo doesn’t really capture the epic scale – which struck me as kinda like a metaphor for AWS itself, i.e. the scale of the administrative operation was off the chart, it was all very efficiently managed, and it gets bigger every year!

I thought it was interesting that they didn’t even “save up” the recent announcement about the 10% price reduction for M3 EC2 instances that was announced on 5th November for re:Invent. To me, this just shows how baked into the business model these regular price reductions have become.

In content terms, the three main new announcements were:

  • Amazon CloudTrail – the ability to log all AWS API calls to S3 for audit and compliance purposes. This is a nice feature that we’ve asked for before, but actually hasn’t been too much of a barrier to customer adoption previously, probably because we are typically managing the entire AWS layer for a customer anyway.
  • Amazon WorkSpaces – virtual desktops-as-a-service. Interestingly desktop “state” is maintained as you move between access devices, e.g. from laptop to tablet. We’re deployed virtual desktops in AWS for a number of customer projects – either desktops for key users in a Disaster Recovery scenario, or for developers who are located around the world and need a consistent desktop with known applications installed etc in order to access AWS-hosted dev and test environments. So I can see us using this new feature in future projects as I suspect the cost model in terms of the saved installation/build/ongoing patching effort of putting in a bunch of Windows Remote Desktop servers.
  • Amazon AppStream – HD quality video generation and streaming across multiple device types. This is related to another announcement that was made on 5th Nov – the new g2.2xlarge instance type, which has the GPU grunt to enables the creation of 3D applications that run in the cloud and deliver high performance 3D graphics to mobile devices, TVs etc.

Weirdly being at the event you get less time to look into these new product announcements and so you probably have less detail than if you were just reading about it on the web – after the keynote it was straight into a bunch of technical sessions.

I mainly focused on the data analytics sessions. First off, I got to hear about what NASA have been doing with data visualisation – I think all attendees expected to hear about exciting interstellar data visualisations, but it was actually about much more mundane visualisations of skills management, recruitment trends etc – and this in fact made it much more applicable to the audience’s typical use cases as well. There were some great takeaways about how to maximise your chance of success which I need to write-up at some point…

I then attended an excellent deep dive on Amazon Elastic MapReduce (EMR) – this covered Hadoop tuning and optimisation, architecture choices and how they impact costs, dynamically scaling clusters, when the use S3 and when to HDFS for storage, instance sizes to use and how to design the cluster size for a specific workload.

This was followed by some customer technical overviews of their use of RedShift. They had all migrated to RedShift from either a SQL or NoSQL architecture. For example, Desk.com have deployed two RedShift clusters in order to isolate read from write workloads, but I felt that they had been forced to put considerable effort into building a proxy in front of RedShift to optimise performance – fundamentally as RedShift is limited to 15 concurrent queries and for their reporting workload, they are not in control of the peaks in their user’s demand for reports. So they’ve implemented their own query queuing and throttling mechanism, which sounds like a whole heap of technical and tricky non-differentiating work to me. A key takeaway from this session for me though was that the price-performance characteristic of RedShift had really worked for these customers, and given them the ability to scale at a cost that they just could not before. They were all achieving very high data ingress rates by batching up their data inserts and loading direct from S3.

The final session I attended was about a Mechanical Turk use case from InfoScout. Mechanical Turk is an intriguing service as it’s so different to the other AWS offerings – in fact it’s not a service at all really although it exposes a bunch of APIs – it’s a marketplace. Classic Mechanical Turk use cases include translation, transcription, sentiment analysis, search engine algorithm validation etc, but InfoScout’s need was for data cleaning and capture following an automated by fallible OCR process – capturing the data from pictures of shopping receipts taken on smart phones. The main takeaway for me was about how they manage quality control – i.e. how do you know and therefore tune and optimise the quality of the results you get from the workers executing your HITs? InfoScout use two quality control strategies:

  • Known answers – in a batch of receipt images that is handled by a Mechanical Turk worker, they inject a “known” receipt and compare the data captured with the known data on that receipt. This technique is good for clear yes/no quality checks, e.g. is this receipt from Walmart. This allows them to compute a metric for each worker as to how likely their other receipts have been accurately processed.
  • Plurality – send unprocessed receipt to more than one worker and see how consistent the returned results are. InfoScout use this to build a confidence score based upon this and other factors such as worker tenure etc.

The final event of the day was the re:invent pub crawl around 16 of the coolest bars in The Venetian and The Palazzo hotels. I’m guessing I don’t need to tell you much about that event, other than it started with sangria… :)

Tough but someone has to do it...

Tough, but someone has to do it…

I think anyone working in technology or digital industries can’t have helped but notice the hype around the API economy – it’s everywhere! You only have to look at the success that Amazon has had building a multi-billion dollar business with Amazon Web Services by providing a powerful API-based elements such as EC2.

SalesForce.com, Twitter and Google Maps have also benefited from opening up their APIs for others to consume and enhance and grow their business.

“So what if everyone is doing it, why should I?” – I think the question to ask is “What do I miss if I don’t take part in the API explosion?”.

Regardless of whether you’re a new or small business or a multi-national corporation the explosion of the API economy offers benefits for your business!

Whether this is by you discovering and consuming APIs that have already been developed so you don’t have to, providing key capabilities to enable your business, or providing you the opportunity to expose your APIs and turn them into a new revenue stream this is all within your reach!

API Management WordleFocussing on exposing and monetizing your APIs – how can you become part of this? Well, there are numerous API Management Solutions that are available in the market today (3Scale, Apigee, APIfy, Layer7, Mashery to name but a few) all with unique benefits with the solutions they offer. The choice of which API Management Solution to go with is one that anyone embarking on joining the API economy will face and the options and combinations available are numerous. How will people discover my APIs? How will you control access to your APIs? How will developers consume / try my APIs? What authentication is needed? Do you want and on-premise or cloud based solution?  How can you scale to meet the growing demand on your services?

These are just some of questions that you will need to address when thinking about your API strategy (regardless of if you are exposing the APIs internally within your business or to external consumers). At Smart421 we recognise that these are significant challenges and decisions that are needed in order to implement the right solution; we have looked at many of the API Management Solutions available and, given our expertise in systems integration, we are perfectly placed to help you with choosing and implementing the right solution to help you join the API economy explosion!

Please share using the social icons below or by using the short URL http://bit.ly/168BV01

Please Rate and Like this blog.  Our readers what to know YOUR opinion so please leave a Comment.

I was doing some Hadoop demo work last week for a customer and mainly just because I could, I used spot instances to host my Hadoop/pig cluster using AWS’s Elastic MapReduce (EMR) offering.  I thought I’d have a quick look at what the resulting costs were over the few hours I was using it.  I used a combination of small and large instances in the US-East region – m1.small for the master node and m1.large for the core nodes.  Note – these costs exclude the PaaS-cost uplift for using EMR (another 6 cents per hour for a large instance).

In summary – it’s dirt cheap….

AWS Spot Price Analysis

What is more revealing is to look at this in terms of the % of the on-demand price that this represents…

AWS Spot Price Analysis Saving

So in summary, around an average saving of 90% on the on-demand price!  This is probably influenced by the fact that I was running the cluster mainly during the time when the US are offline.  We tend to get a bit fixated on the headline EC2 cost reductions that have frequently occurred over the last few years, and the general “race to the bottom” of on-demand instance pricing between AWS, Google, Microsoft etc.  Obviously not all workloads are suitable for spot pricing, but what I did here was deliberately bid high (at the on-demand price for each instance type in fact) knowing that this would mean that I was very unlikely to get booted off the instances as anyone bid higher if capacity got short.  As EC2 instance costs are so low anyway, we tend to not worry too much about optimising costs by using spot pricing for many non-business critical uses – which is a bit lazy really and we could all exploit this more.  Let’s do that!

ImageIf you have any experience of supporting large scale infrastructures, whether they are based on ‘old school’ tin and wires, virtual machines or cloud based technologies you will know that it is important to be able to create consistently repeatable platform builds. This includes ensuring that the network infrastructure, ‘server hardware’, operating systems and applications are installed and configured the same way each time.

Historically this would have been achieved via the use the same hardware, scripted operating system installs and in the Windows application world of my past the use of application packagers and installers such as Microsoft Systems Management Server.

With the advent of cloud computing the requirements for consistency are still present and just as relevant. However the methods and tools used to create cloud infrastructures are now much more akin to application code than the shell script / batch job methods of the past (although some of those skills are still needed). The skills needed to support this are really a mix of both development and sys-ops and have led to the creation of Dev-Ops as a role in its own right.

Recently along with one of my colleagues I was asked to carry out some work to create a new AWS based environment for one of our customers. The requirements for the environment were that it needed to be:

  • Consistent
  • Repeatable and quick to provision
  • Scalable (the same base architecture needed to be used for development, test and production just with differing numbers of server instances)
  • Running Centos 6.3
  • Running Fuse ESB and MySQL

To create the environment we decided to use a combination of AWS CloudFormation to provision the infrastructure and Opscode Chef to carry out the installation of application software, I focussed primarily on the CloudFormation templates while my colleague pulled together the required Chef recipes.

Fortunately we had recently had a CloudFormation training day delivered by our AWS Partner Solutions Architect so I wasn’t entering the creation of the scripts cold, as at first the JSON syntax and number of things you can do with CloudFormation can be a little daunting.

To help with script creation and understanding I would recommend the following:

For the environment we were creating the infrastructure requirements were:

  • VPC based
  • 5 subnets
    • Public Web – To hold web server instances
    • Public Secure – To hold bastion instances for admin access
    • Public Access – To hold any NAT instances needed for private subnets
    • Private App – To hold application instances
    • Private Data – To hold database instances
    • ELB
      • External – Web server balancing
      • Internal – Application server balancing
      • Security
        • Port restrictions between all subnets (i.e. public secure can only see SSH on app servers)

To provision this I decided that rather than one large CloudFormation template I would split the environment into a number of smaller templates:

  • VPC Template – This created the VPC, Subnets, NAT and Bastion instances
  • Security Template – This created the Security Groups between the subnets
  • Instance Templates – These created the required instance types and numbers in each subnet

This then allowed us to swap out different Instance Templates depending on the environment we were creating for (i.e development could have single instances in each subnet whereas Test could have ELB balanced pairs or production could use features such as auto-scaling).

I won’t go into the details of the VPC and Security Templates here, suffice it to say that with the multiple template approach the outputs from the creation of one stack were used as the inputs to the next.

For the Instance Templates the requirement was that the instances would be running Centos 6.3 and that we would use Chef to deploy the required application components onto them. When I started looking in to how we would set the instances up do this I found that the examples available for Centos and CloudFormation were extremely limited compared to Ubuntu or Windows. As this is the case I would recommend working from a combination of the Opscode guide to installing Chef on Centos and AWS’s documentation on Integrating AWS with Opscode Chef.

Along the way to producing the finished script there were a number of lessons which I will share with you to help with your installation, the first of these was the need to use a Centos.org AMI from the AWS Marketplace. After identifying the required AMI I tried running up a test template to see what happens before signing up for it in the Marketplace, in CloudFormation this failed with an error of ‘AccessDenied. User doesn’t have permission to call ec2::RunInstances’ which was slightly misleading. Once I’d signed our account up for the AMI then this was cured.

The next problem I encountered was really one of my own making / understanding. When looking at AMIs to use I made sure that we had picked one that was Cloud-Init enabled, in my simplistic view I thought that this meant that commands such as cfn-init that are used within CloudFormation to carry out CloudFormation specific tasks would already be present. This wasn’t the case as the cfn- commands are part of a separate bootstrap installer that needs to be included in the UserData Section of the template (see below):

"UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [
 "#!/bin/bash -v\n",
 "function error_exit\n",
 "{\n",
 " cfn-signal -e 1 -r \"$1\" '", { "Ref" : "ResFuseClientWaitHandle" }, "'\n",
 " exit 1\n",
 "}\n",<br /> "# Install the CloudFormation tools and call init\n",
 "# Note do not remove this bit\n",<br /> "easy_install https://s3.amazonaws.com/cloudformation-examples/aws-cfn-bootstrap-latest.tar.gz\n",
 "cfn-init --region ", { "Ref" : "AWS::Region" },
 " -s ", { "Ref" : "AWS::StackName" }, " -r ResInstanceFuse ",
 " --access-key ", { "Ref" : "ResAccessKey" },
 " --secret-key ", { "Fn::GetAtt" : [ "ResAccessKey", "SecretAccessKey" ]},
 " -c set1",
 " || error_exit 'Failed to run cfn-init'\n",
 "# End of CloudFormation Install and init\n", 
 "# Make the Chef log folder\n",
 "mkdir /etc/chef/logs\n",
 "# Try starting the Chef client\n",
 "chef-client -j /etc/chef/roles.json --logfile /etc/chef/logs/chef.log &gt; /tmp/initialize_chef_client.log 2&gt;&amp;1 || error_exit 'Failed to initialise chef client' \n",
 "# Signal success\n",
 "cfn-signal -e $? -r 'Fuse Server configuration' '", { "Ref" : "ResFuseClientWaitHandle" }, "'\n"
]]}}

As the cfn-signal which comes as part of the bootstrap installer is used for messaging to any wait handlers defined in the template this can lead to long breaks at the coffee machine before any feedback is received if they are not present.

The final lesson was how to deploy the Chef Client and configuration to the instances. Chef is a rubygems package, so needs this and supporting packages present on the instance before it can be installed. Within CloudFormation packages can be installed via the use of the packages configuration sections of AWS::CloudFormation::Init which for Linux supports rpm, yum and rubygems installers. Unfortunately for the AMI we chose to use the available repositories didn’t contain all packages necessary for our build, to get around this I had to rpm on the rbel repository definitions before using a combination of yum and rubygems to install Chef:

"packages" : {
 "rpm" : {
 "rbel" : "http://rbel.frameos.org/rbel6"
 },
 "yum" : {
 "ruby" : [],
 "ruby-devel" : [],
 "ruby-ri" : [],
 "ruby-rdoc" : [],
 "gcc" : [],
 "gcc-c++" : [],
 "automake" : [],
 "autoconf" : [],
 "make" : [],
 "curl" : [],
 "dmidecode" : [],
 "rubygems" : []
 },
 "rubygems" : {
 "chef" : [] 
 }
}

Once Chef was installed the next job was to create the Chef configuration files and validation key on the instance. This was carried out using the “files” options within AWS::CloudFormation::Init:

"files" : {
 "/etc/chef/client.rb" : 
 "content" : { "Fn::Join" : ["", [
 "log_level :info", "\n", "log_location STDOUT", "\n",
 "chef_server_url '", { "Ref" : "ParChefServerUrl" }, "'", "\n",
 "validation_key \"/etc/chef/chef-validator.pem\n",
 "validation_client_name '", { "Ref" : "ParChefValidatorName" }, "'", "\n"
 ]]}, 
 "mode" : "000644",
 "owner" : "root",
 "group" : "root"
 },
 "/etc/chef/roles.json" : {
 "content" : { 
 "run_list" : [ "role[esb]" ]
 },
 "mode" : "000644",
 "owner" : "root",
 "group" : "root"
 },
 "/etc/chef/chef-validator.pem" : {
 "source" : { "Fn::Join" : ["", [{ "Ref" : "ParChefKeyBucket" }, { "Ref" : "ParChefValidatorName" }, ".pem"]]},
 "mode" : "000644",
 "owner" : "root",
 "group" : "root",
 "authentication" : "S3Access"
 }
}

The hardest part of this was the validation key, as we had multiple instances wanting to use the same key we decided to place this within an S3 bucket and pull the key down. During the script creation I tried multiple ways of doing this, such as using S3Cmd (which needed another repository and set of configuration to run) but found that using the files section worked best.

Once Chef was installed the client was started via the UserData section (basically a shell script), this then handed control of what additional software and configuration is installed on the instance to the Chef Master. How much Chef does at this stage is a bit of a balancing act as the wait handler within the template will fail the stack creation if its timeout period is exceeded.

As you can probably tell if you have got this far, the creation of the templates took quite a few iterations to get right as I learnt more about CloudFormation. When debugging what is going on it is worth remembering that you should always set the stack to not rollback on failure. This then allows you to access the instances created to find out where they got to within the install, as the UserData section is basically a shell script with some CloudFormation hooks, more times than not the faults are likely to be the same as you would see on a standard non-AWS Linux install. Also for a Centos install remember that the contents of /var/log are your friend as both cloud-init and cfn-init create log files here for debugging purposes.

After watching Werner Vogels keynote speech from AWS Re:Invent it’s clear that treating infrastructure as a programmable resource (i.e. using technologies such as CloudFormation and Chef) is somewhere organisations need to be moving towards and based on my experience so far I will be recommending using this approach on all future AWS environments we get involved with, even the small ones.

Whilst there is a bit of a learning curve the benefits of repeatable builds, known configuration and the ability to source control infrastructure far outweigh any shortcomings, such as granular template validation which I’m sure will come with time.

If you have any comments or want to know more please get in touch.

Orthographic Projection of South AmericaAWS have been busy launching new regions around the world at an amazing rate recently – Sydney, Sao Paulo etc – and there’s still gossip of a second European region to come. It’s fair to say that Smart421′s business in South America is still in its growth phase :), so I hadn’t paid a huge amount of attention to the instance costs for the South America region. As you’d expect Smart421 tend to use EU-West and US-East for all our customer and internal AWS work. I happened to be checking out the costs today for Elastic MapReduce (EMR – AWS’s Hadoop-as-a-service offering) and had a quick snout around to compare EMR costs across regions, so I stumbled across the Sao Paulo EC2 pricing.

In short – it’s significantly more than all the other regions. A standard on-demand large instance is $0.26/hr in US-East but a whopping $0.46/hr in Sao Paulo – that’s 77% more. Now I’m used to regional price variations as power, tax etc are different in different territories (when will the EU-West region drop to match US-East prices eh?), but that’s a lot. That creates a pretty significant incentive to still use the US services, latency and other similar considerations put to one side of course. Also, I wonder if it also broadens the opportunity for 3rd parties to offer cloud brokerage services (a market I’ve been rather skeptical about up until now due to the barriers to workload mobility) that automatically port compute workloads between regions for a percentage of any cost savings made.

Looks like cost harmonisation via the the globalisation of IT still has some way to go then. Ouch!

Thursday was a good day! I was heading off to the AWS Tech Summit for Developers & Architects in London with a few of my colleagues from Smart421, which I was looking forward to especially as I have a keen interest in Cloud Computing and given Smart421′s gaining AWS Solution Provider status in 2010, attending was a real win win for both myself and for Smart, and to top it all off there was promise of free Guinness at the end to celebrate St Patrick’s Day!

Iain Gavin, AWS

Iain Gavin, UK Country Manager at AWS. Projected on screen from left: Richard Holland – Operations and Delivery Director, Eagle Genomics; Richard Churchill – Technical Director, Servicetick; Francis Barton – IT Development Manager, Costcutter.

 Doors opened at the conference at 12pm and it was clear to see how popular the event was as the entrance hall was packed, Iain Gavin (AWS UK Account Manager) confirmed that there were over 380 attendees at the conference which was much higher than expected (usually events have 50% drop out) which I think demonstrates the industry’s growing interest and adoption of Cloud Computing.

Whilst we were all trying to find seats in the hall, I couldn’t help but notice a couple of key points that were rolling across the  screen that “262 billion objects are hosted in S3″ and that “S3 handles 200,000 transactions per second” both of which took me by surprise a little, if truth be told, as whilst a lot of people are “nearing the waters edge” I hadn’t appreciated that there were that many  “dipping their toes in the water”.

Anyhow, first up of the speakers was Matt Woods, one of AWS’s evangelists (the UK version of Jeff Barr), who covered off a number of the recent changes and releases to AWS such as VPC supporting NAT addressing, the launch of Cloud Formation (which allows users to spin up an entire stack – which I can see being very useful for the launch of environments ), S3 websites for supporting static web sites and offering the great SLA’s and resilience of S3, though you may need to use Cloudberry Explorer or CyberDuck to help with this as not all of these features are available via the AWS Management Console. Despite being told that they weren’t going to repeat what we had heard before, unfortunately that was what we got!

Following on from Matt Woods, was Francis Barton (IT Development Manager of CostCutter) an AWS client who talked about their experience of using AWS after quite literally stumbling across this whilst looking at Elastic Bamboo as their CI tool! After looking at what it had to offer, they could see the potential this had for helping their company that works on fine margins a way to offer such a highly scalable solution with great resilience for a reasonable cost. they have managed to move their system of Oracle App Server onto AWS EC2 instance running Apache TomCat with relative ease and have made great use of SQS and SNS as part of the solution.  That said he did mention they had hit a couple of issues (which was nice a refreshing to hear about some of the real world experience) around JPA caching and connection pool fail overs with RDS, but all things they are working to resolve.

Next up was Richard Churchill (Technical Director of ServiceTick) who has developed a software solution SessionCam that records a customer’s interaction with your website, capturing loads of information, such as the actual images the customer saw, the behaviour and navigation of the customer around the site. As you can imagine, SessionCam is capturing mass of data as sending this back asynchronously, it is easy to see why they have over 450 EC2 instances running! They to are using SQS at the core of the application and taking advantage of the autoscaling offered and found that the stats you can get for SQS very useful. The other key point (I thought it was key at least) was that they had found that utilising more Micro instances had yielded a far better return on investment than using larger instances (would be great to get some stats on this from AWS’s perspective) but guess it all comes down to your application design and architecture in the end.

The final AWS customer speaker was Richard Holland (Operations and Delivery Director of Eagle Genomics), a Bio-Informatics company based in Cambridgeshire who are using AWS for exposing data to their clients for analysis, ended up being rushed do to the over run from the previous sessions, but touched on how they had used Zeus to obtain better and more intelligent load-balancing. I think the item that he touched on though that got most peoples attention that was given the sensitivity of the data they hold for their clients they had engaged AT&T and Cognizant to complete ethical hacks on AWS, both of whom failed in the attacks – something that I will be looking into more deeply as this is something comes up repeatedly when discussing the Cloud and security within it (my colleague Joseph picked up on this very point in his blog posted earlier today). See slide 12 of 13 on Richard’s slide deck, available on Slideshare.

After a quick break, we all reconvened and started the afternoon technical sessions with Matt Woods giving a presentation on “High-Availability Websites: Build Better Websites with a Concert of AWS Products” covering things such as patterns for availability and utilising S3 for Asset Hosting, and using S3 websites for hosting dynamic (client side) websites. He also covered using CloudFront for Global Edge Caching to enhance your web site. He also touched upon the extended support available now for  languages such a Ruby on Rails, Scala etc.

Carlos Conde was next delivering his presentation on “Running Databases with AWS – How to Make the Right Choice for Your Use Case”. He was very insightful and offered up some architectural principles and patterns for using RDS, as well as using backs ups to creating new test instances for more “real world testing”. It was good to see what RDS brings to the table, but in my view, whilst still limited to MySQL, I think most will stick to hosting the database servers on EC2 instances – well until Oracle instances are available on RDS – still no date on this from AWS!

Finally Ianni Vamdelis delivered his talk on “Deploying Java Applications in the AWS Cloud: A Closer Look at Amazon Elastic Beanstalk” which came with the great tag line of “Easy to begin, impossible to outgrow!”. I think this was the highlight of the day for me, as this is in my current arena of work, I can see the masses of potential that this offers up, for being able to deploy your applications so easily; setting up your logging automatically in S3 for you, configuring Elastic IP’s, Health Checks and Load-balancing and all in a easily repeatable way, plus the lugin’s for Eclipse to support this are great! This surely is such a god send???!!!??? Only down side, this is not available yet in VPC :-(

All in all a good day with some great food for thought, but as seems to be case, you can’t help feel that whilst we have come a long way we are not quite there yet and still waiting for that next release / feature to become available, though I have come away more impressed than before and with a great belief that more and more work that I do will be in the Cloud.

AWS continued their expansion the other day by announcing a new Japan region, hosted in Tokyo.

What I don’t quite understand is some of the pricing differences. I can understand that bandwidth might be different in different territories, and maybe the price of hardware (local taxes maybe? different shipping and local labour costs etc?), but if you compare the EC2 EU region with the new APAC-Tokyo region, you can see that whilst the Windows costs are the same, the Linux costs are higher in Tokyo.

AWSPricingComparison

As there should be no software license cost for the Linux instances, this seems a bit weird. All I can think of is that the Microsoft SPLA (Services Provider Licenses Agreement) that AWS have managed to negotiate with Microsoft happens to be cheaper than the EU region and exactly offsets the other higher costs.

cloud and seaDo the service level agreements (SLAs) offered by public cloud providers hold water? Or are they useless to a customer -and not worth the cyberspace they are written on? We decided to pick the cloud fraternity poster child – Amazon EC2 – review their current SLA offering (which is defined here), and then compare it with ‘traditional’ hosting vendors offerings.

First let’s review the Amazon EC2 SLA. We’re not offering a legal perspective here, but several things strike us as interesting about this:

  • The SLA does provide for some punitive damages in the sense that should the service availability be between 90% and 99.95%, Amazon would still have to pay out 10% of your payments to them.
  • If EC2 was only available 60% of the time, their liability would be limited to 10%.
  • In no way are the damages that Amazon pay out linked to your potential losses. If Amazon had a major outage that could cost you millions, they might only pay out a few thousand in compensation.
  • The definition of ‘unavailable’ that they use here could potentially allow you to claim service credits even when your application is functioning perfectly well if we’re reading the SLA correctly, because ‘region unavailable’ appears to mean that one (not all) of the availability zones within a region is down. For example, if there are two availability zones within your region – your application could be running in both of these zones, such that it fails over between them. You’d get credit if a zone went down even in the event of the application failing over to the backup zone – even though your app is still ‘up.
  • Availability is measured in 5 minute blocks, which implies that they exclude all 4 minute downtimes. 4 minutes is a long time for a critical system to be down.
  • The other angle on this is that the advantage of using EC2 is that it gives you much easier ways to recover from any outage that does occur (provided it doesn’t bring down the whole region, when things get a little trickier).

Having said all of that, how different is this arrangement from what traditional hosting providers offer?

The view from our Head of Service Management is that it looks fairly standard – the hosting providers we work with have similar offerings. One of them offers up to 50% off the monthly fee for poor availability which sounds great but when you read the small print it’s measured in quarterly periods so they can get away with a bad month so long as they have two good months in the same quarter. Amazon’s small print doesn’t look too pleasant from an admin point of view as you must claim within 30 days of the last outage, and have to provide evidence of the outage etc.

Service credits are things that procurement and legal departments like but IT departments can’t be bothered with them, as they cause too much hassle to claim back a small sum that doesn’t really hurt the supplier and doesn’t compensate the customer for the real loss. We have come across some companies who offer a service whereby they take 100% of the service fee if they meet service levels i.e. you don’t have to prove it or claim it, which is obviously more attractive.

Our conclusion is that Amazon have paid lip service to service credits, and have probably done enough to satisfy procurement/legal requirements but nothing to shout about.

Thanks are due to my colleagues Paul Russell and Andy Budd for their input on this material…

Further reading on this subject…

CloudburstBox530x155We’ve got a bit more under the skin of CloudBurst now and I wanted to post some info that’s not been written by anyone in Marketing…about the realities of the product (good or bad) rather than the saleman’s spin.

So what does it do? Well, in a nutshell it holds VMWare ESX and ESXi VM images of WebSphere Application Server (WAS) on disk and can install them to some IP addresses of your choice at your command and follows them up with some automated WAS admin scripting of your choosing. Some pre-created WAS Hypervisor Edition VM images exist (based upon Novell SLES 10 SP2) or you can create your own and package them, up using OVF (Open Virtualisation Format). There’s no runtime component to the product other than it’s VM management/deployment role, i.e. it relies on WAS XD if you want load balancing etc. There’s more to it than that but that’s the bare bones of it.

So what are the key use cases for CloudBurst – why would someone want one when they can install VM images themselves? Well, the key reason for use is to take the deployment cost out of an IT organisation. The creation of the OVF VM images is still going to be just as easy/traumatic as it was before, but once you’ve got a “pre-canned” environment setup in CloudBurst then you can roll that out repeatedly and with confidence with very little manpower required.

Who would use it? Well, if you get benefit from being able to ‘can a topology’ rather than just putting a single machine image ‘in a can’ then there could be real cost savings and agility/reproducibility benefits from being to roll out a defined topology to your private cloud very quickly and repeatedly. So if your organisation has many projects running throughout the year that need multiple dev, test, pre-prod, prod etc environments created and ripped down all the time, then you’d very quickly get a payback I suspect. It would also make you more likely to kill off an unused environment if you knew you could painlessly recreate it, allowing you to have less capacity needs overall.

The immaturity of the Open Virtualisation Format (OVF v1.0 was only released in March 2009) is a key constraint at the moment and this is an industry-wide issue – it’s early days. A key impact relating to CloudBurst is that each VM image is a bit of a beast at 20Gb minimum (not entirely clear why this is – maybe due to a WAS install being big anyway, due to the way virtual disks are managed in the OVF standard, and the images being uncompressed?). This directly impacts deployment times just due to the sheer volume of data to be shunted around, but it’s not immediately clear to me if this an OVF issue (it does have some compression support) or an issue with the implementation/use of the standard. If deployed more than once to the same hypervisor then deployment times can be accelerated as all this data doesn’t need to be provided a second time. It can take something like 7-15 minutes to deploy a VM image.

There are two key design approaches when creating your VM images (the best approach is probably a mixture of the two):

  • use a ‘stock’ WAS image and then customise it with your coifing/settings and your EAR files installed etc, and create a new OVF image for deployment
  • use a ‘stock’ WAS image and then do the customisation post-deployment using WAS admin scripting

So where’s it going from here…? Well support for Xen-based VM images must be likely as this is crucial for anyone who is going to ultimately deploy into Amazon EC2. Portal Server is already available on EC2 on a pay-as-you-go basis and WAS is coming. Also, it’ll be interesting to see if IBM support non-WAS images in later versions of the product.

Follow

Get every new post delivered to your Inbox.

Join 1,122 other followers