timeAn interesting announcement from Google at their recent I/O conference was of per minute billing for virtual machine capacity. As we work with AWS a lot, cloud billing is a subject close to my heart – and so this caught my attention. On the surface, per minute-based billing is attractive as there is some inherent wastage in the per-hour model used by AWS, Microsoft Azure etc. When estimating likely AWS usage charges for customer engagements (using the excellent AWS online calculator, which says it is still in beta testing but is actually rock solid), we take great care with the assumptions made about the number of hours that instances will be running for – the classic example of this is for dev/test environments, e.g. it’s quite easy to assume 5 days per week at 10 hours day. What we’ve found over time is that because customers and their development staff have typically been brought up on a diet of inefficient server use (i.e. make a guess what I need, add some capacity contingency on top, pay for them upfront, it’s a sunk cost, so I don’t care about how efficiently I use it), then there is not a strong culture of turning off environments when not in use etc. Also, we control dev/test environments using our SmartSentinel cloud management tooling – and you need to allow a few minutes for instances to startup/shutdown to ensure you don’t fall into an additional hour of cost (especially Windows instances :) ).

So per-minute billing is attractive as it just cuts down on some over-billing when we “spill” into another hour of usage. But – and this is a big but – the logistics of IaaS billing are already complex enough that I don’t really want it to become more complex. We manage cloud billing for a number of customers, and in a 30 day month we have 720 distinct hourly measurement points where virtual machine usage charges are accrued (to keep it simpler – I’m ignoring other usage based charging, e.g. for storage etc, here). Even with this level of data, validation, reconciliation and invoicing of charges is already very complex. If that became 43,200 measurement points in a month, I think it would tip our finance team over the edge :) . The complexity stems from the fact that AWS have some really attractive sophistications to their charging model – we like these features and don’t want to lose them, e.g.

  • the ability to reserve instances over a 1 or 3 year time period, i.e. make a commitment and share the cost advantage with AWS
  • the ability to choose 3 different types of reservation based upon likely usage levels, e.g. 100% on all the time, or rarely on (e.g. for a DR scenario)
  • the ability for a customer to get the benefit of their reservations across their various AWS projects/deployment, e.g. if across your AWS estate on average you always have 5 m1.large instances running, but no individual project has them running all the time, you can still reserve the instances and get all the price advantage as the reduced per-hour cost is shared across the entire estate
  • volume discounts
  • ..and that’s before I even get to spot pricing!

These pricing model sophistications are real differentiators and allow a much more tailored cost model for specific customer deployment scenarios – and I think they are more important than per-minute metering of usage. It’ll be interesting to see if AWS or Azure follow the Google lead (as tends to happen with IaaS pricing between the big boys). Cloud billing truly is becoming a big data problem – if this carries on we’ll need to run up an on-demand AWS EMR Hadoop cluster to do billing reconciliation :)

Last week Smart421 once again was a sponsor at the Gartner Application Architecture, Development and Integration (AADI) Summit - this is the primary event in the UK Gartner calendar that covers integration/SOA, cloud computing, mobile, big data etc. Our marketing manager kept me rather busy with customer meetings throughout the event (!), but I still managed to catch some of the Gartner analyst sessions.

In the keynote, Andy Kyte made an interesting point about the barriers to SaaS adoption, in that organisations traditionally organise themselves along application lines (i.e. the System A team, the System B team…) rather than along domain lines (the “customer domain” team, the “product domain” team…). This relates nicely to our view of how organisations should develop their integration strategy over time. Anyway, although Andy didn’t use this term, this seemed to me to be yet another realisation of Conway’s Law, where individual team structures and the typical budget scraps over limited resources that they take part in put significant barriers in the way of the adoption of SaaS to replace existing legacy systems.

Ray Valdes presented on user experience design and gave a useful checklist of things to consider and the top 10 common mistakes to avoid. He made the distinction between design approaches led by intuition (“I know I’m right, trust me”) and evidence (“I’ve tried this and measured the outcome so I know it improves things”). I felt that the remedy to many of the top 10 mistakes lay in the application of the principles of the Lean Startup, e.g. test and learn, iterate quickly, validate your assumptions, collect metrics and feedback etc – although interestingly the intuition-based approach is more likely to lead to a innovation leap, whereas the evidence-based approach is more likely to lead to refinement and tuning of existing approaches but not such a large likelihood of a game-changing innovation.

On the Thursday there was the usual evening reception which was a lot of fun – this year we had some flair bartending on our stand from CocktailMaker – here’s an action shot… (Photo by Jim Templeton-Cross )

Gartner AADI 2013 stand

The cocktail guy did drop the bottle on the floor a few times following particularly ambitious throws but fortunately we didn’t burn the place down with any flammable liquids :) . As you can see from the photo below, our team (Neil Miles, the MD, and two colleagues from Business Services) are very smiley after knocking back a few cocktails…and Red Hat’s marketing approach at the event clearly worked!

Gartner AADI 2013 smiles all round

As the leading enterprise AWS Partner in the UK  ;o)  Smart421 had a big presence at the London AWS Summit last week (23 April).

Several of our customers also attended and one of them, Steve Howes, Chief Executive of Rail Settlement Plan part of Association of Train Operating Companies – ATOC) presented as part of the 2nd keynote, and very kindly mentioned us.

In fact, we were referenced thoughout the day, sometimes in unexpected ways. For example, within the opening five minutes of the first keynote, Smart421 was name-checked by Werner Vogels, CTO at Amazon.

Steve Howes takes to the stage

Steve Howes of RSP takes to the stage in front of 1,200 delegates

As per the recent Las Vegas event, it was a bit rock’n'roll in the keynotes, with music by Foo Fighters playing over the PA whilst we were waiting for the queuing hordes to make their way through the registration bottleneck and into the venue.  Vogels himself appeared on stage to the sound of Nirvana pumping out of the speakers.

What struck me was the size that this event had become, a significant increase on the 2012 AWS Summit.

AWS Summit London 2013

In the afternoon, the event split into separate streams. I stuck to the more involved sessions that were digging into the specifics of particular service releases such as Amazon Redshift and Amazon DynamoDB.

As you can see from the photo above, there was barely sitting room only in some of these techy sessions, let alone standing room.

It made me think – would conference attendees be prepared to sit on the floor for any vendor, especially in the corner of a crowded room? There must be few other vendors, if any, that have that kind of pulling power right now.

These are exciting times. Opportunities for enterprises to benefit are enormous.

2013 AWS Summit, London 23 AprilAnd this event confirmed my reflections that we’re reaching a tipping point in the market; adoption by the “early majority” in the technology adoption lifecycle is really visible and is happening – albeit with different market sectors arriving at very different stages.

Our conversations continued all day long over at the Smart421 stand, where we showcased three of our customer engagements:
Disaster Recovery on the AWS Cloud for Haven Power,
Big Data Analytics on the AWS Cloud for Aviva / Quotemehappy.com,
and Service Transition to the AWS Cloud for ATOC.

Frankly, we wanted to showcase even more but there just wasn’t the space.

Please Rate and Like this blog.  If you can, please leave a Comment.

When you lift the lid on what is going on in the big data analytics world, the pace of development and innovation is phenomenal. Yesterday I heard some more about the range of optimisations and ongoing development being progressed for Apache Hive and Hadoop itself, and it really reminds me of the pace of innovation and development of other technologies I’ve seen in the past, e.g. relational databases, application servers, web and mobile development platforms etc. Hadoop has only existed for seven years or so and the data analytics revolution that it kicked off is morphing further and further away from its origins – I guess that’s natural selection in operation. Anyone still using WML over WAP? :)

Stinger Roadmap

HortonWorks are a big contributor to the Hive project and the Stinger initiative they are involved in is driving a number of really interesting optimisations and enhancements:

  • Further optimising the already optimised RCFile data storage structure to allow queries to avoid I/O for blocks of data held in HDFS that contain no data relevant to the query, or where preaggregated results can avoid I/O (precalculated min, max values etc)
  • Optimising data retrieval to best exploit CPU on-chip memory buffers
  • Exploiting YARN (“Yet-Another-Resource-Negotiator” – a recent framework and sub-project of Apache Hadoop that facilitates writing arbitrary distributed processing frameworks and applications) and Tez (a new Apache incubator project) to avoid unnecessary HDFS writing and subsequent reading of intermediate results between sequential MapReduce jobs
  • Using “always on” Hadoop implementations (aka Tez service) to avoid the heavy penalty of JVM startup costs – which are very significant for otherwise relatively low cost queries
  • Enhancing query optimisation for common use cases, e.g. the loading of dimension tables (which are relatively small in comparison to the fact table) into memory on each node in the cluster to accelerate common queries against a data warehouse star schema

Olivier Renault from HortonWorks presented some early data at yesterday evening’s London Hive meetup that showed that they’d seen Hive query performance times drop by a multiple of up to 35-40 using a combination of some of the above optimisations. So the Stinger initiative objective of a 100x performance improvement seems feasible, which really would be a transformational achievement. When most people experience a pig/Hive query on a Hadoop cluster for the first time there is a rather “oh – that was slow for a simple query…” reaction – and the strong drive to move Hadoop processing closer and closer to being able to support interactive query use cases is causing some interesting overlaps. For example, Cloudera’s Impala takes a different approach to performance optimisation where all data is loaded into memory – so it can provide blinding fast query performance but won’t ultimately scale as far as Hive (see here for more detail on this).

To be honest, for most of us mere mortals (who don’t work for Facebook, Yahoo! or Google), the tools are now already out there to handle 99% of the use cases we need – and we can see from the above work that they are improving at a rate such that supporting interactive queries for multiple users on very large corporate datasets is becoming a reality, so our customers will just expect this in the future without question.

Update – 10th April – A similar set of slides to the ones Olivier presented are now available here

Last night, I was one of 6 presenters who gave 5 minute lightning presentations at the monthly Front End Suffolk meeting in Ipswich.

I love the lightning presentation format as you get a really broad range of presentation topics and styles in a short burst – I first came across it at CloudCamp some years ago and we’ve used it internally in Smart421. Presentations covered Jasmine for JavaScript unit testing, use of make and accessibility considerations with links – a really good mix of thought provoking topics.  Thanks to Anders Fisher ( @atleastimtrying ) for organising the event and for the invitation to present. Great group Anders..

At the previous month’s FESuffolk meeting Paul Hutson presented on KineticJS (see his presentation here) and that got me thinking/comparing it with the work I’d done with YUI, especially related to collision detection – so I foolishly stepped up and offered to present on the topic. Here’s my presentation on SlideShare…

Please Rate and Like this blog post.  If you can, please leave a Comment.

I was doing some Hadoop demo work last week for a customer and mainly just because I could, I used spot instances to host my Hadoop/pig cluster using AWS’s Elastic MapReduce (EMR) offering.  I thought I’d have a quick look at what the resulting costs were over the few hours I was using it.  I used a combination of small and large instances in the US-East region – m1.small for the master node and m1.large for the core nodes.  Note – these costs exclude the PaaS-cost uplift for using EMR (another 6 cents per hour for a large instance).

In summary – it’s dirt cheap….

AWS Spot Price Analysis

What is more revealing is to look at this in terms of the % of the on-demand price that this represents…

AWS Spot Price Analysis Saving

So in summary, around an average saving of 90% on the on-demand price!  This is probably influenced by the fact that I was running the cluster mainly during the time when the US are offline.  We tend to get a bit fixated on the headline EC2 cost reductions that have frequently occurred over the last few years, and the general “race to the bottom” of on-demand instance pricing between AWS, Google, Microsoft etc.  Obviously not all workloads are suitable for spot pricing, but what I did here was deliberately bid high (at the on-demand price for each instance type in fact) knowing that this would mean that I was very unlikely to get booted off the instances as anyone bid higher if capacity got short.  As EC2 instance costs are so low anyway, we tend to not worry too much about optimising costs by using spot pricing for many non-business critical uses – which is a bit lazy really and we could all exploit this more.  Let’s do that!

Orthographic Projection of South AmericaAWS have been busy launching new regions around the world at an amazing rate recently – Sydney, Sao Paulo etc – and there’s still gossip of a second European region to come. It’s fair to say that Smart421′s business in South America is still in its growth phase :) , so I hadn’t paid a huge amount of attention to the instance costs for the South America region. As you’d expect Smart421 tend to use EU-West and US-East for all our customer and internal AWS work. I happened to be checking out the costs today for Elastic MapReduce (EMR – AWS’s Hadoop-as-a-service offering) and had a quick snout around to compare EMR costs across regions, so I stumbled across the Sao Paulo EC2 pricing.

In short – it’s significantly more than all the other regions. A standard on-demand large instance is $0.26/hr in US-East but a whopping $0.46/hr in Sao Paulo – that’s 77% more. Now I’m used to regional price variations as power, tax etc are different in different territories (when will the EU-West region drop to match US-East prices eh?), but that’s a lot. That creates a pretty significant incentive to still use the US services, latency and other similar considerations put to one side of course. Also, I wonder if it also broadens the opportunity for 3rd parties to offer cloud brokerage services (a market I’ve been rather skeptical about up until now due to the barriers to workload mobility) that automatically port compute workloads between regions for a percentage of any cost savings made.

Looks like cost harmonisation via the the globalisation of IT still has some way to go then. Ouch!

Las VegasUnfortunately I didn’t manage to make a strong enough case to travel to Las Vegas in person :( , so I did the next best thing and watched the live media stream yesterday evening – it was just like being there, but without Tom Jones or any showgirls. The two big things from Andy Jassy (the AWS SVP) were an approx 24% storage (S3) price reduction across all regions from 1st Dec, and the launch of a limited beta version of datawarehousing-as-as-service. On the second of these, AWS Redshift (which is discussed in more detailed in Jeff Barr’s post here) is a direct challenge to the existing column-oriented database world, Teradata, IBM, Oracle etc. It looks really interesting and is a classic cloud use case and so it makes sense for AWS to tackle it – it requires large volumes of storage and compute power and is a traditionally high-CapEx market sector – I’m looking forward to playing with it..

As for the S3 price reduction…well, a 24% price reduction is a pretty amazing step change in pricing. In what other industries would have such dramatic changes in price? I wish it was happening to UK gas & electricity pricing :) . Having said that, Google storage costs currently start at $0.095 per GB per month, so it looks like AWS are price matching with Google. Microsoft Azure pricing was still at $0.125 per GB when I checked this morning, but presumably they will have to respond (to be precise – this is not quite an apples for apples comparison as Azure replication is over a significant distance whereas AWS S3 replication is between AZs which are separate but within some, but typically undisclosed, kilometres). As discussed before on the blog, I can’t see how the majority of smaller (and by that I still mean very big!) IaaS cloud players can possibly compete with this perfect storm of huge economies of scale and immensely deep pockets. Looking at our current AWS billing (which includes customer’s AWS accounts that we manage on their behalf), S3 storage costs only account for <5% of the total costs as the lion’s share of the cost relates to compute – so more price reductions here as well please!

[Update 30/11/12 - Since reading Jeff's post I've realised that these cost savings also apply to EBS Snapshots (der...of course you'd expect that), so this actually makes the cost saving from this one price reduction more significant, getting up to 8% or so]

Attribution: Phillip PerryTonight I attended a talk by Sainsbury’s IT Director Rob Fraser (hosted by the BCS ELITE group) – who was voted Computing’s CIO of the Year 2011 no less! Whilst I do rub shoulders with the odd CIO, it’s usually on a very specific topic so tonight was a great opportunity to hear some candid details of what it’s really been like driving through a three year transformation programme at Sainsbury’s, what the gotchas were, what he’d do differently next time etc.

So here’s some of his key pieces of advice and reasons for success:

  • They used an architecture-led approach. This was music to my ears to be honest – I tend to oscillate between “(enterprise) architecture is the answer” and despair at the state of the architecture profession in practice, so this gives me hope. When he landed they had an architecture team of about 5, and to drive the transformation programme they grew this up to about 40 (with approx 500 staff in IT, excluding external delivery resources, that’s moving from a ratio of 1:100 to 1:12.5).
  • He hired some key staff for the IT transformation from the retail side of the business, not from IT (but supported by strong technologists). The credibility of a business representative massively helped drive change through with the rest of the business, and enabled more educated push back on the inevitable attempts at scope creep. In fact, the whole transformation strategy was very people-focused.
  • The transformation programme was tracked over a 3 year time period, and the original “plan on a page” was still in use as the benchmark to measure progress against every year (i.e. they did not suffer from a “stretching” programme), and the programme had a definite end. It started, ran, and finished – rather than the original objectives bleeding into other change activities and giving a fuzzy back-end to the work.
  • “Time to value” for any delivery was limited to 18 months maximum – as most organisations just don’t have the attention span to concentrate on anything with a longer payback.

Camp, with beer!Cloudcamp is still a great event, but it does feel like it’s running out of steam a little – the cloud hype has waned a little just as part of the natural evolution of things, and the attendance seemed marginally lower to me last night. They’ve also tackled all the big crowd-pulling themes. But I’ll still be going, just because the presentations were still really good, introducing you to topics and ideas you just wouldn’t stumble across any other way. And then there’s the sarcasm, red cards (mainly a deterrent it seems as I’ve never seen one deployed yet), beer and pizza.

In this post I’ll just point out some interesting (to me at least!) little snippets – partly so I remember them and partly to share with my Smartie colleagues.

Lock-in – Joe from VMWare made a great point about lock-in – as an architect, really you can’t avoid lock-in, even though this is the language that we tend to use. In reality what you can do is design what you want to be locked-in by and what you don’t. E.g you can implement your own private cloud using OpenStack to avoid being locked into a public cloud vendor, but then you are locked into the OpenStack community and it’s uncertain future, and the hardware you’ve bought etc. Or you can use PaaS to avoid hypervisor-layer lock-in, but they you might need to lock-in to a certain dev language etc. This resonated with me as Smart421 considered this topic on a recent client cloud consultancy engagement, and the feared lock-in was actually not as severe as maybe it was perceived. It’s all choices, no rights and wrongs, just tools in the architects toolbox.

Cloud cost examples – Ali from PlanForCloud gave some costs examples for what it might cost to run TripAdvisor or Pinterest infrastructure on AWS, e.g. $1.7m p.a. using on-demand reducing to $0.9m p.a. if reserved instances are used for TripAdvisor. What I was hoping to hear was the equivalent current on-premise (or at least non-AWS) costs, but he didn’t present those. My expectation is that these kinds of scale on-premise will still be way cheaper, but of course it’s more complex than that – capex, options to mix on-demand/spot/reserved, load distribution across AZs and regions, lower support staff costs etc. It’s a complex area.

Lack of transparency with big data – Chris Swan made an interesting point about the increasing risk of very non-transparent machine generated decisions as we move into a more big data-based world, e.g. credit card fraud detection systems that are so opaque that false positives cannot really be challenged, it’s just “what the system said”. My message to customers is that big data technologies offer an opportunity to gain new insights into their existing data assets at previously unheard of processing costs – but ironically this trend could also lead to less insight, at least from an end user perspective.

There were also some great lightning talks from Kuan Hon re cloud contract negotiations (seen her speak several times – a cloud legal eagle), Phil about why we should be scared of European cloud standardisation initiatives, and James Mitchell on the likely trend towards cloud computing capacity futures/options/derivatives trading etc. In fact – no dud talks this time at all, and you’d expect at least one :)

Follow

Get every new post delivered to your Inbox.

Join 801 other followers