Photo: Industrial backdrop by Pilarts  Dreamstime Stock Photos & Stock Free Images

Photo: Industrial backdrop by Pilarts Dreamstime Stock Photos & Stock Free Images

I’d like to propose a best practice for rolling out new features in a Service Oriented Architecture (SOA).

Traditionally, when we roll out a major new feature, we often end up causing a breaking change to the service. We’re then faced with a choice: (a) Force all our consumers to upgrade to the new version, and making all our consumers hate us, or (b) continue to support the old version of the service as well as the new, making only our own teams hate us. Suck it up, plan (b) is the better option, but try telling that to the guy having to patch fixes in three concurrent versions of a service.

Now, there are patterns that can help here (more on that another day), but they all still mean more work for everyone.

Also, when we first roll out a feature is exactly the moment we understand it least. We’ve got absolutely no idea how people will use it, nor whether it will even turn out to be useful. By baking the feature into a new major version of the service, we’re taking all our options away. The feature will be hard to remove if we decide it isn’t useful, and if we want to change how it works, we’re back into a major version upgrade again.

To my mind, good engineering is largely about keeping your options open. It’d be nice if we can try a new feature with a subset of consumers first, iterating quickly with just that subset, gradually adding more consumers as we get more confident.

Enter the Feature Flags pattern. Feature flags allow you to turn features on an off at a moment’s notice. At its most basic, a feature flag just turns a feature on or off for everyone at once, but the idea is often extended to allow turning on features for specific users, or collections of users. This allows you to roll out a new feature to consumers gradually, over an extended period.

So, here’s the proposal:

  • Allow consumers to pass a set of feature flags dictating which features they’d like enabled in the service.
  • Whenever you build a major new feature that would otherwise cause a breaking change, only enable it when the feature flag is passed.
  • If appropriate to your environment, control access to feature flags like you would to any other resource – e.g. you might want to restrict access in the early days to just a single consumer, making it easier to iterate.
  • Once we’re comfortable with a feature, it becomes publicly available – i.e. anyone can toggle the flag.
  • Every so often (e.g. once every couple of years), create a new major version of the service, refactoring it to include popular, battle tested features by default. Also, take this as an opportunity to clean out the cupboard and abandon any features that aren’t well used.

What do you think? Comments and thoughts very welcome…

 

Please remember to Rate and Like this post.  If you can, please leave a Comment.

Jeff Bezos Photo by John Keatley, Seattle's leading photographer keatleyphoto.com

Jeff Bezos
Photo by John Keatley, Seattle’s leading photographer keatleyphoto.com

Every time I hear this story, it makes me smile. From Kim Lane over at API Evangelist:

[…] one day Jeff Bezos issued a mandate, sometime back around 2002 (give or take a year):

  • All teams will henceforth expose their data and functionality through service interfaces.
  • Teams must communicate with each other through these interfaces.
  • There will be no other form of inter-process communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
  • It doesn’t matter what technology they use.
  • All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.

The mandate closed with:

Anyone who doesn’t do this will be fired. Thank you; have a nice day!

Assuming for the moment that this is true, the thing that makes me smile here isn’t the closing rhetoric. What Jeff described here is pretty well everything you need to know about successful SOA.

Look at the wording again. “All teams”. He didn’t say “all systems” or “all services”. Technology isn’t [the most] important. People are.

By focussing on teams rather than technology, Jeff ensured that Amazon’s embryonic SOA was business aligned. One, simple decision was all it took. Well, that and ten years of concerted effort of one of the brightest engineering teams on the planet.

When it comes to adopting cloud computing, to my mind there are three types of company:

  • Early adopters who swallow the pill in a big way. They’ll get burned, almost without exception. But they’ll come out stronger, leaner, meaner and faster than the rest.(Netflix, I’m looking at you.)
  • Those who do their homework the day it’s set. They’ll either have or will shortly select non-mission critical applications and move them into the cloud, and at the same time start looking to create new apps in the cloud albeit in a low key way. These guys will be slow and steady, but they’ll get there in the end. (Most of the 2015 FTSE 100?)
  • Those who do their homework the night it’s due. They’ll wait for everyone else to ‘take the risk’ for them, and only then start a gradual, lumbering migration. Just like at school, these guys will get outpaced by the competition. For some of them, it’ll be a terminal mistake. (Most of the current FTSE 100?)

Make no mistake, all companies will end up in the cloud eventually. How (and if) you get there is up to you.

My advice? Don’t be last.

It’s pretty much a universal principle in a large enterprise: If you want something new, first you try to reuse, then you try and buy something, and only as a last resort do you build something yourself. 

Sound enough, right? It ought to be cheaper to buy something from a specialist vendor than work it out and build it from scratch yourself. They get the economies of scale, you get a reduced price, everyone’s a winner. 

The only trouble is, sometimes buy before build sucks. 

The trouble is that usually when you buy, you’re making an often large up front financial commitment to something. Not only that, but often we buy something before we’ve had a chance to really work out what it is we need. So, we end up buying the uber product – something that delivers our every whim and desire. 

Very often, when it comes down to it, we buy 100%, use 20%, and wind up bespoking the living daylights out of the rest. The Vincent van Gogh of our vision becomes more like an HR Geiger Alien. It costs as much to customise as it would have cost to build, and far from being a virtual utopia becomes the treacle holding us back. 

So, how do you make sure this doesn’t happen to you? Simple: Only buy what you’re absolutely sure you need. 

How do you know what you really need? Simple: Build it and see how your users use it, rework, repeat. 

Sometimes life has its little ironies…

I am way out of date here, but I’ve just stumbled across an interview between Information Week and Amazon CTO Werner Vogels. The interview is from way back in 2008, but many of the things discussed are just as valid (and in some senses revolutionary) now as they were then.Touching on a number of interesting bits of information about Amazon’s architecture, the interview talks about how Amazon came to become the cloud computing ‘thought leader’ they are today.

Amazon have been doing SOA highly successfully for nearly a decade, and as demonstrated by their dominant position in internet retail (2009 revenues topping $24 billion and a market cap of over $70 billion at time of writing).

What particularly caught my eye about this article is how it aligns with my own views about what makes good SOA tick:

“It’s not just an architectural model, it’s also organizational. Each service has a team associated with it that takes the reliability of that service and is responsible for the innovation of that service. So if you’re the team that’s responsible for that Listmania widget, then it’s your task to innovate and make that one better.” 

To me, making SOA work is more about people and organisation structures than it is about technology. Build the right teams and the technology will come. Focus on the technology and your organisation will just hold you back.

From there, Werner goes on to talk about how they evolved the AWS cloud computing platform out of their own need for highly resilient distributed infrastructure, and of course how they exposed this too as services. To put this in perspective, this ‘spin off’ is projected to earn Amazon over half a billion dollars in 2010. Not bad for a by-product.

Read the article. Even 2 years late, it’s well worth 15 minutes of your time.

On Tuesday I dropped in on IBM’s UK Impact 2010 conference in London. UK Impact is effectively a pocket-sized version of the 5-day, 6000 attendee Las Vegas Impact event held last month.

The Conference was well organised, held in the (rather swanky) Grange Hotel St Paul’s.

This was a one-day event, with the morning being single track, and the afternoon multi-track.

As you’d expect given IBM’s recent ‘Smart’ branding (imitation is the highest form of flattery), the Keynote was a startlingly on-message presentation entitled “How your Organization Can Work Smarter”.

There were some interesting gems in there: Did you know for example that certain electrical companies in the states give customers a circa $300 annual rebate in return for handing over the keys to their air conditioning to the electrical company? In times of peak demand (but not when it’s health-threateningly hot), rather than power up another turbine, they’ll start a rolling programme of AC shut-downs to reduce demand. That is smart.

A fun set of statistics for you: Among the businesses run by the top 500 CIOs, compared to other businesses there is:

  • double the usage of process modelling and automation technology.
  • 3.75 times greater usage of collaborative workspaces.
  • 9 times greater usage of SOA.

… of course, how you define ‘top’ CIOs or ‘greater usage of SOA’ is potentially subject to interpretation!

A key message, which I really buy into is that ‘excellence is a moving target’. It’s easy to be complacent when you’re at the top of your game. What (arguably) separates the likes of Google and Apple from Microsoft is their ability to know what the customer wants, before the customer knows it themselves. Doing this, obviously, requires an ability to innovate at speed and change on a dime.

To me, the only way to achieve this is to keep everything as simple as possible at all times. If your IT is so complicated that your business can’t wrap their heads around it, is it any wonder you struggle to keep up with their demands? Sometimes the best investment you can make is one that leaves you with less than you started with, particularly if it makes your IT look more like the business it serves.

Robin Meehan just pointed me to an article in Information Week describing Doyenz Shadowcloud, an interesting product cloud-based disaster recovery solution.

Basically, the product allows SMEs to wire up the servers in their data centre so that they are incrementally backed up to the Doyenz cloud. When disaster strikes and your basement floods after a storm, you can restart your servers in the Doyenz cloud, getting back up and running as quickly as possible.

It’s an interesting proposition, particularly for tech savvy but lean SMEs out there who are maybe still relying on tape backups and insurance policies for their DR plans. How well it works, and how big this market is remains to be seen…

I’ve been at the IBM WebSphere User Group meeting in Edinburgh today, and attended a couple of sessions about IBM’s shiny new WebSphere CloudBurst Appliance.

For the un-initiated, the CloudBurst appliance is a hardware appliance which provides an easy means to deploy WebSphere products to virtualised infrastructure. We’ve written about this a couple of times before (first on 6th and again on 17th of July), so I won’t repeat it all over again.

Firstly, a bit of an update on one particular area which was confusing us here at Smart421 Towers. IBM WebSphere Application Server Hypervisor Edition is a version of WAS which is tuned for virtualised environments and pre-packaged into a VM image. Each image is 20 GB. Each of these images is encrypted and stored on the Cloudburst’s built in hard drive. These images can’t be stored on a LAN or SAN, so the maximum number of images you can store is limited by the size of the hard drive. So, let’s assume this thing has a hard drive that’s 500GB; that means the maximum number of images is 500/20=25 images, right? Wrong.

CloudBurst takes every image it owns, and cuts it into little pieces (I’m not sure on the official terminology, but I’ll call them shards), and then builds a manifest that describes how to put them back together again to make the image. When you create a new image (usually based on an existing one), the appliance analyses this image to work out what’s different between this image and the one it was based on. It then only needs to store the differences between this image and its parent. This way, the cloudburst can store a load more images than you’d otherwise expect. Almost certainly hundreds, I would suggest.

Talking about the appliance got me thinking. What are the use cases for this beast? It’s a serious piece of kit, and it has a significant cost; certainly enough that it needs justification.

The most likely short term use for it seems to be self-service access for creating new WebSphere development and test environments. The machine cannot yet be clustered (although I understand that’s coming in a future firmware release), nor can it create elastic environments that scale up and down on request, which makes it unlikely to be attractive for managing production environments right now. Delete production environment, create larger production environment doesn’t feel like a viable workaround to me.

Whether this business case makes sense will depend on the size of your organisation and how many WAS environments you create/destroy. Many organisations don’t do this that often, but that’s often for all the wrong reasons: It’s hard to create a new environment, so we piggy back on an existing one. Even so, environment build can be a major stumbling block, and if you don’t create a fresh environment for the project, you’re putting yourself at the mercy of the last bunch of cowboys who used it (unless, of course, that was you).

To help navigate this minefield, IBM have developed an ROI model for CloudBurst which can help you work out how much benefit you’re likely to get. Ultimately though, the proof of this particular pudding will be in the eating – if the business case stacks up, then fairly soon we’ll see adoption rise significantly, particularly among the large blue chip companies that make up Smart421′s customer base. When this happens, we’ll be there to help.

There are currently only two CloudBursts in the UK. We get to get our corporate grubby hands on one for the first time sometime in early November, and you can too. Drop us a line and we’ll put you in touch with the right people in IBM.

P.S. If you’d like to see a CloudBurst in action, check out the YouTube video.

I finally got around to watching the Google Wave developer preview video last night. I’m a great fan of any tool that helps people work better together. If you’ve not heard of Wave, or not had time to investigate, it feels to me like a hybrid of e-mail, instant messaging, Wikis and SubEthaEdit. Users can create new waves (documents/conversations/communications), make them available to others, and work on them. Wave manages to (surprisingly elegantly) bridge the gap between e-mail, instant messaging and wikis. When you edit a wave, the other person can see your changes as you make them, one character at a time. On the other hand, if they aren’t online, the next time they come back online, they’ll see your wave waiting for them. This is pretty difficult to describe, but beautiful to watch, and it scales. Watch the video to see what I mean, but suffice to say something which starts off feeling like an e-mail can transparently become a discussion and the reverse is just as true.

There’s no doubt in my mind that the technology involved is amazing, but from my perspective, the most interesting thing about the video is that it makes the scale of Google’s ambition clear. Google are pretty openly hinting that this thing could become a rival to, or even replace e-mail, IM, Wikis and a whole bunch of other collaboration approaches with a single unified solution. Read that sentence again. A replacement for e-mail; a protocol and metaphor for communication that’s been around in more or less its present form since 1982. That’s 27 years. 7 years before Tim Berners-Lee wrote his first proposal outlining the workings of the World Wide Web. Google are either seriously confident, or seriously arrogant. Or both.

But. They might just succeed. Unlike many other Web 2.0 services such as Twitter, Google are (at least outwardly) trying hard to ensure that Wave doesn’t become a walled garden. Even services such as Google Sites, which offer integration with the outside world using standard protocols (in the case of sites through HTML linking and RSS) don’t provide the same level of integration seen in the standardised protocols that support e-mail, IRC and other ‘old school’ services.

So, what makes Wave different? Google have built, and more importantly released to the public a protocol that allows any old Tom, Dick and Harry to create and implement a Wave server. Moreover, because the protocol is not trivial, Google have open sourced reference implementations of the protocol, and in the video suggest that they’re intending to open source the majority of the code-base of Google Wave itself so that competitors can download, tweak and run their own competing Wave services. These services will all federate, and make the experience broadly seamless regardless of which provider you choose to use. Like E-mail, USENET and IRC, information is only sent to the servers supporting users actively involved in the wave, opening the possibility of the (perhaps justifiably) paranoid running their own organisational Wave servers to ensure that content only leaves the corporate network when it is actively shared with a third party. This approach potentially eliminates a major barrier to adoption in the commercial world. Lastly, Wave provides support for Robots (intelligent agents) that can accomplish a multitude of tasks. Google demonstrated Robots that did things like integrating with Google’s blogger service and it seems clear this technology could be extended to support integration with existing communication mechanisms, and in particular the big threat: e-mail.

How this all pans out remains to be seen. Google are not an academic organisation, and they must deliver value for their shareholders, but it’s fair to say that they have a history of taking relatively large risks by taking on large scale projects with no obvious revenue model that would scare your average VC witless. Despite this, they’re still here, and still profitable. I think it’s reasonable to say that there’s an excellent chance that Wave the product will be a success. I’m much more sceptical about Wave the global infrastructure, due in part to the complexity of the technology and consequent barriers to entry for competitors, but mainly due to something much more human: Inertia.

Regardless of the success of the Wave platform, the debate Wave is likely to stimulate can only be a good thing. The Wave preview opens its doors on September 30 2009 to the next 100,000 users. I have my fingers crossed.

In a recent post, David Linthicum asks “Can SOA governance technology be distracting?“. His answer is yes, and he offers the following sound advice:

First, only purchase SOA governance technology, if it’s indeed needed, after you have a complete semantic-, service-, and process-level understanding of the problem domain. Never before.

Amen to that. In my opinion, for all but the most mature and involved environments, the procurement of an SOA governance platform should be well down the list of priorities. I’d add to David’s list of things that need to be ‘worked out’ before you get that cheque book out:

  • What is your vision for governance itself? Do you want to adopt a ‘iron fist’ or ‘hand in glove’ approach? Is your registry going to be a mechanism for governing or a side effect of it?
  • Who’s going to populate it? Have you got your analysis, design and development processes sufficiently honed that your repository isn’t going to turn into a dumping ground of candidate services?
  • Have you actually got any services live yet? Governance is a whole lifecycle thing. Until you’ve worked out how you’re going to deploy and manage services in the production environment and demonstrated that this works, how do you know what capabilities your governance platform needs to offer?
  • Most importantly: What are the use cases for your governance platform? Can you demonstrate that these use cases can’t be addressed using your existing tooling (even if that’s Microsoft Excel)? Be honest with yourself about when you’re likely to implement these use cases. If the answer is further than one year away, then for the time, you might be wise to forget them. There is little point in spending good money on runtime governance or automated deployment technology when in a year’s time you’ll be able to get more for less.

A lot of projects using SOA governance tools at the moment treat them as glorified databases. If that’s where you’re at, consider using something less specialised that allows you to evolve your ideas, understanding and schema before you commit to something that will make this innovation harder and more time consuming. When you’ve spent six to twelve months getting your ducks in a row, so to speak, you’ll be in a much better place to make decisions.

I’d really welcome stories from people about how they’ve implemented governance platforms in the past, whether they’re informal (e.g. Wikis, bugtrackers, spreadsheets) or formal (e.g. IBM Websphere Registry and Repository, CentraSite from Software AG): What did you implement? What worked? What didn’t? What would you do differently next time?

Follow

Get every new post delivered to your Inbox.

Join 801 other followers