My current client has posed an interesting challenge: they have an SOA architecture and want to leverage it in some of their batch activities. Now, while SOA and batch may not jump out as obvious bedfellows, it doesn’t strike me as unreasonable to want to take advantage of an existing SOA investment. How do you process hundreds of thousands of records whilst reusing those lovely enterprise services you’ve spent so much cash creating – and do it in a performant way?

As you might expect, some reasonable concerns have begun to surface. But my client is currently taking a mature approach to the problem: while some of the team think there may be a problem, we don’t have numbers to prove it. And so, we’re going to get the numbers and let that inform our response.

So the point of my post is not to say what we found out, but rather the explore the intellectual space. After all, while SOA isn’t perhaps a natural candidate for batch, I must admit to being a little disappointed by the alternatives. How can you do SOA batch without dropping SOA? Well, I think there are some options. (Which I’ll cover in just a second, below.)

Having been through some of them, I feel a mixture of happiness with a twinge of disappointment. On the positive side, there are a few options that offer benefit, with varying degrees of cost. On the negative side, there’s no switch that’s going to make SOA performant without investment. In some ways, perhaps I’m asking SOA to address a use case it really isn’t intended for but I’m not so defeatist – clearly SOA batch is not out of the question. It’ll be interesting to see how this area develops – if enterprises are to extract their maximum value from SOA, then batch is a unique use case that cannot be avoided.

  1. Pare down the per-record process to a bare minimum. Rather than calling a single heavyweight service to do everything, perhaps part of the work can be carried out per-record, leaving the rest to be dealt with in the background. One could perhaps even take this to the extent of only performing validation in the per-record loop – a read-only SOA validation service isn’t completely out of the question. It is a necessary part of the implementation in any case.
  2. Make the services themselves a bit more batch oriented. Make the services accept 1..* records to work on, and supply them, perhaps 100 at a time. This really cuts down the round-trip time, at the expense of necessitating a bit of forethought in service design. But it’s an easy pattern to understand, and potentially one that could be retrofitted to an existing service layer if the ESB can be moved close enough to reduce the round trip overhead, or the implementation and interfaces changed slightly.
  3. Have a two stage process that validates the input, prior to processing the content, and decoupling them. The idea would be to perform a quick first pass (perhaps even not leveraging SOA at all), and then loading the known valid data into SOA in the background. Ideally, the validation step catches enough problems to make the remainder that fail at run time a not inconceivable problem to deal with operationally.
  4. Stick with SOA, but go for less heavyweight components. For example – in our case, we are using a BPEL engine to do the load and orchestration, but that could be switched out for a ESB only orchestration. A bit more fiddly, but doable.
  5. Some times, things can be done in different places. (In our case, actually they can’t, but I’ve seen this enough times to mention it.) For example, if part of the job is aimed at ensuring data wasn’t corrupted or truncated in transit, there are approaches to dealing with this at the network or transport layer that mean the service layer can be freed from such a menial task to do the heavier lifting.
  6. Process things in parallel, and leverage the spare capacity in your system. So, this only applies if there is capacity that can be used. But if you have it, then perhaps more of it can be dedicated to the batch processing at certain times (overnight or in quiet times). This can require some deep reconfiguration of the platform, perhaps to leverage multiple queues with differently performant configurations, but it is only configuration.
  7. Partition your environment, so that no matter how much you throw at batch, the rest of the system remains responsive and available. This is more of a environmental deployment approach, but if you can do it, it’s another option that doesn’t require re-development.
  8. Make your services batch oriented, but also take advantage of SOAP with Attachments and stream your data. Not something that can be done without effort. But if your payload has a few hundred thousand records in it, and you can avoid the overhead of a request/reply for each record, the saving could be significant. However, I don’t know of many tools that could take advantage of this without some clever implementation.
  9. In some situations, it might be possible to redeploy components so as they are co-located. It is clearly not always going to be possible. But if it is, and if the overhead associated with the across-the-network trip is a significant contributor to the problem, then this could really help.

And if I find out what the answer is, I’ll come back and let you know!

An Agile climber

A nimble climber

I managed to make time to attend a talk on Agile Development and it was a pleasant surprise to hear an Agile practitioner speaking from first hand experience, advising to tread carefully when implementing Agile (perhaps I paraphrase a little aggressively) [see http://eastanglia.bcs.org/, "Agile Development. What must go right, what can go wrong (and what you can do about it)" by Giovanni Asproni]. The presenter was not so much suggesting any difficulty with Agile, per se, but cautioning, rather, that making any change to the way people work can be challenging, and it is something that requires careful consideration. Consideration, not just of the change itself and the team that will work it, but also of how it fits into the wider business context.

So – introduce stand ups, user stories and continuous integration after careful consideration (there are many more practices, of course); don’t introduce them all at once; and don’t introduce them just because an Agile Coach says you have to. (I maybe made up the last one.)

This is great advice: no matter how important software delivery might be in a business, it is rarely the only part. It resonates with me, and perhaps others of my ilk. I like the idea of being able to be Agile, without having to – checkbox-style – implement a prescribed set of practices, yet having the flexibility to implement those that make sense.

Yet on reflection, it leaves me feeling a little lost. I’ve found that Agile works when driving a pure software delivery: the tight feedback loop Agile offers is an incredible sight to behold. (At least, it is for compsci grads like me that grew up with university professors teaching the latest Waterfall technology has to offer.)

But scaling out beyond the development phase of a project to encompass analysis, design, integration, etc., I have found Agile to be a very challenging proposition. For sure, it “works”. But it can be hard – very hard – work. In an environment where your software makes up just one system in the picture, where other system changes are not using Agile, and where your business representative perhaps sits the other side of a contract and customer project management team, the need to consider the context the project operates in goes without saying, but more guidance is desperately needed: I wonder if checkbox-style templates are actually called for.

How does Agile “butt up” against those traditional aspects of project delivery such as requirements capture, integration and acceptance testing? What does Agile have to say about subcontracted deliverables and how should Agile be used effectively in a bid scenario? What aspects are compatible, and what are not? Does, or should, Agile ever form part of a larger Waterfall-style (or Prince 2, or …) project? These are questions that take Agile far beyond pure software delivery. But Agile is being adopted by many a large corporation, and not just for software development: whether it is ideal or not, those checkbox-style prescriptive templates will come out.

So how does Agile scale out? I don’t think we know. Many of us will get tied up in the checklist bureaucracy, and some of us will get tied up inventing it. As an Agile community, we need to start talking more about how Agile interacts with the world outside, and what we want those checklists to look like. If we don’t talk about it, those that “just use” Agile might reasonably expect we’ve talked about it and solved it. Unless I’m mistaken, that is not yet the case.

Follow

Get every new post delivered to your Inbox.

Join 1,084 other followers