My current client has posed an interesting challenge: they have an SOA architecture and want to leverage it in some of their batch activities. Now, while SOA and batch may not jump out as obvious bedfellows, it doesn’t strike me as unreasonable to want to take advantage of an existing SOA investment. How do you process hundreds of thousands of records whilst reusing those lovely enterprise services you’ve spent so much cash creating – and do it in a performant way?
As you might expect, some reasonable concerns have begun to surface. But my client is currently taking a mature approach to the problem: while some of the team think there may be a problem, we don’t have numbers to prove it. And so, we’re going to get the numbers and let that inform our response.
So the point of my post is not to say what we found out, but rather the explore the intellectual space. After all, while SOA isn’t perhaps a natural candidate for batch, I must admit to being a little disappointed by the alternatives. How can you do SOA batch without dropping SOA? Well, I think there are some options. (Which I’ll cover in just a second, below.)
Having been through some of them, I feel a mixture of happiness with a twinge of disappointment. On the positive side, there are a few options that offer benefit, with varying degrees of cost. On the negative side, there’s no switch that’s going to make SOA performant without investment. In some ways, perhaps I’m asking SOA to address a use case it really isn’t intended for but I’m not so defeatist – clearly SOA batch is not out of the question. It’ll be interesting to see how this area develops – if enterprises are to extract their maximum value from SOA, then batch is a unique use case that cannot be avoided.
- Pare down the per-record process to a bare minimum. Rather than calling a single heavyweight service to do everything, perhaps part of the work can be carried out per-record, leaving the rest to be dealt with in the background. One could perhaps even take this to the extent of only performing validation in the per-record loop – a read-only SOA validation service isn’t completely out of the question. It is a necessary part of the implementation in any case.
- Make the services themselves a bit more batch oriented. Make the services accept 1..* records to work on, and supply them, perhaps 100 at a time. This really cuts down the round-trip time, at the expense of necessitating a bit of forethought in service design. But it’s an easy pattern to understand, and potentially one that could be retrofitted to an existing service layer if the ESB can be moved close enough to reduce the round trip overhead, or the implementation and interfaces changed slightly.
- Have a two stage process that validates the input, prior to processing the content, and decoupling them. The idea would be to perform a quick first pass (perhaps even not leveraging SOA at all), and then loading the known valid data into SOA in the background. Ideally, the validation step catches enough problems to make the remainder that fail at run time a not inconceivable problem to deal with operationally.
- Stick with SOA, but go for less heavyweight components. For example – in our case, we are using a BPEL engine to do the load and orchestration, but that could be switched out for a ESB only orchestration. A bit more fiddly, but doable.
- Some times, things can be done in different places. (In our case, actually they can’t, but I’ve seen this enough times to mention it.) For example, if part of the job is aimed at ensuring data wasn’t corrupted or truncated in transit, there are approaches to dealing with this at the network or transport layer that mean the service layer can be freed from such a menial task to do the heavier lifting.
- Process things in parallel, and leverage the spare capacity in your system. So, this only applies if there is capacity that can be used. But if you have it, then perhaps more of it can be dedicated to the batch processing at certain times (overnight or in quiet times). This can require some deep reconfiguration of the platform, perhaps to leverage multiple queues with differently performant configurations, but it is only configuration.
- Partition your environment, so that no matter how much you throw at batch, the rest of the system remains responsive and available. This is more of a environmental deployment approach, but if you can do it, it’s another option that doesn’t require re-development.
- Make your services batch oriented, but also take advantage of SOAP with Attachments and stream your data. Not something that can be done without effort. But if your payload has a few hundred thousand records in it, and you can avoid the overhead of a request/reply for each record, the saving could be significant. However, I don’t know of many tools that could take advantage of this without some clever implementation.
- In some situations, it might be possible to redeploy components so as they are co-located. It is clearly not always going to be possible. But if it is, and if the overhead associated with the across-the-network trip is a significant contributor to the problem, then this could really help.
And if I find out what the answer is, I’ll come back and let you know!