Have had discussions recently with various colleagues and clients about the way in which business-level transactions need to be implemented in a reliable way that also supports cancelling the actions that have been performed if and when something goes wrong in the process.
When using technology, the use of XA-compliant transaction managers and two-phase-commits give much of what is required. However, those facilities are not always available, and even when present they only cover a particular set of steps or actions in an overall process.
It is important to take a higher-level view of the operations being performed by the associated business process (not only BPEL and BPM, but the real business process, involving people and actions).
The need to unwind or roll back an operation, or sequence of operations, is not only down to technical updates in a database, but has to consider the implications this has on the business.
Where a plain rollback is not possible, there is a need to perform some form of ‘compensating transaction’ to correct the erroneous actions. This can be quite complex, depending on what update(s) went first, what auditing is required, how the updated data might already have been used, and what actions should follow the ‘corrective action’. In an SOA environment, this logic is also to be applied to the definition and implementation of services – if an ‘update’ service is provided, consider if there is a need to provide a related ‘revert update’ service that records the necessary information.
An architectural principle may be defined to state that application updates should not require compensating transactions, but in reality there will often be a need to do some form of update and change of processing.
Caution is required to avoid only performing basic updates to reverse the original transaction. From an accountancy perspective, this is often shown by a balancing transaction that negates the original entry, rather than removal of that older entry. Applications should correct data in the same way, making it easier to identify the sequence of events of the overall business process.
Looking again at the technical aspects, the implementation of multi-stage transactions begins with transaction handlers, which normally exist within the application framework or container (or the operating system itself). As more and more steps occur within a process up to the point at which a failure may arise, this requires additional routes for error handling to unwind the corresponding transactions and notifications.
This may be implemented using specific code in an application, but this may bring problems of its own in complexity and reliability, with maintenance also being made more difficult should any of the steps be changed. We have looked at how this error handling is implemented by the BPM tooling of WebSphere Process Server, and this does provide a clean way to applying compensating transactions based on the logic applied before identification of an error. That does seem to be a great benefit for complex business processes with automation in this way, but it isn’t exactly cheap if considered just as a way to wrap up error handling.
Anyway, discussions come up with various thoughts and recommendations as to how best this is solved for different situations. The most important thing to note is that analysis and design of business processes MUST address the approach taken to transaction handling in error conditions.