using error topics in camel to allow bad messages to escape

Mon, Jul 16, 2012

In the first version of Matrix Provisioning the modules that handle the local systems such as Active Directory and Blackboard were responsible for dealing with messages they couldn’t process. For example if a message contained an invalid location code for an account then that account can’t be provisioned in Active Directory as the module doesn’t know where to create it in the tree. It’s a classic example of an error that’s not going to go away and it needs manual intervention to sort the bad location code. Either manually changing it in the message or deleting the message and issuing a new one with the correct location code. The module would persist the message to a database in this case and email someone to come and sort it but I’d rather keep the modules as simple and focussed as possible. More along the lines of the Single Responsibility Principle, although it’s quite a broad responsibility in that it creates/updates/deletes accounts in Active Directory based on messages coming from an ActiveMQ broker. What I’d rather do is allow the module to say to the broker ‘you’ve sent me a bum message, do something about it’ and forget about it. So I had a read of Enterprise Integration Patterns and decided to use the Invalid Message Channel. The flow is summarised in the diagram.

Matrix Provisioning with error topics

It’s a steady flow of messages from the broker to the modules with ‘eddy currents’ that are trapped in the error topics which provide the IMC functionality. So what happens now is when a module cannot process a message a set number of times it’s parked in that module’s error topic and someone is notified by email. Initially the message goes from the broker to the module and if the module can’t process it, it sends the message to its error topic where Camel adds a retryHeader, delays the message and then sends it back to the module’s main message topic where it comes in again as if it’s a new message but with a retryCount header. If it fails again it goes round the loop again, having its retryCount header incremented. When it comes into the error topic with the retryCount value breaching a set limit, Camel punts it off to the module’s IMC and someone is notified to go and fix the message.

The Camel config for the error looping is:

<route>
  <from uri=“activemq:topic:activedirectoryerror”/>
  <transacted />

<filter> <simple>${header.retryCount} == null</simple> <setHeader headerName=“retryCount”> <simple>0</simple> </setHeader> </filter>

<choice> <when> <simple>${header.retryCount} > 10</simple> <to uri=“activemq:topic:activedirectoryimc”/> </when> <otherwise> <delay><constant>1000</constant></delay> <setHeader headerName=“retryCount”> <simple>${header.retryCount}++</simple> </setHeader> <to uri=“activemq:topic:activedirectory”/> <stop/> </otherwise> </choice> </route>

So in the case of the target system being offline, the error topics provide lounges where messages can relax at their leisure while the target system remains down. When it comes back the messages will automatically flow out of the error topics and be processed as normal. Of course this relies on the delay and retryCount values being optimised for expected downtime durations otherwise if the target system is down for an hour or so, all the messages could end up in the IMC. But that shouldn’t really be a problem as I’m working on helper scripts that trawl the IMC looking for messages to replay. I can do this as the modules set Matrix specific headers in the messages when they send them to the error topics. These headers contain the information about the error and quick access details about the failed account. So it should be possible to pull all ‘connection refused’ messages, for example, out of the IMC and send them back to main topic with their retryCount header removed.

comments powered by Disqus