on the philosophy of logging
Mon, Oct 17, 2011
When your system is out in the wild, more often than not logging is the only way you can tell what’s going on if things go a bit wonky. Logging is your app’s way of saying something went wrong and why. Although I use the word “why”, the semantics aren’t that simple but the logs are the first stop in tracing the lineage of a fault. So it’s important to get it right and after seeing many logging patterns, both good and bad, my thoughts on the subject might be of some help to folk new to application logging.
The first question to ask yourself is, who’s doing the logging? Is it the caller or the callee? A common problem I see is what I’ll call Dispersed Logging. This is where the caller and the callee each have their own log files and tracking down a problem involves analysing both files and trying to sync dates and times, especially if the calling is done over JMS.
In the above example, at some point in the workflow of AClass, it needs to call a helper function in the Utils class which has the method signature:
i.e. the method will do something and return a value that if greater than zero means it worked. Otherwise it’s going to return -1 and log the error in its own log file. At the same time, AClass is going to log the same error in its log file, so you end up with redundant log entries, with differing styles that may be out of sync with each other. If you want to know why the call to doit() failed you have two log files to look at. Obviously, UtilsLog is going to have the detail of the error while all AClass can say is, it failed with -1.
A better approach is to change the Utils method signature to let the caller handle the error and log it itself:
In the way of the unix, we don’t care what the return value is as long as it worked. If it didn’t work we want to know why and an error code isn’t going to cut the mustard. We want an Exception. This pattern I call Non Contextual Logging as in all probability it dumps the exception and moves on. I’ve seen stacktraces simply dumped to stdout with no explanation of why they’re being dumped. There is no context to the logging information. What was AClass trying to do when the stacktrace was dunped? This isn’t helped by systems that throw exceptions for no good reason, such as Blackboard. If you search for a user using the Blackboard API it will throw an exception if the user isn’t found. That’s not an exceptional situation. It’s a normal one. You can absolutely expect a user to not be in the system so why does Blackboard treat it as an exceptional situation and throw an Exception? I can’t answer that but it means the code for creating accounts in Blackboard looks very defensive indeed, having to catch exceptions that need to be eaten rather than passed up the chain. But that’s a different problem of design. If you design Utils.doit to only throw a UtilsException if something exceptional occurres then AClass can deal with that using contextual logging, such as shown below.
Utils no longer logs and AClass performs its work and if it catches a UtilsException it logs the error in a contextualised manner. What I mean by that is, AClass was doing something when the exception was thrown, so it says what it was doing in the log:
Everything required to track a fault is on one line in the log file. You definitely don’t want to be doing this:
This is what I call Interleaved Logging. If the code above is a JMS message handler, or indeed any form of reentrant code, you’re going to get different threads calling the code at the same time and the log lines will get mixed up. You’ll end up thinking you’ve created an account when you haven’t and matching up log lines with thread accesses will be nigh on impossible. Keep your log lines contexualised and single. Cram as much info into each one as possible in order to provide as much context as possible. This makes it much easier to track down faults.
I mentioned eating Blackboard exceptions rather than throwing them back up the chain. So when should you eat exceptions? Are they good for your digital digestion? Well it depends on what the exception is and what you’re trying to do. Let’s take the Blackboard example a bit further:
This is far too defensive code. It has to eat the first exception in the context of looking for an account in order to create it if it doesn’t exist. But to do so it has to eat the resulting exception. If the BB system simply blows up instead, the outer try/catch logs and throws instead. Very messy. Note the logging in the catch block. In this context, AClass is the authority for creating accounts so all logging in that contextual domain is delegated to it. Anything to do with account creation detail is logged by AClass. In MVC, we would expect AClass to be the Model and we would go there for specific logging details if things go wrong.
So to summarise, when you’re logging, keep it contextual and keep the details of each fault contained in its own log line. You’ll be grateful you did when you’re tracking down those faults.