Tips for logging and error notifications for unattended applications

Jan 12, 2008 / software
Tips for logging and error notifications for unattended applications

In this post, I want to share a few tricks I've learnt when implementing logging and notifications for applications that run "in the background" without human intervention.

Background

I find that .NET Console applications are extremely useful for solving particular kinds of problems when developing software solutions. This is mainly because they can be scripted. Another reason is that you can schedule them to run at frequent intervals using Windows Scheduled Tasks.

For example

[download.cmd]
DownloadNewDataByFTP.exe server=100.10.10.1 user=test pass=xxx

This little windows script executes console application with a few parameters, which in turn knows how to contact a server and retrieve certain files, making sure I don't have them already.

To add scheduling, I can then create a windows scheduled task to run this cmd file every minute, hour or day.

Tip: .NET Console applications combined with Windows Scheduled Tasks is a good solution for running frequent processes without requiring human intervention.

Usually, I'd have several programs that perform different processes (download, load, transform, post-processing etc). I can use scripts like this in conjuntion with the scheduler to organise which processes are run at which times and in which order. I can set different machines to run different tasks if I want to, which allows me to make use of the hardware available.

Tip: Write small, cohesive applications that can be put onto different machines to distribute the workload when the application needs to process more throughput.

So, using console apps and the scheduler, you end up with a kind loosely coupled, scalable, distributed architecture.

I've found this is useful when building systems such as these:

Gathering data from remote sources and loading it into a data warehouse.
Frequently running intensive transformations on data sets.
Having multiple web spiders on multiple machines collect data from various sites.
Building indexes of gathered data (with Lucene, for example)
Generating 1000's of reports on a nightly basis (for example, using several machines to create PDF reports from customer data)
Any heavy-duty data processing tasks that must run on a schedule

For each program, you can vary the frequency of tasks, the computers it runs on, and the configuration for each task (such as which database to use, or where to store tmp files). This gives quite a lot of flexibility.

A typical example might a web spidering application. You could have 3 low-end servers dedicated to collecting documents. Another machine dedicated to running the database software. One more as a replicated database slave. Another machine could then build Lucene indexes and run periodical searches. All these tasks are written as console applications that can be moved around or duplicated onto new machines as needed.

Tip: Keeping programs small and focused often allows you to use many smaller, cheaper servers dedicated to a few tasks. This lets businesses keep their scaling costs down

Another approach that I often use is to write one program, that performs lots of functions. However, the function is chosen by the parameters passed in. For example, three different uses of the same program:

SpiderAdmin.exe cmd=spider site=www.tobinharris.com
SpiderAdmin.exe cmd=build_lucene_index
SpiderAdmin.exe cmd=compress_lucene_index 
SpiderAdmin.exe cmd=export_articles_by_search keywords=".NET, Console Applications"

Tip: One big program might be more manageable to develop, just make sure you can use parameters to run specific sub-tasks

The need for monitoring

Once you're using this approach, the problem of monitoring raises it's head. How do we actually know if the programs are running ok?

When I use this kind of architecture, I rarely watch the programs run (except during development). After all, the idea is that the programs just get on with their job without human intervention. However, at some point we need to know what's going on with these programs. In fact, I've found that we want to know quite a few things:

Did the scheduled task actually run?
How long did the programs execute for?
Did they do any significant work?
Did any of the programs encounter errors?
If the program did fail, how can I look at the conditions that lead to failure?

This also raises some additional questions:

Who needs to see this information?
Where should the information be stored, in a file, a database, emails?
How often do people need to see the information?

Tip: Think hard about the minimum information people need to be confident that things are running smoothly.

Step 1: Logging

The solution I'm liking at the moment involves a few parts, the first being logging. The idea here is that each of your console applications verbosely records lots of information about what it's been up to. For example, a program that frequently downloads data might output this:

DEBUG: Attempting to connect to ftp://www.somewhere.com/
DEBUG: Found 5 files for download
DEBUG: Downloading file 1 of 5
WARNING: Server disconnected. Will try again in 1 minute.
DEBUG: Downloading file 1 of 5
ERROR: Server disconnected more than once, someone might need to contact the server administrator
etc...

In this example, DEBUG entries are simply those that contain information useful for debugging. WARNINGS are things that might lead to an error at some point. Also, I'd usually log other information such as a timestamp of the event and perhaps the location in the code that called the event.

During development, I print this information out to the console screen, so I can see what progress a program makes as I run it. Eventually, I'll put the program live, and I can then have the program dump this information to a log file that I can come back and review at a later date. This is particularly handy if there are problems and you want to see what might have caused them.

Tip: Log verbosely enough so that you can examine the logs and see where things are going wrong.

Log4Net

I actually use a ready-built open source tool to help with my logging needs. It's called Log4Net, and is very feature rich, flexible, mature and well documented.

Tip: Use a ready-to-run logging framework. There is a learning curve, but in return you get mature, flexible features that make logging much easier.

Log4Net has lots of nifty features that help with logging, such as the ability to automatically start new log files each day so that you can easily look up any errors for a given date.

Tip: Configure your logging tool to rotate log files based on size and date. This keeps them small and manageable, and allows you to pinpoint problems without having to wading through reams of information.

Log4Net gives us more "categories" or "levels" that we can use when outputting log information - DEBUG, INFO, WARNING, ERROR and CRITICAL. You can also create your own.

Here's what I'm doing with these:

Use DEBUG for almost everything for verbose logging.
Use INFO for things that you want the user to see (notifications)
Use WARNING for things that could be indicative of a problem, which is handy for seeing the lead up to an ERROR.
Use ERROR for things that actually go wrong.

By using logging for your unattended applications, you are systematically recording their output in one log file per day, which you can go back and inspect at any time. This answers the question of "How can I go and look at the conditions that lead to failure?". But I haven't yet tackled the other questions of "Did the actually program run?" or "Did it do any significant work?" etc.

Step 2: Notifications

It's all very well having your programs write tons of verbose information to a log file so that you can go an trace the cause of a problem 3 days after it happened. But how do you actually learn about problems happening? One option would be to just "notice" that things aren't working right, and then go inspect the log files every day. This is pretty low-tech, and often relies on esp! There are better solutions.

We really want the bad (or good) news to come to us, rather than going to it. So, email to the rescue...

Tip: Make sure that you make the important log information come to you (or whoever needs to see it). This might mean using dashboards, emails or MSN etc.

Log4Net can be configured to output log messages to various places, including the console, a log file, a database table, and even an email. Furthermore, you can also control what kinds of information get emailed - DEBUG, INFO, WARNING or ERROR.

A simple solution for getting notifications is to configure Log4Net to send you any INFO or ERROR messages. The INFO messages let you know that things are ticking over, and the ERROR messages alert you to any problems that might need manual inspection. When receiving the latter, you might want to go and open up an actual log file to see what led to the problem.

Tip: For small solutions, it's very simple to cofigure logging frameworks to have ERROR notifications sent on-the-fly

This is all fine, but what if you have 100 applications outputting 1,000,000's of INFO messages a day!!!? You really don't want these programs sending you tons of emails.

Another option would be to just receive the ERROR notifications, but then how do you know that the programs are ticking over? The next step hopefully addresses all these issues.

Tip: Decide upfront how the logging information will be actually used. This will make you write much more manageable logging code.

Step 3: Better Notifications

For many applications, it's good to know about problems and progress (ERROR and INFO) on a daily basis. So, someone could simply check a daily email and decide if they need to take further action.

A nice way of doing this is as follows.

Have your programs log to log files as usual. In addition, log INFO and ERROR messages to a database table. With Log4Net, you can do this by just adding an extra few lines to the config file.
Write a program that reads these records, and sends the day's entries in an email once a day. This "emailer" program can also run on a scheduled task.
job done!

All this can be done in about 10 lines of code, as I used this cool utility class for simply reading a DataSet, converting it to text, and dumping it in an email. I wrote that for SqlBuddy years ago, and the class works well for small datasets (< 500 records).

Tip: Logging summary information into a database table makes it easy to build reports or dashboards that staff can use to quickly assess if systems are running smoothly

This works well for for applications that don't generate 1,000,000 of errors and info messages a minute. I don't have experience with those, but because the data is in the database, you could go a step further and write some simple code that groups the records by type, and presents a summary report. Perhaps only showing details of the 50 most recent errors.

**Daily Email**

Type      Location                                 Count
INFO       EtlLoader.Importer                  5134
INFO       EtlLoader.Transformer            3256
ERROR    EtlLoader.Downloader             2  

**Top 50 Recent Error Messages**
ERROR: 2008-01-01 08:22:12. Server kicked us off!
ERROR: 2008-01-01 08:52:17. Server kicked us off!

Tip: Consider writing your logs in such a way where they can be aggregated usefully, using SQL GROUP BY statements for example

Regardless of if you need to aggregate the information in this way, one benefit of daily notification emails is that you get to see if the program are actually running at all! If the programs aren't doing anything, there won't be anything logged in the database table or log files, but you will still get an empty email!

Summary

All this stuff isn't rocket science, but if you're new to logging and notifications then hopefully it will give you a starting point.

Logging information in the first place is a good thing to do for many types of application. Using some simple techniques, you can quickly add notifications into the mix without having to write tons of code. Some of the ideas above worked particularly well with Log4Net.

You may also like...

Jan 12, 2008 / software Tips for logging and error notifications for unattended applications