I was recently faced with this question of how to ease out log analysis for our production system. We run a cluster of sorts with applications running on 4 tomcat fronts. The site in question has requests/day going into millions. So, it is not a surprise the there is a flooding of logs. It gets really tough to monitor logs to see what’s for example the highest occurring WARN messages if you end up logging 400 MB of log statements per hour. So I came up with script that might help you as well if you are looking for some quick filtering of logs. We have the following log format
{DATE} {TIMESTAMP} {LOG_LEVEL} {MESSAGE} | {LOGGING COMPONENT} {TOMCAT-HTTP-PROCESSOR}
here’s the script:
use Getopt::Long;
GetOptions("file=s","level=s");
$w = "(.+?)";
if($opt_file){
$DISTILLED_LOGFILE = "distilled-".$opt_level."-".$opt_file;
open(INPUTFILE, "$opt_file") or die("Could not open log file.");
open OUTPUTFILE, ">", $DISTILLED_LOGFILE, or die("Could not create filtered log file.");
foreach $line () {
$line =~ m/^$w $w $w $w \| $w \[$w\]/;
$date = $1;
$timeStamp = $2;
$logLevel = $3;
$message = $4;
$classLocation = $5;
$httpProcessor = $6;
if($logLevel eq $opt_level) {
print OUTPUTFILE "$logLevel\t$message\t$classLocation\n";
}
}
close(INPUTFILE);
close(OUTPUTFILE);
}else {
print STDOUT "You didn't select a file!\n";
};
This is how to use it
filter_logfile.pl -file someTomcatlogFile -level logLevel
What it does is pretty simple. matches each line for a regex pattern, checks if the line is the same log level as provided by you on command line; if yes. it copies the Log Level; Message and the component logging the message to a new file named
distilled-LOGLEVEL-[YOUR-INPUT-LOG-FILE-NAME].
Happy Log filtering!