The following sections give a high-level overview of the design of the BuzzSaw log processing framework. The implementation is based on the design philosophy described in the introductor section of the documentation.
The entire BuzzSaw system can really be reduced down to the need to do two specific tasks: importing of data and report generation. The whole system revolves around the central database into which all necessary data is stored.
All events of interest are stored in the database. The decision was made to use the PostgreSQL server because of it's excellent feature set, reliability and scalability. It was clear from the outset that there would be the potential to eventually store a very large number of log messages (and associated derived data) so scalability and speed is of particular concern.
A full description of the database schema is given
elsewhere. The high-level view is that each log message of
interest is recorded as an event. Associated with
each event is a set of zero or more tags and
zero or more pieces of extra_info. An event is
split down into fields representing the date/time, hostname, user,
program, process pid of the program and the full message. Tags are
simple labels applied to an event (e.g. auth_failure
)
whereas extra information entries have both an arbitrary name and
value (e.g. source_address
. For speed many of these
fields and combinations of fields are indexed to improve query
times.
The BuzzSaw interface to the database (see
the BuzzSaw::DB
module for full details) is built
using the Perl DBIx::Class
object-relational
mapper. This is an excellent module which provides the ability to
very easily handle complex queries. For speed in a few parts of
the code base we do use raw SQL statements via the standard DBI
module but that is only where absolutely essential.
The implementation of various internal processes relies on PostgreSQL functions and triggers which means that BuzzSaw is currently only going to work with PostgreSQL. Having said that, it's not likely to require a lot of work to rewrite those features into the language supported by some other database engine if required.
The import process is driven by
the BuzzSaw::Importer
Perl module. The import process
reads through the log messages from each data source. If an event
has not previously been stored in the database then it will be
parsed and the event data will be put through the stack of
filters. If any filter declares an interest in an event then it
will be stored at the end of the process. Additionally, any filter
can attach tags and associated extra information even if it does
not declare an interest in the event being stored.
The importer process can have any number of data sources. A
data source is any implementation of
the BuzzSaw::DataSource
Moose role. The data source
is required to deliver log messages one at a time to the importer
process.
Currently there is only
the BuzzSaw::DataSource::Files
Perl module. This
module can search through a hierarchy of directories and find
files which match a POSIX or Perl regular expression. As well as
standard text files, it supports opening files which are
compressed with gzip or bzip2. When a file is opened a lock is
recorded in the database to avoid multiple processes working on
the same data concurrently. When the reading of a file has
completed the name is recorded in the database along with the
SHA-256 checksum of the file contents. This helps avoid
reprocessing files which have been seen previously.
Each data source requires a parser module which implements
the BuzzSaw::Parser
Moose role. The parser module is
used to split a log entry into separate parts, e.g. date, program,
pid, message. Mostly this is a case of being able to handle the
particular date/time format being used in the log entry. The
parser module is called on every log message so it is expected to
be fast.
Currently there is only the BuzzSaw::Parser::RFC3339
Perl module. This handles date/time stamps which are formatted
according to the guidelines in RFC3339 (e.g. looks
like 2013-03-28T11:57:30.025350+00:00
).
After a log message has been parsed into various fields as an event it is passed through a stack of filters. All events will go through the filter stack in the same sequence. It is possible to make decisions in one filter based on the results of previous filters. If one or more filters declare an interest in an event it will be stored. It is not possible for a filter to overturn a positive vote from any previous filter.
A filter is an implementation of
the BuzzSaw::Filter
Moose role. Currently there are
the following filters: Cosign, Kernel, Sleep, SSH and
UserClassifier. Most of them are straightforward filters that
examine events and return a note of interest, where necessary,
along with some tags or other information. The UserClassifier
module is slightly different in that it never declares an
interest, it just adds extra details when the userid field has
been set by any previous filter in the stack (e.g. Cosign or
SSH). Typically this module is added last in the stack so that it
can process the userid value from any previous filter.
The reporting process is driven by
the BuzzSaw::Reporter
Perl module. This module has
a record of reports which should be generated on an hourly,
daily, weekly or monthly basis. When it is run it is possible to
run it in two modes. Either it is limited to running a specific
set of reports (e.g. only hourly) or it is possible to ask it to
run all jobs of all types which have not been run recently
enough. So, in the latter case, if a weekly job has not been run
for 8 days it would be run immediately. A record is kept of when
each report was last run.
A report will select all events which are have certain tags which occurred within a specified time period. The ordering of the events records retrieved can be controlled.
A report can be generated using the
generic BuzzSaw::Report
module or, more typically, by
implementing a specific sub-class which is used to specify the
names of the relevant tags, the time period of interest, the name
of the template to be used, etc. For convenience, when using a
sub-class most of these attributes will have sensible defaults
based on the name of the Perl module.
A sub-class of the BuzzSaw::Report
module can
override specific parts of the process to do additional complex
processing beyond the straightforward selection of events and
subsequent printing of the raw data. For example, the Kernel
report carries out extra analsis of the kernel logs to collate
events which are associated with particular types of problem
(e.g. an out-of-memory error or a kernel panic).
A report is generated by passing the events and any results from additional processing to a template which is handled using the Perl Template Toolkit. A report can be simply printed to stdout or sent via email to multiple recipients.