mail2clf - Convert mail into lines for a Web servers common log file format
mail2clf [-v] [mail...]
mail2clf -H
Converts important information from a mail to lines for a common log file. This format is used by web servers and there are a number of programs which create beautiful visualizations. See WEBALIZER for hints for a configuration file for webalizer.
If mail is given at least once, each must be a path to a file containining a
single mail. These mail files are converted. Use - to read a single mail
from STDIN.
Omitting mail allows a combination of mail2clf with mail2thread. The output of mail2thread -l is expected on STDIN. The result gives a better picture regarding threads in a bunch of mails since all the mail belonging to a single thread is taken as a hit to a single file named as the thread. -l must be given to mail2thread to have long file names instead of just numbers.
The input is expected to be generated by mail2thread -l and thus must follow a simple format of two types of alternating line blocks.
One or more lines without leading whitespace give one or more cleaned subjects belonging to a single thread. If there is more than one subject line, the first one is used as the thread name. However, it is often useful to run mail2thread with -e so an exact match is done on the cleaned subjects and there is only one subject line for each thread.
After that one or more lines with leading whitespace gives the names and full subjects of the mail files belonging to that thread. The first word of the line gives the file name and the rest of the line is ignored.
Operate verbose.
Generate the man page for this program on standard output.
If an unknown option such as -. is given, a short usage message is generated.
Since webalizer is a free tool to produce nice graphics, it may be used for the visualization.
The following are useful settings in a webalizer configuration file. Only the differences to the sample file found in the webalizer documentation are given.
No log file may be given to use STDIN.
If a mailing list is visualized, the mail address of the list is a good value for this variable.
Since the subjects are mapped into file names without any extension, this must be given and it needs to be an empty string.
The notion of a visit doesn't make much sense for a mailing list, so this should be turned of.
Since the host names are mail addresses of the senders of the mail, this makes no sense.
So far neither the referrer field nor the client field of the combined log file format is used.
The notions used by webalizer or other visualization tools are meant to be used for web server statistics of course. However, mail2clf maps mail, so the following notion mapping applies.
Web Mail
--------------------------------------
URL (Thread) subject
Hits Number of single mails
Files dito
Site Address of mail author
KBytes Size of mail body
Visits Makes no sense
Entry pages dito
Exit pages dito
The following pipeline produces a visualization of the mail from a mailing list considering threads.
mail2thread -p '\[[a-z]?ox\]' -e -l ~/Mail/oekonux/arc/* |
mail2clf |
webalizer -i
The mail is stored in ~/Mail/oekonux/arc.
There are a number of possible prefixes due to the mailing list all matching
\[[a-z]?ox\].
The subjects in the mailing list are used quite disciplined so -e can be
used.
webalizer may not use any history information.
A file webalizer.conf needs to be found in the current directory.
Because this is a Perl program, Perl (>= V5.005) must be installed.
This program needs the great MailTools package installed. Try
http://search.cpan.org/search?dist=MailTools
This program needs the Time-modules package installed. Try
http://search.cpan.org/search?dist=Time-modules
mh
mail2thread
Stefan Merten <smerten@oekonux.de>
This program is licensed under the terms of the GPL. See
http://www.gnu.org/licenses/gpl.txt
See
http://www.merten-home.de/FreeSoftware/mail2clf/