mail2thread - Identifies threads in given mail files
mail2thread [option]... [mail] ...
mail2thread -i [option]... mail ...
mail2thread -H
Identifies threads in the given mail files and outputs them in various ways.
Several options are designed to work with MH, the best mail front end around.
If mail is not given a scan listing with lines containing the message number and the corresponding subject separated by white-space is expected on STDIN.
If mail starts with @ a white-space separated list of mail files is read
from the file given after the @. You may use @- to read the list from
STDIN.
Thread identification is normally based on subject lines, and mails with empty or missing subject lines are ignored. Standard replies are handled properly.
However, with -i, the thread identification is based on the message ids and the reference data in the headers of the mails. This identification method needs to access the real mail files mail.
Normal output is a listing headed by one or more lines containing the subjects of a thread, and the contents of it indented by a tab.
Standard sorting tries to preserve a reasonable sort order derived from the ordering of the input files.
There are options for selecting a certain output style, options to control the handling of subjects and other options.
If this option is given, an additional line with a count is given for each thread just before the subject lines of that thread. Moreover the sorting order is changed to have the threads with the most posts at the top.
If this option is given, long names are output in any case. Normally numbers are output if the file names refer to mails in an MH folder. If -f is not given, the long name is exactly the path given in mail. If -f is given the long name is the absolute path returned by mhpath.
This option may not be used without giving mail arguments.
If this option is given, only the names or numbers respectively of the found mails are output.
Similar to -n but outputs one line for each thread. The first word in the
line is a sequence name starting with prefix and an additional running
number. The remaining words are the message in this sequence. This can be used
to build sequences from the found threads. Threads with only one entry are put
to sequence prefix0.
This option may not be combined with other output options.
Output only the threads and omit the names of mail.
Use exact matching of cleaned subject lines.
Normally the subject of a given mail is cleaned in various ways and then matched against the threads existing at that time. Normally a match is considered successful, if either the subject is a starting substring of the thread name or vice versa. In the latter case the name of the thread is used as the new name for that thread so always the longest names are taken as thread names. This permits shortened subjects to be considered part of a thread.
This option changes this behaviour and only exact matches of the cleaned subject line are considered successful. Thus every thread has exactly one subject.
Base identification of threads on the ids supplied in the mail instead of considering the subjects.
In this mode of operation the value of the Message-Id: header is used to
identify each mail. A thread is then a collection of ids. A given mail belongs
to a thread, if it relates to another mail in the collection of ids of that
thread by mentioning the id in a References: or In-Reply-To: header.
If this option is given, the arguments mail must be given since a simple scan on the standard input can not supply enough information in a comprehensible manner.
If this option is given, only subjects matching the regular expression key-re are considered.
This option may be given more than once.
If this option is given, all prefixes matching a pfx-re are removed from the subject in addition to some standard prefixes. The match is done case-insensitive.
If you give an empty argument to this option, all the standard prefixes are removed and you are on your own with a useful definition.
This option may be given more than once.
If this option is given, all suffixes matching a sfx-re are removed from the subject in addition to some standard suffixes. The match is done case-insensitive.
If you give an empty argument to this option, all the standard suffixes are removed and you are on your own with a useful definition.
This option may be given more than once.
This options define regular expressions, which prefix a part in a subject line, which refers to another subject this one was derived from. This is useful when a thread builds a tree connected by special subjects using this construct. The match is done case-insensitive.
If you give an empty argument to this option, all the standard was-res are removed and you are on your own with a useful definition.
This option may be given more than once.
Give the MH folder where mail is in. If this option is given, mail may consist of numbers only. The command mhpath is used to expand the numbers to paths to the mails.
This can be used to give the output of a normal MH command to mail2thread.
Note: If -f is given, the order of mail arguments is not preserved. Instead mail names are sorted numerically in large chunks.
Operate verbose.
Generate the man page for this program on standard output.
Some examples may illustrate the use of certain options and the interaction with MH commands.
The MH command
scan +folder -width 1000 -format '%(msg) %{Subject}'
will produce a valid input for folder folder. Just pipe it into mail2thread instead of giving mail arguments.
The pipe
mail2thread -s t folder/* |
xargs -l mark +folder -zero -sequence
marks all threads found in folder as sequences with names starting with
t.
mail2thread -f folder `pick +folder -from someone`
displays the threads in folder to which someone has contributed.
The pipe
mail2thread -t mails |
tr ' \011' '\012\012' |
tr -d ',():' |
sort |
uniq -c |
sort +0nr -1
lists all the words in the subject lines of mails with their frequency in the threads sorted by frequency with most frequent words first. Some annoying characters are stripped out first.
If you do not want threads combined to trees by using the was: constructm
yoi should give -w ''.
Because this is a Perl program, Perl (>= V5.005) must be installed.
This program needs the Tie::IxHash module installed. Try
http://search.cpan.org/search?mode=module&query=Tie%3A%3AIxHash
This program needs the great MailTools package installed. Try
http://search.cpan.org/search?dist=MailTools
This program needs the MIME-tools package installed. Try
http://search.cpan.org/search?dist=MIME-tools
If -f is used mail is accessed by an MH command. Then you need the MH program suite installed. See
http://www1.ics.uci.edu/~mh/
mh
Stefan Merten <smerten@oekonux.de>
This program is licensed under the terms of the GPL. See
http://www.gnu.org/licenses/gpl.txt
See
http://www.merten-home.de/FreeSoftware/mail2thread/