NAME

mail2thread - Identifies threads in given mail files


SYNOPSIS

mail2thread [option]... [mail] ...

mail2thread -i [option]... mail ...

mail2thread -H


DESCRIPTION

Identifies threads in the given mail files and outputs them in various ways.

Several options are designed to work with MH, the best mail front end around.

If mail is not given a scan listing with lines containing the message number and the corresponding subject separated by white-space is expected on STDIN.

If mail starts with @ a white-space separated list of mail files is read from the file given after the @. You may use @- to read the list from STDIN.

Thread identification is normally based on subject lines, and mails with empty or missing subject lines are ignored. Standard replies are handled properly.

However, with -i, the thread identification is based on the message ids and the reference data in the headers of the mails. This identification method needs to access the real mail files mail.

Normal output is a listing headed by one or more lines containing the subjects of a thread, and the contents of it indented by a tab.

Standard sorting tries to preserve a reasonable sort order derived from the ordering of the input files.


OPTIONS

There are options for selecting a certain output style, options to control the handling of subjects and other options.

Output options

-c
--count

If this option is given, an additional line with a count is given for each thread just before the subject lines of that thread. Moreover the sorting order is changed to have the threads with the most posts at the top.

-l
--long

If this option is given, long names are output in any case. Normally numbers are output if the file names refer to mails in an MH folder. If -f is not given, the long name is exactly the path given in mail. If -f is given the long name is the absolute path returned by mhpath.

This option may not be used without giving mail arguments.

-n
--name

If this option is given, only the names or numbers respectively of the found mails are output.

-s prefix
--sequence prefix

Similar to -n but outputs one line for each thread. The first word in the line is a sequence name starting with prefix and an additional running number. The remaining words are the message in this sequence. This can be used to build sequences from the found threads. Threads with only one entry are put to sequence prefix0.

This option may not be combined with other output options.

-t
--thread

Output only the threads and omit the names of mail.

Thread identification options

-e
--exact

Use exact matching of cleaned subject lines.

Normally the subject of a given mail is cleaned in various ways and then matched against the threads existing at that time. Normally a match is considered successful, if either the subject is a starting substring of the thread name or vice versa. In the latter case the name of the thread is used as the new name for that thread so always the longest names are taken as thread names. This permits shortened subjects to be considered part of a thread.

This option changes this behaviour and only exact matches of the cleaned subject line are considered successful. Thus every thread has exactly one subject.

-i
--ids

Base identification of threads on the ids supplied in the mail instead of considering the subjects.

In this mode of operation the value of the Message-Id: header is used to identify each mail. A thread is then a collection of ids. A given mail belongs to a thread, if it relates to another mail in the collection of ids of that thread by mentioning the id in a References: or In-Reply-To: header.

If this option is given, the arguments mail must be given since a simple scan on the standard input can not supply enough information in a comprehensible manner.

-k key-re
--key key-re

If this option is given, only subjects matching the regular expression key-re are considered.

This option may be given more than once.

-p pfx-re
--prefix pfx-re

If this option is given, all prefixes matching a pfx-re are removed from the subject in addition to some standard prefixes. The match is done case-insensitive.

If you give an empty argument to this option, all the standard prefixes are removed and you are on your own with a useful definition.

This option may be given more than once.

-P sfx-re
--suffix sfx-re

If this option is given, all suffixes matching a sfx-re are removed from the subject in addition to some standard suffixes. The match is done case-insensitive.

If you give an empty argument to this option, all the standard suffixes are removed and you are on your own with a useful definition.

This option may be given more than once.

-w was-re
--was was-re

This options define regular expressions, which prefix a part in a subject line, which refers to another subject this one was derived from. This is useful when a thread builds a tree connected by special subjects using this construct. The match is done case-insensitive.

If you give an empty argument to this option, all the standard was-res are removed and you are on your own with a useful definition.

This option may be given more than once.

Other options

-f folder
--folder folder

Give the MH folder where mail is in. If this option is given, mail may consist of numbers only. The command mhpath is used to expand the numbers to paths to the mails.

This can be used to give the output of a normal MH command to mail2thread.

Note: If -f is given, the order of mail arguments is not preserved. Instead mail names are sorted numerically in large chunks.

-v
--verbose

Operate verbose.

-H
--help

Generate the man page for this program on standard output.


EXAMPLES

Some examples may illustrate the use of certain options and the interaction with MH commands.

Generating input for STDIN

The MH command

scan +folder -width 1000 -format '%(msg) %{Subject}'

will produce a valid input for folder folder. Just pipe it into mail2thread instead of giving mail arguments.

Using -s

The pipe

mail2thread -s t folder/* |

xargs -l mark +folder -zero -sequence

marks all threads found in folder as sequences with names starting with t.

Using -f

mail2thread -f folder `pick +folder -from someone`

displays the threads in folder to which someone has contributed.

Using -t

The pipe

mail2thread -t mails |

tr ' \011' '\012\012' |

tr -d ',():' |

sort |

uniq -c |

sort +0nr -1

lists all the words in the subject lines of mails with their frequency in the threads sorted by frequency with most frequent words first. Some annoying characters are stripped out first.

Using -w

If you do not want threads combined to trees by using the was: constructm yoi should give -w ''.


PREREQUISITES

Because this is a Perl program, Perl (>= V5.005) must be installed.

This program needs the Tie::IxHash module installed. Try

        http://search.cpan.org/search?mode=module&query=Tie%3A%3AIxHash

This program needs the great MailTools package installed. Try

        http://search.cpan.org/search?dist=MailTools

This program needs the MIME-tools package installed. Try

        http://search.cpan.org/search?dist=MIME-tools

If -f is used mail is accessed by an MH command. Then you need the MH program suite installed. See

        http://www1.ics.uci.edu/~mh/


SEE ALSO

mh


AUTHOR

Stefan Merten <smerten@oekonux.de>


LICENSE

This program is licensed under the terms of the GPL. See

        http://www.gnu.org/licenses/gpl.txt


AVAILABILTY

See

        http://www.merten-home.de/FreeSoftware/mail2thread/