|oSoSo —> filtergetmail|
getmail is a great program for downloading mail from POP3 servers to local maildirs. It just has got one drawback: It fetches all messages without any filtering. This is extremely annoying when there are a lot of (big) viruses in your mailbox and you have a slow internet connection. It would be much better to consider the headers and perform filtering before downloading the whole mails. Well, run mailfilter before executing getmail, one might say. However, this solution has got three disadvantages:
The latter two problems are inherent to the approach of having mail filtering done by a separate program which is completely independent of the actual download program. So it might be a good idea to integrate the filtering feature into getmail:
As getmail is written in Python and as I like Python very much, I modified the source code, so that it performs three steps for every mail:
You will find more detailed documentation below.
To be honest, I didn't bother adding a switch to getmail's configuration file syntax to turn filtering on or off for specific mail accounts. When the patch is applied, getmail will filter every mail. I know that this is a serious drawback, but I think that Charles Cazabon, the author of getmail, could integrate the filter feature more quickly than I can, and I encourage him to do so. Oh, don't send mails to him, I already notified him of this web page.
If you do not receive many unwanted mails, the overhead of downloading the headers of wanted mails twice will be greater than the benefit of not downloading the bodies of unwanted mails. In this case, you had better use the original getmail and filter your mails after downloading.
The patch has to be applied to /usr/lib/getmail/getmail.py (or wherever you installed the Python scripts). It has been tested to work with version 3.2.1. It might work for other versions, too, but you will probably have to apply the patch manually (using an editor).
New in version 220.127.116.11: Skipped message are now logged to log file (when using --message-log).
Warning: When testing it, set the
delete flag of getmail to zero, so that you will not lose mail in case anything goes wrong. You can set
delete to 1 when you are convinced that everything works.
This is how it works in detail:
Instead of downloading a mail, filtergetmail (that's how I call the patched version of getmail, even if it's still named getmail.py) sends a TOP command to the POP3 server. I haven't been able to check what happens if the server doesn't support it - anyway filtergetmail won't do any error handling. Then, filtergetmail opens the filter program (currently hardwired to ~/.getmail/filter) as a pipe, supplies the mail size in octets (i.e. bytes) as first parameter and writes the following things to the pipe's stdin:
E.g., what filtergetmail does might look like this example invocation of ~/.getmail/filter, if you did it manually. Please note that, as no body lines have been retrieved (which can be changed in the source), there are two newlines at the end: The newline of the last header line and the empty line as separator between header and body.
If the filter program returns "1" as exit code, filtergetmail will log "skipped" as informational message instead of retrieving and delivering the mail. The patch doesn't interfere with the deletion process, i.e. filtered mails will be deleted as usual if you configured getmail to delete mails after retrieving.
If the filter program wasn't found or wasn't executable or if its exit code was not 1 (preferably 0 instead), filtergetmail will retrieve the mail as usual.
Note that the filter program is invoked for every single mail. If it takes too long to execute, it will slow down mail downloading, because filtergetmail waits until its termination before continuing talking to the server.
The only logging filtergetmail performs is the informational "skipped" message. You will rather want to implement your own logging in the filter program in order to be able to investigate why a specific mail has (not) been filtered.
You may want to read the (very basic) example filter script, save it as ~/.getmail/filter, make it executable and play around with it. But make sure that getmail's
delete flag is set to zero as long as you are not completely sure that it works as intended.
By the way, you will have to write your own spam filter. I will not supply one here.
Finally I'd like to say again that I encourage the author of getmail to modify the filter feature as he sees fit and integrate it into the main getmail sources, because currently it's just too hackish (too many hardwired things).
Have a look at an example output of filtergetmail.
In that example, there were 255 mails in the mailbox: 253 of them have been deleted because they are viruses or spam, and 2 of them have been downloaded as usual. If I had not used filtergetmail, I would have downloaded all 255 mails. This would have taken approx. 40 minutes (for 18.7 MB with ISDN), whereas it took less than 4 minutes using filtergetmail.