Removing Duplicate E-mail Messages From A Mailbox

Occasionally your mail delivery scheme might hiccup, leaving you with duplicate copies of email messages sitting in your mailboxes. I find this happens occasionally if something goes wrong with fetchmail Рyou kill the fetchmail process before it has expunged the deleted email from the remote POP3 server, so the next time you run fetchmail it downloads a second copy of each email. This is a simple process that I came up with to remove duplicate email messages from a maildir format mailbox.

As a bit of background, a maildir mailbox is a small directory tree:

$ du .boxes.xml-dev
4       .boxes.xml-dev/tmp
124     .boxes.xml-dev/new
52340   .boxes.xml-dev/cur
52920   .boxes.xml-dev
$

Hierarchy is represented by components of the mailbox name separated by dots, so the mailbox above is called¬†xml-dev¬†and it is in the¬†boxes¬†mailbox. Messages are files in either the¬†new¬†or¬†cur¬†directories. Transport agents place messages into the¬†new¬†directory. When a user agent opens a mailbox it moves all the messages from¬†new¬†to¬†cur. If you’re accessing your mail through an IMAP server like¬†Courier-IMAP¬†the IMAP server will deal with this for you.

  1. Make sure there’s nothing sitting in the¬†new¬†subdirectory.
    $ ls new
    $


    If there are messages in the new subdirectory, open the mailbox in a user agent to get it to move them into cur.
  2. See how many messages you have:
    $ ls cur | wc -l 842
    $
  3. Check they all have Message-IDs:
    $ for i in cur/*; do reformail -x Message-ID: <$i; done | wc -l
    842
    $
  4. See how many you have if you filter out duplicate Message-IDs:
    $ for i in cur/*; do reformail -x Message-ID: <$i; done | sort -u | wc -l 698
    $
  5. See how many we’re going to delete:
    $ rm /tmp/dups $ for i in cur/*; do reformail -D 20000 /tmp/dups <$i && echo $i; done | wc -l
    144
    $
    expr 698 + 144
    842
    $

    If this total doesn’t match you should increase the 20000 – reformail isn’t remembering enough Message-IDs to spot all the duplicates.
  6. Delete the messages and check things look right afterwards:
    $ rm /tmp/dups $ for i in cur/*; do reformail -D 20000 /tmp/dups <$i && rm $i; done
    $ ls cur | wc -l
    698
    $

Leave a Reply

Your email address will not be published. Required fields are marked *