Spam Protection

automatic detection of address-harvesting risks

Mara 2.3-on includes a simple but effective detection system for harvestable email addresses posted into site pages.

The detection mechanism is smart enough to distinguish between email addresses and other similar but unrelated constructs such as Twitter accountnames. It is also able to find email addresses regardless of whether they are enclosed in a 'mailto' construct. It does not perform any checks on the existence or otherwise of the email account or domain in question, but purely detects all textual constructs which have the valid form of an email address.

University studies show that the majority of spam arises from the posting of unprotected email addresses on webpages. Often this is done through the sheer naivety of the user, who does not appreciate that doing so is about as shortsighted as putting your credit card details on a webpage, in the expectation that  that only honest merchants will use those details to debit your account.  A spammer's robot then comes along and scans your entire site for anything looking like email addresses, includes these addresses in an 'Email Marketing' CD, and sells this  to everyone on the planet who wants to annoy you with junkmail. Bingo, every morning you now have a thousand unwanted junk messages in your inbox.

To activate harvesting protection, the sitecfg/siteini.php variable harvesting_protection must be set to 1 or higher.


Currently, two values are recognised:

1 - Carry out a simple 'munging' of any email addresses found. This will defeat most spambots, and will require the human site visitor to remove the (fairly obvious) foreign characters from the address. The downsides of munging protection methods are that they are a small nuisance to the site visitor, and that when automatic they seemingly legitimise the posting of email addresses, and that might lead to users doing so in other circumstances where there is no protection. Your call on that one.

2- Block the publication of any page containing harvestable addresses. Instead, a warning message is displayed stating the reason for the publication ban. The obvious downside here is that the page is taken offline completely until the harvesting problem is fixed. That said, it serves as a more stringent warning to the user, not to do this. 

Additional options could be coded, for example a more sophisticated javascript protection could be automatically added, or all email addresses found could be automatically replaced with a link to the Contact Us page of the site. Some thought on this is worthwhile, and suggestions are welcomed.

As of v2.4, anti-harvesting protection is ON by default, but only operates on pages created or modified with the online CKEditor. Although this means pages uploaded by FTP are not scanned for harvesting risks, it was felt that this was a satisfactory arrangement and one which involves less server overhead than scanning each page delivered by the webserver.  Thus, the  mechanism is now that a page containing unprotected email addresses cannot be saved in the online editor until the problem has been fixed.  

Since harvestable addresses can arrive in a page through FTP or other imports, and which pages might not be edited online, some care should be taken to check all such imported data for insecure mailtos.

This feature is still under development, and more sophisticated protection methods may be deployed in future releases. For the moment, it is effective enough to satisfy an observed need for harvesting protection.


Powered by Mara cms