Real Web 2.0: Battling Web Spam

IBM takes a two part march through the attack vectors of spam on “web 2.0” sites.

‘Real Web 2.0 means harnessing the power of social groups to improve information systems. This will invariably attract nasty people who look for the cracks to take advantage of you and me. Spam on the Web is one of the biggest threats to a modern Web developer.’

Part 1 of this series shows you how to assess visitor behaviour and control work-flow to reduce Web 2.0 spam. Part 2 shows you how to use the power of community against spam.

With thanks to the anonymous contributer who submitted this story.

Spammers do what they do because at the end of the day, it earns money; it works.
The articles seem to gently miss the human element of the spammers themselves.

If you can think like the spammer, then you can better defend against the spammer.
How can a visitor/user on your site use your site to make money? Where can they get their message across? How can they affect other users on the site?

Whilst the articles are designed to define many different anti-spam techniques rather than justify them all, there are a couple of methods mentioned that I agree with and some that I disagree with. I’d like to add the human element to these methods by adding my own UX opinion on top.

Agree

Flood Control

Spammers have to get the message across quickly and widely before being spotted. Very rarely will I see a slow spam crawl aiming to duck under the radar by keeping volume low, as because as soon as one person reports them they are removed before any real number of people have been spammed.

Flood control though, needs good UI to avoid hurting regular users. Don’t tell the user of the flood time limit after they’ve typed their message and pressed “Submit” — you find it incredibly annoying so don’t do it to your users either!

If the user begins writing a message and they are still within the flood time limit, then place a message above the text box to tell them, and provide a countdown using Javascript. The user then knows ahead of time what they are expected to wait, and may invest that time in proof-reading their comment.

Disagree

Detecting JavaScript

The first article does digress on this point to say that it has its flaws. I would add to that by simply saying: don’t ever require Javascript to submit a form. If you’re anything like me you use NoScript and arrive at a site with Javascript disabled by default. I don’t want to have to re-fill in a form after I’ve loaded Javascript. It goes without saying that history is full of examples where public-service websites have only been accessible through one browser on one platform because of this kind of behaviour.

The user platform is more diverse than it was 10 years ago. You can no longer assume that the only browser you need to support is the most popular one on a desktop computer. Mobile browsers are ever more popular and some may not support Javascript at all, or may only support a sub-set.

Blocking Proxies

A spammer using a proxy is only a technical detail of their process. Regular, legitimate users use proxies too, some living in areas that restrict and filter Internet access.

It doesn’t matter if a spammer masks their IP or not; blocking an IP may be useless when the spam could be coming from a bot-net that has a million different IPs available, all from legitimate user’s computers.


In missing the human element of spammers themselves, the article also fails to acknowledge a tactic that is becoming increasingly common: paying humans to break CAPTCHAs and anti-spam measures. A human can sit and sign up to e-mail accounts all day long without any difficulty, and once a spammer has even one e-mail address they can send potentially thousands of e-mails before that particular e-mail address gets blocked.

When you have a human doing the spamming, then all technical details are irrelevant. Indeed, then it becomes a matter of the spammer’s behaviour and the content they create on your site than it is about what method they use to do that. This is where moderation systems need to be in place to allow the other users of the site to report spam.

Another solution is to simply think what your site really needs as far as functionality. Does your website need comments? The answer by default should be “no”, and you should have to justify the reasons to yourself why you need each feature. Making your site simply do less is not going to prevent it from succeeding in its goals if the features you’ve chosen to have meet those goals and you don’t add functionality you can do without just because “everybody else does it”.

Remember, spammers are people. They may use bots, scripts and other technical measures but they still follow the basic human emotion of greed.

Battling spam as we move more into the cloud will be less about the technical details of what spammers do and more about the technical details of determining a user’s intentions.