DerekMartin.ca

I'm a father, manager, programmer, scrum master, geek, & movie lover.

Just My Luck

:P at 2006-01-10 12:51:30

Castaway

Long Time No Update

FYI - It took me 3 hours to delete all the blog spam I’ve received lately, and 1 hour to write the code for this whitelist. During the 1 hour that I was writing the whitelist, 27 new spam popped up in my blog. Yes, it had gotten seriously out of control… BUT I think it’s fixed now :) at 2006-01-28 18:49:59

Vlad said: ..test of the whitelist system :) at 2006-01-29 07:03:47

Vlad said: woohoo! i feel special! thank you, thank you, thank you! at 2006-01-29 07:04:30

(andre)[http://www.andremolnar.com] said: The home system looks nifty cool. That’s some serious dedication to home entertainment.

p.s. I finally met Jeff f2f at thursday’s newpath. I think that at some point I should invite you jeff alex and anyone else you can think of out for a pint. at 2006-01-30 05:18:33

Anti-Spam Comment System

  1. Whenever you post a comment to my site, you are required to provide an email address. I don’t care if it’s a real one, but you should use the same one every time you comment.
  2. NOW when you comment it will be added instantly in an “unapproved” state, meaning that it will NOT be visible on the site (spammers do not get what they’re after).
  3. Then the code checks to see if there are any previous *approved* comments from the same email address.
  4. If there are, then I’ve approved comments by you before, and the system automatically approves your comment and emails me a link by which I may delete it if I find it inappropriate (I’ve never done this, except for spam).
  5. If there aren’t, then I’m sent an email link by which I may approve your comment, if I find it appropriate (i.e. it’s not spam). Subsequent comments from that email address will be auto-approved as above.

That’s it! Do you like it? Anyway, the words I filtered for are: Alik, penis, enlargement, internet, slot, cheap, buy, xenical, levitra, meridia, gambling, xanex, ionamin, phentermine, cialis, poker, casino, blackjack, holdem, fioricet, bingo, bintril, adipex, discount, diet, prescription, degree, debt, credit, loan, duremar, viagra, diazepam, tramadol, generic, migraine, afrikanez, geocities, free, cigarette, roulette, craps, ringtone, tenuate, vaniqa, celebrex, antoxa, ccyclovir, vicodin, pills, hydroc, bonsai, online I would have also filtered for Texas (as in Texas Holdem), but a friend of mine uses the handle “TexasHotplate”. Approximate Spam Stats by Type:Narcotic: 27/53 words = 51% = 1734 messages Gambling: 8/53 words = 15% = 510 messages Financial: 7/53 words = 13% = 442 messages Other: 7/53 words = 13% = 442 messages Self-Image: 4/53 words = 8% = 272 messages NARCOTIC:cigarette, xenical, levitra, meridia, ionamin, phentermine, cialis, xanex, fioricet, bingo, prescription, duremar, viagra, diazepam, tramadol, generic, migraine, bontril, adipex, tenuate, vaniqa, celebrex, antoxa, acyclovir, vicodin, pills, hydroc GAMBLING:gambling, slot, poker, casino, blackjack, holdem, roulette, craps, FINANCIAL:cheap, buy, discount, debt, credit, loan, free Other:Alik, afrikanez, internet, online, geocities, ringtone, bonsai SELF-IMAGE:degree, penis, enlargement, diet THE QUERYBelow is the query I used. Yes, it took a while to write (1/2 hour?), but it was waaaay faster than manually going through all 4600 comments (1200 legit, 3400 spam). SELECT * FROM comments WHERE name = “Alik” OR name LIKE “%penis%” OR url LIKE “%penis%” OR email LIKE “%penis%” OR name LIKE “%enlargement%” OR url LIKE”%enlargement%” OR email LIKE “%enlargement%” OR name LIKE “%internet%” OR url LIKE “%internet%” OR email LIKE “%internet%” OR name LIKE “%slot%” OR url LIKE “%slot%” OR email LIKE “%slot%” OR name LIKE “%cheap%” OR url LIKE “%cheap%” OR email LIKE “%cheap%” OR name LIKE “%buy%” OR url LIKE “%buy%” OR email LIKE “%buy%” OR name LIKE “%xenical%” OR url LIKE “%xenical%” OR email LIKE “%xenical%” OR name LIKE “%levitra%” OR url LIKE “%levitra%” OR email LIKE “%levitra%” OR name LIKE “%meridia%” OR url LIKE “%meridia%” OR email LIKE “%meridia%” OR name LIKE “%gambling%” OR url LIKE “%gambling%” OR email LIKE “%gambling%” OR name LIKE “%ionamin%” OR url LIKE “%ionamin%” OR email LIKE “%ionamin%” OR name LIKE “%xanex%” OR url LIKE “%xanex%” OR email LIKE “%xanex%” OR name LIKE “%hentermine%” OR url LIKE “%hentermine%” OR email LIKE “%hentermine%” OR url LIKE “%cialis%” OR name LIKE “%cialis%” OR email LIKE “%cialis%” OR name LIKE “%poker%” OR url LIKE “%poker%” OR email LIKE “%poker%” OR name LIKE “%casino%” OR url LIKE “%casino%” OR email LIKE “%casino%” OR url LIKE “%blackjack%” OR name LIKE “%blackjack%” OR email LIKE “%blackjack%” OR name LIKE “%holdem%” OR url LIKE “%holdem%” OR email LIKE “%holdem%” OR name LIKE “%fioricet%” OR url LIKE “%fioricet%” OR email LIKE “%fioricet%” OR name LIKE “%bingo%” OR url LIKE “%bingo%” OR email LIKE “%bingo%” OR name LIKE “%bontril%” OR url LIKE “%bontril%” OR email LIKE “%bontril%” OR name LIKE “%adipex%” OR url LIKE “%adipex%” OR email LIKE “%adipex%” OR name LIKE “%discount%” OR url LIKE “%discount%” OR email LIKE “%discount%” OR name LIKE “%diet%” OR url LIKE “%diet%” OR email LIKE “%diet%” OR name LIKE “%prescription%” OR url LIKE “%prescription%” OR email LIKE “%prescription%” OR name LIKE “%degree%” OR url LIKE “%degree%” OR email LIKE “%degree%” OR name LIKE “%debt%” OR url LIKE “%debt%” OR email LIKE “%debt%” OR name LIKE “%credit%” OR url LIKE “%credit%” OR email LIKE “%credit%” OR name LIKE “%loan%” OR url LIKE “%loan%” OR email LIKE “%loan%” OR name LIKE “%duremar%” OR url LIKE “%duremar%” OR email LIKE “%duremar%” OR name LIKE “%iagra%” OR url LIKE “%iagra%” OR email LIKE “%iagra%” OR name LIKE “%iazepam%” OR url LIKE “%iazepam%” OR email LIKE “%iazepam%” OR name LIKE “%ramadol%” OR url LIKE “%ramadol%” OR email LIKE “%ramadol%” OR name LIKE “%eneric%” OR url LIKE “%eneric%” OR email LIKE “%eneric%” OR name LIKE “%afrikanez%” OR url LIKE “%afrikanez%” OR email LIKE “%afrikanez%” OR name LIKE “%migraine%” OR url LIKE “%migraine%” OR email LIKE “%migraine%” OR name LIKE “%geocities%” OR url LIKE “%geocities%” OR email LIKE “%geocities%” OR name LIKE “%free%” OR url LIKE “%free%” OR email LIKE “%free%” OR name LIKE “%cigarette%” OR url LIKE “%cigarette%” OR email LIKE “%cigarette%” OR name LIKE “%ringtone%” OR url LIKE “%ringtone%” OR email LIKE “%ringtone%” OR name LIKE “%craps%” OR url LIKE “%craps%” OR email LIKE “%craps%” OR name LIKE “%roulette%” OR url LIKE “%roulette%” OR email LIKE “%roulette%” OR name LIKE “%tenuate%” OR email LIKE “%tenuate%” OR url LIKE “%tenuate%” OR name LIKE “%vaniqa%” OR email LIKE “%vaniqa%” OR url LIKE “%vaniqa%” OR name LIKE “%celebrex%” OR email LIKE “%celebrex%” OR url LIKE “%celebrex%” OR name LIKE “%antoxa%” OR email LIKE “%antoxa%” OR url LIKE “%antoxa%” OR name LIKE “%acyclovir%” OR email LIKE “%acyclovir%” OR url LIKE “%acyclovir%” OR name LIKE “%vicodin%” OR email LIKE “%vicodin%” OR url LIKE “%vicodin%” OR name LIKE “%pills%” OR email LIKE “%pills%” OR url LIKE “%pills%” OR name LIKE “%hydroc%” OR email LIKE “%hydroc%” OR url LIKE “%hydroc%” OR name LIKE “%bonsai%” OR email LIKE “%bonsai%” OR url LIKE “%bonsai%” OR name LIKE “%online%” OR url LIKE “%online%” OR email LIKE “%online%”

Comments from my old blog:

(Alex)[http://www.newpathnetwork.org] said: Derek,

Looks like you’ve done your homework here… The thing is that your approach is not easily scalable… One thing you could do is prime the database with known good email addresses, but then what happens if a spammer decides to use a known good email address, impresonating as someone else they know works. What is there to prevent them from seeing who is valid and posting as them. Furthermore, won’t this generate an enormous workload as your blog becomes more popular.

It seems like the approach most filters take is the 90/10 rule or better ones go for 95/5, filter on the keywords you’ve identified. Don’t let people post with those words in the comment box, or if they really want to use those words then have them be escaped somehow pills or something like that.

Using a text filter will probably obviate the needs for a serious whitelist… Then you deal with the small amount of spam using your query… Do you really need more work than you already have, is my question… at 2006-01-30 05:07:13

u.c. said: Good heavens,what a lot of work due to xxxxxx spammers. I guess I wont be able to tell you the story about the man who after loosing weight went to Las Vegas for the weekend,gambled it all away,fell in with a hooker but felt inferior ,tried everything but still was so retired to his Japanese garden smoking himself to death whilst on his mobile. shame… at 2006-01-30 08:32:58

u.c. said: comma after but still was at 2006-01-30 08:37:08

dad said: I do love a good euphemism,lol. at 2006-01-30 14:32:36

derek said: Alex - Good questions. The main thing that will prevent them from using “known good email addresses” is that I don’t display the email addresses of people who comment — so they’d be totally guessing, which is hard to do. In the 2.0 version of this system, I plan to reduce my workload even further by copying Mailblocks a bit. When you comment, if you’re not on my whitelist you’ll get an email asking you to “click here to be added to the whitelist”…. I’ll only receive email notifying me of people who click that link, not of every comment that’s posted. Should be much more manageable. As for filtering people’s words — I don’t want to do that for the same reason I got upset with the choir I was in. People speak in distinct ways, and I don’t want to censor those nuances. Allowing people to say what they want will take some work — but not a lot. It’s worth it. at 2006-01-30 14:35:49

(Richard)[http://www.braeken.com] said: Funny that e-mail address caused trouble…. at 2006-01-31 21:50:27

(andre)[http://www.andremolnar.com] said: Time to upgrade to DM2.0 powered by Drupal.

1) Spam bots haven’t quite figured out how to hammer drupal sites. 2) Even if they do - there is a anti-spam module that is a learning besian filtering system. The occasional spam message I get is immediately trapped and dumped into a moderation queue (unpublished) until I can verify that it is or is not spam.

Say it with me… Drupal rules. I’ll even set up your first site ;-)

andre at 2006-02-03 08:42:44

Surprise!