Quis custodiet ipsos custodes? (Who watches the watchmen?)-- Juvenal
You bet - They do it in comment spam, articles they try to post, and script attacks
that attempt SQL injection, among other shady techniques.
At the ittyurl.net site that I run for "social searchable short urls" I can't make everyone authenticate
in order to just submit a link - So I've developed a set of defenses that go
a long way toward stopping the "baddies" in their tracks. Now the solution
I present here is very "barebones" - it's just a starting point. I've
left out quite a bit of the exception - checking in order to keep the code simpler
and more readable. But if it gives you some ammunition in keeping the quality
of your ASP.NET blog or website up to snuff, I'm glad to have been able to help.
I have, basically, three sets of "traps" that I use to stop spam links
and content - "Bad Words", "Banned Domains" and Banned IPs".
I also check for things like malformed urls or urls with script links, redirects,
or other telltale signs that I'm not dealing with a "normal" link.
Most of this site-specific code is not included in this sample as it only makes
sense for a very specific website. When somebody submits a url to be shortened
and indexed, I spider the target web page to compile a list of "tag words"
- and in the process, the page gets run through my "Bad Words" list.
If there is offensive content, the link submission is automatically rejected.
The "banned domains" check doesn't just check for entire domains -
it can check for url "fragments" too, and reject them. For example,
a lot of spammers join some member site and put up porn advertisements in their
profile page. Then they set out to place as many links as they can at sites that
accept links. Typically these urls will contain the telltale "members.php"
fragment. There's no way that a link with that in it would represent content
that is of general interest to the developer community, so those script-kiddies
never even get to my Page Handler. For example, you could not successfully submit this article page to Ittyurl.net because the page has an offensive word in it. Of course, if there
is a legitimate page that contains an offending word or phrase, I have engineered
a way to override this behavior, but I'm not going to disclose it here. You get
the idea.
Here's a short synopsis of how I do the filtering:
1) In my database, I have three tables - BadWords, BannedIPs, and BannedDomains.
I have an admin - protected database page that lets me quickly enter free-form
sql inserts to add or delete items in any of these tables. I also have an admin
- protected "delete links" page that let's me visit any submitted link,
and by clicking a second link, I can quickly delete offensive content that's
made it past my "filters".
2) In the Application_Start event handler of my sites, I load each of the three database
tables into a Generic List of type string.
3) In my global class, I have three static methods each of which returns boolean
- IsBadWord, IsBannedDomain, and IsBannedIp. These methods can be called from
any page on the site, but the first line of defense is before anybody even gets
to a Page handler - I use the Application_PreRequestHandlerExecute handler to
do the checking. PreRequestHandlerExecute is a good place to do this sort of
filtering, as it is fired before the Request is handed off to a Page, WebService
or other Page-type handler. That saves you time and resources, because if a request
is denied it has no chance to tie up a thread with a page handler. It looks like
this:
protected void Application_PreRequestHandlerExecute(object sender, EventArgs e)
{
if (HttpContext.Current.Request.Url.AbsolutePath.Length > 2000)
{
// I also use Regex here to check for valid urls.
Exception
ex = new Exception("Url over 2000 char");
ex.Data.Add("Host", HttpContext.Current.Request.UserHostAddress.ToString());
// log the data using your preferred mechanism--
// PAB.ExceptionHandler.ExceptionLogger.HandleException(ex);
HttpContext.Current.Response.StatusCode
= 404;
HttpContext.Current.Response.SuppressContent
= true;
HttpContext.Current.Response.End();
return;
}
if (IsBannedIP(Request.UserHostAddress))
{
// PAB.ExceptionHandler.ExceptionLogger.HandleException(
// new Exception("Banned IP:" + Request.UserHostAddress.ToString() + ":
" + Request.RawUrl +":UA=" +Request.UserAgent) );
HttpContext.Current.Response.StatusCode = 404;
HttpContext.Current.Response.SuppressContent
= true;
HttpContext.Current.Response.End();
return;
}
string userHostName= System.Net.Dns.GetHostEntry(Request.UserHostAddress).HostName;
if(IsBannedDomain( userHostName))
{
HttpContext.Current.Response.StatusCode
= 404;
HttpContext.Current.Response.SuppressContent
= true;
HttpContext.Current.Response.End();
return;
}
}
In some cases, you may also want to deny requests that show a particular referer
that is on your "baddies" list. In this case, you would use: IsBannedDomain(
Request.UrlReferer.ToString() ).
You can download the barebones Visual Studio 2008 Solution and experiment with it. Just create a database "BadGuys" and run the enclosed
SQL script to create and populate your tables and add some sample data to each.
Make sure the "badGuys" connection string in the web.config matches
your environment.
Since adding and "tuning" the above filters and some more customized filters
to my site, I've been able to eliminate 99% of spam link submissions in their
tracks - before they even get to a Page handler. Before doing this, I was having
to delete 10 to 15 spam links a day. Now, often a whole week goes by before I
have to take manual action to keep the site clean. Have fun, and practice "safe
computer". Remember, its "us against them"! Die, spammers!