ASP.NET Search Engine Keyword Logging with an HttpModule

Logging the search terms from search engines like Google, Bing and Yahoo can be very useful in determining how your site content is found and how you can improve your search engine ranking. You can get this information from your Google Analytics or similar account, but often getting it in real time can be valuable. Knowing which search terms produce the most page views can be invaluable in "tuning" your site for maximum performance.

Here I present a simple search engine keyword logging facility built into a .NET HttpModule. This hooks the PreRequestHandlerExecute event, which occurs just before ASP.NET starts executing a request handler (for example, a page or an XML Web service). This is an ideal place to capture all kinds of information about the request, including the Session (if available), Http Referer and it's query string, browser information, etc. Here I am only interested in logging search terms if the request came from somebody clicking a link in a page of search results.

The first thing needed to "hook up" an HttpModule is to specify it in the web.config:

<?xml version="1.0"?>
<configuration>
  <connectionStrings>
    <add name="connectionString" connectionString ="server=(local);database=keywordlog;Integrated Security=SSPI" providerName="SqlClient"/>
  </connectionStrings>
    <system.web>
         <compilation debug="true" targetFramework="4.0" />  
    </system.web>
  <!-- Note: below is for Integrated mode, provided by IISExpress. Otherwise use HttpModules section in system.web-->
  <system.webServer>
    <modules>
      <add type="PAB.RequestLogger.KeywordLogger,PAB.RequestLogger"  name="PAB" />
      </modules>    
  </system.webServer>
</configuration>

The entry must specify the fully qualified class name, then the assembly name, and it gives the entry a "name" attribute.

Let's switch over to the actual code for the Logger class:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Web;

namespace PAB.RequestLogger
{
     public class KeywordLogger : IHttpModule
    {
         public void Dispose()
        {
         }

         public void Init(HttpApplication context)
        {
            context.PreRequestHandlerExecute += new EventHandler(context_PreRequestHandlerExecute);
         }

        void context_PreRequestHandlerExecute(object sender, EventArgs e)
         {
             if (sender is HttpApplication)
            {
                HttpApplication context = sender as HttpApplication;
                 StatisticsManager.LogRequest(context);
            }
        }
    }

    public class StatisticsManager
    {
         public static void LogRequest(HttpApplication app)
        {
            string sessionId = String.Empty;
            string referer = String.Empty;
            if (app.Context.Session != null)
            {
                sessionId = app.Context.Session.SessionID;
             }

             if (app.Context.Request.UrlReferrer != null )
            {
                 if (IsSearchEngine(app.Context.Request.UrlReferrer.ToString()))
                {
                    string engine = app.Context.Request.UrlReferrer.Host;
                    referer = app.Request.UrlReferrer.ToString();
                    string keyword = GetKeywords(referer);
                     // async fire & forget database call on background thread
                    PAB.RequestLogger.ThreadUtil.KeywordUpdate(engine, keyword, sessionId);
                 }
             }
         }


         private static bool IsSearchEngine(string url)
        {
            bool isSearch = false;
            // add any additional search engines here. localhost is for testing only
            Regex regEx = new Regex("google|daum|msn|bing|search|yahoo|ask|altavista|alltheweb|live|aol|search|netscape|mamma|yandex|about|baidu|localhost");
            isSearch= regEx.Match(url.ToLower()).Success;
             return isSearch;
        }

        private static string GetKeywords(string urlReferrer)
        {
            var searchQuery = string.Empty;
            var url = new Uri(urlReferrer);
            string q = url.Query;
            var query = HttpUtility.ParseQueryString(q);
            switch (url.Host)
            {
                case "google":
                case "daum":
                case "msn":
                case "bing":
                case "ask":
                case "altavista":
                case "alltheweb":
                case "live":
                case "aol":
                case "search":
                    searchQuery = query["q"];
                     break;
              
                case "netscape":
                case "mama":
                case "mamma":
                case "terra":
                case "cnn":
                    searchQuery = query["query"];
                     break;
                case "virgilio":
                case "alice":
                    searchQuery = query["qs"];
                     break;
                case "yahoo":
                    searchQuery = query["p"];
                     break;
                case "onet":
                    searchQuery = query["qt"];
                     break;
                case "eniro":
                    searchQuery = query["search_word"];
                     break;
                case "about":
                    searchQuery = query["terms"];
                     break;
                case "voila":
                    searchQuery = query["rdata"];
                     break;
                case "baidu":
                    searchQuery = query["wd"];
                     break;
                case "yandex":
                    searchQuery = query["text"];
                     break;
                case "yam":
                    searchQuery = query["k"];
                     break;
                case "rambler":
                    searchQuery = query["words"];
                     break;
                default:
                    searchQuery = query["q"];
                     break;
            }
            return searchQuery;
        }
      }
    }

    In the Init method, we subscribe the eventhandler to hook PreRequestHandlerExecute from the HttpApplication instance of the request.
    
    In the specified event handler, we check if the sender is of type HttpApplication, and if so, we call the LogRequest method of the StatisticsManager class, passing in the context.

    The LogRequest method checks if there is a Session and if so it grabs the SessionId. Then it checks to see if there is a UrlReferer (the user got here by clicking a link).  

    Next we want to check if the request came from a search engine. The IsSearchEngine method provides this check. Then, we want to parse the querystring and get the keywords.

    Since each search engine has a different querystring, we have the GetKeywords method which uses a switch statement to obtain the correct querystring name and return the contents.

    Finally, I call my  PAB.RequestLogger.ThreadUtil.KeywordUpdate(engine, keyword, sessionId); which does a "Fire and Forget" async call to log the info into the database. I use the Fire and Forget technique for speed, because everything happens on a background thread. Since the Fire and Forget technique calls EndInvoke on the AsyncResult and closes the WaitHandle, there is no need for a callback method. You just "drop it in" and your code keeps going. Here's the class:

using System;
using System.Collections.Specialized;
using System.Configuration;
using System.Data;
using System.Data.SqlClient;
using System.Diagnostics;
using System.IO;
using System.Net;
using System.Text;
using System.Threading;



namespace PAB.RequestLogger
{
    /// <summary>
    /// Provides threadsafe, non-blocking methods using the Fire and Forget pattern
    /// </summary>
    public static class ThreadUtil
    {
         /// <summary>
        /// Callback used to call <code>EndInvoke</code> on the asynchronously
        /// invoked DelegateWrapper.
        /// </summary>
        private static AsyncCallback callback = EndWrapperInvoke;

         public delegate void KeywordUpdateDelegate(string engine, string keyword,  string sessionId);
      
         /// <summary>
        /// An instance of DelegateWrapper which calls InvokeWrappedDelegate,
        /// which in turn calls the DynamicInvoke method of the wrapped
        /// delegate.
        /// </summary>
        private static DelegateWrapper wrapperInstance = new DelegateWrapper(InvokeWrappedDelegate);

       
        public static void KeywordUpdate(string engine, string keyword,  string sessionId)
        {
                 FireAndForget(new KeywordUpdateDelegate(AKeywordUpdate), new object[] {engine, keyword,sessionId });
         }


        private static void AKeywordUpdate( string engine, string keyword, string sessionId)
        {
            SqlConnection conn =
                 new SqlConnection(ConfigurationManager.ConnectionStrings["connectionString"].ConnectionString);
            SqlCommand cmd = new SqlCommand("dbo.UpdateKeywords", conn);
            cmd.CommandType = CommandType.StoredProcedure;
             cmd.Parameters.AddWithValue("@engine", engine);
             cmd.Parameters.AddWithValue("@keyword", keyword);
             cmd.Parameters.AddWithValue("@sessionId", sessionId);
             try
            {
                 conn.Open();
                 cmd.ExecuteNonQuery();
                 conn.Close();
                cmd.Dispose();
            }

            catch (Exception ex)
             {
                 System.Diagnostics.Debug.WriteLine(ex.Message );
             }
        }

        

         /// <summary>
        /// Executes the specified delegate with the specified arguments
        /// asynchronously on a thread pool thread.
        /// </summary>
        public static void FireAndForget(Delegate d, params object[] args)
         {
             // Invoke the wrapper asynchronously, which will then
            // execute the wrapped delegate synchronously (in the
            // thread pool thread)
            wrapperInstance.BeginInvoke(d, args, callback, null);
        }

         /// <summary>
        /// Invokes the wrapped delegate synchronously
        /// </summary>
        private static void InvokeWrappedDelegate(Delegate d, object[] args)
         {
             d.DynamicInvoke(args);
         }

         /// <summary>
        /// Calls EndInvoke on the wrapper and Close on the resulting WaitHandle
        /// to prevent resource leaks.
        /// </summary>
        private static void EndWrapperInvoke(IAsyncResult ar)
         {
             wrapperInstance.EndInvoke(ar);
             ar.AsyncWaitHandle.Close();
        }

         #region Nested type: DelegateWrapper

         /// <summary>
        /// Delegate to wrap another delegate and its arguments
        /// </summary>
        private delegate void DelegateWrapper(Delegate d, object[] args);

         #endregion
    }
}


The last part of the code is the SQL Server table and sproc to handle the logging:

USE [KEYWORDLOG]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[Keywords](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Engine] [varchar](50) NULL,
[Keyword] [varchar](250) NULL,
[SessionId] [varchar](50) NULL,
[Count] [int] NULL,
CONSTRAINT [PK_Keywords] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE PROC [dbo].[UpdateKeywords]
@engine varchar(50),
@keyword varchar(250),
@sessionId varchar(50)

AS

IF EXISTS (SELECT ID FROM KEYWORDS WHERE KEYWORD = @KEYWORD)
BEGIN
UPDATE KEYWORDS SET COUNT = COUNT+1 WHERE KEYWORD = @KEYWORD
END
ELSE
BEGIN
INSERT INTO KEYWORDS (ENGINE, KEYWORD, SESSIONID, COUNT)
VALUES ( @engine,@keyword, @sessionId, 1)

END
GO
SET ANSI_PADDING OFF
GO

The stored proc will update the count on a keyword if it has already been logged, or perform an insert if it is a new search phrase. The SessionId is used because it can be useful in tracking the activities of a user once they have landed on your site. In this simple implementation, I am not yet using this. There are many other items of data you can capture here - the last DateTime of the visit from the keyword, browser info, form fields, etc.

I've also included a small "Report.aspx" page that simply displays the contents of the Keywords table for convenience.

The test web application starts a page like this:

Referrer.aspx?q=engine query&name=joe blow

This is similar to what the HttpReferer would look like if the request came from a search results page on a search engine. This page has a link that takes you to the Default.aspx page which is where the logging occurs. Since you arrive at Default.aspx by clicking a link, the UrlReferer will be populated and it gets logged.

You can do a lot with this concept; it provides the capability of real time logging of a potentially large number of items, and can be the basis of a much more sophisticated request logging framework.

You can download the complete test Visual Studio 2010 solution here. It includes a SQL Script to create the table and the sproc for logging.

By Peter Bromberg   Popularity  (5536 Views)