XmlDocument Cache / SQL Server nText Speed Enhancement

By Robbe D. Morris

Printer Friendly Version

Robbe Morris
Robbe & Melisa Morris
For starters, it is Independence Day!  Take a few moments to reflect on the sacrifices made by those who came before you.  Without their efforts, you wouldn't be breathing free air right now.  You might be speaking German, Russian, Japanese or perhaps withering away under the heels of communism instead.  With that in mind, the next time you see a veteran, stop and just say thank you.  They've earned your respect and deserve your thanks.  Enjoy the holiday!
 


 
As you may have read in some of my other articles here on NullSkull.com (Web Service vs Web Workspace Part I), I spend a great deal of time writing software and websites that center around research and analysis methodologies for Gartner.  Much of this data is hierarchial in nature and seldom is relational across multiple analysis models.  Thus, one study doesn't really relate to another one from a database perspective.
In previous versions of our software, we often stored the data relationally in Access, SQL Server, or Oracle.  This forced us to have to execute expensive queries across large numbers of records that didn't really relate to one another, load the data in some form of hierarchial structure either with ADO or COM object trees, work with the data, and then run expensive updates across myriads of records with varying data types.  In a desktop environment, you can get away with the extra overhead.  Performing these same complex tasks via a web site is a whole different story.
As we began to move some of our complex desktop apps to the web, I opted to start storing the entire analysis study in a hierarchial XmlDocument to begin with (nText column in SQL Server) instead of incurring all the overhead of taking the relationally stored data and turning it into a hierarchial object.  Now, with every page load, we simply queried for one record to retrieve the XML string, load it into the XmlDocument, manipulate nodes based on a wide variety of business rules that affected all parts of the analysis study, then save off the XmlDocument's string back to the database.  In classic ASP, this provided a substantial boost to our performance and page load times.
Then along came Microsoft's .NET framework...
After reviewing .NET's caching capabilities via a few of the books reviewed here on NullSkull.com as well as discussing the matter with other developers, it was clear that .NET's ability to cache XmlDocuments could enhance our performance even more.  Here's how:
Currently, with each page load we would issue a query for a single row containing nText data (our XML strings), load it into an XmlDocument, manipulate the nodes, save the XmlDocument XML string to the database, then perform the query again for the next data entry page.  As I'm sure you know, nText data types in SQL Server do carry with them a performance penalty because the data is stored differently than other data types such as int or varchar.  This is where caching the XmlDocument comes in handy.
.NET provides you with the ability to instantiate the XmlDocument object and cache it in memory safely.  Its removal from cache automatically can be arranged under a wide variety of conditions such as a file content change, fixed expiration date/time, or time since last accessed (as well as others).  So, you can control just exactly how long the XmlDocument stays in memory as well as automatically react to it's removal and re-cache it.
Now, we can check to see if the XmlDocument has been queried, loaded, and cached from SQL Server once and never have to execute a query again unless the XmlDocument has been removed from cache.  In the scenario above, this removes two costly queries for each new page load and leaves only the single call to the database to save off the current XmlDocument's contents.  This drastically reduces the performance hit from using a SQL Server nText data type.  And, the page load time is greatly reduced giving you a lightning fast website.  Of course, with anything there are a few items you'll need to manage carefully.
 
1.Take great care to remove the XmlDocument from cache on any page that doesn't need it.  Your code will always just execute a query to the database if it trys to access the cached XmlDocument and it isn't there.
2.Set the cache experiation to a long enough timespan to avoid unnecessary queries.  If you know that there is a 95% change that the next page the user will visit requires the cached object, you may want to set the expiration to about 5 or 10 minutes.  You'll want to review this time suggestion based on your site's needs.  It may require a longer or shorter period.
3.You'll get the best bang for your buck if you store your nText data in an entirely separate table from that which describes your study.  Even though you may run queries that don't pull back the nText column, SQL Server will still perform better if the nText column doesn't exist at all.  Just create a separate table that only has the same primary key value and your nText columns.  That way, when your query your regular study table to display available studies (etc...), your performance will improve.
4.The benefits of this methodology are greatly reduced in a web farm environment.  While it will still work,you'll be running more queries to get the XML string because not every server in your cluster will have the object in cache.  In my situation, it made more sense to use sticky sessions to force the visitor to return to the same server during a session because our site is very data entry intensive and the user stays on one page and it's child iframe about 97% of the time during the session.  Depending on your situation, this may not be the best option.
5.If you ever need to perform data mining across multiple studies, you'll need to write a special program to extract the XML values and store it differently to better enable various reporting and data mining tools to work properly.  Typically, this sort of data mining is done quarterly or yearly so real time data access for this purpose is rarely needed or even useful.
 
I've included a little bit of sample code demonstrating on option for caching the XmlDocument.  Jesse Liberty's book Programming ASP.NET has a great section on the in's and out's of caching in .NET.  I'd suggest picking up a copy today if you weren't one of our winners from in the book giveaway in May.
 
Sample Code
 
 
using System;
using System.Xml;
using System.Data.SqlClient;


namespace MyXML
{
	 
   public class XmlTree  
   {
     public XmlDocument XMLDoc = new XmlDocument();

       public bool LoadMyXML()
      {
            string sXML="";
            string sCacheID="";
            bool fRet=false;
	    
            SqlConnection oConn = new SqlConnection();

	   try
	   {

                XmlDocument oXML;
                sCacheID = "XMLDOC"


                // Get an explicit reference to our cached XmlDocument.  In C#,
                //  cached objects must be cast to the proper data type.

                oXML = (XmlDocument)System.Web.HttpContext.Current.Cache[sCacheID];
                
                if (oXML == null)  //Not in cache
                {
                   try
                    {

                       // Run query to get XML string

                       oConn.ConnectionString = "your db connect string";
                       oConn.Open();

                       SqlCommand oCmd = new SqlCommand("EXEC mystoredprocedure ",oConn);
                       SqlDataReader oReader = oCmd.ExecuteReader();

                       while (oReader.Read())
                          {				 
                            sXML = oReader["XMLnTextColumnName"].ToString();
                          }

                       oReader.Close();
                       oConn.Close();

                       //load up an XmlDocument

                       XMLDoc.LoadXml(sXML);

                       // load the XmlDocument into Cache and keep it there until at least 5 
                       // minutes have passed since the last time the object was 
                       // referenced in Cache.  If that timespan exires, we'll remove it from 
                       // cache.  If you need to react when it is actually released from 
                       // Cache, look up the following items in your .NET help:  
                      // CacheItemRemovedReason, RemovedCallback, onRemove 

                       System.Web.HttpContext.Current.Cache.Insert(sCacheID,XMLDoc,null,DateTime.MaxValue,TimeSpan.FromMinutes(5));

                       fRet=true;
                     }
                     catch (Exception e) {}
                     finally { if (oConn.State == ConnectionState.Open) { oConn.Close(); } }
                 }
                 else
                 {

                   // The XmlDocument is alreay in Cache so let's set our public XMLDoc object
                   // to a reference to the cached XmlDocument.

                   try{XMLDoc = oXML;fRet=true;}
                   catch (Exception e){ }
				
                  }


             }
             catch (Exception e) {}
         
           return fRet;
          
         }

  }
}


 


Robbe has been a Microsoft MVP in C# since 2004.  He is also the co-founder of NullSkull.com which provides .NET articles, book reviews, software reviews, and software download and purchase advice.