A Sql Bulk Insert App Data Cache Class
by Peter A. Bromberg, Ph.D.

Peter Bromberg

"There's never the intent to break the law when the guy is in the garage inventing the iPod"
--Supreme Court Justice David Souter.

Steve Smith of AspAlliance.com has put out some really quality stuff over the last few years. One of his recent articles about "Write Caching" sparked my interest, and made me think. And after all that's what we as professional software developers are really paid to do, isn't it? We're not paid to write code - we're PAID TO THINK! Writing quality code should be the byproduct of this thinking.

The essence of Steve's article revolves around having any kind of high - volume application that generates data which needs to be inserted into a database. Repetitive database inserts are time-consuming; they slow down your application, and if it is a web-based app, they restrict your ability to handle larger amounts of traffic.

Steve mentions several options including using MSMQ (which I"ve already addressed here in several articles) and comes up with the idea of caching the data in a HashTable, periodically serializing it into XML, then using OpenXml to handle the inserts.

While this is an excellent solution, I wonder if it isn't overkill. Here's my thinking:

  • OpenXml requires quite a bit of additional study for the majority of developers.
  • It's difficult to use because you have to feed it both XML and Schema.
  • It's only efficient with a flattened - out Xml table schema.
  • If you read the documentation carefully, the MS Gurus say that it is not reliable with very large sets of data.
  • Finally, what if you need to do deletes and updates too ? You can do it, but again, some extra coding.

But wait a minute! Doesn't a DataAdapter have an update method that does exactly the same thing as SQLXML? Sure it does. Under the hood, the ADO.NET DataAdapters simply take the DataSet to be updated and create diffgrams that are efficiently passed to the SQL Server engine via 100% native .NET code, and on a single connection / command, all the inserts, updates and deletes are handled in one fell swoop. Many developers don't use this simply because they'd prefer not to go through the pain of "RTFM" -- and that's too bad, because the mechanism is extremely efficient in this context.

So this led to a new idea: How about if we create an App Data Cache class that holds -- instead of a HashTable, a DataTable! We can have an AddItem method where we pass in either an object array of our data row, or even an actual DataRow - you could come up with four or five different overloads if you wanted.

Then, what we do is take advantage of the built in RowUpdated event of the DataTable. When our delegate target is invoked, we would simply check to see if the number of rows accumulated has reached our target level for a "write" - and invoke the infrastructure to pass our DataTable into a DataAdapter and call its Update method. Then, we just clear our DataTable and allow it to continue receiving new rows of data.

Since we are already using a DataTable as the temporary "cache" repository, we don't need to do any manipulation of the data, nor do we need to create any Xml - we just send the whole thing right into the DataAdapter!

What I've done here is create, essentially, two classes - the main CacheEngine class, and a helper class I call "DaHandler" that provides a SubmitChanges method. SubmitChanges creates and populates the Command objects needed by the DataAdapter for its updates, inserts and deletes dynamically from the names of your stored procedures. An additional benefit of this is that it doesn't matter what your DataTable needs to look like for your particular business logic. As long as you have created Stored procedures, the SubmitChanges method in my DaHandler class will take care of the rest. You simply create your own DataTable exactly to your liking and pass it in as a ctor parameter to the Engine class, and DaHandler will be happy.

So now let's have a looky at the two classes. Surprisingly, it's not a lot of code, considering all that it does. First, the Engine class:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.Common;
namespace AppDataUpdateCache
 /// <summary>
 /// AppDataUpdateCache holds generic DataTable and performs SQL Bulk insert 
 /// of same at the specified number of rows interval.
 /// </summary>
 public sealed class CacheEngine
  #region ctor 
  public CacheEngine(DataTable dt, string selectStoredProc,string insertStoredProc,
string deleteStoredProc, string updateStoredProc,
connectionString, int numRowsPerUpdate) { this._cacheTable =dt; this._connectionString=connectionString; this._selectStoredProc =selectStoredProc; this._insertStoredProc =insertStoredProc; this._deleteStoredProc=deleteStoredProc; this._updateStoredProc =updateStoredProc; this._numRowsPerUpdate =numRowsPerUpdate; Init(); } #endregion #region public fields / properties private DataTable _cacheTable; private DataSet ds= new DataSet(); public DataTable CacheTable { get { return _cacheTable; } set { _cacheTable=value; } } private string _connectionString; public string ConnectionString { get { return _connectionString; } set { _connectionString=value; } } private string _selectStoredProc; public string SelectStoredProc { get { return _selectStoredProc; } set { _selectStoredProc=value; } } private string _deleteStoredProc; public string DeleteStoredProc { get { return _deleteStoredProc; } set { _deleteStoredProc=value; } } private string _updateStoredProc; public string UpdateStoredProc { get { return _updateStoredProc; } set { _updateStoredProc=value; } } private string _insertStoredProc; public string InsertStoredProc { get { return _insertStoredProc; } set { _insertStoredProc=value; } } private int _numRowsPerUpdate=250; public int NumRowsPerUpdate { get { return _numRowsPerUpdate; } set { _numRowsPerUpdate=value; } } public void AddItem(object[] objItemArray) { if(this._connectionString==null || this._connectionString =="") throw new InvalidOperationException("All properties must be set to add an item."); DataRow row=this._cacheTable.NewRow(); row.ItemArray=objItemArray; this._cacheTable.Rows.Add(row); } #endregion #region private members private void Init() { DataSet newDs = new DataSet(); this.ds=newDs; this._cacheTable.TableName ="AppData"; this.ds.Tables.Add(_cacheTable); this._cacheTable.RowChanged += new DataRowChangeEventHandler( Row_Changed ); } private void Row_Changed( object sender, DataRowChangeEventArgs e ) { if( ((DataTable)sender).Rows.Count >this._numRowsPerUpdate ) { BulkInsertData(); } } private void BulkInsertData() { this._cacheTable.RowChanged -= new DataRowChangeEventHandler( Row_Changed ); try { DataSet ds2 = new DataSet(); DataTable tbl; lock(_cacheTable) { tbl = this._cacheTable.Copy(); } tbl.TableName ="AppData"; ds2.Tables.Add(tbl); DaHandler.SubmitChanges(ref ds2,this._connectionString, this._selectStoredProc ,this._updateStoredProc, this._insertStoredProc,this._deleteStoredProc); } catch(Exception Ex) { System.Diagnostics.Debug.Write(Ex.Message+Ex.StackTrace ) ; } finally { _cacheTable.Clear() ; this._cacheTable.RowChanged += new DataRowChangeEventHandler( Row_Changed ); } } #endregion } }

And now, the DaHandler class:

namespace AppDataUpdateCache
    using System;
    using System.Collections;
    using System.Data;
    using System.Data.SqlClient;

    public class DaHandler
        // Methods
        public DaHandler()
  private static SqlCommand CreateSelectViaSPCommand(string spName, string strConn)
   IEnumerator enumerator1=null;
   SqlConnection connection1 = new SqlConnection(strConn);
   SqlCommand command1 = new SqlCommand(spName, connection1);
   command1.CommandType = CommandType.StoredProcedure;
   SqlCommandBuilder builder1 = new SqlCommandBuilder();
    enumerator1 = command1.Parameters.GetEnumerator();
    while (enumerator1.MoveNext())
     SqlParameter parameter1 = (SqlParameter) enumerator1.Current;
     parameter1.SourceVersion = DataRowVersion.Current;
     parameter1.SourceColumn = parameter1.ParameterName.Remove(0, 1);
    if (enumerator1 is IDisposable)
     ((IDisposable) enumerator1).Dispose();
   return command1;

        public static bool SubmitChanges(ref DataSet DS, string ConnectionString, 
spSelectName,string spUpDateName, string spInsertName, string spDeleteName) { DataTable table1 = DS.Tables[0]; DaHandler.SubmitChanges(ref table1, ConnectionString, spSelectName,
spUpDateName, spInsertName, spDeleteName); return true; } public static bool SubmitChanges(ref DataTable tbl, string ConnectionString,
spSelectName,string spUpDateName, string spInsertName, string spDeleteName) { SqlTransaction trans =null; try { SqlDataAdapter adapter1 = new SqlDataAdapter(); if (spSelectName!="") { adapter1.SelectCommand = DaHandler.CreateSelectViaSPCommand(spSelectName, ConnectionString); } SqlCommandBuilder bldr1 = new SqlCommandBuilder(adapter1); adapter1.InsertCommand =bldr1.GetInsertCommand(); adapter1.UpdateCommand=bldr1.GetUpdateCommand(); adapter1.DeleteCommand =bldr1.GetDeleteCommand(); adapter1.ContinueUpdateOnError = true; adapter1.InsertCommand.Connection.Open(); trans = adapter1.InsertCommand.Connection.BeginTransaction(); adapter1.InsertCommand.Transaction =trans; adapter1.UpdateCommand.Transaction =trans; adapter1.DeleteCommand.Transaction =trans; adapter1.Update(tbl); trans.Commit(); if(adapter1.InsertCommand.Connection.State ==ConnectionState.Open) adapter1.InsertCommand.Connection.Close(); } catch (Exception ex) { trans.Rollback(); System.Diagnostics.Debug.WriteLine (ex.Message +ex.StackTrace); bool flag1 = false; return flag1; } return true; } } }

Note that in the above class, provided we have the Select Command for the DataAdapter and it returns Key column information (a Primary Key), we can get the Insert, Update and Delete commands we need automatically via the SqlCommandBuilder. Note also that if you wrap this update into a SqlTransaction, you will gain a significant order of magnitude of speed. The reason for this (thanks to Gert Drapers at Microsoft, SQL Server guru) is that the writes to the trans log are delayed until the transaction is committed.

Here is what the Winforms tester app looks like after a test run:

If I have sparked your interest by now, you may be wondering what kind of efficiencies this arrangement can generate. I did some informal tests with Homer (Web Application Stress tool) to see if there was a statistically significant improvement in throughput with a web page that did it's own single logging SQL insert on each page load, and another identical web page as described above which instead wrote its single log record into the App Data Cache, allowing the cache class to handle the decision about when to do the multiple writes. Here are the initial results:

NO CACHING 9 0 1 MINUTE 16,866 281.12 18863
WRITE CACHING 9 0 1 MINUTE 24,878 409.59 26013

The write caching infrastructure was the clear winner, with nearly a 40 percent improvement in traffic. I hasten to add that I didn't spend much time tuning this. However, I did take care to ensure that everything was "apples to apples" by restarting both IIS and SQL Server after each test to "level the field".

I"ve included both the Windows Forms Tester app as well as a Web Application that you can use to experiment with this creation. Take a look at Global.asax.cs for the web app which is where I create a static member that holds the engine class, making it easily available from any page. In early testing, I've been getting up to 2,000 inserts per second with this. I bet readers who are exercising their noodles will come up with some really cool enhancement ideas. There is complete SQL to create the test database and sprocs in the download. And Steve, thanks for making me think!

Download the Visual Studio.NET solution that accompanies this article

P.S-- I've started a Google Group for Open Source .NET discussion. You're invited to join!

Google Groups Subscribe to Open Source .NET forum
Browse Archives at groups-beta.google.com

Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform.
Article Discussion: