|
"There's never the intent to break
the law when the guy is in the garage inventing the iPod"
--Supreme Court Justice David Souter.
Steve Smith of AspAlliance.com has put out some really
quality stuff over the last few years. One of his
recent articles about "Write Caching" sparked my interest,
and made me think. And after all that's what we as professional software
developers are really paid to do, isn't it? We're not paid to write code
- we're PAID TO THINK! Writing quality code should be the byproduct of
this thinking.
The essence of Steve's article revolves around having any
kind of high - volume application that generates data which needs to
be inserted into a database. Repetitive database inserts are time-consuming;
they slow down your application, and if it is a web-based app, they restrict
your ability to handle larger amounts of traffic.
Steve mentions several options including using MSMQ (which
I"ve already addressed here in several articles) and comes up with
the idea of caching the data in a HashTable, periodically serializing
it into XML, then using OpenXml to handle the inserts.
While this is an excellent solution, I wonder if it isn't
overkill. Here's my thinking:
- OpenXml requires quite a bit of additional study for the majority
of developers.
- It's difficult to use because you have to feed it both
XML and Schema.
- It's only efficient with a flattened - out Xml table schema.
- If you read the documentation carefully, the MS Gurus say
that it is not reliable with very large sets of data.
- Finally, what if you need to do deletes and updates too
? You can do it, but again, some extra coding.
But wait a minute! Doesn't a DataAdapter have an update method that
does exactly the same thing as SQLXML? Sure it does. Under the hood,
the ADO.NET DataAdapters simply take the DataSet to be updated and create
diffgrams that are efficiently passed to the SQL Server engine via 100%
native .NET code, and on a single connection / command, all the inserts,
updates and deletes are handled in one fell swoop. Many developers don't
use this simply because they'd prefer not to go through the pain of "RTFM"
-- and that's too bad, because the mechanism is extremely
efficient in
this context.
So this led to a new idea: How about if we create an App
Data Cache class that holds -- instead of a HashTable, a DataTable! We can have
an AddItem method where we pass in either an object array of our data
row, or even an actual DataRow - you could come up with four or five
different overloads if you wanted.
Then, what we do is take advantage of the built in RowUpdated
event of the DataTable. When our delegate target is invoked,
we would simply check to see if the number of rows accumulated has
reached our target level for a "write" - and invoke the infrastructure
to pass our DataTable into a DataAdapter and call its Update method.
Then, we just clear our DataTable and allow it to continue receiving
new rows of data.
Since we are already using a DataTable as the temporary "cache" repository,
we don't need to do any manipulation of the data, nor do we need to create
any Xml - we just send the whole thing right into the DataAdapter!
What I've done here is create, essentially, two classes - the main CacheEngine class,
and a helper class I call "DaHandler" that
provides a SubmitChanges method. SubmitChanges
creates and populates the Command objects needed by the DataAdapter for
its updates, inserts and deletes dynamically from the names of your stored
procedures. An additional benefit of this is that it doesn't
matter what your DataTable needs to look like for your
particular business logic. As long as you have created Stored procedures,
the SubmitChanges method in my DaHandler class will take care of the
rest. You simply create your own DataTable exactly to your liking and
pass it in as a ctor parameter to the Engine class, and DaHandler will
be happy.
So now let's have a looky at the two classes. Surprisingly, it's not
a lot of code, considering all that it does. First, the Engine class:
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.Common;
namespace AppDataUpdateCache
{
/// <summary>
/// AppDataUpdateCache holds generic DataTable and performs SQL Bulk insert
/// of same at the specified number of rows interval.
/// </summary>
public sealed class CacheEngine
{
#region ctor
public CacheEngine(DataTable dt, string selectStoredProc,string insertStoredProc, string deleteStoredProc, string updateStoredProc, string connectionString, int numRowsPerUpdate)
{
this._cacheTable =dt;
this._connectionString=connectionString;
this._selectStoredProc =selectStoredProc;
this._insertStoredProc =insertStoredProc;
this._deleteStoredProc=deleteStoredProc;
this._updateStoredProc =updateStoredProc;
this._numRowsPerUpdate =numRowsPerUpdate;
Init();
}
#endregion
#region public fields / properties
private DataTable _cacheTable;
private DataSet ds= new DataSet();
public DataTable CacheTable
{
get
{
return _cacheTable;
}
set
{
_cacheTable=value;
}
}
private string _connectionString;
public string ConnectionString
{
get
{
return _connectionString;
}
set
{
_connectionString=value;
}
}
private string _selectStoredProc;
public string SelectStoredProc
{
get
{
return _selectStoredProc;
}
set
{
_selectStoredProc=value;
}
}
private string _deleteStoredProc;
public string DeleteStoredProc
{
get
{
return _deleteStoredProc;
}
set
{
_deleteStoredProc=value;
}
}
private string _updateStoredProc;
public string UpdateStoredProc
{
get
{
return _updateStoredProc;
}
set
{
_updateStoredProc=value;
}
}
private string _insertStoredProc;
public string InsertStoredProc
{
get
{
return _insertStoredProc;
}
set
{
_insertStoredProc=value;
}
}
private int _numRowsPerUpdate=250;
public int NumRowsPerUpdate
{
get
{
return _numRowsPerUpdate;
}
set
{
_numRowsPerUpdate=value;
}
}
public void AddItem(object[] objItemArray)
{
if(this._connectionString==null || this._connectionString =="")
throw new InvalidOperationException("All properties must be set to add an item.");
DataRow row=this._cacheTable.NewRow();
row.ItemArray=objItemArray;
this._cacheTable.Rows.Add(row);
}
#endregion
#region private members
private void Init()
{
DataSet newDs = new DataSet();
this.ds=newDs;
this._cacheTable.TableName ="AppData";
this.ds.Tables.Add(_cacheTable);
this._cacheTable.RowChanged += new DataRowChangeEventHandler( Row_Changed );
}
private void Row_Changed( object sender, DataRowChangeEventArgs e )
{
if( ((DataTable)sender).Rows.Count >this._numRowsPerUpdate )
{
BulkInsertData();
}
}
private void BulkInsertData()
{
this._cacheTable.RowChanged -= new DataRowChangeEventHandler( Row_Changed );
try
{
DataSet ds2 = new DataSet();
DataTable tbl;
lock(_cacheTable)
{
tbl = this._cacheTable.Copy();
}
tbl.TableName ="AppData";
ds2.Tables.Add(tbl);
DaHandler.SubmitChanges(ref ds2,this._connectionString,
this._selectStoredProc ,this._updateStoredProc,
this._insertStoredProc,this._deleteStoredProc);
}
catch(Exception Ex)
{
System.Diagnostics.Debug.Write(Ex.Message+Ex.StackTrace ) ;
}
finally
{
_cacheTable.Clear() ;
this._cacheTable.RowChanged += new DataRowChangeEventHandler( Row_Changed );
}
}
#endregion
}
} |
And now, the DaHandler class:
namespace AppDataUpdateCache
{
using System;
using System.Collections;
using System.Data;
using System.Data.SqlClient;
public class DaHandler
{
// Methods
public DaHandler()
{
}
private static SqlCommand CreateSelectViaSPCommand(string spName, string strConn)
{
IEnumerator enumerator1=null;
SqlConnection connection1 = new SqlConnection(strConn);
SqlCommand command1 = new SqlCommand(spName, connection1);
command1.CommandType = CommandType.StoredProcedure;
SqlCommandBuilder builder1 = new SqlCommandBuilder();
connection1.Open();
SqlCommandBuilder.DeriveParameters(command1);
connection1.Close();
try
{
enumerator1 = command1.Parameters.GetEnumerator();
while (enumerator1.MoveNext())
{
SqlParameter parameter1 = (SqlParameter) enumerator1.Current;
parameter1.SourceVersion = DataRowVersion.Current;
parameter1.SourceColumn = parameter1.ParameterName.Remove(0, 1);
}
}
finally
{
if (enumerator1 is IDisposable)
{
((IDisposable) enumerator1).Dispose();
}
}
return command1;
}
public static bool SubmitChanges(ref DataSet DS, string ConnectionString, string spSelectName,string spUpDateName, string spInsertName, string spDeleteName)
{
DataTable table1 = DS.Tables[0];
DaHandler.SubmitChanges(ref table1, ConnectionString, spSelectName, spUpDateName, spInsertName, spDeleteName);
return true;
}
public static bool SubmitChanges(ref DataTable tbl, string ConnectionString, string spSelectName,string spUpDateName, string spInsertName, string spDeleteName)
{
SqlTransaction trans =null;
try
{
SqlDataAdapter adapter1 = new SqlDataAdapter();
if (spSelectName!="")
{
adapter1.SelectCommand = DaHandler.CreateSelectViaSPCommand(spSelectName, ConnectionString);
}
SqlCommandBuilder bldr1 = new SqlCommandBuilder(adapter1);
adapter1.InsertCommand =bldr1.GetInsertCommand();
adapter1.UpdateCommand=bldr1.GetUpdateCommand();
adapter1.DeleteCommand =bldr1.GetDeleteCommand();
adapter1.ContinueUpdateOnError = true;
adapter1.InsertCommand.Connection.Open();
trans = adapter1.InsertCommand.Connection.BeginTransaction();
adapter1.InsertCommand.Transaction =trans;
adapter1.UpdateCommand.Transaction =trans;
adapter1.DeleteCommand.Transaction =trans;
adapter1.Update(tbl);
trans.Commit();
if(adapter1.InsertCommand.Connection.State ==ConnectionState.Open)
adapter1.InsertCommand.Connection.Close();
}
catch (Exception ex)
{
trans.Rollback();
System.Diagnostics.Debug.WriteLine (ex.Message +ex.StackTrace);
bool flag1 = false;
return flag1;
}
return true;
}
}
} |
Note that in the above class, provided we have the Select Command for
the DataAdapter and it returns Key column information (a Primary
Key), we can get the Insert, Update and Delete commands we need
automatically via the SqlCommandBuilder. Note also that if you
wrap this update into a SqlTransaction, you will gain a significant
order of magnitude of speed. The reason for this (thanks to Gert
Drapers at Microsoft, SQL Server guru) is that the writes to the
trans log are delayed until the transaction is committed.
Here is what the Winforms tester app looks like after a test run:
If I have sparked your interest by now, you may be wondering what kind
of efficiencies this arrangement can generate. I did some informal
tests with Homer (Web Application Stress tool) to see if there
was a statistically significant improvement in throughput with
a web page that did it's own single logging SQL insert on each
page load, and another identical web page as described above which
instead wrote its single log record into the App Data Cache, allowing
the cache class to handle the decision about when to do the multiple
writes. Here are the initial results:
ONE MINUTE TEST - WRITE CACHING VS NO CACHING |
| TEST TYPE |
THREADS |
DELAY |
TEST LENGTH |
TOT REQUESTS |
REQ/SEC |
TOT ROWS INSERTED |
| NO CACHING |
9 |
0 |
1 MINUTE |
16,866 |
281.12 |
18863 |
| WRITE CACHING |
9 |
0 |
1 MINUTE |
24,878 |
409.59 |
26013 |
|
PERCENT DIFFERENCE: |
37.90% |
|
|
|
|
|
|
|
|
The write caching infrastructure was the clear winner, with nearly
a 40 percent improvement in traffic. I hasten to add that I didn't
spend much time tuning this. However, I did take care to ensure
that everything was "apples to apples" by restarting both IIS
and SQL Server after each test to "level the field".
I"ve included both the Windows Forms Tester app as well as a Web
Application that you can use to experiment with this creation.
Take a look at Global.asax.cs for the web app which is where I
create a static member that holds the engine class, making it
easily available from any page. In early testing, I've been getting
up to 2,000 inserts per second with this. I bet readers who are
exercising their noodles will come up with some really cool enhancement
ideas. There is complete SQL to create the test database and sprocs
in the download. And Steve, thanks for making me think!
Download the Visual Studio.NET solution that accompanies this article
P.S-- I've started a Google Group for Open Source .NET discussion. You're invited to join!
|