Speed Up Copy Operations with SqlBulkCopy

by Peter A. Bromberg, Ph.D.

Peter Bromberg
"Nothing is more difficult, and therefore more precious, than to be able to decide." --Napoleon

I must confess that this little venture into the ADO.NET 2.0 SqlBulkCopy class had an ulterior motive: Where I work, as with many other places, bosses tend to be reluctant to adopt new technologies early. Now I don't consider Visual Studio.NET 2005 and ADO.NET 2.0 "new technology", but they certainly have the right to be cautious.

Of course, one can take the "high road" approach: create for your boss such a compelling reason for introducing an application written with the new technology that they relent and allow you to do it. And, so it was with SqlBulkCopy. When you need to insert millions -- and I mean, literally -- millions of rows of VOIP phone call records from the flat files generated by the platforms in a dozen different cities, and you need to do it programmatically because it needs to be pre-processed on the fly, then the SqlBulkCopy class, new in .NET 2.0, is going to become your friend. Needless to say, the powers-to-be were impressed.

Now let's take a look at a test app that will not only demonstrate the basics of the SqlBulkCopy class, but unlike the hordes of "Hello World" sample apps we all run into, most of which accomplish absolutely nothing of value, this one will provide you with a complete US Zipcode database, including all the Latitude and Longitude coordinates!.

What I've done is exported my Zipcodes database into a standard CSV - style flat file. Each row is delimited with commas, and there are no double quote marks around each field since they aren't needed. In this case everything is a Varchar field, since I want to "keep it simple" -- although the SqlBulkCopy class supports table mappings and datatypes. It even handles datatype conversions "on the fly" during the inserts.



The sample app will have two buttons:

SqlBulkCopy Test App

The "Prepare Table" button drops the Zipcodes table in your chosen test Sql Server database (if it already exists), and creates a new Zipcodes table. This allow you to drop the table and recreate it before or after a test, so that you can experment with the various options and have a "level playing field" for testing purposes. (You need to press this button BEFORE you press the "Get Data" button, n00bs.)

The "Get Data - Insert" button will read the CSV file, chosen from an OpenfileDialog, parse it via a StreamReader and string Split operations, create a DataTable from it, and pass this into the SqlBulkCopy class, calling it's WriteToServer method.

SqlBulkCopy also supports events, which I don't use here, but you can set the interval ("number of rows copied') and use them just like any other event. I use them in production to generate SysLog UDP Messages to interested listeners every 5000 rows Bulk Copied when that feature is "turned on" in my Windows Service's config file.

I use a Stopwatch instance to time the results, and the elapsed time is displayed (in milliseconds, including the reading and parsing of the CSV file) in the purple label band at the bottom of the small Form after each test run.

Let's take a look at the code behind the "Get Data - Insert" method:

private static TimeSpan  DoBulkCopy(string filePath)

{

 

Stopwatch stopWatch = new Stopwatch();         

stopWatch.Start();

StreamReader sr = new StreamReader(filePath);

string fullFileStr = sr.ReadToEnd();

sr.Close();

sr.Dispose();           

string[] lines = fullFileStr.Split('\n');

DataTable dt=new DataTable() ;

string[] sArr =lines[0].Split(',');

foreach(string s in sArr)

{

dt.Columns.Add(new DataColumn());

}

DataRow row;

string finalLine = "";

foreach (string line in lines)

{

row = dt.NewRow();

finalLine = line.Replace(Convert.ToString('\r'), "");

row.ItemArray = finalLine.Split(',');               

dt.Rows.Add(row);   

}   

SqlConnection cn = new SqlConnection(System.Configuration.ConfigurationManager.AppSettings["connectionString"].ToString());

System.Data.SqlClient.SqlBulkCopy bc = new System.Data.SqlClient.SqlBulkCopy(cn, SqlBulkCopyOptions.TableLock, null);

bc.BatchSize = dt.Rows.Count;

cn.Open();

bc.DestinationTableName = "Zipcodes";

bc.WriteToServer(dt);

cn.Close();

bc.Close(); 

TimeSpan ts = stopWatch.Elapsed;

stopWatch.Stop();

return ts;

}

In a nutshell:

  • You select the import file via the openFileDialog.
  • We read it into a string in one "fell swoop" with a StreamReader.
  • We split the string on the newline character to get a String Array of all the lines.
  • We "chop off" the offending "\r" character that's left over at the end of each line.
  • We get the array of Columns to add to our DataTable by splitting the first line with the comma delimiter, and add the columns to our new DataTable.
  • We iterate over each line in the file, split it into a string array, and assign this to the ItemArray property of a new DataRow, and add the row to our table.
  • We create a new SqlConnection to our database, a new instance of the SqlBulkCopy class, set the target Sql Server table, and call its WriteToServer method, passing in our newly constructed DataTable of "stuff".

Now, here's the big question: How long do you think it will take to read 43,000 lines of zip code data and SqlBulkCopy it into a Sql Server table?

On my home box, which is an AMD x64 3400+ with 2 GB RAM, it takes about 1250 milliseconds for "everything" -- reading the file, parsing it, creating the datatable, and inserting all the rows. And folks, that fact -- 1250 milliseconds -- is what convinced my boss to let us go with Visual Studio .NET 2005 and ADO.NET. It does it reliably. You may need to throttle your Sql Server by setting a maximum memory footprint when you're doing the kinds of continuous million - row inserts I do in production, but it will still work like a champ.

Have fun with the download! Be sure that the target database in the sql connection string in the app.config matches your system if you want to get to "first base" with the demo code.

Download the Visual Studio 2005 Solution that accompanies this article


Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform.
Article Discussion: