SQL Server Bulk Insert CSV Reader

Have you ever received CSV delimited text files that you need to import into SQL Server? You use the wizard, and everything seems to be fine during the import and then it stops dead in its tracks with an error message, usually about data being truncated. The SQL Server Import Data Wizard works fine for most files, but when you get one of these your development efforts come to a screeching halt. Yes, you can try to edit the SSIS package but often that still won't fix the problem.

What you really need is a CSV parser-reader that allows for settings that will overcome these read errors. I looked around and tried a few; the one I like the best is called Lumen.Works.Framework.IO. It has a lot of configuration options and by coupling this with the SqlBulkCopy class in .NET, you can very quickly create a utility that can solve most data import problems.

This Windows Forms Utility allows you to select multiple CSV files (assuming the schema is the same) in one go. It iterates over the files list, loading and parsing one file at a time. Every 5000 rows, it performs a SqlBulkCopy insert into your table and updates the UI so you can see the progress.

In order to use this, you need to:
1) Create the target table first in your SQL Server database.
2) Set the connection string in the App.Config file.
3) Supply the target table name in the form, and
4) Select the file(s) you want to import.

I've supplied a sample SQL Script you can use that will create a target table "TEST" in a database named "TEST", as well as a sample 10,000 row CSV file that is TAB delimited and has a header row with the column names. The data is simply some voter records that are publicly available in many states.

The utility is pretty fast - about as fast as the SQL Server Import Wizard.

Here's the class for the single Form. The code should be self-explanatory and has comments at important areas:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Configuration;
using System.Data;
using System.Data.SqlClient;
using System.Drawing;
using System.IO;
using System.Text;
using System.Windows.Forms;

using LumenWorks.Framework.IO.Csv;

using Microsoft.ApplicationBlocks.Data;

namespace CsvReaderDemo
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}


    private string tableName = String.Empty;
private void button1_Click(object sender, EventArgs e)
{
    tableName = txtTableName.Text;
             if (tableName.Length <1)
    {
         MessageBox.Show("Must provide Table Name");
        return;
    }
DialogResult res = openFileDialog1.ShowDialog();
   string[] files = openFileDialog1.FileNames;
            foreach (String file in files)
            {
                label2.Text = "INSERTING FROM " + file;
                StringReader rdr = new StringReader(file);
                DataTable dt = CreateDataTable(tableName );

                string cnString = ConfigurationManager.ConnectionStrings["test"].ConnectionString;
                int ctr = 0;

               // NOTE: there are several constructor options. This one specifies that there is a header line ( "true"), and the delimiter is a TAB character.
                 using (CsvReader csv = new CsvReader(new StreamReader(file), true, '\t'))
                 {
    // Note various behavioral properties that can be set:
                    csv.SupportsMultiline = false;
                    csv.DefaultParseErrorAction = ParseErrorAction.AdvanceToNextLine;
                    csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
                    string s;
                    string[] stuff = new string[dt.Columns.Count];

                     while (csv.ReadNextRecord())
                    {
                        csv.CopyCurrentRecordTo(stuff);
                        object[] parms = new object[stuff.Length];
                          for (int i = 0; i < parms.Length; i++)
                            parms[i] = stuff[i];
                    
                        DataRow row = dt.NewRow();
                        row.ItemArray = parms;
                         dt.Rows.Add(row);
                         ctr++;
                       
                          if (ctr % 5000  == 0)
                        {
                            InsertRows(dt,  tableName );
                              // inserted 5000 rows, clear the DataTable for the next batch
                              dt.Rows.Clear();
                              this.label1.Text = ctr.ToString();
                            Application.DoEvents();
                        }
                    }
                    label1.Text = "COMPLETED.";
                 }
             }
}


         private DataTable CreateDataTable( string tableName)
        {
            SqlConnection cn = new SqlConnection(ConfigurationManager.ConnectionStrings["test"].ConnectionString);
            DataTable dt = new DataTable();
            SqlDataAdapter da = new SqlDataAdapter("SELECT TOP 1 * FROM " + tableName, cn);
             da.Fill(dt);
            dt.Rows.Clear();
            return dt;
        }


        private void InsertRows( DataTable dt, string tableName)
         {
             using (SqlConnection connection =
             new SqlConnection(ConfigurationManager.ConnectionStrings["test"].ConnectionString))
            {
                SqlBulkCopy bulkCopy =
                     new SqlBulkCopy
                    (
                         connection,SqlBulkCopyOptions.Default ,null
                     );

                 // set the destination table name
                bulkCopy.DestinationTableName = tableName;
                 connection.Open();
                 // write the data in the "dataTable"
                 try
                {
                    bulkCopy.WriteToServer(dt);
                }
                catch (Exception ex)
                {

                }
            }
        }
}
}

You can download the complete Visual Studio 2010 solution here.

By Peter Bromberg   Popularity  (3147 Views)