Comparing Datasets using Linq

How to compare two datasets using Linq to DataSet

One of my recent projects included simple and I believe, commonly done tasks.  We were implementing project utilizing in Microsoft Visual Studio Professional edition, with C# 3.0 and the task was to compare two snapshots taken from the same table with particular time interval. The purpose of this task was to determine if any changes where done on the server side within given time interval.

The first way that came to my mind was to use the DataSet.Merge() method.  Ideally, this is how it should work:

At some point of the application initialization, we “grab” the first snapshot of data and load it into the original DataSet (let’s call it dsOriginal), then on the Timer event, when the given interval has passed, we take second snapshot (let’s call it dsChanged). The rest should be a fairly routine procedure:

1.       Create an empty DataSet (we will call it dsDifferences).
DataSet dsDifferences = new Dataset();

2.       Merge our dsOriginal with dsChanged
dsOriginal.Merge(dsChanged);

3.       The last step is to get the modified or added rows using the DataSet.GetChanges() method
dsDifferences = dsOriginal.GetChanges();

 

This is a very “stylish” and simple solution, but not so easy as in concept.
We were unpleasantly surprised when we found out that merge datasets do not change RowSatus property of the rows in the tables of the dataset.  Just to make sure that we did nothing wrong, I looked at MSDN library again and the MSDN materials confirmed that the Merge method combines rows (equivalent  to SQL Union statement) when no Primary Key present, or updates rows in target DataSet using source Dataset when Primary Key was defined. To resolve this issue we contacted the Microsoft Support team and after a few days of email exchange, we got the following message: I think the root of the issue is that Merge does not change row state.  If you create a dataset, then manually change a row value, then row state is tweaked and GetChanges() behaves as expected.”

But Merge does a “dumb” merge and does not set other flags from that we are seeing, but I am trying to find documentation to support that.  If that is the case, the only option is to do a manual compare.  I will keep you posted on what I find…

As you can see from the message that Microsoft suggestion was to go manually trough both DataSets comparing values row by row and manually update settings in the resulting Dataset using Row.SetAdded() or Row.SetModified()  methods according to the situation – a solution not as elegant. The fact that the method provided above is available for developers in Framework 2.0 and later for extra flexibility,  it is hard to imagine when these methods can be used in practical terms. After a day or two of additional research and conversations with Microsoft , we received another email from the support team, where we got the clue that the issue cannot be resolved: “I have raised the issue with appropriate personnel and I am waiting to hear back from them.  I will let you know something as soon as I find out more.

That’s when we turned our heads to LINQ.  I did not have extensive experience with this new feature, but in a few hours found it very easy to learn and apply. There are a few ways using LINQ to achieve the same result. For our purpose, we selected involvement of Union method. Therefore, here is the “moment of Zen”:

  1. During Initialization of our application, we still create an initial snapshot of the data
    in the sample below we will call it dsOriginal.
  2. On the Elapsed event of the timer we  create another snapshot of current data state
    we will call it dsChanged.
  3. To use LINQ we need to get the tables from both datasets into an enumerable row collection. This can be done using the AsEnumerable() method (see code sample below) and we are ready to use LINQ magic.

Below is the code, which completes the task:

 

var orig = dsOriginal.Tables[0].AsEnumerable();

var updated = dsChanged.Tables[0].AsEnumerable();

//First, getting new records if any           

var newRec =

            from u in dsChanged

            where !(from o in orig

                     select o.Field<decimal>("PRIMARY_KEY_FIELD"))

                 .Contains(u.Field<decimal>(" PRIMARY_KEY_FIELD"))

                        select new

                        {

                           prim_key = u.Field<decimal>("PRIMARY_KEY_FIELD"),

                           field1 = u.Field<decimal>("FIELD1"),

                           field2=u.Field<decimal>("FIELD2"),

                           field3 = u.Field<decimal>("FIELD3"),

                           field4 = u.Field<decimal>("FIELD4"),

                           rec_type="A"//Added

                        };

 

//Secondly, getting updated records

 var updRec = from u in updated

              join o in orig

              on u.Field<decimal>("PRIMARY_KEY_FIELD")

              equals o.Field<decimal>("PRIMARY_KEY_FIELD")
             where (u.Field<decimal>("FIELD1") !=
                  o.Field<decimal>("FIELD1")) ||
                  (u.Field<decimal>("FIELD2") !=
                  o.Field<decimal>("FIELD2"))

                         select new

                         {

                           prim_key = u.Field<decimal>("PRIMARY_KEY_FIELD"),

                           field1 = u.Field<decimal>("FIELD1"),

                           field2=u.Field<decimal>("FIELD2"),

                           field3 = u.Field<decimal>("FIELD3"),

                           field4 = u.Field<decimal>("FIELD4"),

                           rec_type = "M"//Mofified

                         };

var Union = newRec.Union(updRec);

 

 

The code snippet is simple and self-explanatory. In the first selected statement we are getting new records by using Contains() method  preceding  it with the “!” operator which gives us only new(Added) rows. In the second statement we get the rows with updated values using comparison in the Where clause, and finally, we compare the results using Union() method.  

 

You probably noticed the variable rec_type used in both queries. When you need to know which record was modified and which is new (added), the Union operator comes in handy because it allows you to create a custom flag on DataRow with this information, but if you only need to get differences without knowing if the row added or modified, you can get all the data in “one shot” and the code snippet below demonstrates how:

 

 

 

 

 

 

var AddedAndModif = from u in updated

                        where !(from o in orig

                                select o.Field<decimal>("PRIMARY_KEY"))

                               .Contains(u.Field<decimal>("PRIMARY_KEY"))

                        || !(from o in orig

                                select o.Field<decimal>("FIELD1"))

                               .Contains(u.Field<decimal>(“FIELD1"))

                        select new

                        {

                           prim_key = u.Field<decimal>("PRIMARY_KEY"),

                           field1 = u.Field<decimal>("FIELD1"),

                           field2=u.Field<decimal>("FIELD2"),

                           field3 = u.Field<decimal>("FIELD3"),

                           field4 = u.Field<decimal>("FIELD4"),

                        };

 

 

As you can see from this example, the Contains() method used twice solves the task and provides you with collection of both, modified and newly added rows.

 

 

In our opinion, among all other options, this solution appears to be fastest.  It is easy to read and modify as needed and showed about 20% better performance when comparing it to the same result achieved by the method suggested by MS Support professionals  provided at the top of this article (comparing row by row and manually changing RowSate using  Row.SetAdded() or Row.SetModified() methods).

 

Since the situation provided in this discussion is common, we decided to share our solution, thinking that it might make another developers life easier.

Aaron Katz - Certified PMP,
Yuri Kasan - MCAD

By Yuri Kasan   Popularity  (5820 Views)
Biography - Yuri Kasan
Yuri Kasan, is the lead developer of the corporate intranet at Mellon Investor Services, LLC. Yuri, who is an Microsoft Certified Developer specializes in the implementation of a wide variety of applications using Microsoft technologies. His expertise includes assisting his users in maximizing their investment in Microsoft desktop, server and development technologies. You can reach Yuri via email at ykasan@yahoo.com