"Everything" RSS / ATOM Feed Parser

A canonicalized, generic solution to the problem of parsing any kind of RSS or ATOM feed and returning a usable collection for databinding.

While looking over the SyndicationFeed class and related classes I found out something quite annoying: Microsoft put together this wonderful class infrastructure for handling various kinds of Syndication Feeds, in .NET 3.5, but they cannot handle the old style ATOM .03 feed schema (xmlns="http://purl.org/atom/ns#").  If you attempt to use the SyndicationFeed.Load method, you get this:  "The element with name 'feed' and namespace 'http://purl.org/atom/ns#' is not an allowed feed format.". Frankly, I think the error message should have been written more like "We decided we didn't want to bother with ATOM .03 feeds, so tough titsky on you!".

That's too bad, because a huge number of feeds, including most of Google's news, gmail and other feeds, are still delivered in this format. I have no idea what the rationale for this omission was, nor do I care to speculate. The bottom line is, .NET 3.5 SyndicationFeed classes cannot handle the format.

So, what should a developer do? Well, you can either spend a lot of time figuring out how to override the existing infrastructure, or you can just roll your own. In my case since I was mostly interested in gathering and displaying feed items, all I needed was the <item> or <entry> collection from the respective feed. Since all feeds are well-formed XML, I decided to start from that common denominator.

The code I present here is relatively simple: I start out with a GenericFeedItem class as a container for the Title, Link, Description and PubDate items, and I use an XmlTextReader with a switch block to traverse the DOM of  the retrieved feed, testing for and adding the correct elements and canonicalizing their names. The result is a simple, fast way to parse any feed (adding additional switch tests as needed) and return a standardized List<GenericFeedItem> collection that is always the same and can be databound.

The XmlTextReader class is perfect for this scenario because it provides fast, non-cached, forward-only access to XML data - similar to the way a SQLDataReader handles data from a SQL Server query. The switch block can be easily modified to accomodate additional feed schemas.

Here is the ultra-simple GenericFeedItem class:

using System;

namespace PAB.FeedParser
{
    [Serializable]
     public class GenericFeedItem
    {
         public string Title { get; set; }
        public string Link { get; set; }
        public string Description { get; set; }
        public DateTime PubDate { get; set; }
    }
}

And here is the GenericFeedParser class, with plenty of inline comments to explain what is happening:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml;
using PAB.FeedParser;

namespace PAB.FeedParser
{
    public class GenericFeedParser
    {
         public List<GenericFeedItem> ReadFeedItems(string url)
        {
             //create a List of type Dictionary<string,string> for the element names and values
            var items = new List<Dictionary<string, string>>();
             // declare a Dictionary to capture each current Item in the while loop
            Dictionary<string, string> currentItem = null;
            // Wrap a new XmlTextReader around the url of the feed
            var reader = new XmlTextReader(url);
             /// Read each element with the reader
            while (reader.Read())
             {
                 // if it's an element, we want to process it
                 if (reader.NodeType == XmlNodeType.Element)
                 {
                      string name = reader.Name;
                     if (name.ToLowerInvariant() == "item" || name.ToLowerInvariant() == "entry")
                     {
                          // Save previous item
                         if (currentItem != null)
                              items.Add(currentItem);

                          // Create new item
                        currentItem = new Dictionary<string, string>();
                     }
                      else if (currentItem != null)
                     {
                          reader.Read();
                          // some feeds can have duplicate keys, so we don't want to blow up here:
                         if (!currentItem.Keys.Contains(name))
                            currentItem.Add(name, reader.Value);
                     }
                  }
             }

             // now create a List of type GenericFeedItem
            var itemList = new List<GenericFeedItem>();
             // iterate all our items from the reader
            foreach (var d in items)
            {
                var itm = new GenericFeedItem();
                 //do a switch on the Key of the Dictionary <string, string> of each item
                 foreach (string k in d.Keys)
                 {
                     switch (k)
                     {
                          case "title":
                            itm.Title = d[k];
                              break;
                         case "link":
                            itm.Link = d[k];
                              break;
                         case "published":
                         case "pubDate":
                         case "issued":
                            DateTime dt ;
                           DateTime.TryParse(d[k],out dt);
                            itm.PubDate = dt != DateTime.MinValue  ? dt : DateTime.Now;
                              break;
                         case "content":
                         case "description":
                            itm.Description = d[k];
                              break;
                         default:
                              break;
                     }
                 }
                 // add the created item to our List
                 itemList.Add(itm);
             }
             return itemList;
        }
    }
}


In order to use this arrangement (say in a web page with a GridView) one would use code similar to this:


protected void DropDownList1_SelectedIndexChanged(object sender, EventArgs e)
        {
            if( DropDownList1.SelectedValue=="") return;
            var parser = new GenericFeedParser();
            List<GenericFeedItem> items = parser.ReadFeedItems(DropDownList1.SelectedValue);
            GridView1.DataSource = items;
             GridView1.DataBind();

         }

That's all it takes! You can throw virtually any kind of feed at this and it will happily return a List of type GenericFeedItem for you. If I have missed any of the common feed schemas in this exercise, it is a simple matter to modify the switch block as shown above in order to accomodate them.

In the downloadable solution, I've enhanced the arrangement  to permit the addition of a SyndicationFormat class which identifies the feed type and provides the title of the feed as well. You can download the full Visual Studio 2008 Solution which includes a web project with a page that will try each feed type and display the results , including the feed type and it's title.

By Peter Bromberg   Popularity  (4138 Views)