Fun With OPML IN ASP.NET

by Peter A. Bromberg, Ph.D.

Peter Bromberg
"There's no trick to being a humorist when you have
the whole government working for you."
  - Will Rogers

OPML ("OutLine Processor Markup Language") is getting a lot of interest lately, mostly because everybody's using it as the de-facto import / export mechanism for RSS Reader subscriptions. OPML is what you export from your BlogLines, FeedDemon, NewsGator or other deal and OPML is what you import into it. I regularly export OPML from my BlogLines web page and save it somewhere safe, heck I read recently that somebody had their entire GMail account wiped out and they lost a lot of important "stuff". Protect your ass!



The OPML 1.0 Spec is pretty simple, the elements are:

<opml version="1.0">
This is the root element. It must contain the version attribute and one head and one body element.
<head>
Contains metadata. May include any of these optional elements: title, dateCreated, dateModified, ownerName, ownerEmail, expansionState, vertScrollState, windowTop, windowLeft, windowBottom, windowRight. Each element is a simple text element. dateCreated and dateModified contents conform to the date-time format specified in RFC 822. expansionState contains a comma-separated list of line numbers that should be expanded on display. The windowXXX elements define the position and size of the display window. An OPML processor may ignore all the head sub-elements. If the outline is opened inside another outline then the processor must ignore the window elements.
<body>
Contains the content of the outline. Must have one or more outline elements.
<outline>
Represents a line in the outline. May contain any number of arbitrary attributes. Common attributes include text and type. The outline element may contain any number of outline sub-elements.

This is an example that uses the proposed category attribute from the upcoming OPML 2.0 Draft Specification:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<opml version="2.0">
<head>
<title>Illustrating the category attribute</title>
<dateCreated>Mon, 31 Oct 2005 19:23:00 GMT</dateCreated>
</head>
<body>
<outline text="The Mets are the best team in baseball." category="/Philosophy/Baseball/Mets,/Tourism/New York" created="Mon, 31 Oct 2005 18:21:33 GMT" />
</body>
</opml>

In particular, the proposed "category" attribute provides a way to do a DMOZ - style breadcrumb-outline tree, which I agree is most useful.

So in a nutshell, you have a bunch of <outline> elements that can be nested, which have a lot of attributes, the only commonly expected ones being the text (title), htmlUrl and xmlUrl. The text or title attribute would be used as a visible subject line for display in the "outline" document, and the htmlUrl would point to the target web page, the xmlUrl pointing to some variant of an RSS feed document such as RSS or Atom. The "type" attribute would specify the format ("rss", "atom" etc.).

Now this is a handy way to move "Table of Contents" type data around; it's extremely portable. It also lacks a lot of features; Dave Winer has been working on the OPML 2.0 Draft specification, but there's a lot of static stuff in there that some people don't particularly agree with. Of note, the MS Internet Explorer crowd is coming out with RSS Extensions that blow away everything currently offered by the FireFox crowd. And it's out under the Creative Commons "ShareAlike" license - a sign of great maturity on the part of the MS people. The most useful of these is the set of Simple Sharing Extensions that Jack Ozzie, George Moromisato, and Paresh Suthar have put together; they provide a mechanism for using RSS as the basis for bidirectional outline and item sharing. This opens up a new world of syndication possibilities.

One thing I like to do with Xml, and RSS in particular, is to take advantage of the ease of loading it directly into an ADO.NET DataSet with "ReadXml". However, because of the fact that OPML permits nested "outline" elements, the DataSet is going to scream bloody hell and tell you that it can't give you duplicate table names.

One thing I really do not like about OPML is that the spec wasn't hardened to the point that RSS was (RSS wasn't much better), and so you can have outline titles that have the "title" attribute (like my sample) or which only contain the "text" attribute, which seems to be the norm. If you really want to see an example of HOW NOT to evolve a specification, read the Wikipedia page on RSS and you can get a flavor for the personalities that got involved. UGLY, unproductive, confusing! Standards are good; they do NOT impede innovation - provided there is a base, level playing field one can rely on.

Sam Ruby has an excellent validator here. There's nothing more frustrating than a perfectly good Xml specification that was never nailed down! If I had more time (or perhaps later when I really decide to do something with this) I'd write some specialized parsing code on each XmlNode to ensure that it had the text attribute if it only had a "title" attribute, so my DataColumn would always be correct. In the code here, I only have added a limited amount of this "attribute substitution processing". Also, one should be aware of the fact that not all OPML outline elements have both an xmlUrl and an htmlUrl attribute - some have only the xmlUrl to the RSS feed.

If you wanted to do this with a serializable set of classes, it would need to look something like this:

namespace OPML
{
  using System;
  using System.Xml;
  using System.Xml.Serialization;

[XmlRoot("opml")]
public class Opml
{
  public OpmlBody body;
}

[XmlRoot("body")]
public class OpmlBody
{
  [XmlElement("outline")]
  public OpmlOutline[] outline;
}

[XmlRoot("outline")]
public class OpmlOutline
{
  [XmlAttribute]
  public string title;
  [XmlAttribute]
  public string description;
  [XmlAttribute]
  public string xmlUrl;
  [XmlAttribute]
  public string htmlUrl;
  [XmlAttribute]
  public string language;
}
}

So the first thing I did in my "Fun with OPML" exercise was to buld a helper class with a couple of static methods that would take care of the ReadXml parsing problem and give me back a DataSet. Advantages? Instant databinding to things like an ASP.NET DataGrid or DataGridView, sorting, filtering (e.g., poor man's "Search" facility) and more. Here's the code for my utility class with two overloads of the same method, one accepting an URI, and the other accepting an XmlNodeList:

namespace OPML

{

    using System;

    using System.Data;

    using System.IO;

    using System.Text;

    using System.Xml;

 

    /// <summary>

    /// OPMLUtil.

    /// </summary>

    public class OPMLUtil

    {

        private OPMLUtil()

        {

            //private ctor so people don't try to create an instance,

            // all methods are static

        }

 

        public static string GetSafeAttributeValue(XmlNode nod, string attributeName)

        {

            string attrValue = String.Empty;

            try

            {

                attrValue = nod.Attributes[attributeName].Value;

            }

            catch

            {

            }

            return attrValue;

        }

 

        public static DataSet OPMLToDataSet(string OPMLUrl)

        {

            DataSet ds = new DataSet();

            XmlDocument doc = new XmlDocument();

            // this Xml will never load directly into a DataSet because of the nested (duplicate) table names

            //    (e.g., "outline/outline") so we load into XmlDocument first

            doc.Load(OPMLUrl);

            // Now get the  single level "outline" element stuff we want

            XmlNodeList nods = doc.SelectNodes("//outline[@xmlUrl!='']");

            // // Create the target XmlDocument

            XmlDocument doc2 = new XmlDocument();

            // give it a root element

            doc2.LoadXml("<items></items>");

            // import the nodelist

            foreach (XmlNode nod in nods)

            {

                XmlNode importNode = doc2.ImportNode(nod, true);

                doc2.DocumentElement.AppendChild(importNode);

            }

            string outerXml = doc2.OuterXml;

            byte[] b = Encoding.UTF8.GetBytes(outerXml);

            MemoryStream ms = new MemoryStream(b);

            ds.ReadXml(ms);

            return ds;

        }

 

        public static DataSet OPMLToDataSet(XmlNodeList nods)

        {

            DataSet ds = new DataSet();

            XmlDocument doc2 = new XmlDocument();

            // give it a root element

            doc2.LoadXml("<items></items>");

            // import the nodelist

            string text = "";

            string xmlUrl = "";

            string htmlUrl = "";

            //massage the nodes and do some substitution preprocessing

            foreach (XmlNode nod in nods)

            {

                XmlNode importNode = doc2.ImportNode(nod, true);

                try

                {

                    text = GetSafeAttributeValue(importNode, "text");

                }

                catch

                {

                }

                if (text != "")

                {

                    try

                    {

                        XmlAttribute att = doc2.CreateAttribute("title");

                        att.Value = text;

 

                        importNode.Attributes.Append(att);

                        importNode.Attributes.Remove(importNode.Attributes["text"]);

                    }

                    catch

                    {

                        throw new InvalidOperationException("outline element has no text / title attribute");

                    }

                }

                try

                {

                    xmlUrl = GetSafeAttributeValue(importNode, "xmlUrl");

                    if (xmlUrl == "")

                    {

                        xmlUrl = GetSafeAttributeValue(importNode, "url");

                        XmlAttribute att2 = doc2.CreateAttribute("xmlUrl");

                        att2.Value = xmlUrl;

                        importNode.Attributes.Append(att2);

                        importNode.Attributes.Remove(importNode.Attributes["url"]);

                    }

                }

                catch

                {

                }

 

                try

                {

                    htmlUrl = GetSafeAttributeValue(importNode, "htmlUrl");

                    if (htmlUrl == "")

                    {

                        htmlUrl = GetSafeAttributeValue(importNode, "xmlUrl");

                        XmlAttribute att3 = doc2.CreateAttribute("htmlUrl");

                        att3.Value = htmlUrl;

                        importNode.Attributes.Append(att3);

                    }

                }

                catch

                {

                }

                doc2.DocumentElement.AppendChild(importNode);

            }

            string outerXml = doc2.OuterXml;

            byte[] b = Encoding.UTF8.GetBytes(outerXml);

            MemoryStream ms = new MemoryStream(b);

            ds.ReadXml(ms);

            return ds;

        }

    }

}

I am sure there may be more elegant ways to handle the copying of a nodelist from one XmlDocument to another; I must confess I am not an XML Guru so I took the easiest approach. The key thing here is that we want to "flatten out" that <outline> hierarchy so that we get an XmlNodelist that is linear - it has only one set of outline elements with no nesting. This is what will make the DataSet happy. So, by preprocessing out our NodeList with

  XmlNodeList nods = doc.SelectNodes("//outline[@xmlUrl!='']");

we can be sure to get a "flattened" NodeList of XmlNodes that have at least an xmlUrl attribute. You must use the ImportNode method with this in order to avoid the infamous "cannot copy" XmlDocument ownerdocument context exception. So the code attempts to fix that, and it also does a limited amount of "attribute substitution processing" in order to ensure that the DataSet will get the column names that I expect. It's not perfect, but it enables me to handle most of the OPML variants that will get thrown my way.

Now I am ready to build a fun Web page. I want it to do two things:

1) Import an OPML xml file from wherever I point it to and display it in a Pageable, Filterable ("Search") Datagrid, and

2) Be able to Upload my custom OPML file to the web page and have it do the same to display that.

You can see that with these two goals, we already have the foundation for some sort of "OPML Sharing" web site, which sounds pretty cool to me.

Here's the codebehind for my web page:

namespace OPML

{

    using System;

    using System.Data;

    using System.Web.UI;

    using System.Web.UI.HtmlControls;

    using System.Web.UI.WebControls;

    using System.Xml;

 

    public class WebForm1 : Page

    {

        protected DataGrid DataGrid1;

        protected TextBox txtFilter;

        protected Label Label1;

        protected Button Button1;

        protected HtmlInputFile htmlinputfile;

        protected Label lblUpload;

        protected HtmlForm Form1;

        protected Button btnUpload;

        protected Label lblOpml;

        protected Label Label2;

        protected DataSet ds;

 

        private void Page_Load(object sender, EventArgs e)

        {

            DataView dvBind = GetDataSource(txtFilter.Text);

            DataGrid1.DataSource = dvBind;

            DataGrid1.DataBind();

        }

 

        private DataView GetDataSource(string filter)

        {

            filter = "title like '%" + filter + "%'";

            if (this.ds != null)

            {

                DataView dv1 = ds.Tables[0].DefaultView;

                dv1.Sort = "title";

                dv1.RowFilter = filter;

                Session["dvBind"] = dv1;

            }

 

            DataView dv;

 

            if (Session["dvBind"] != null)

            {

                dv = (DataView) (Session["dvBind"]);

                dv.Sort = "title";

                dv.RowFilter = filter;

            }

 

            else

            {

                ds = OPMLUtil.OPMLToDataSet(Server.MapPath("export.xml"));

                DataTable dt = ds.Tables[0];

                dv = dt.DefaultView;

                dv.Sort = "title";

                dv.RowFilter = filter;

                Session["dvBind"] = dv;

            }

 

            return dv;

        }

 

        #region Web Form Designer generated code

 

        protected override void OnInit(EventArgs e)

        {

            InitializeComponent();

            base.OnInit(e);

        }

 

        private void InitializeComponent()

        {

            this.DataGrid1.PageIndexChanged += new DataGridPageChangedEventHandler(this.DataGrid1_PageIndexChanged);

            this.btnUpload.Click += new EventHandler(this.btnUpload_Click);

            this.Button1.Click += new EventHandler(this.Button1_Click);

            this.Load += new EventHandler(this.Page_Load);

        }

        #endregion

 

        private void DataGrid1_PageIndexChanged(object source, DataGridPageChangedEventArgs e)

        {

            DataGrid1.CurrentPageIndex = e.NewPageIndex;

        }

 

        private void Button1_Click(object sender, EventArgs e)

        {

            this.GetDataSource(txtFilter.Text);

        }

 

        private void btnUpload_Click(object sender, EventArgs e)

        {

            XmlDocument doc = new XmlDocument();

            try

            {

                doc.Load(this.htmlinputfile.PostedFile.InputStream);

            }

            catch (Exception ex)

            {

                this.lblOpml.Text = ex.Message;

                return;

 

            }

            XmlNodeList nods = doc.SelectNodes("//outline[@xmlUrl!='']");

            if(nods.Count==0)

            nods =doc.SelectNodes("//outline[@url!='']") ;

            this.lblOpml.Text = "Your OPML Has  " + nods.Count.ToString() + " Outline Elements";

            if(nods.Count==0)

            {

                this.lblOpml.Text="OPML Has no parseable outline nodes.";

                return;

 

            }

            DataSet dsret = OPMLUtil.OPMLToDataSet((nods));

            Session["dvBind"]=null;

            this.ds = dsret;

            DataView dvBind=this.GetDataSource(this.txtFilter.Text);

            DataGrid1.DataSource = dvBind;

            DataGrid1.DataBind();

        }

    }

}


What happens here is that when the Page loads, I call my GetDataSource method which accepts a Text "filter" string. This checks to see if there isn't a class-level DataSet (from an upload operation), and if not, it either gets the DataView that's in Session, or if it is null, it loads the OPML and gets back the DataSet from my utility class method. It then stores the DataView in Session and binds the grid. Same thing with Paging - all we need to do is change the currentPageIndex.

You can see also that if the user has uploaded an OPML document, the method at the bottom handles it and again, it binds the grid.

Here is a live example for you to play with. None of the uploaded OPML gets saved, it's all done in memory:

http://www.nullskull.com/articles/OPML/WebForm1.aspx

I should caution readers that this is the simplest of implementations; it requires that your OPML have at least the attributes "title" or "text" and "xmlUrl" with the exact casing shown, or it won't parse. So, please pre-process anything you decide to upload, or just download the solution and revise the code to your liking.

Have fun with OPML! The downloadable solution includes a copy of my sample OPML with some pretty good links in it. If you are interested in working on a "Social OPML Sharing" website application and have some good ideas, let me know. It would be fun to have three or four core developers work on something "open source" like this. We would get a GotDotNet Workspace for it and probably be able to turn out something useful for the community.

Download the Visual Studio 2003 Solution that accompanies this article


Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform.
Article Discussion: