| As an ex-stockbroker,
I've always been fascinated with quote streamers and other web-based
financial tickers. One of the most interesting recent offerings is Yahoo's
real time stock quote service. In this article, I'll show how you can
combine Regular Expressions and the Matches colection with the XmlDocument
class to "scrape" the important parts of the web page from Yahoo, reformat
the XML, and show it as a moving Marquee in Internet Explorer.
We will use the WebRequest class to make our call to
the Yahoo URL. Then we will iterate the web page that we received, stripping
out the elements we want from the one or more rows in the HTML Table
using the Regex Match Collection and Regex match variables embedded
in our match string, like this: (?<symbol>[^<]+)
Finally, we will iteratively
build an XmlDocument from the match results using the XmlDocument class
and methods, and return it to the caller. The XmlDocument can then be
used
to populate
a DataSet,
do an XSL Transform, or other generic purpose.
First, let's take a look at the HTML (Aspx) UI portion
of the page:
<%@ Page language="c#" Codebehind="WebForm1.aspx.cs" AutoEventWireup="false" Inherits="RegexYahooXml.WebForm1" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
<HTML>
<HEAD>
<title>Screenscraping With Regex and Xml.</title>
<meta name="GENERATOR" Content="Microsoft Visual Studio .NET 7.1">
<meta name="CODE_LANGUAGE" Content="C#">
<meta name="vs_defaultClientScript" content="JavaScript">
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
</HEAD>
<body MS_POSITIONING="GridLayout">
<form id="Form1" method="post" runat="server">
<marquee id="Ticker" style="Z-INDEX: 101; LEFT: 72px; POSITION: absolute; TOP: 80px" runat="server"
width="600" onmouseover="this.stop();" onmouseout="this.start();"></marquee>
<asp:TextBox id="TextBox1" style="Z-INDEX: 102; LEFT: 168px; POSITION: absolute; TOP: 144px"
runat="server" Width="256px" Height="24px"></asp:TextBox>
<asp:Button id="Button1" style="Z-INDEX: 103; LEFT: 464px; POSITION: absolute; TOP: 144px" runat="server"
Width="104px" Text="Get Stocks"></asp:Button>
<asp:Label id="Label1" style="Z-INDEX: 104; LEFT: 176px; POSITION: absolute; TOP: 112px" runat="server"
Width="376px" Font-Names="Verdana">Enter Symbols, separated by spaces.</asp:Label>
<asp:Label id="Label2" style="Z-INDEX: 105; LEFT: 64px; POSITION: absolute; TOP: 8px" runat="server"
Width="575px" Height="32px" Font-Names="Verdana">Yahoo Realtime Stock Quotes with Regular Expressions and Xml.</asp:Label>
</form>
</body>
</HTML> |
You can see that I've added a Marquee control, set it to runat="server",
and added client-side event handlers to start and stop the scroll when
your mouse hovers over an item.
The "engine" of the process looks like this:
public XmlDocument GetXmlYahoo(string symbolList )
{
string url="http://finance.yahoo.com/q?s=";
url+=symbolList;
url+="&d=e";
WebRequest webRequest = WebRequest.Create(url);
string beginStr = "";
try
{
WebResponse webResponse = webRequest.GetResponse();
beginStr = new StreamReader(webResponse.GetResponseStream(),
Encoding.Default).ReadToEnd();
webResponse.Close();
// clean up some YHOO finance "junk" first so Regex matches won't fail
beginStr = beginStr.Replace("\n", "");
beginStr=beginStr.Substring(beginStr.IndexOf("Order Books"));
beginStr=beginStr.Replace("<font color=ff0020>","");
beginStr=beginStr.Replace("</font></font>","</font>");
}
catch (Exception)
{
beginStr = "";
}
XmlDocument xmlDocument = new XmlDocument();
XmlElement elemQuotes = xmlDocument.CreateElement("StockQuotes");
xmlDocument.AppendChild(elemQuotes);
// match string for our Regex Matches collection
string mainStr = "<td nowrap align="left"><font face=arial size=-1><a href=\"(?<href>[^\"]+)\">(?<symbol>[^<]+)</a> </font></td><td nowrap align="center"><font face=arial size=-1><i>(?<time>[^<]+)</i> </font></td><td nowrap><font face=arial size=-1><b><i>(?<price>[^>]+)</i></b> </font></td><td nowrap><font face=arial size=-1><i>(?<change>[^<]+)</i></font></td>";
new Regex(mainStr, RegexOptions.Compiled);
IEnumerator iEnumerator = Regex.Matches(beginStr, mainStr).GetEnumerator();
//Response.Write("<textarea rows=100 cols=120>" +beginStr + "</textarea>");
try
{
while (iEnumerator.MoveNext())
{
Match match = (Match)iEnumerator.Current;
XmlElement elemQuote = xmlDocument.CreateElement("Quote");
XmlElement elemSymbol = xmlDocument.CreateElement("Symbol"); XmlElement elemTime = xmlDocument.CreateElement("Time");
XmlElement elemPrice = xmlDocument.CreateElement("Price");
XmlElement elemChange = xmlDocument.CreateElement("Change");
elemSymbol.InnerText = match.Groups["symbol"].Value;
elemPrice.InnerText = match.Groups["price"].Value.Replace(",", ".");
elemTime.InnerText=match.Groups["time"].Value.Replace(",", ".");
elemChange.InnerText = match.Groups["change"].Value.Replace(",", ".");
elemQuote.AppendChild(elemSymbol);
elemQuote.AppendChild(elemPrice);
elemQuote.AppendChild(elemChange);
elemQuote.AppendChild(elemTime);
xmlDocument.DocumentElement.AppendChild(elemQuote);
}
}
catch(Exception ex)
{throw new Exception(ex.Message);}
return xmlDocument;
}
|
What the above method does is as follows:
1) Accept the space-delimited list of stock symbols, and append it to
the URL
2) Make the WebRequest to the Yahoo finance URL and get the response
Text into "beginStr"
3) Chop off everything before "Order Books" in order to simplify
processing
4)
Clean up all instances of "<font
color=ff0020>" in order to be able to handle both positive and
negative price changes without writing a lot of extra Regex code
5)
Create the main Regex Match string that will isolate every row that has stock
information (Note the placeholder variable for each item embedded in the string
(e.g., : (?<time>[^<]+)):
<td
nowrap align="left"><font face=arial size=-1><a href=\"(?<href>[^\"]+)\">(?<symbol>[^<]+)</a>
</font></td><td nowrap align="center"><font face=arial size=-1><i>(?<time>[^<]+)</i>
</font></td><td nowrap><font face=arial size=-1><b><i>(?<price>[^>]+)</i></b>
</font></td><td nowrap><font face=arial size=-1><i>(?<change>[^<]+)</i></font></td>
6) Get the Enumerator
for the Matches object, and loops though the collection.
7) Build an XmlDocument from the values returned, and return the XmlDocument
Note that this is
the format the Yahoo service returns DURING MARKET HOURS ONLY.
Now that we have our
XmlDocument, we will pass it to the "FormatXML" method, a utility method
that simply converts the XML element values to <span> elements suitable
for assigning to the innerHTML of our Marquee Control:
private string FormatXML(XmlDocument xmlDoc)
{
string strResult=String.Empty;
string strBegin=" <SPAN STYLE='COLOR:blue'>";
IEnumerator iEnumerator = xmlDoc.DocumentElement.ChildNodes.GetEnumerator();
try
{
while (iEnumerator.MoveNext())
{
XmlNode xmlNode = (XmlNode)iEnumerator.Current;
string strUri="http://finance.yahoo.com/q?s=";
string strQuotes= xmlNode.ChildNodes[0].InnerText;
string strEndQry="&d=e";
string strFullUri=strUri+strQuotes+strEndQry;
string[] strs = new string[]{strBegin,
"<a href='"+strFullUri+"' target='_blank'>"
+ xmlNode.ChildNodes[0].InnerText + "</a>: ",
xmlNode.ChildNodes[1].InnerText+ " ",
"["+xmlNode.ChildNodes[2].InnerText + "] "
,xmlNode.ChildNodes[3].InnerText+ " </SPAN> "};
strResult+= String.Concat(strs);
}
}
catch (Exception ex)
{throw new Exception(ex.Message);}
return strResult;
} |
All this is kicked off and controlled by the Button Click event handler:
private void Button1_Click(object sender, System.EventArgs e)
{
XmlDocument xmlDoc=GetXmlYahoo(TextBox1.Text);
string strHTML=FormatXML(xmlDoc);
Ticker.InnerHtml=strHTML;
} |
And there you have it: a utility method for extracting realtime quotes from
Yahoo and returning them in a generic XmlDocument. This could easily form the
basis of a ServerControl (all you would need are a few public properties and
a Designer class) or for a webservice or any of a number of other uses.
Download the Visual Studio.NET solution below. If you don't have 2003, just
start a new blank WebForms project and add the files from mine.
Download the code that accompanies this article
| |
| | Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform. |
|