Client-SideReal-Time Stock Quote ScreenScraping
By Peter A. Bromberg, Ph.D.
Printer - Friendly Version
Peter Bromberg

I recently built a C# Webservice to supply some stock quote information to a Tray Icon application I've been experimenting with. Often whenever I am working on parsing HTML content from a remote page, I'll prototype the initial work with client - side Javascript, because its easy to test and also because Javascript is so similar in syntax to C# that its a simple matter to "upconvert" it.



I noticed Yahoo now offers real time quotes and, being an ex stockbroker, I just couldn't resist throwing together a little page that would scroll my favorite stocks and update them every minute. The cool thing about this page is that all the work is done on the client, and it uses a cookie to remember your list of favorite stocks. So, whenever you revisit the page, it's already going with your stock list for you!

"Client side" means, of course, you need Internet Explorer 5.0 or higher. The minimum requirement of version 3.0 of the MS XML Parser should be included in that installation.

First thing we need to do is have an input element and some sort of control to display our quotes. The Marquee control in IE is perfect for this:

<Body onload="popSymbols();">
<table border="0" cellpadding="2" cellspacing="2" BGCOLOR=#ffcc66 valign="top">
<tr><td align="center">
<input type=text id=quotelist />&nbsp;<a href="Javascript:getQuote(quotelist.value);">
Get Quotes</a>
</td>
</tr>
<tr><td align="center"><div style="font-size:9px;">List symbols, separated by spaces</div></td></tr>
<tr><td><marquee id=showQuotes></marquee></td></tr>
</table>
</body>

Most of the above should be self-explanatory. Note that when you are doing everything client-side there is no need to post a form. Just put the event callers in the onclick, onblur, onchange or whatever event of your input controls, and tell them what method to call. So here even though we have a hyperlink control, we are using it's onclick event to call the getQuote method and pass in the value of the quotelists input control as the required method parameter.

We also need a handy way to refresh the page and make the getQuote method to refresh our stock quotes, and the easiest, most trouble-free way to do that is to use a META-REFRESH tag which in this case looks like the following:

<META HTTP-EQUIV=Refresh CONTENT="60; URL=http://www.nullskull.com/articles/YahooMultipleQuotes.htm">

Now we need our cookie methods to set and get the cookie holding the quotes list:

function setCookie(name, value, expires, path, domain, secure) {
var curCookie = name + "=" + escape(value) +
((expires) ? "; expires=" + expires.toGMTString() : "") +
((path) ? "; path=" + path : "") +
((domain) ? "; domain=" + domain : "") +
((secure) ? "; secure" : "");
document.cookie = curCookie;
}
function getCookie(name) {
var dc = document.cookie;
var prefix = name + "=";
var begin = dc.indexOf("; " + prefix);
if (begin == -1) {
begin = dc.indexOf(prefix);
if (begin != 0) return null;
} else
begin += 2;
var end = document.cookie.indexOf(";", begin);
if (end == -1)
end = dc.length;
return unescape(dc.substring(begin + prefix.length, end));
}

And of course, we need our method "popSymbols" from the Body onload event which simply gets the cookie value and stuffs it into our input element:

function popSymbols(){
  var Symbols=getCookie("symbols")
  if(Symbols!=null && Symbols !="" && Symbols !="undefined")
    {
     quotelist.innerText=Symbols;
     getQuote(Symbols);
    }
}

Note that we do some checking so that we don't end up entering "undefined" or some other useless junk in there. And now comes the "cool code" that does the actual work:

function getQuote(Symbols){
if(getCookie("symbols") !=Symbols){
var now = new Date();
now.setTime(now.getTime() + 365 * 24 * 60 * 60 * 1000);
setCookie("symbols",Symbols, now);
}
var x = new ActiveXObject("MSXML2.XMLHTTP");
arSymbols=Symbols.split(" ");
// uri for realtime quotes
x.Open("GET","http://finance.yahoo.com/q?s=" + Symbols+"&d=e", false);
x.Send();
var res=x.ResponseText;
var startpos = res.indexOf("Detailed");
var theQuotes="";
for(var i = 0;i<arSymbols.length;i++){
res=res.substring(startpos);
startpos=res.indexOf("Order Books");
res=res.substring(startpos);
startpos=res.indexOf(arSymbols[i].toUpperCase());
res=res.substring(startpos, res.length);
var startpos = res.indexOf("<font face=arial size=-1><b>");
var endpos = res.indexOf("</b>")
theQuotes+= arSymbols[i] +":" + res.substring(startpos,endpos)+" ";
// clean up the junk remaining in the string
theQuotes=theQuotes.replace("<b>","").replace("<font face=arial size=-1>", "");
startpos=endpos+1;
}
var theTime = new Date();
var hr =theTime.getHours();
var mn = theTime.getMinutes();
var ms =theTime.getMilliseconds();
var strTime = hr + ":" + mn +":" +ms;
showQuotes.innerHTML=theQuotes + " at " + strTime;
}

Note in the first line, I'm checking    if(getCookie("symbols") !=Symbols) to see if the user has changed the list of symbols, and if so, we write a new cookie for them to save their new list. Next, we convert our string list of symbols to an array and make the Http GET call to the Yahoo quote server. We get the result in the responseText property:

var x = new ActiveXObject("MSXML2.XMLHTTP");
arSymbols=Symbols.split(" ");
// uri for realtime quotes
x.Open("GET","http://finance.yahoo.com/q?s=" + Symbols+"&d=e", false);
x.Send();
var res=x.ResponseText;

Next, we need to iterate over our symbol list array, isolating the place in the returned string where the information we want is located. Obviously, the way you do this is first start by returning the entire page and look at the source, making notes of unique sequences of HTML that you can use to nail down a start and end position with the Javascript indexOf or lastIndexOf string methods:

var startpos = res.indexOf("Detailed");
var theQuotes="";
for(var i = 0;i<arSymbols.length;i++){
res=res.substring(startpos);
startpos=res.indexOf("Order Books");
res=res.substring(startpos);
startpos=res.indexOf(arSymbols[i].toUpperCase());
res=res.substring(startpos, res.length);
var startpos = res.indexOf("<font face=arial size=-1><b>");
var endpos = res.indexOf("</b>")
theQuotes+= arSymbols[i] +":" + res.substring(startpos,endpos)+" ";
// clean up the junk remaining in the string
theQuotes=theQuotes.replace("<b>","").replace("<font face=arial size=-1>", "");
startpos=endpos+1;
}

You can use Regex in many instances to do this kind of stuff, for example in C# you could isolate all the content between two markers like so:

Regex regex = new Regex("<!-- BEGIN STUFF -->((.|\n)*?)<!-- END STUFF -->", RegexOptions.IgnoreCase);
Match match = regex.Match(strContent);

However, in some cases its better to do what I do here which is basically to "walk" your way through the content, and each time you have acquired a substring containing info that you need, just "chop off" the already parsed portion so that you have a new string that starts at a point where you can repeat your substring search for the next occurring similar element.

Finally all we need to do at the end of our iterations is add in the time so all the stock jockeys with Obsessive Compulsive Disorder will see that the page is indeed refreshing once per minute:

var theTime = new Date();
var hr =theTime.getHours();
var mn = theTime.getMinutes();
var ms =theTime.getMilliseconds();
var strTime = hr + ":" + mn +":" +ms;
showQuotes.innerHTML=theQuotes + " at " + strTime;
}

And, true to form, it would be a crime to let you get this far without a live working example:

Real-Time Stock Quotes

Need the source? It's yours- just view source on the window that comes up from the link above (The source contains additional features not shown here, including the use of a Regex method to remove extraneous HTML tags and their contents). NOTE: If you get errors running the above, it's most likely because you have IE locked up too tight. Put the site in your Trusted Sites list, and make sure that Trusted Sites has permissions to run ActiveX and handle cross-site scripting calls.

NOTE: Yahoo stock quote data is not for resale or display in commercial web sites. Please use good Netiquette when considering the uses for such code. And - visit the provider!

 

Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform.