SEO with Google, MSN and
Yahoo Site: and Link: Counts

by Peter A. Bromberg, Ph.D.

Peter Bromberg
"People still choose complexity over simplicity when given a choice. This is a sad fact of life,
so get over it. Ideally the best model is complexity combined with ease of use."
-- John Dvorak

Experts will tell you that the more inbound links you have to your site or page, the better your search engine rank. While that's true, there are of course many other factors that get involved. Lists of inbound links are typically obtained from a search engine by prefixing the site domain or url with "link:", and the number of indexed pages the search engine has on a particular site are typically obtained by prefixing the domain with "site:".



Yahoo, MSN, Google and other search engines all provide this feature; Google and Yahoo (and I think possibly now also MSN) also provide API's that you can use for this, although the developer needs to request a license key and usually they are limited to some number of queries per day (5000 last time I looked). But, you don't need an API; you can simply make a webrequest for the search results page and "Webscrape" out the counts from the line near the top of the page, and convert it to an integer. Of course if you abuse it and start bombarding them with repetitive webRequests, their traps are going to stop you. For example, Google's comes up with a nasty error page saying something like "It appears that this request is coming from some rogue virus or automated software..." etc, etc. However, provided you don't abuse it, the approach is still useful.

This little exercise does that and provides a static method "GetCounts" that works with either "link:" or "site:" searches, with either Google, MSN, or Yahoo. It's just a handy utility class that let's you check a page or a site and get some quick stats on it. First, here's the code:

using System;

using System.Collections.Generic;

using System.Text;

using System.Net;

using System.Configuration;

namespace SearchCounts

{

public enum CountTypes

{

  SiteCounts,

  LinkCounts

}

 

public enum SearchEngine

{

  Google,

  MSN,

  Yahoo

}

 

public class Counts

{

private Counts() { }

//Usage: int val = GoogleSearchCounts.Counts.GetCounts(url, CountTypes.LinkCounts, SearchEngine.Google);

public static int GetCounts(string siteUrl, CountTypes countType, SearchEngine searchEngine)

{

int retval=0;

WebClient wc = new WebClient();

 

if (ConfigurationSettings.AppSettings["proxy"] != "")

{

WebProxy p = null;

string proxyAddressAndPort = ConfigurationSettings.AppSettings["proxy"];

string proxyUserName = ConfigurationSettings.AppSettings["proxyUserName"];

string proxyPassword = ConfigurationSettings.AppSettings["proxyPassword"];

ICredentials cred;

cred = new NetworkCredential(proxyUserName, proxyPassword);

p = new WebProxy(proxyAddressAndPort, true, null, cred);

GlobalProxySelection.Select = p;

wc.Proxy = p;

}

 

if (searchEngine == SearchEngine.Google)

{

try

{

string searchUrl2 = String.Empty;

string searchUrl = "http://www.google.com/search?hl=en&q=";

searchUrl2 = (countType == CountTypes.LinkCounts) ? "link%3A" : "site%3A";

string sFullSearchUrl = string.Concat(searchUrl, searchUrl2, siteUrl);

byte[] b = wc.DownloadData(sFullSearchUrl);

wc.Dispose();

string strContent = System.Text.Encoding.UTF8.GetString(b);

int startPos = strContent.IndexOf("of about <b>");

strContent = strContent.Substring(startPos);

int endPos = strContent.IndexOf("</b>");

strContent = strContent.Substring(12, endPos - 12);

strContent = strContent.Replace(",", "");

retval = int.Parse(strContent);

}

catch (Exception ex)

{

System.Diagnostics.Debug.WriteLine(ex.Message + ex.StackTrace);

}

}

else if (searchEngine == SearchEngine.MSN)

{

try

{

string searchUrl2 = String.Empty;

string searchUrl = "http://search.msn.com/results.aspx?q=";

searchUrl2 = (countType == CountTypes.LinkCounts) ? "link%3A" : "site%3A";

string sFullSearchUrl = string.Concat(searchUrl, searchUrl2, siteUrl);

byte[] b = wc.DownloadData(sFullSearchUrl);

wc.Dispose();

string strContent = System.Text.Encoding.UTF8.GetString(b);

int startPos = strContent.IndexOf("<h5>Page 1 of ") + 14;

strContent = strContent.Substring(startPos);

int endPos = strContent.IndexOf("results");

strContent = strContent.Substring(0, endPos);

strContent = strContent.Replace(",", "");

retval = int.Parse(strContent);

}

catch (Exception ex)

{

System.Diagnostics.Debug.WriteLine(ex.Message + ex.StackTrace);

}

}

 

else

{

try

{

string searchUrl2 = String.Empty;

string searchUrl = "https://siteexplorer.search.yahoo.com/advsearch?p=";

searchUrl2 = (countType == CountTypes.LinkCounts) ? "" : "";

string sFullSearchUrl = string.Concat(searchUrl, searchUrl2, siteUrl);

byte[] b = wc.DownloadData(sFullSearchUrl);

wc.Dispose();

string strContent = System.Text.Encoding.UTF8.GetString(b);

if (countType == CountTypes.LinkCounts)

{

int startPos = strContent.IndexOf("Inlinks (") + 9;

strContent = strContent.Substring(startPos);

int endPos = strContent.IndexOf(")</a>");

strContent = strContent.Substring(0, endPos);

strContent = strContent.Replace(",", "");

retval = int.Parse(strContent);

}

else

{

int startPos = strContent.IndexOf("Pages (") + 7;

strContent = strContent.Substring(startPos);

int endPos = strContent.IndexOf(")");

strContent = strContent.Substring(0, endPos);

strContent = strContent.Replace(",", "");

retval = int.Parse(strContent);

}

}

catch (Exception ex)

{

System.Diagnostics.Debug.WriteLine(ex.Message + ex.StackTrace);

}

}

return retval;

}

}

}

Now that wasn't difficult at all. I didn't use Regex here (although I could have) because its quite easy in these cases to simply "substring out" what we need. Besides, if the search engines change their HTML, Regex would likely break, too.

Now for an implementation of something this is useful for, besides general "SEO" testing. Let's say you want to feature, on your web site, a list of blog RSS feeds, but you'd like only to feature those that have a high-enough inbound link count to be "relevant". Why not just go through your list of blog urls, and get the "Link:" counts from your favorite search engine, and rank them. You could throw away, say, all those with less than 10 inbound links, and then you would have a nice "Hot list" of blogs that are really popular for your display?

I've put together a web page that does this with the Microsoft Bloggers OPML list, which has over 3,000 feeds listed. Since this is a large list, I did this on the .NET ThreadPool with an ManualResetEvent waiter that gets signaled when all the threads are done. I'm not going to show all the code for this here, but it's in the downloadable solution below. I"ve also thrown in some Anthem.Net controls so you can update your page without any reload. The Anthem.Net library is in the /bin folder of the website portion of the solution.

Enjoy!

Download the Visual Studio 2005 solution that acommpanies this article


Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform.
Article Discussion: