All YQL statements comprise a SQL - like query language that is highly customizable.
In fact, there is a vibrant community that has a GitHub repository of custom
tables and repositories here: https://github.com/yql/yql-tables. There is also a "helper" site here: http://www.datatables.org/.
These "Open Datatables" allow developers to use just a single, uniform
way of using any web-service or data source like Amazon, iTunes, or Twitter.
The YQL (Yahoo! Query Language) platform enables developers to query, filter,
and combine data across the web through a single interface. It exposes a SQL-like
syntax that is both familiar to developers and expressive enough for getting
the right data.
Open Data Tables are XML files that can be "plugged" into the Yahoo! Query
Language open platform (YQL). These files describe how the YQL SQL-like language
can be mapped onto any web service or source on internet. Once mapped, these
data sources can be used by developers in many ways in YQL. You can even extract
elements from the HTML of a cross-domain url with queries like the following:
select * from html where url="http://finance.yahoo.com/q?s=yhoo" and xpath='//div[@id="yfi_headlines"]/div[2]/ul/li/a'
In this sample, we'll use the contentanalysis.analyze table from Yahoo! to perform
"term extraction" (getting valid keywords and phrases) from either
a block of text, or even directly from the url to a blog post or article. These
keywords can be used for SEO optimization. A typical query resultset looks like
this:
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
yahoo:count="1" yahoo:created="2012-01-15T13:57:50Z" yahoo:lang="en-US">
<diagnostics>
<publiclyCallable>true</publiclyCallable>
<user-time>140</user-time>
<service-time>114</service-time>
<build-version>24402</build-version>
</diagnostics>
<results>
<entities xmlns="urn:yahoo:cap">
<entity score="0.784327">
<text end="16" endchar="16" start="0" startchar="0">Italian sculptors</text>
</entity>
<entity score="0.764539">
<text end="72" endchar="72" start="58" startchar="58">the Virgin Mary</text>
<wiki_url>http://en.wikipedia.com/wiki/Mary_%28mother_of_Jesus%29</wiki_url>
<related_entities>
<wikipedia>
<wiki_url>http://en.wikipedia.com/wiki/Mary_MacKillop</wiki_url>
<wiki_url>http://en.wikipedia.com/wiki/S%c3%bcmela_Monastery</wiki_url>
<wiki_url>http://en.wikipedia.com/wiki/Canonization</wiki_url>
<wiki_url>http://en.wikipedia.com/wiki/Lourdes</wiki_url>
<wiki_url>http://en.wikipedia.com/wiki/Naval_warfare_of_World_War_I</wiki_url>
</wikipedia>
</related_entities>
</entity>
<entity score="0.509566">
<text end="29" endchar="29" start="22" startchar="22">painters</text>
</entity>
</entities>
</results>
</query>
You can see above that the XML returned contains Wikipedia entries ("related_entities")
as well as keywords ("entity/text"). In this example, I only use the
keywords. Other queries will also return Yahoo!'s Categories for the entered
content.
Here is my method to return keywords from a specified URL:
// get search terms from an entered Url
public static List<string> GetSearchTermsUrl(string url)
{
string query = "http://query.yahooapis.com/v1/public/yql?q=select * from contentanalysis.analyze
where url='" + url + "'";
List<string> items = new List<string>();
string s = String.Empty;
WebClient wc = new WebClient();
s = wc.DownloadString( query);
wc.Dispose();
MemoryStream ms = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(s));
XDocument doc = XDocument.Load(ms);
XNamespace x = "urn:yahoo:cap";
var results = doc.Descendants(x + "text");
foreach(var itm in results)
items.Add(itm.Value);
items = items.Distinct().ToList();
return items;
}
And here is my method to return the same type of results from a block of entered
text (e.g. that you copied from a page):
public static List<string> YqlPost( string content)
{
// clean out various characters (e.g. from code samples) that could mess up the YQL
select statement
content = content.Replace(";", " ").Replace("{", " ").Replace("}", " ").Replace("@", " ").Replace("=", " ").Replace("'", " ");
string query =
"SELECT * FROM contentanalysis.analyze WHERE text='" + content + "'";
List<string> items = new List<string>();
WebClient wc = new WebClient();
wc.Headers.Add(HttpRequestHeader.ContentType,"application/x-www-form-urlencoded");
NameValueCollection nvc = new NameValueCollection();
nvc.Add("q",query);
byte[] b = null;
try
{
b = wc.UploadValues("http://query.yahooapis.com/v1/public/yql", nvc);
MemoryStream ms = new MemoryStream(b);
XDocument doc = XDocument.Load(ms);
XNamespace x = "urn:yahoo:cap";
var results = doc.Descendants(x + "text");
foreach (var itm in results)
items.Add(itm.Value);
items = items.Distinct().ToList();
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine(ex.ToString());
}
finally
{
wc.Dispose();
}
return items;
}
You'll notice that in the YqlPost method I have a line that replaces certain
characters with spaces as these are known to mess up the YQL query - just like
"bad SQL". There may be additional ones to add, as I have only done
limited testing with this. Don't expect your YQL queries to come back with
keywords every time. For example, this url http://msdn.microsoft.com/en-us/library/system.text.aspx doesn't return anything for me. I guess Yahoo just isn't interested in .NET
namespaces.
You can view the complete documentation for the contentanalysis API here: http://developer.yahoo.com/search/content/V2/contentAnalysis.html
The downloadable Visual Studio 2010 solution has a class library containing the
two static methods above, along with a test harness ASP.NET web application providing
a form with a textarea for entered text, a textbox for an entered URL, and two
buttons - one to get the results and one to clear the controls. You can only
have either text in the textarea, or a url in the URL textbox - not both.
These are the keywords that Yahoo! returned from the text in this article:
YQL
query language
string query
MemoryStream
Yahoo!
Results are listed in the textarea for easy copying.
You can download the sample solution here.