Topsy recently indexed its 5 billionth Tweet, so it is giving developers access to
a lot of data. The Otter API is a REST-based interface to the Topsy Search Engine.
Topsy, which claims to be the largest searchable index of content posted on Twitter,
is driven by an architecture that spans a cluster of 500 servers and a petabyte
of data. It ranks links, photos and tweets by the number and quality of the tweets.
The Otter API makes available to developers some fairly interesting data that has
been mined by Topsy. You can find users who have mentioned a term, or look for
experts on a particular topic. Topsy’s "author influence" is used to
sorts results, which I believe makes them more valid.
The entire set of Otter API Resources provides a developer with some very interesting
indicia to search for. For example, if I wanted to find out the experts on MongoDB,
I can use the /experts resource of the Otter API. There’s no API Key required,
so you can dive right in. Here’s the Mongo DB example: http://otter.topsy.com/experts.json?query=mongodb A
nice way to find people to follow.
The Otter API uses a credit allocation system to ensure fair distribution of capacity.
Each IP is allocated 10,000 credits per hour.
The typical API call deducts 1 credit from your allocation. Search based API calls
(/search, /searchcount, /authorsearch, /profilesearch) have a significantly higher
computational cost on the backend, and deduct 10 credits per call.
No developer key or registration is required. You can read the documentation here:
http://code.google.com/p/otterapi/wiki/Resources
I needed some practice deserializing and working with C# types from JSON, so I figured
that working out a usable C# library to hit the major Otter API methods would
be useful, both now and in the future. I've included only what I consider the
most important methods:
AuthorInfo - returns Topsy's Custom Author description with their "Author Influence"
property.
Experts - provide a subject phrase like "mongodb norm" or "wcf" and
get a list of experts, ranked by influence and frequency of posts.
LinkPosts - provide a Twitter username and you get back a list of link Tweets by that author.
Related - List of related URLs. This list is derived by tracking other URLs that are mentioned
in the same tweet as the query URL.
Search - List of results for a query.
Search (site) - List of results for a query using the site: modifier.
Search (user) - List of results for a query using the from: modifier.
Trackbacks - List of tweets that mention the query URL, most recent first. Also accepts a "contains",
and an "influential only" modifier.
Trending - List of trending terms.
My approach is very simple: I have a class called "Otter" with a series
of static methods named GetXXX (where XXX is the OtterApi operation name). Each
method uses JSON.NET to parse only the part of the returned JSON string that
I want to work with, via it's JObject.Parse(string) method.
This is extremely useful as opposed to the Javascript Serializer or the DataContractSerializer
classes, which do not provide any of these helper methods. So for example, if
we get back a JSON string like the following:
{
"request": {
"parameters": {
"window": "d",
"q": "bernanke",
"type": "cited"
},
"response_type": "json",
"resource": "search",
"url": "http://otter.topsy.com/search.json?q=bernanke&type=cited&window=d"
},
"response": {
"window": "d",
"page": 1,
"total": 91,
"perpage": 10,
"last_offset": 10,
"hidden": 0,
"list": [
{
"trackback_permalink": "http://twitter.com/paceset9999/status/8590644708638721",
"trackback_author_url": "http://twitter.com/paceset9999",
"content": "RT @jetts424: Bernanke Rolling the Dice: America's
Financial Dilemma http://tinyurl.com/39px96f",
"trackback_date": 1290883143,
"topsy_author_img": "http://a3.twimg.com/profile_images/1129281907/vet01_normal.jpg",
"hits": 10,
"topsy_trackback_url": "http://topsy.com/www.marketoracle.co.uk/Article24599.html?utm_source=otter",
"firstpost_date": 1290882191,
"url": "http://www.marketoracle.co.uk/Article24599.html",
"trackback_author_nick": "paceset9999",
"highlight": "RT @jetts424: <span class=\"highlight-term\">Bernanke</span>
Rolling the Dice: America's Financial Dilemma http://tinyurl.com/39px96f
",
"topsy_author_url": "http://topsy.com/twitter/paceset9999?utm_source=otter",
"mytype": "link",
"score": 12.836,
"trackback_total": 8,
"title": "Bernanke Rolling the Dice: America's Financial Dilemma
:: The Market Oracle :: Financial Markets Analysis & Forecasting Free Website"
},
{
"trackback_permalink": "http://twitter.com/dvolatility/status/8725998342242304",
. . . .
I would really only be interested in the "response" portion. And
of that, probably only it's "list" portion. To do that, I could use
the following code:
JObject stuff = JObject.Parse(json);
var resp = JsonConvert.DeserializeObject<list>(stuff["response"]["list"].ToString());
This helps keep processing overhead down.
Many of these API methods also accept a "page" parameter, so I've
included that option as well. Each JSON object is modeled with a strongly typed
C# class. For example:
namespace OtterApi
{
public class Trackback
{
public int page { get; set; }
public int total { get; set; }
public int perpage { get; set; }
public List<TrackbackList> list {get;set; }
}
public class TrackbackList
{
public string permalink_url { get; set; }
public string date { get; set; }
public string content { get; set; }
public string type { get; set; }
public AuthorInfo author { get; set; }
public string date_alpha { get; set; }
}
}
You can then take these various results and databind or display them. In my simplified
WinForms "test harness" I provide a GridView to which the various results
are databound.
NOTE: Where present in the documentation, you can also add the "&perpage=100" to the url. I've tried it with 100 but no
more than that.
I hope the Otter API and the solution that accompanies this article are useful to
you. I'm already coming up with a few interesting ideas for using it. You can
download the C# Visual Studio 2010 Solution here.