Task Parallelism in C# 4.0 with System.Threading.Tasks

Use the new System.Threading.Tasks namespace to download multiple documents in parallel. In C# 4.0, Task parallelism is the lowest-level approach to parallelization with PFX (Parallel Framework). The classes for working at this level are defined in the System.Threading.Tasks namespace.

A task is a lightweight object for managing a parallelizable unit of work. A task avoids the overhead of starting a dedicated thread by using the CLR’s thread pool: this is the same thread pool used by ThreadPool.QueueUserWorkItem, but in CLR 4.0 it is enhanced to work more efficiently with Tasks.

Tasks can be used whenever you want to execute something in parallel; in CLR 3.0 / 3.5 we would have used the ThreadPool for this. However, the advantage of tasks is that they are tuned for leveraging multicores, unlike the ThreadPool by itself. In fact, the Parallel class and PLINQ, which I discussed briefly in this article, are internally built on the task parallelism constructs.

Tasks do more than just provide an easy and efficient way into the thread pool. They also provide powerful features for managing units of work. With Task, you can:

• Tune a task’s scheduling
• Establish a parent/child relationship when one task is started from another
• Implement cooperative cancellation
• Wait on a set of tasks—without a signaling construct
• Attach “continuation” task(s)
• Schedule a continuation based on multiple antecedent tasks
• Propagate exceptions to parents, continuations, and task consumers

Task.Factory.StartNew creates and starts a task in one step. You can also separate these operations by first instantiating a Task object, and then calling Start:

object state = s;
var task = Task.Factory.StartNew(() => DoSomeWork(state),TaskCreationOptions.LongRunning);

You can wait on multiple tasks at once—via the static methods Task.WaitAll and Task.WaitAny (waits for just one task to finish).

WaitAll is similar to calling Wait() on each task in turn, but is more efficient in that it requires, at most, just one context switch.

The following demo will use the Task class to do what we would previously have used the ThreadPool class to do. We will load a string array of MSDN forums "short names", and using an http url template, we'll download the XML for each forum's threads to an xml file. The code should be more or less self-explanatory, so I'll present it first:

using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading.Tasks;
using System.Configuration;


namespace TaskDemo
{
class Program
{
private static string ForumTemplate = "http://social.msdn.microsoft.com/Forums/en-US/{0}/threads?outputas=xml";

static void Main(string[] args)
{
string[] forums = File.ReadAllLines(Environment.CurrentDirectory + @"\forums.txt");
Task[] tasks = new Task[forums.Length ];
int ctr = 0;
DateTime Start = DateTime.Now;
foreach(string s in forums)
{
object state = s;
var task = Task.Factory.StartNew(() => DoSomeWork(state),TaskCreationOptions.LongRunning);
tasks[ctr] = task;
ctr++;
}

Task.WaitAll(tasks);
DateTime end = DateTime.Now;
TimeSpan elapsed = end - Start;
string totalMs = elapsed.TotalMilliseconds.ToString();
Console.WriteLine("DONE in " +totalMs + " ms. Any Key to quit.");
Console.ReadKey();
}


static void DoSomeWork(object state)
{
string forumShortName = (string) state;
string url = string.Format(ForumTemplate, forumShortName);
WebClient wc = new WebClient();
try
{
wc.DownloadFile(url, Environment.CurrentDirectory +@"\" + forumShortName + ".xml");
}
catch
{
// we probably timed out here so, nada!
}
finally
{
wc.Dispose();
}
Console.WriteLine("saved: " +forumShortName );
}
}
}

Note that after loading the file into a string array, we create an array of type Task of the same length. Then for each short forum name, we set the state object to the name, and call TaskFactory.StartNew() passing in a lambda expression with the method to call and the state parameter, and TaskCreationOptions.LongRunning. Each task is then added to the tasks array for use in the WaitAll method.

The LongRunning enum method overload tells the Task library to dedicate a thread to each task, regardless of the number of cores it can use. This is useful for long - running or blocking tasks (such as downloading an XML document here) because you avoid having the Task class only run 2 threads at a time on say, a 2 core CPU, which is not what we want here. On my box, which has 2 cores, all 442 MSDN forums downloaded in about 100 seconds. That's over 25MB of data. On a 4 core box, it would be even faster. On our eggheadcafe.com database server, which has sixteen cores, it would sing! It would be extremely difficult to get this kind of thread coordination and multi-core advantage using the "old style" ThreadPool.

There is much more to the Task Class; this short demo is designed to present a "bite size" chunk of information that should be easy to understand and that you can use as a base to build from. In case you were wondering, System.Threading.Tasks is not available for Silverlight applications.

Microsoft has given us the tools to get with Parallel. Take advantage of the hardware by learning how to do it right.

Additional Resources: Parallel Whitepaper by Stephen Toub

You can download the demo Visual Studio 2010 Solution here.



By Peter Bromberg   Popularity  (26240 Views)