In C# 4.0, Task parallelism is the lowest-level approach to parallelization with
PFX (Parallel Framework). The classes for working at this level are defined in
the System.Threading.Tasks namespace.
A task is a lightweight object for managing a parallelizable unit of work. A task
avoids the overhead of starting a dedicated thread by using the CLR’s thread
pool: this is the same thread pool used by ThreadPool.QueueUserWorkItem, but
in CLR 4.0 it is enhanced to work more efficiently with Tasks.
Tasks can be used whenever you want to execute something in parallel; in CLR 3.0
/ 3.5 we would have used the ThreadPool for this. However, the advantage of tasks
is that they are tuned for leveraging multicores, unlike the ThreadPool by itself.
In fact, the Parallel class and PLINQ, which I discussed briefly in this article, are internally built on the task parallelism constructs.
Tasks do more than just provide an easy and efficient way into the thread pool. They
also provide powerful features for managing units of work. With Task, you can:
• Tune a task’s scheduling
• Establish a parent/child relationship when one task is started from another
• Implement cooperative cancellation
• Wait on a set of tasks—without a signaling construct
• Attach “continuation” task(s)
• Schedule a continuation based on multiple antecedent tasks
• Propagate exceptions to parents, continuations, and task consumers
Task.Factory.StartNew creates and starts a task in one step. You can also separate
these operations by first instantiating a Task object, and then calling Start:
object state = s;
var task = Task.Factory.StartNew(() => DoSomeWork(state),TaskCreationOptions.LongRunning);
You can wait on multiple tasks at once—via the static methods Task.WaitAll and Task.WaitAny
(waits for just one task to finish).
WaitAll is similar to calling Wait() on each task in turn, but is more efficient
in that it requires, at most, just one context switch.
The following demo will use the Task class to do what we would previously have used
the ThreadPool class to do. We will load a string array of MSDN forums "short
names", and using an http url template, we'll download the XML for each
forum's threads to an xml file. The code should be more or less self-explanatory,
so I'll present it first:
using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Threading.Tasks;
using System.Configuration;
namespace TaskDemo
{
class Program
{
private static string ForumTemplate = "http://social.msdn.microsoft.com/Forums/en-US/{0}/threads?outputas=xml";
static void Main(string[] args)
{
string[] forums = File.ReadAllLines(Environment.CurrentDirectory + @"\forums.txt");
Task[] tasks = new Task[forums.Length ];
int ctr = 0;
DateTime Start = DateTime.Now;
foreach(string s in forums)
{
object state = s;
var task = Task.Factory.StartNew(() => DoSomeWork(state),TaskCreationOptions.LongRunning);
tasks[ctr] = task;
ctr++;
}
Task.WaitAll(tasks);
DateTime end = DateTime.Now;
TimeSpan elapsed = end - Start;
string totalMs = elapsed.TotalMilliseconds.ToString();
Console.WriteLine("DONE in " +totalMs + " ms. Any Key to quit.");
Console.ReadKey();
}
static void DoSomeWork(object state)
{
string forumShortName = (string) state;
string url = string.Format(ForumTemplate, forumShortName);
WebClient wc = new WebClient();
try
{
wc.DownloadFile(url, Environment.CurrentDirectory +@"\" + forumShortName + ".xml");
}
catch
{
// we probably timed out here so, nada!
}
finally
{
wc.Dispose();
}
Console.WriteLine("saved: " +forumShortName );
}
}
}
Note that after loading the file into a string array, we create an array of type
Task of the same length. Then for each short forum name, we set the state object
to the name, and call TaskFactory.StartNew() passing in a lambda expression with
the method to call and the state parameter, and TaskCreationOptions.LongRunning.
Each task is then added to the tasks array for use in the WaitAll method.
The LongRunning enum method overload tells the Task library to dedicate a thread
to each task, regardless of the number of cores it can use. This is useful for
long - running or blocking tasks (such as downloading an XML document here) because
you avoid having the Task class only run 2 threads at a time on say, a 2 core
CPU, which is not what we want here. On my box, which has 2 cores, all 442 MSDN
forums downloaded in about 100 seconds. That's over 25MB of data. On a 4 core
box, it would be even faster. On our eggheadcafe.com database server, which has
sixteen cores, it would sing! It would be extremely difficult to get this kind
of thread coordination and multi-core advantage using the "old style"
ThreadPool.
There is much more to the Task Class; this short demo is designed to present a "bite
size" chunk of information that should be easy to understand and that you
can use as a base to build from. In case you were wondering, System.Threading.Tasks
is not available for Silverlight applications.
Microsoft has given us the tools to get with Parallel. Take advantage of the hardware
by learning how to do it right.
Additional Resources: Parallel Whitepaper by Stephen Toub
You can download the demo Visual Studio 2010 Solution here.