"It is the mind that makes the body rich" -- Shakespeare
What's an iterator?
The iterator pattern is a common one, and in .NET it's encapsulated in the
IEnumerator interfaces, and their generic counterparts. (Most of the time I'll use the nongeneric form for descriptions and the generic form in sample code. Obviously the generic form is strongly typed where the nongeneric form isn't, but the only other difference is that the generic form extends
The basic idea is that as a data consumer, you can ask an
IEnumerable for an
IEnumerator with the
GetEnumerator() call, and then iterate over the contents of the
IEnumerator (using the
MoveNext() method and
Current property) until either you no longer need any more data, or the iterator runs out of information to return.
For example, a simple
IEnumerator<string> iterator = dataSource.GetEnumerator();
string data = iterator.Current;
// Use data within the body of the loop
The above code doesn't call
on the iterator, for simplicity: the code generated for a
loop uses a
construct to make sure that the iterator is disposed when we've finished with it. This is important when the iterator is implemented by magic of
statements which we'll see later, but it's not crucial to the basic pattern itself.
Why are there two interfaces involved?
You may well be asking yourself why
IEnumerable doesn't have
Current members itself - why do we need an extra interface? Well, imagine you had two different loops iterating over the same list. You don't want the loops to interfere with each other, so you need two independent representations of the idea of "how much of the list I've looked at so far". This concept is precisely what
IEnumerator is for. Typically it keeps a separate piece of information (such as the current index in the list) as well as referring to the collection itself. Although iterators are typically used for in-memory collections, they certainly don't have to be, as we'll see later. The key is the "ask for data, process it, ask for more data" cycle.
Iterator blocks (C# 2)
In C# 1, implementing iterators was a pain. You had to make your collection type implement
IEnumerable, then create a separate type with a reference to an instance of its "parent", usually adding some more state for how far you'd already gone. It was very, very easy to end up with subtle off-by-one errors, and generally developers didn't do it unless they really had to.
In C# 2, the compiler does all the hard work for you when you use iterator blocks to implement either
IEnumerator (or the generic forms). It builds a state machine for you, and the iterator obtained effectively executes the code within the iterator block, yielding values as it goes. It's easier to demonstrate than to explain - here's a complete program:
static IEnumerable<string> GetDemoEnumerable()
yield return "start";
for (int i=0; i < 5; i++)
yield return i.ToString();
yield return "end";
static void Main()
foreach (string x in GetDemoEnumerable())
The output of the program is simple:
I won't go into all the details of how iterator blocks work (see chapter 6 of C# in Depth for more information, or the language specification for a lot of detail) but it's important to understand the flow:
GetDemoEnumerable() creates a new instance of the extra type generated by the compiler. None of the source code we've written executes yet.
- The iterator executes code until it reaches a
yield statement. In our case, this happens immediately. The iterator remembers that the current item should be "start" and returns
true to indicate that there is data available.
Main uses the
Current property to retrieve the data, then prints it out.
- The iterator continues execution from the point it had previously reached - in other words, it goes to the line after the first
yield return. As before, it executes code (initialising the
i variable) until it reaches the next
- ... The pattern repeats, until there's a call to
MoveNext() which reaches the end of the method - at which point the call returns
false to indicate that there's no more data available.
There are three very important (related) points here: first, that none of our original source code is executed until the first call to
MoveNext(). Second, that subsequent calls to
MoveNext() effectively jump back into the source code at the point they left off before. Yielding a value is sort of like "pausing" the method. Third, the compiler makes sure that the state of the method (in terms of local variables) is preserved - even though the
i variable is local inside the iterator block, its value is kept inside the iterator so that next time
MoveNext() is called, it can still be used.
These features of iterator blocks make them completely different from other methods - it can be quite tricky to get your head round how they work, until it suddenly clicks. It can be quite useful to debug into an iterator block to really see how the flow works for yourself. Now we know about iterators and how they can be easily implemented in C# 2, let's think about how we can chain them together.
Data pipelines and composability
LINQ to Objects is built on the concept of a data pipeline. We start with a data source which implements
IEnumerable, and chain together lots of operations, each one acting on the result of the previous one and returning another
IEnumerable - although it could be one with a different type of result. This act of chaining small operations together to form one large operation is called
composition. The emphasis in composition is of making each of the small operations genuinely simple. Flexibility is provided in LINQ to Objects by the use of delegates which can act on the input elements - for example, to act as a predicate in a filter, indicating whether or not a particular value should pass the filter; or as a projection to map one value to another.
As a silly demonstration of this, suppose we have a sequence of numbers, in some random order. We want a result which is a sequence of strings - we want to filter out any negative numbers, sort the rest, and then call
ToString to get the numbers in hexadecimal format. In C# 3 and .NET 3.5, we can do this with the following query expression:
from number in numbers
where number >= 0
The parser translates this into a non-query expression before the compiler deals with it any further:
numbers.Where (number => number >= 0)
.OrderBy (number => number)
.Select (number => number.ToString("x"))
syntax is for lambda expressions
- a simple way of creating delegates (and also expression trees
methods are extension methods
provided by LINQ to Objects to do filtering, ordering and projections respectively. As you can tell, there are lots of different features interacting here, and I don't intend to go into the details of most of them - we're just concentrating on the behaviour of iterators in this article.
Deferred execution vs immediate execution
Just as when we created our own iterator using iterator blocks, the expression above doesn't execute any significant code when it's executed - it sets up the pipeline, but won't actually ask for any data. It only starts doing any actual filtering/ordering/projecting when the return value of
Select is first asked for some data. This is called deferred execution. Many LINQ to Objects query operators use deferred execution, especially when they're conceptually building more of a pipeline. Other query operators such as
ToList() use immediate execution - which means they perform their work immediately, and give you a result with all the appropriate data.
It's important to be aware of what's going on here so you know when operations will actually take place. For instance, if you specify a projection which might throw the exception, you need to know that the exception will only occur when the data is being requested: just because the call to
Select doesn't throw an exception doesn't mean everything is okay!
Streaming vs buffering
There's another concept which is similar to deferred/immediate execution, but subtly different - and that's whether an operator streams the data or buffers it. An operator which streams data basically only asks for as much data (from the previous part of the pipeline) as it it needs to give the next result.
Select() is the simplest example of this - it retrieves the next item, applies the specified delegate to project it to the result, and then yields that result.
Where() is slightly more complicated - it will keep asking for more data until either there is no more data, or until it finds an element which passes its filter. When it finds a passing element, however, it will yield that value without asking for any more data.
Buffering operators act differently. When they're asked for data, they buffer up all the data from the earlier piece of the pipeline, and then work out all the results.
Reverse() is the simplest example of this - in order to return all of the data in the reverse order, you've got to get to the last element before you can yield anything. Sorting and grouping also buffer data in LINQ to Objects.
LINQ to Objects contains a few generator operators - ones which produce a sequence of data without starting with one.
DefaultIfEmpty is listed as a generator, but in some ways it isn't - it acts on an existing sequence, albeit one which may be empty. We won't look at that here.
There are three other standard generator operators:
Range. We'll look at possible implementations of each of them, and then work out how we could implement all of them slightly differently using composition. Of course, in reality all of these are implemented for us already, but it gives nice practice for understanding iterator blocks and composition.
Empty does exactly what you'd expect it to - it returns an empty sequence, i.e. one where the first call to
false. Strangely enough, implementing it isn't quite as straightforward as we might expect. We don't need to return any values, so we can just leave the method body empty, right? Well, no... we actually need code like this:
public static IEnumerable<TResult> Empty<TResult> ()
statement is an "early out" in iterator blocks - it acts like the end of the method, making
. Why do we need it at all? Well, with an empty method body there's nothing to tell the compiler that we want an iterator block in the first place! The compiler only treats a method as being implemented with an iterator block if it encounters
within the method body. This is the only case I can think of where a
directly before the end of a method actually has an effect.
Repeat takes two parameters - an element to return, and the number of times to return it. It will yield the same element however many times you've specified, and then stop. It's trivial to implement (including checking that the count isn't negative):
public static IEnumerable<TResult> Repeat<TResult>(TResult element, int count)
if (count < 0)
throw new ArgumentOutOfRangeException("count");
for (int i=0; i < count; i++)
yield return element;
A note on error checking: there's a problem here. Ideally, we would like our
method to throw an exception immediately we call it, if
is less than 0. Currently, we'll only throw the exception when some code tries to iterate through the result. That's generally a bad thing, and it would be nice if the language gave us more support here
. For the moment though, we'll leave our slightly broken implementation as it is.
Range is also straightforward. Unlike the previous two operators, it's not generic: it always returns an
IEnumerable<int>. You specify two parameters: the starting point, and how many values to return. The returned sequence starts at the given value, and counts up - so
Range(10, 3) will yield 10, 11, 12. Here's an implementation, omitting error checking:
public static IEnumerable<int> Range(int start, int count)
for (int i=start; i < start+count; i++)
yield return i;
One generator to rule them all
The thing about the above generators is that they're all a bit specialized. In particular, they take no delegates, unlike almost everything else in LINQ to Objects! Let's introduce a new generator. It's so general, I reckon it deserves the name
Generate. It will produce an infinite sequence by taking a starting point, yielding that, and then applying a delegate to the current value, yielding the current value, and repeating ad infinitum. Here's the code (with no error checking - we'd normally check that
step wasn't null):
public static IEnumerable<TSource> Generate<TSource>(TSource start,
TSource current = start;
yield return current;
current = step(current);
The code is as simple as we'd hope - but of course, we'd never want to actually iterate over the result of
Generate directly - it would keep going forever. Indeed,
Generate is almost useless without composition. However, by the cunning use of just a single other operator, we can easily produce the three generators we've already looked at. Rather than show any infinite sequences here, I'll wait for a moment before actually using
Take is what I think of as a "pipeline" operator - it takes a sequence and returns a sequence, just like
Where do. As well as the "source" sequence, you need to specify how many elements of the source you want to be returned.
Take simply yields elements from the source until either the source runs out of data, or the requested number of elements have been returned. So, the implementation (minus error checking) could look like this:
public static IEnumerable<TSource> Take<TSource>(this IEnumerable<TSource> source,
foreach (TSource element in source)
yield return element;
Note the use of
here to indicate that we've finished when the counter reaches 0. (We could have used a simple
in this particular case, as there's no more code after the loop, but I thought it would be worth demonstrating a slightly more normal use of
than the one in our
implementation.) The use of
in the declaration of the
parameter indicates that
is an extension method.
Generate in our toolkit, we can now reimplement
Range using composition:
public static IEnumerable<TResult> Repeat<TResult>(TResult element, int count)
return Generate(element, x => x).Take(count);
public static IEnumerable<TResult> Empty<TResult>()
return Repeat(default(TResult), 0);
public static IEnumerable<int> Range(int start, int count)
return Generate(start, x => x+1).Take(count);
is just implemented by first creating an infinite sequence which begins with the specified element and then applies an identity function to it at each step. We then cut that infinite sequence to the required length by composing it with
sequence is simply the default value of the required type, repeated zero times. Finally
creates another infinite sequence - starting with the specified value, it just adds one repeatedly - and again we
the required number of elements.
It's not particularly useful to actually implement the three generator functions in this manner - but it does give an interesting insight into composition. The fact that
Generate always creates an infinite sequence forces you to think about the fact that the values are only generated as they're requested - if
Generate were to try to create all the values first, and then return them in one go, you'd clearly run out of memory quite quickly. I have a somewhat more complicated example, taken from a blog post which contains the full source. It's a query expression which is used to generate a visualisation of the Mandelbrot set:
var query = from row in Enumerable.Range(0, ImageHeight)
from col in Enumerable.Range(0, ImageWidth)
// Work out the initial complex value from the row and column
let c = new Complex((col * SampleWidth) / ImageWidth + OffsetX,
(row * SampleHeight) / ImageHeight + OffsetY)
// Work out the number of iterations
select Generate(c, x => x * x + c).TakeWhile(x => x.SquareLength < 4)
.Count() into count
// Map that to an appropriate byte value
select (byte)(count == MaxIterations ? 0 : (count % 255) + 1);
The use of
here is within the nested query used to find the colour to use for a particular point. We generate an infinite path of values which jump around the field of complex numbers, stopping when either the point has a modulus of 2 or more, or we've performed as many iterations as we're interested in. The full code is downloadable from the blog post, but I hope this gives some indication of how powerful the composition technique is.
LINQ to Objects is full of sequences, and a bunch of simple operators which combine to form myriad possibilities. In C# in Depth, I recommend that when you're faced with a complicated query expression, you look at the sequences from which it's composed. Understanding how iterators work, streaming/buffering and deferred/immediate execution gives important foundation material to being able to work with LINQ effectively. Composing queries is fun, and the iterator blocks which first appeared in C# 2 make iterators reasonably easy to work with. In this article I've given just a few examples, but I'm sure you can come up with your own. Go ahead, create your own operators - and above all, experiment.
For many of us, LINQ isn't just a new technology, but a new way of approaching problems when writing in C#. It always takes a while to really get to grips with fresh ideas, so I'd heartily recommend playing with LINQ before you need to use it professionally. While the other LINQ providers are perhaps "sexier" than LINQ to Objects, there's a purity, simplicity and immediacy involved when you can just define a collection in memory and manipulate it to your heart's content.
Likewise iterator blocks are a mystery to many developers. Unless you've already been thinking in sequences, you may well have never written an iterator block. Hopefully this article has shown you how simple they can be, along with a couple of warnings where you may have expected slightly different behaviour. Iterator blocks were already useful in C# 2, but when combined with LINQ they can be truly spectacular, all within tiny pieces of code. Again - experiment. It'll all be useful in the long run, and if you're anything like me, I promise you'll have a lot of fun along the way.