Buffer.BlockCopy vs MemoryStream for
Concatenating Byte Arrays: Speed Tests.

by Peter A. Bromberg, Ph.D.

Peter Bromberg

"The boldness of asking deep questions may require unforeseen flexibility if we are to accept the answers." --Brian Greene

Recently a co-worker and I had somebody mention they thought using a MemoryStream to concatenate arrays of bytes was faster than BlockCopy. We looked at each other, and the first thing out of my mouth was, "Let's make a test". So I did. The answers may surprise you.



First, we need to go over what BlockCopy is because not many programmers are aware it exists for our use:

The Buffer Class, which resided in the System namepace, is for manipulating arrays of primitive types. Buffer is applicable to the following primitive types: Boolean, Char, SByte, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, IntPtr, UIntPtr, Single, and Double.

BlockCopy copies a specified number of bytes from a source array starting at a particular offset to a destination array starting at a particular offset. Very simple. So if you need to concatenate two byte arrays, all you need to do is create a target byte array of the correct size, and call Buffer.BlockCopy twice. BlockCopy also offers GetByte and SetByte methods.

With a MemoryStream, all writes to the stream automatically append the bytes written to the stream. There are a number of overloads, but the most important concept is that the stream will automatically resize to accomodate additional data written to it. However, this can only occur when there is no initial byte array used in the constructor. In other words, using the overloads: new MemoryStream(), or new MemoryStream(400000). It also provides a handy ToArray method that pulls out all of the contents into a byte array.

So here is the code for my little test. Each test was repeated 10000 times to get an overall timing perspective on performance:

using System;
using System.IO;
namespace BlockCopyTests
{ 
 class TestSuite
 { 
  [STAThread]
  static void Main(string[] args)
  {
   long  endTicks;
   long elapsedTicks;
   TimeSpan elapsed;
   Console.WriteLine ("test one: blockcopy.");
   long startTicks= DateTime.Now.Ticks ;
   byte[] b1 = new byte[200000];
   byte[] b2 = new byte[200000];
   byte[] b2a = new byte[200000];
   byte[] b3 = new byte[400000];  
   for (int i = 0;i<10000;i++)
   {
    Buffer.BlockCopy(b1,0,b3,0,200000);
    Buffer.BlockCopy(b2,0,b3,200000,200000);
   }
   endTicks=DateTime.Now.Ticks;
   elapsedTicks=endTicks-startTicks;
   elapsed =  new TimeSpan(elapsedTicks);    
   Console.WriteLine (string.Concat("took " , 
    elapsed.TotalSeconds.ToString()," seconds."));
            Console.WriteLine ("test two: MemoryStream.");   
   startTicks=DateTime.Now.Ticks;
   MemoryStream ms = new MemoryStream(400000);
   for (int j=0;j<10000;j++)
   {   
      ms.Position =0;
   ms.Write(b1,0,200000);
   ms.Write(b2,0,200000);
    // if leave next line out (keep as memorystream) improves performance
    //b3=ms.ToArray();    
   }
   endTicks= DateTime.Now.Ticks;
   elapsed=new TimeSpan(endTicks-startTicks);    
   Console.WriteLine (string.Concat("took " , 
    elapsed.TotalSeconds.ToString(), " seconds."));
   Console.WriteLine ("test two A: MemoryStream with ToArray().");   
   startTicks=DateTime.Now.Ticks;
   ms = new MemoryStream(400000);
   for (int j=0;j<10000;j++)
   {   
    ms.Position =0;
    ms.Write(b1,0,200000);
    ms.Write(b2,0,200000);
    // if leave next line out (keep as memorystream) improves performance
    b3=ms.ToArray();    
   }
   endTicks= DateTime.Now.Ticks;
   elapsed=new TimeSpan(endTicks-startTicks);    
   Console.WriteLine (string.Concat("took " , 
    elapsed.TotalSeconds.ToString(), " seconds."));
   Console.WriteLine ("test three: MemoryStream: not allocated.");   
   startTicks=DateTime.Now.Ticks;   
    MemoryStream ms2 = new MemoryStream(); 
   for (int k=0;k<10000;k++)
   {  
    ms2.Position =0;   
     ms2.Write(b1,0,200000);   
    ms2.Write(b2,0,200000);  
   // if leave next line out (keep as a memorystream) improves performance   
   //b3=ms2.ToArray();
   }

   endTicks= DateTime.Now.Ticks;
   elapsed=new TimeSpan(endTicks-startTicks);    
   Console.WriteLine (string.Concat("took " , elapsed.TotalSeconds.ToString(), 
                " seconds."));
   Console.WriteLine ("Test three A: MemoryStream: not allocated, with ToArray().");   
   startTicks=DateTime.Now.Ticks;   
   ms2 = new MemoryStream(); 
   for (int k=0;k<10000;k++)
   {  
    ms2.Position =0;   
    ms2.Write(b1,0,200000);   
    ms2.Write(b2,0,200000);  
    // if leave next line out (keep as a memorystream) improves performance
    // System.Diagnostics.Debug.WriteLine(ms.Length.ToString());
    b3=ms2.ToArray();
   }
   endTicks= DateTime.Now.Ticks;
   elapsed=new TimeSpan(endTicks-startTicks);    
   Console.WriteLine (string.Concat("took " , elapsed.TotalSeconds.ToString(), 
    " seconds."));
   Console.WriteLine("press any key to quit.");
   Console.ReadLine();
  }
 }
}

And now for the results:

 

Note that the first test, which creates a third byte array of 400,000 bytes and uses BlockCopy to copy the two original byte arrays in, took 7.84 seconds.

The second test, which uses a MemoryStream created initially to the correct size, and then writes the two byte arrays in, took less time than BlockCopy. (Note that in the above, if we do not need to get the final bytes back out, but can leave them in the MemoryStream for subsequent work, the time is reduced).

Test "two A" shows the same test but with a call to ToArray each time, to get the bytes out of the stream. Overhead for this is considerable - nearly 19 seconds total.

The final test, test "three" is where we use the overloaded ctor of the MemoryStream with no initial size allocation, and this one took 7.41 seconds. However, if as in test two, we add the need to call ToArray , the time is increased to about 19.24 seconds.

 

Conclusions:

If you need to concatenate byte arrays, BlockCopy offers the best overall performance. However, in certain circumstances, especially if you can leave the concatenated arrays inside the stream, the MemoryStream can actually offer slightly better performance.

In closing, it is important to understand how the MemoryStream class works. If you do not use the overloaded constructor where an initial size is set, but rather use the overload where your first byte array is passed in as a parameter, all subsequent writes to the MemoryStream will overwrite the initial byte array, and the stream cannot be resized. It is only when one creates an initial capacity or uses the no-parameter ctor on the MemoryStream that we can do this "automatic concatenation" with writes to the stream, with automatic resizing of the object.

When performance is critical, it always pays to take the time to "do the math".


Peter Bromberg is a C# MVP, MCP, and .NET consultant who has worked in the banking and financial industry for 20 years. He has architected and developed web - based corporate distributed application solutions since 1995, and focuses exclusively on the .NET Platform.
Article Discussion: