In-Memory Data Compression in .NET [C#: Beta 2] PART I

By Peter A. Bromberg, Ph.D.

Peter Bromberg  

With the proliferation of XML going back and forth over the wire, we have many advantages as .NET developers. SOAP standards, the ability to make remote procedure calls over HTTP through firewalls, the simplification of and seamless validation of B2B and B2C interchanged data via XML and Schema, ADO.NET with the ability to reference XMLData in either ADO DataSet or XMLDataDocument format, and much more.



We also have some new problems, not the least of which is that our data encapsulated and transmitted as a tag-dense XML document stream can now be up to three times the size of the original data itself. This poses bandwidth considerations and other problems, particularly in a multi-user or low - bandwidth (e.g. 56K modem) type scenario.

One workable solution to this issue is to use a lossless compression algorithm to compress the data and decompress it at the receiving end prior to processing (and of course, the reverse). I've worked on a number of COM - based solutions using Zlib and have written my own VB COM Zlib wrapper component, PABZlib, available here at eggheadcafe.com, as well as having used the XCeed Streaming Compression Library. My solutions have primarily involved downloading and installing the component in a CAB file via the standard <OBJECT> tag arrangement on to the client's browser, thereby enabling client - side Javascript code to both decompress and compress data for streaming over the wire. The server of course operates its own identical component, and therefore we can have seamless, transparent transfer of large XML streams in a compressed byte array state, which can achieve average compression ratios in the 80 to 85% range. That's a significant bandwidth and transmission time savings compared to the mere milliseconds it takes to compress or decompress these streams prior to normal processing. In addition, we can embed these byte arrays containing our compressed data right into an XML element using the construct:

oElement = xmlDoc.createElement("CompressedData");
oRoot.appendChild(oElement);
oElement.dataType = "bin.base64";
// (this can also be bin.hex, which is somewhat smaller)
oElement.nodeTypedValue = bytCompressedData;

We can certainly use compression in .NET as well, but we want to try and avoid the performance hit of COM Interop. Xceed has released a BETA of their ZIP compression library for .NET and expects to release a version of their Streaming Compression for .NET in early 2002. But if you want to start using compression in .NET right now, I've got something even better :>).

Mike Krüger of icSharpcode.net has released NZipLib, a free open-source port of the Zlib Zip/GZip library written entirely in C# for the .NET platform. Mike says, "I've ported the zip library over to C# because I needed gzip/zip compression and I didn't want to use libzip.dll or something like this. I want all in pure C#."   Actually, I think Mike is a very humble guy - he claims all he did was "translate" the ZipLib from its Java version. However, if you look at the professionalism of his code and documentation, you'll see that he has done much more.

The big advantage of using a library like this to handle your streaming data compression / decompression is not just that the "price is right" - but that it's a direct port of the ZLib RFC's, and therefore you can count on the fact that if there's an RFC compliant ZLib engine on the other end - whether it be COM or Java based or any other platform - it should be able to handle your compressed data perfectly well, as the compression headers and so on should be identical. Talk about "standards"!

Let's get into some code. We'll look at my C# Winform project that uses Mike's NZipLib .NET assembly with the DeflaterOutputStream class to grab the text or XML you paste into one TextBox, compress it and then display a representation of the compressed byte array in a second TextBox. Then, when you press the Decompress button, we'll decompress it, again using the NZlipLib InflaterInputStream class to decompress the data and redisplay it in its uncompressed state back in the original textbox. The code and Compress / Decompress functions shown here should be sufficient for most C# developers to get started using streaming data compression in just about any program. Time permitting, I plan a sequel illustrating the use of the library with SOAP extensions to intercept and decompress a compressed SOAP message in a .NET Webservice. This is not a complex process, and as you'll see from my code, I've included two simple functions, Compress and DeCompress, that can easily be used in virtually anything from a Winform app to an ASP.NET page to a WebService.

Here's the Form1.cs page, which is 95% of the project. Apporoximately the first 2/3rds of the listing is Winform Designer - related code created by the IDE, so you can skip over that down to the major functions which begin highlighted in red. The entire Beta 2 C# Winform project is in the Zip file which you can download from the link at the bottom of this article:

using System.Data;
using System.IO;
using NZlib.Streams;
using System.Text;
namespace NZIPLIBFORM
{
/// <summary>
/// Summary description for Form1.
/// </summary>
public class Form1 : System.Windows.Forms.Form

{
public byte[]crunchedData =null;
private System.Windows.Forms.TextBox textBox1;
private System.Windows.Forms.TextBox textBox2;
private System.Windows.Forms.Button button1;
private System.Windows.Forms.Button button2;
private System.Windows.Forms.Button button3;
private System.Windows.Forms.Button button4;
/// <summary>
/// Required designer variable.
/// </summary>
private System.ComponentModel.Container components = null;

public Form1()
{
//
// Required for Windows Form Designer support
//
InitializeComponent();

//
// TODO: Add any constructor code after InitializeComponent call
//
}

/// <summary>
/// Clean up any resources being used.
/// </summary>
protected override void Dispose( bool disposing )
{
if( disposing )
{
if (components != null)
{
components.Dispose();
}
}
base.Dispose( disposing );
}

#region Windows Form Designer generated code
/// <summary>
/// Required method for Designer support - do not modify
/// the contents of this method with the code editor.
/// </summary>
private void InitializeComponent()
{
this.textBox2 = new System.Windows.Forms.TextBox();
this.textBox1 = new System.Windows.Forms.TextBox();
this.button1 = new System.Windows.Forms.Button();
this.button2 = new System.Windows.Forms.Button();
this.button3 = new System.Windows.Forms.Button();
this.button4 = new System.Windows.Forms.Button();
this.SuspendLayout();
//
// textBox2
//
this.textBox2.Location = new System.Drawing.Point(24, 160);
this.textBox2.MaxLength = 327670;
this.textBox2.Multiline = true;
this.textBox2.Name = "textBox2";
this.textBox2.Size = new System.Drawing.Size(512, 128);
this.textBox2.TabIndex = 1;
this.textBox2.Text = "";
//
// textBox1
//
this.textBox1.Location = new System.Drawing.Point(24, 8);
this.textBox1.MaxLength = 327670;
this.textBox1.Multiline = true;
this.textBox1.Name = "textBox1";
this.textBox1.Size = new System.Drawing.Size(512, 144);
this.textBox1.TabIndex = 0;
this.textBox1.Text = "";
//
// button1
//
this.button1.Location = new System.Drawing.Point(32, 296);
this.button1.Name = "button1";
this.button1.Size = new System.Drawing.Size(80, 24);
this.button1.TabIndex = 2;
this.button1.Text = "Compress";
this.button1.Click += new System.EventHandler(this.button1_Click);
//
// button2
//
this.button2.Location = new System.Drawing.Point(120, 296);
this.button2.Name = "button2";
this.button2.Size = new System.Drawing.Size(88, 24);
this.button2.TabIndex = 3;
this.button2.Text = "Decompress";
this.button2.Click += new System.EventHandler(this.button2_Click);
//
// button3
//
this.button3.Location = new System.Drawing.Point(216, 296);
this.button3.Name = "button3";
this.button3.Size = new System.Drawing.Size(88, 24);
this.button3.TabIndex = 4;
this.button3.Text = "Clear";
this.button3.Click += new System.EventHandler(this.button3_Click);
//
// button4
//
this.button4.Location = new System.Drawing.Point(312, 296);
this.button4.Name = "button4";
this.button4.Size = new System.Drawing.Size(80, 24);
this.button4.TabIndex = 5;
this.button4.Text = "Quit";
this.button4.Click += new System.EventHandler(this.button4_Click);
//
// Form1
//
this.AutoScaleBaseSize = new System.Drawing.Size(5, 13);
this.ClientSize = new System.Drawing.Size(568, 349);
this.Controls.AddRange(new System.Windows.Forms.Control[] {
this.button4,
this.button3,
this.button2,
this.button1,
this.textBox2,
this.textBox1});
this.HelpButton = true;
this.Name = "Form1";
this.StartPosition = System.Windows.Forms.FormStartPosition.CenterScreen;
this.Text = "NZipLib CODEC Test Harness";
this.ResumeLayout(false);

}
#endregion

/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main()
{
Application.Run(new Form1());
}
// Compress Method - uses byte array compressedData which is public defined above

private byte[] Compress(string strInput)
{
try
{
byte[] bytData = System.Text.Encoding.UTF8.GetBytes(strInput);
MemoryStream ms = new MemoryStream();
Stream s = new DeflaterOutputStream(ms);
s.Write(bytData, 0, bytData.Length);
s.Close();
byte[] compressedData = (byte[])ms.ToArray();
// show the user what's going to happen---
MessageBox.Show("Original: " +bytData.Length.ToString()+": " +"Compressed: " +compressedData.Length.ToString());
return compressedData;
}
catch(Exception e)
{
MessageBox.Show( e.ToString());
return null;
}
}

// Decompress Method accepts byte array and returns string
private string DeCompress(byte[] bytInput)
{
string strResult="";
int totalLength = 0;
byte[] writeData = new byte[4096];
Stream s2 = new InflaterInputStream(new MemoryStream(bytInput));

try
{
while (true)
{
int size = s2.Read(writeData, 0, writeData.Length);
if (size > 0)
{
totalLength += size;
strResult+=System.Text.Encoding.ASCII.GetString(writeData, 0,
size);
}
else
{
break;
}
}
s2.Close();
return strResult;
}
catch(Exception e)
{
MessageBox.Show(e.ToString());
return null;
}

}

// Various button handlers follow--
private void button1_Click(object sender, System.EventArgs e)
{
//handler for COMPRESS button
crunchedData=Compress(textBox1.Text);
// convert crunchedData byte Array to string, display
textBox2.Text= System.Text.Encoding.ASCII.GetString(crunchedData);
}

private void button2_Click(object sender, System.EventArgs e)
{
//handler for Decompress Button
textBox1.Text=DeCompress(crunchedData);
}

private void button3_Click(object sender, System.EventArgs e)
{
textBox1.Clear();
textBox2.Clear();
}

private void button4_Click(object sender, System.EventArgs e)
{
Application.Exit();

}
}
}

Aside from the IDE Winform - related code, the actual "Meat" of the process is really in the two functions I've color-coded in red - Compress and DeCompress. The code above will seamlessly compress and decompress text strings of any size, and uses the default deflater. On most XML Documents, this will provide compression ratios on the order of 70 to 80 percent. Another side benefit of compression is of course that your compressed XML as a byte stream is pretty well encrypted from prying eyes until you decide to decompress it. And there you have it - seamless, "standards - based" data compression in .NET! Thanks to Mike Krüger for an excellent job on this library. For a follow up on this with ASP.NET, see PART II!

Download the code that accompanies this article

 

Peter Bromberg is an independent consultant specializing in distributed .NET solutions in Orlando and a co-developer of the NullSkull.com developer website. He can be reached at info@eggheadcafe.com