XML Data Compression / Decompression over the Wire

By Peter A. Bromberg, Ph.D.

Peter Bromberg  

Most developers who've spent any time working with XML, especially when it needs to be transmitted from client browser to IIS over the wire and back again, are painfully aware of a major limitation inherent in this area: an XML document can often be three times the size of just the raw data that it's supposed to be transmitting! As my buddy and co-site developer Robbe once said, "Why the hell would somebody want to send all that garbage over the wire?" Tongue-in-cheek, there ended up being more truth to his statement than I ever imagined. There are, of course, some sensible ways to cut down on this problem - using attributes instead of elements only, reducing the size of the tags to only a couple of characters, even to the point where we will only transmit the elements / nodes that actually contain data, "leaving out" the empty elements, and if necessary, "reconstruct" the full XML Document on the server using a template of sorts. And there are other neat tricks people have come up with.



But the bottom line is, if you are in a bandwidth-sparse situation where, for example you have people connecting to an application over 56K modems, the transmission of the XML data over the wire and back can prove to be a major bottleneck.

I looked into this problem and quite frankly, didn't find any solutions that seemed worth implementing. So I started playing around with some data compression algorithms. My first thought was to try to use Huffman or GZip in script, but I gave up pretty quickly. When you try to implement a compression algorithm in an interpreted scripting environment, you can pretty much bet that it's going to take even longer to get the data compressed that it would to just send it over the wire as - is.

So then I started looking into some components. I played with a few, got some promising results, and then I took a look at XCeed's Streaming Compression Library. This particular component offers about six different compression methods, is COM compliant (meaning it can be installed on the client and instantiated in a client - side VBscript or Javascript function with CreateObject (or new ActiveXObject in JS) and it seems to offer the most "Bang for the Buck". I am consistently getting compression ratios of 80% to 95% and more with XML documents. Compression of a 110K tag-heavy XML Document can take up to four or more seconds on the client side, so obviously there are some tradeoffs to measure. But decompression of the compressed document might only take 17/100ths second. They give you a 20 day free trial, and then the component is only about $150 or so and I believe it offers an unlimited royalty free distribution license, which is what I need.

So I set about to see what could be done with this. Long story short, they have just about ZERO documentation or examples using ASP in either VBScript or Javascript, and so I had no choice but to blaze my own path. They do have 2 VB samples, so at least I could borrow some code and comment out data type declarations and other stuff that wasn't VBScript compliant. (Correction 8/17/01 - the Xceed people liked my article and have informed me that they do have some ASP examples now) The real trick in working with in-memory compression components like this, however, is to understand that the compressed data is no longer a "string" - it's a byte array. So you'll need to get comfortable with using the multibyte variations of ASC, CHR, MID, LEN and other familiar VBScript intrinsic functions (e.g., the ones with a "W" or "B" at the end, like "LenB", "AscW" etc.) since VBScript cannot work directly with real binary data. Also, since you will be using XMLHTTP to transmit binary data (which by the way, it does very well) you'll need to process this data differently on the server in the receiving page.

The XMLHTTP send() method takes one parameter, which is the requestBody to use. The acceptable VARIANT input types are BSTR, SAFEARRAY of UI1 (unsigned bytes, which is what we are going to transmit here), IDispatch to an XML Document Object Model (DOM) object, and IStream *. The component automatically sets the Content-Length header for all but IStream * input types. You can read the Content-Length header in the receiving page, or you can access the information directly from the ASP Request object, as I'll show shortly.

After downloading and installing the XCeed Streaming Compression Library , you are ready to start compression / decompression of XML (- or any document, actually) in memory. Please bear in mind that the code I'm going to show you has been stripped down to the most generic usage, designed only to get the uninitiated up and running. The rest is up to you -- there are a lot of intricacies, timing issues and IIS - type issues you 'll need to study. But my initial results have been so promising, I wanted to distill them into this short article as my way of "giving back" to the developer community.

We will need two pages here. First we'll show XCeedSend.htm, the client side page. You'll see the code to be able to paste any document into a textarea, press a compress / send button, have it compressed and sent over the wire via XMLHttp, and you'll see the original size, the compressed size, and the estimated compression ratio for the particular case.

The second page, XCeedReceive.asp, is the server-side listener page. This retrieves the compressed binary data in the Request body, uses the XCeed library again to Uncompress it, and streams it back uncompressed to the XMLHTTP.responseText property for display in the original page as "Proof of Concept"

First, lets browse through the code for the client-side page:

<HTML>
<HEAD><TITLE>XML Compresssion Test</TITLE>
<script language="VBScript">
Dim uncompsz
DIm compsz
Dim elapsed
dim starttime
Function cmdCompress( sTextToCompress)
starttimer
Dim xCompressor
Set xCompressor = CreateObject("Xceed.streamingcompression.1")
'xyz =xCompressor.License("License number is inserted here")
Dim I
Dim lTextLen
Dim lErrorNumber
With xCompressor
sTextToCompress = txtTextToCompress.value
uncompsz= len(sTextToCompress)
lblUncompressedsize.innerText =uncompsz
lTextLen = Len(sTextToCompress)
On Error Resume Next
m_vaCompressed = .Compress(sTextToCompress, True)
lErrorNumber = Err.Number
If lErrorNumber <> 0 Then
cmdCompress= "Error during compress." & vbCrLf & Err.Description & " (" & Hex(Err.Number) & ")"
exit function
End If
On Error GoTo 0
If lErrorNumber = 0 Then
If IsEmpty(m_vaCompressed) Then
cmdCompress="no output"
exit function
End If
end if
End With
Set xCompressor = Nothing
compsz =lenB(m_vaCompressed)
lblCompressedSize.innerText=compsz
cmdCompress=m_vaCompressed
endTime
End function

Sub DoCompress ()
sData = cmdCompress(txtTextToCompress.innerText)
txtTextToCompress.innerText =sData
compsz=len(sData)
lblCompressedSize.innerText =compsz

if err <> 0 Then status.innerText = err.description
Dim xmlHttp
set xmlHttp = createObject("MSXML2.XMLHTTP")
xmlHttp.Open "POST", "http://localhost/xceedReceive.asp", false
xmlHttp.Send sData
ratio.innerText = (uncompsz - compsz)/uncompsz & " Percent."
rText.innerHTML = "<XMP>" & xmlHttp.ResponseText & "</XMP>"
set xmlhttp = Nothing
end sub
Function starttimer()
starttime = timer
End function
Sub endtime()
elapsed = timer - starttime
divelapsed.innerText=elapsed
end sub
</script>
</HEAD>
<BODY>
<CENTER><h3>XML COMPRESSION TEST</h3></CENTER>
<Textarea id=txtTextToCompress ROWS=20 COLS=100></textarea>
<BR><input type=button value ="compress and send" onclick = "DoCompress()">
<BR>
Uncompressed:<div id=lblUncompressedSize></div><BR>
Compressed:<div id=lblCompressedSize></div><BR>
Compression Ratio: <div id=ratio></div><BR>
Elapsed Time:<div id=divelapsed></div><BR>
<div id=status></div>
<CENTER>Return Document after decompression at server:</CENTER>
<HR>
<div id=rText></div>
</BODY>
</HTML>

Ok, let's trace what happens here. First, we render a page with a large textarea and a button that wired to the "DoCompress()" method. We also create a few DIV tags to hold Uncompressed and Compressed document sizes, the ratio, elapsed time, and any status info we want to display. We also have a final div "rText" to display the returned document after it's been decompressed by the listener page on the server.

When we paste an XML Document into the Textarea and press the button, "DoCompress" runs "CmdCompress" with the value of the textarea as a parameter. CmdCompress instantiates the XCeed library, does some other housekeeping, and then calls the compress method on the document:

m_vaCompressed = .Compress(sTextToCompress, True)

The length in bytes of the compressed document ("sData") is obtained, it's displayed to the user back in the same textarea (not that it's going to be of much use to look at the browser's rendition of a bytearray) and then we immediately SEND it to the server:

Dim xmlHttp
set xmlHttp = createObject("MSXML2.XMLHTTP")
xmlHttp.Open "POST", "http://localhost/xceedReceive.asp", false
xmlHttp.Send sData

Now we switch gears and hop over to the server side to see what's happening:

<%
SData=Request.BinaryRead(Request.TotalBytes)
finaldata = cmdDecompress(Sdata)
Response.write finaldata

Private Function cmdDecompress( stringToDecompress)
Dim xCompressor
Dim I
Dim sDecompressedText
Dim lErrorNumber
Set xCompressor = server.CreateObject("Xceed.streamingcompression.1")
'xyz =xCompressor.License("License number is inserted here")
With xCompressor
vaDecompressed = .Decompress(stringToDecompress, True)
lErrorNumber = Err.Number
If lErrorNumber <> 0 Then
cmdDecompress= "Error during compress." & vbCrLf & Err.Description & " (" & Hex(Err.Number) & ")"
exit function
End If
On Error GoTo 0
If lErrorNumber = 0 Then
If Not IsEmpty(vaDecompressed) Then
cmdDecompress = vaDecompressed
End If
End If
End With
Set xCompressor = Nothing
End Function

Again, what happens is:

1. We get the length of the binary data from Request.TotalBytes (we could also read the Content-length header instead with Request.ServerVariables("HTTP_Content_Length") ).
2. We Read the binary data with Request.BinaryRead
3. we send the byte array to the cmdDecompress function, which does the same thing with the Xceed component as on the client except it calls the .Decompress method.
4. As a "proof of concept" we simply Response.write out the finaldata (the decompressed document, which should be identical to the one we originally pasted into the textarea) and send it right back to the client page, which is still sitting there loaded.

Finally, back on the client page, we access the XMLHTTP.responseText propery to get the data that was sent back to us, and redisplay it in the rText div at the bottom of the page, with "<XMP>" example rendering tags around it so you can see the literal content.

You now have a greatly simplified, but nevertheless quite functional basis for a very efficient and powerful XML Data compression CODEC for over-the-wire data transmission. I'd be very interested in hearing from other developers who have made inroads in this area. Email me or post to our XML forum here on Eggheadcafe.com with whatever you have to share.

NOTE: Since the original write of this article, I've created a lightweight COM wrapper component for the powerful Zlib C Library called PABZlib. This provides, among others, a "Combined" method called CompressSendReceiveDecompress that handles the XMLHTTP or ServerXMLHTTP send and receive early-bound, all within the same method. We've ramped this component all the way up to 120 requests per second using Application Center Test. The component is avaialable for sale or a trial download HERE.

 

 

download the code that accompanies this article

 

Peter Bromberg is an independent consultant specializing in distributed .NET solutionsa Senior Programmer /Analyst at in Orlando and a co-developer of the NullSkull.com developer website. He can be reached at info@eggheadcafe.com