REVIEW: PreEmptive Solutions Dotfuscator PE

By Peter A. Bromberg, Ph.D.
Printer - Friendly Version

Peter Bromberg
The hot topic in the .NET space is protecting code. Preemptive knows how to do that – they were doing it long before .NET was public. Unlike Java, .NET will have only a few important VMs, thereby allowing them to do better code
protection and optimization.

PreEmptive Solution's Dotfuscator takes a reverse path to the bytecode optimizers that Preemptive has engineered in the past- it starts with obfuscation as a goal, not optimization.

Dotfuscator includes identifier renaming, control-flow obfuscation, string encryption, and unused type/method/field removal.



Obfuscation is the technology of shrouding the facts. It's not encryption, but in the context of .NET (or java) code, it might be better. Early in Java's life, several companies produced encrypting class loaders to fully encrypt java classes. Decryption was done Just-in-time prior to execution. Although this made classes completely unreadable, this methodology suffered from a classic encryption flaw, it needed to keep the decryption-key with the encrypted data. Therefore, an automated utility could be created to decrypt the code and put it out to disk. Once that happens the fully unencrypted, un obfuscated code is in plain view.

Without argument, obfuscation (or even encryption) is not 100 percent protection. Even compiled C is disassembleable. If a hacker is perseverant enough, they can find the meaning of your code. The goal of obfuscation is certainly to knock out the 90 percent of hackers who aren't willing to go the extra step. From there, the returns on investment dissipate. It now takes exponentially more effort to thwart progressively fewer decompilers. Although we were told decompilers would rename unprintable identifier names (to something readable), in truth, not too many did. Far fewer ever even attempted to untangle control flow. The goal became to stop all casual hackers and as many serious hackers as possible up until some level of return-on-investment. That level varied between customer requirements and new decompiler releases.

Identifier Renaming

As expected of any good obfuscator, Dotfuscator renames all program identifiers to small meaningless names. Instead of clever names, Preemptive invented and patented an algorithm called "overload induction" that has been in use in its JAVA DashO product since its inception. Overload induction works by identifying colliding sets of methods across inheritance hierarchies and renaming such sets according to some enumeration (i.e. the alphabet or something). Because separate colliding sets are identified and the enumeration starts at the beginning each time, method overloading is induced on a grand scale.

This effect is far stronger than normal one-to-one renaming for several reasons. First, overload induction raises the 90% casual hacker number. It takes more work to undo overload induction than not, so fewer people will go to the trouble. It does protect more programs.
Secondly, in order to undo Overload induction effectively, a decompiler needs to implement overload induction themselves to undo it. That's a lot of work and "college-kid" free decompilers rarely stay motivated to get that far. Alternatively, a simpler renaming scheme can be taken to undo it to a lesser degree.


Regardless of the tack taken, overload induction is provably irreversible. The best overload induction undo-er will come out with a different number of unique methods than the original source code contained. It cannot be undone all the way because overload induction destroys original overloading relationships. In undone overload induction, there will be no overloaded methods. If we assume the grand designers of OO technology implemented overloaded methods as a way of creating "more readable code", then by virtue of removing that ability, the code has less information in it than before.

Apart from obfuscation, overload induction also reduces the final program size of obfuscated code. Because of its heavy reuse of identifier names, it saves significant space.

Preemptive provides Dotfuscator-Community Edition free for non-commercial use. This is not crippleware, it's a full-fledged renaming obfuscator that incorporates their overload induction renaming system.

The Professional version, which I've reviewed, adds incremental obfuscation, unused-code pruning, control flow obfuscation, and more.

Incremental Obfuscation

The problem this solves is that customers distributed their obfuscated code and their customers found a bug in their product. They wanted to issue just a patch to fix their customer's problems, but because of obfuscation this always wasn't possible. Fixing bugs in their code would often create or delete classes, methods or fields. This action caused subsequent obfuscation runs to rename things slightly differently. Unfortunately, how and what was renamed different was a mystery.

Dotfuscator includes incremental obfuscation to combat that problem. Dotfuscator creates a map file to tell you how it did renaming. So if you get a stack trace from your customer, you can match it to the mapfile and find out what unobfuscated class your bug is in (obviously mapfiles are to be treated as confidential by you). However, that same mapfile can be used as input to Dotfuscator on subsequent runs to dictate that renames used previously should be used again wherever possible.

So, if you release your product, then patch a few classes. Dotfuscator can be run in such a way to mimic its previous renaming scheme. That way, you can issue just the patched classes to your customers.


String Encryption

Dotfuscator implements runtime decrypted String encryption. Of course any encryption (or specifically decryption) done at runtime is inherently insecure. That is, a smart hacker can eventually break it, but for Strings present in customer code, they found it worthwhile. Effectively, you can apply a simple encryption algorithm to any strings in your application you desire. You can tell from looking at my example below that this can be highly effective!

Let's face it, if a hacker wants to get into your code, he doesn't blindly start searching renamed types. He probably does a nice grep on "Invalid License Key" which points him right to the type where license handling is performed. Searching on strings is incredibly easy. String Encryption raises the bar for the casual hacker and deters that many more non-serious hackers. This algorithm incurs a tiny performance penalty at runtime but as with pretty much everything else in Dotfuscator, this option is fully configurable.

Control Flow Obfuscation

Control flow obfuscation is a strong form of protection, but comes at a cost. Whereas renaming and pruning can actually speed up execution speed, control flow obfuscation can degrade it.

The most popular academic form of control flow obfuscation comes in the form of Opaque Predicates. Collberg introduced the 5 terms which evaluate obfuscating transforms including potency, resilience, deobfuscation, cost, and stealth (foundational stuff). The term control-flow-obfuscation is a bit unfortunate because its so broad. Any form of introduction or reduction of control flow is technically obfuscation.

Dotfuscator 1.1 (professional edition) is released with a strong control flow obfuscation system in place.Here is a quick example:

BEFORE:
// Code Snippet copyright 2000, Microsoft Corp, from WordCount.cs
sample app
public int CompareTo(Object o) {
int n = occurrences - ((WordOccurrence)o).occurrences;
if (n == 0) {
n = String.Compare(word, ((WordOccurrence)o).word;
}
return (n);
}

This is a pretty straightforward code snippet. Now after obfuscation it decompiles to (using Jay Freeman's "Anakrino" decompiler):

public virtual int a(object A_0) {
int local0;
int local1;

local0 = this.a - (c) A_0.a;
if (local0 != 0)
goto i0;
goto i1;
while (true) {
return local1;
i0: local1 = local0;
}
i1: local0 = System.String.Compare(this.b, (c) A_0.b);
goto i0;
}

The version after control flow obfuscation is a complete mess and it came from a simple code example (one of Microsoft's distributed code examples). Preemptive's control flow algorithms do a pretty good job of messing up code
while still maintaining complete execution consistency and code verification.

Keep in mind that the above code looks exceptionally messed up, but that's only to the decompiler. Preemptive has run thousands of classes through control-flow that won't decompile anymore but run and verify 100%.

In the early days of java obfuscation, Preemptive had an arms race with the decompiler companies. They set out to crash the decompilers, not just screw up decompilation. That turned out to be an impossible task for 2 reasons. One, decompiler guys were smart and fixed the junk Preemptive gave them pretty fast. Two, there were more Java VMs than there were decompilers. It became very difficult to find obfuscations that broke all decompilers that didn't break some primitive VM.

In the .NET space, things look a bit different. First off, the control flow obfuscator is already breaking decompilers out of the box. This is a strong testament to the primitiveness of the existing decompilers. However, eventually, someone is going to need a Ph.D. dissertation and some better ones will come along.

Optimization

Dotfuscator (unlike DashO) currently includes nothing specifically for code optimization. However, the opportunity is far greater since we have a very limited number of VMs and can target performance transforms that enhance those VMs.

Pruning

It seems odd that unused-code removal can actually do anything – after all, who writes code they don't use? Well, the answer is all of us. What's more, we all use libraries and types written by other people that were written to be reusable. Reusable code implies there is contingent code that handles many cases – however, in any given application, you typically only use one or two of those many cases. Pruning figures that out and rips out all the unused code from the compiled IL (it never touches source).

Pruning's most visible result is the reduction in size of the executable. For many applications that are distributed on CD-ROM, the size of the application isn't often a serious worry. However, more and more applications are involving a networked/distributed component or written for embedded systems. In those cases, every byte counts.

The size reduction caused by pruning is literally staggering. Some customers have reported they received a 70% size reduction of their executable. In PreEmptive's tests, they claim to see a solid 40% reduction using DashO (this is pruning, renaming, and general metadata reduction).

Their sample size for Dotfuscator isn't as mature as DashO's, but the results are looking similar. Pruned programs tend to run in less memory too.

What Can't be "Dotfuscated"?

Dotfuscator allows significant customization features to allow the programmer to specify which types or methods are used dynamically. You can tell pruning or renaming (or control flow) to leave given methods/types/fields alone. The level of customization is as deep as you want to get. Virtually any .NET assembly can be "Dotfuscated", except for those containing embedded native code (Managed C++). For interoperability with the underlying platform, the Managed C++ compiler embeds native code, called "IJW Thunk" inside .NET modules and it is impossible to round-trip such modules through Dotfuscator. If Dotfuscator detects such a module during loading, it will issue an error and stop processing. Currently, only a small number of .NET users (those using managed C++) cannot use Dotfuscator. For .NET, MS is strongly pushing either C# or VB.NET (not managed C++) , but if this becomes a big issue, I 've been assured by the Preemptive people that they will deal directly with it.

My Own Test with Dotfuscator

As a test, I did an obfuscation of a Class Library written in C# that "wraps" around the popular NZlib compression library by Mike Kruger. I had a lot of questions and Bill Leach of Preemptive responded quickly with a number of suggestions.

The Setup for this obfuscation assumes that you are using the DLL as a library (referencing it from another, unobfuscated application), and is set to perform maximum obfuscation given that constraint.


1) It is using library mode in the general options section. This tells the renamer to leave exposed (e.g. public) types, methods, and fields alone.
2) I have no custom rename excludes in the renaming section. I also turned off the keepnamespace option, and specified a map file.
3) In the control flow section, we don't want the assembly exclusion rule Control Flow obfuscation, like renaming, is all-inclusive by default and the rules are exclusion based.
4) I set String Encryption settings to Include all methods in the assembly
5) I have pruning turned on, even though it doesn't buy much in this application, since it's pretty tightly written.
6) In the trigger section, with the library option turned on, the default behavior is sufficient (that is, use all exposed members as triggers).

With these settings in place, I ran the result through salamander and the results below. All of the more complicated receive the "Decompilation not complete" message. Those methods are littered with gotos that point to
nonexistent labels. In at least one type (DeflatorHuffman), the decompiler did not finish at all-- there was a message at the bottom ("Decompilation failed") and not all methods showed up in the decompilation. What this means is
that the decompiler is graceful enough to not crash and burn when it can't decompile the code, but the results are generally not all that useful.

Finally, this is a strong named assembly -- In my case, I need to remember to resign it after obfuscation (or disable strong name verification for the assembly during testing).

First, let's look at the Public exposed methods in my "helper" class - the ones that I had exposed to the COM interface.
Note in red the methods that could not be decompiled correctly, and in blue the string obfiuscation:

Source Code for [PABNZLib1]PABNZlib.Helper.PABNZlibHelper
--------------------------------------------------------------------------------

// Decompiled by Salamander version 1.0.6
// Copyright 2002 Remotesoft Inc. All rights reserved.
// http://www.remotesoft.com/salamander

using NZlib.Compression;
using NZlib.Streams;
using PABNZlib.Encryption;
using System;
using System.IO;
using System.Net;
using System.Text;

namespace PABNZlib.Helper
{
[ProgIdAttribute("PABNZlib.Helper")]
[ClassInterfaceAttribute(ClassInterfaceType.AutoDual)]
public class PABNZlibHelper
{

public Stream CompressStream(Stream inStream, int iCompressionLevel)
{
int i;

Stream stream = new MemoryStream();
DeflaterOutputStream deflaterOutputStream = new DeflaterOutputStream(stream, new Deflater(iCompressionLevel));
byte[] bs = new byte[1000];
for (i = inStream.Read(bs, 0, 1000); i > 0; i = inStream.Read(bs, 0, 1000))
{
deflaterOutputStream.Write(bs, 0, i);
}
deflaterOutputStream.Finish();
deflaterOutputStream.Flush();
return stream;
}

public Stream DecompressStream(Stream inStream)
{
return new InflaterInputStream(inStream);
}

public byte[] StringCompressToByte(string strInput, int iCompressionLevel)
{
try
{
byte[] bs1 = Encoding.UTF8.GetBytes(strInput);
MemoryStream memoryStream = new MemoryStream();
Stream stream = new DeflaterOutputStream(memoryStream, new Deflater(iCompressionLevel));
stream.Write(bs1, 0, (int)bs1.Length);
stream.Close();
byte[] bs3 = memoryStream.ToArray();
return bs3;
}
catch (Exception e)
{
throw new Exception(e.Message);
}
}

// Decompilation not complete! (3)
public string ByteDecompressToString(byte[] bytInput)
{
string str1;
int i;
byte[] bs;
Stream stream;
Exception e;
string str2;
str1 = "";
i = 0;
bs = new byte[4096];
stream = new InflaterInputStream(new MemoryStream(bytInput));
int j = stream.Read(bs, 0, (int)bs.Length);
if (j <= 0)
{
goto IL_0057;
}
i = j;
str1 = string.Concat(str1, Encoding.Default.GetString(bs, 0, j));
goto IL_0024;
if (true)
{
goto IL_0024;
}
stream.Close();
str2 = str1;
IL_0060: leave.s IL_0071
IL_0062: stloc.s 5
throw new Exception(e.Message);
return str2;
}

public string StringCompressToString(string strInput, int iCompressionLevel)
{
try
{
byte[] bs1 = Encoding.UTF8.GetBytes(strInput);
MemoryStream memoryStream = new MemoryStream();
Stream stream = new DeflaterOutputStream(memoryStream, new Deflater(iCompressionLevel));
stream.Write(bs1, 0, (int)bs1.Length);
stream.Close();
byte[] bs2 = memoryStream.ToArray();
string str = Encoding.Default.GetString(bs2);
return str;
}
catch (Exception e)
{
throw new Exception(e.Message);
}
}

// Decompilation not complete! (5)
public string StringDeCompressToString(string strInput)
{
string str1;
int i;
byte[] bs1;
Stream stream;
Exception e;
string str2;
str1 = "";
i = 0;
bs1 = new byte[4096];
stream = new InflaterInputStream(new MemoryStream(Encoding.Default.GetBytes(strInput)));
int j = stream.Read(bs1, 0, (int)bs1.Length);
if (j <= 0)
{
goto IL_0065;
}
i = j;
str1 = string.Concat(str1, Encoding.Default.GetString(bs1, 0, j));
goto IL_0031;
if (true)
{
goto IL_0031;
}
stream.Close();
str2 = str1;
IL_006f: leave.s IL_0080
IL_0071: stloc.s 6
throw new Exception(e.Message);
return str2;
}

public string StringCompressSendRecvDecompressString(string postData, string strURL, int iCompressionLevel)
{
WebClient webClient = new WebClient();
byte[] bs1 = null;
bs1 = StringCompressToByte(postData, iCompressionLevel);
byte[] bs2 = webClient.UploadData(strURL, a("\u1948\u1b55\u1d4f\u1f4a"), bs1);
return ByteDecompressToString(bs2);
}

public string Encrypt(string strDataToEncrypt, string strEncryptionKey, bool blnBase64)
{
return new Encryption64().Encrypt(strDataToEncrypt, strEncryptionKey, blnBase64);
}

public static string Decrypt(string strDataToDecrypt, string strEncryptionKey, bool blnBase64)
{
return new Encryption64().Decrypt(strDataToDecrypt, strEncryptionKey, blnBase64);
}
}

}

Now let's take a look at one of the internal NZlib classes:

 

// Decompiled by Salamander version 1.0.6
// Copyright 2002 Remotesoft Inc. All rights reserved.
// http://www.remotesoft.com/salamander

using NZlib.Checksums;
using NZlib.Compression;
using NZlib.Streams;
using System;
using System.IO;

namespace NZlib.GZip
{
public class GZipInputStream : InflaterInputStream
{
protected Crc32 crc = new Crc32();

protected bool eos;

private bool a;


public GZipInputStream(Stream baseInputStream) : this(baseInputStream, 4096)
{
}

public GZipInputStream(Stream baseInputStream, int size) : base(baseInputStream, new Inflater(true), size)
{
}

// Decompilation not complete! (2)
public override int Read(byte[] buf, int offset, int len)
{
int i;

if (!a)
{
b();
}
if (eos)
{
goto IL_0068;
}
else
{
goto IL_0020;
}
crc.Update(buf, offset, i);
break;
i = base.Read(buf, offset, len);
if (i <= 0) goto IL_0056 else goto IL_000d;
a();
break;
if (!inf.get_IsFinished()) goto IL_006a else goto IL_0040;
return -1;

return i;
}

// Compilation failed for this method
private void b();

// Decompilation not complete! (4)
private void a()
{
int i2;

byte[] bs = new byte[8];
int i1 = inf.get_RemainingInput();
if (i1 > 8)
{
i1 = 8;
}
Array.Copy(buf, len - inf.get_RemainingInput(), bs, 0, i1);
int j1 = 8 - i1;
while (j1 > 0)
{
int k = baseInputStream.Read(bs, 8 - j1, j1);
if (k <= 0)
{
throw new Exception(a("\u195d\u1b7b\u1d6e\u1f72\u2159\u2302\u2561\u2769\u296e\u2b0a\u2d4e\u2f4f\u3143\u3357\u357d\u3758\u3948\u3b4f\u3d48\u3f6d\u4134\u4330\u4521\u4727\u4925\u4b6a\u4d0b\u4f14\u5119\u5302\u5574\u5730\u5937\u5b35\u5d28\u5f3b\u6112"));
}
j1 -= k;
}
i2 = bs[0] & byte.MaxValue | (bs[1] & byte.MaxValue) << 8 | (bs[2] & byte.MaxValue) << 16 | bs[3] << 24;
throw new IOException(string.Concat(new object[]{a("\u195f\u1b40\u1d55\u1f4e\u2100\u2341\u2556\u2745\u2908\u2b59\u2d59\u2f43\u3110\u335f\u355d\u3745\u3955\u3b5b\u3d48\u3f5d\u4128\u436e\u4564\u4732\u4920\u4b2f\u4d25\u4f3c\u5123\u5372\u5576"), i2, a("\u193a\u1b3a\u1d7d\u1f70\u2144\u2302\u254b\u2753\u295a\u2b59\u2d0c\u2f0c"), (int)crc.get_Value()}));
int j1;

if ((bs[4] & byte.MaxValue | (bs[5] & byte.MaxValue) << 8 | (bs[6] & byte.MaxValue) << 16 | bs[7] << 24) != inf.get_TotalOut())
{
throw new IOException(a("\u1956\u1b6f\u1d71\u1f7c\u2145\u2350\u2504\u2749\u294e\u2b0a\u2d4e\u2f57\u3144\u3357\u3547\u3716\u3955\u3b53\u3d4f\u3f53\u4121\u4336\u4527\u472e"));
}
eos = true;
return;
if (i2 == (int)crc.get_Value()) goto IL_009f else goto IL_001c;
}
}

}

You can see above that Dotfuscator PE has done a pretty good job of mangling the decompilation results to the point of pure unusability. However, the code ran perfectly! Dotfuscator seems to be the most advanced and well-supported IL obfuscator on the market.

Key Features Include:

  • Complete support for .Net Framework
  • Makes application size smaller
  • Designed to stop even the best of decompilers from producing useful output.
  • Easy to use XML based configuration file.
  • Generated Map files allow you to interpret stack traces.
  • Namespace/Type/Method/Field renaming using patented Overload-Induction™ renaming system
  • Enhanced Overload Induction
  • Incremental Obfuscation
  • Control Flow Obfuscation
  • Pruning - Unused Type/Method/Field removal
  • String Encryption
  • Includes GUI and command line interface suitable for integrating into build environments.
  • Complete and accurate Users's Guide in PDF format

Dotfuscator Pro is $1295.00. Not cheap, but when you consider how much your organization has invested in its intellectual property, it's a lot cheaper than all the lawyer's fees you'll need to pay with no certainty of remuneration when your code is stolen.