Obfuscation and Packaging of .NET Applications via Compressed Embedded Assemblies

Shows a technique for embedding all the assemblies for an application as compressed, embedded resources into a single stub loader executable, extracting, decompressing and loading into the AppDomain at startup.

O wad some Power the giftie gie us
To see oursels as ithers see us!
--Robert Burns

I was looking over some old code of mine that uses compressed embedded resources to store a Zip Code database inside an assembly when this idea came to me. Why not compress all the assemblies used by an application, store them as embedded resources, and decompress / load them into the AppDomain "on the fly" when a stub "container / loader" executable is run?  Seems like you could do it, right? Of course you can do it! There could be some interesting benefits to doing this:

1. Only need to distribute one executable since all the assemblies your application needs to run are stored in compressed form inside it.

2. Smaller size since all the needed assemblies are compressed. Could be 80% smaller - or more.

3. Pretty hard to disassemble since the only thing a hacker could see is binary compressed data, and the assemblies are never deposited on the File System. Certainly not foolproof, but the concept  is worthy of some study.

So I put together this application as a proof - of - concept.

Here's how it works:

My application has two assemblies, MainClass and SecondaryClass. MainClass.Class1 shows a Windows Form with a DataGridView and a Button on it. The Button click instantiates and makes a call to SecondaryClass.Class1, which returns an RSS feed in the form of a DataTable, which is then displayed in the Grid on the Form.  Extremely simple stuff, it's just a proof of concept. Two separate assemblies, one for the class showing the main form, and the other containing the class that gets the feed.

The solution also contains two additional projects:

1. Assembly Compressor. This is just a Windows form that allows you to load an assembly and save it in compressed form using LZMA (7Zip) compresssion. I chose LZMA because it has a superior compression ratio and very fast decompression as opposed to ZIP format. The 7Zip SDK has most everything you need to do this with managed code.  Use this to compress all the assemblies for your application and save them with a ".dat" extension (you can use any extension you want, ".dat" was just handy).  Then you add these compressed assemblies to your main "StartUp" project and mark the build action as "Embedded Resource". My implementation of the LZMA sports a handy SevenZipHelper wrapper that makes it easy to use, with a byte array going in, and a byte array coming back out, in either direction. It also sets the LZMA parameters for you, which can be very tricky.

2. StartUp. This is the "Stub" executable in which you store your compressed assemblies. When this is executed, it retrieves, decompresses, and loads the application's assemblies into the AppDomain, and calls the main method on the MainClass, which shows your form with the Grid and button. The Form for Startup is not shown, since it is just a utility loader form. 

We don't really see the benefits of compression here since these "proof" assemblies are very small. But once you get into a real application, you will see tremendous advantages since many .NET assemblies will compress to only 20% of their original size.

Here is what the code for the "loader" looks like:

 
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Text;
using System.Windows.Forms;
using System.Reflection;

namespace StartUp
{
    public partial class Form1 : Form
    {
        public Type t = null;
        public Type t2 = null;
        Assembly assembly2 = null;
        public Form1()
        {
            InitializeComponent();
            LoadAssemblies();
        }

        private byte[] DecompressAssembly(byte[] assemblyBytes)
        {
          return  SevenZip.Compression.LZMA.SevenZipHelper.Decompress(assemblyBytes);
        }
        private void LoadAssemblies()
        {
            AppDomain.CurrentDomain.AssemblyResolve += new ResolveEventHandler(CurrentDomain_AssemblyResolve);
            // uncomment next line to examine the true names  all your embedded resources are stored as.
            //  string[] names = Assembly.GetExecutingAssembly().GetManifestResourceNames();
            Stream stm = Assembly.GetExecutingAssembly().GetManifestResourceStream("StartUp.MainClass.dat");
            stm.Seek(0, 0);
            byte[] assemBytes = new byte[stm.Length];
            stm.Read(assemBytes, 0, (int)stm.Length);
            byte[] decompAssemBytes = DecompressAssembly(assemBytes);
            Assembly assembly = Assembly.Load(decompAssemBytes);
            t = assembly.GetType("MainClass.Class1");
           Stream stm2 = Assembly.GetExecutingAssembly().GetManifestResourceStream("StartUp.SecondaryClass.dat");
           stm2.Seek(0, 0);
            byte[] assemBytes2 = new byte[stm2.Length];
           stm2.Read(assemBytes2, 0, (int)stm2.Length);
            byte[] decompAssemBytes2 = DecompressAssembly(assemBytes2);
           assembly2= AppDomain.CurrentDomain.Load(decompAssemBytes2);
             t2 = assembly2.GetType("SecondaryClass.Class1");
            MethodInfo mymethod = t.GetMethod("StartApp");
           Object obj = Activator.CreateInstance(t);   
            mymethod.Invoke(obj, null);
        }

        Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args)
        {
            // handle assembly resolve events for referenced assemblies
            // can have a switch statement for all dependencies
          string s=  args.Name;
          if (s == "SecondaryClass, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null")
          {
             return assembly2;
          }
          return null;
           
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            // this form is just a proxy "start" form for the real app and is not to be shown.
            this.Visible = false;
        }
    }
}
Note that I'm using the AssemblyResolve Event. After Appdomain loads an assembly, it will not load it again when it tries to use something in the assembly. But the issue is the Appdomain doesn't know it has loaded the assembly in this case, because we loaded it from a byte array.

When the appdomain tries to load the assembly, it fails so the AssemblyResolve event is fired to resolve the assembly location. I suppose one could make the case that this is a deficiency in the Framework, since AppDomain only uses paths to attempt to resolve an assembly's location, and doesn't work from "memory".

In the AssemblyResolve handler, you get the assembly (we already have it); the appdomain doesn't remember the assembly real location. Then you just return the assembly that the AppDomain could not resolve; the ResolveEventArgs Name property holds this.  NOTE: I am sure there is a better way to handle this, I just haven't had the time to research it as this is  a technique I haven't used before. But it works fine the way shown, so we'll leave that for further study.

Here is an alternative (and better) implementation. First the code, then I'll explain:

private static Dictionary<string, Assembly> libs = new Dictionary<string, Assembly>();

Assembly CurrentDomain_AssemblyResolve(object sender, ResolveEventArgs args)

{

string shortName = new AssemblyName(args.Name).Name;

if (libs.ContainsKey(shortName)) return libs[shortName];

Assembly assembly = null;

using ( Stream stm = Assembly.GetExecutingAssembly().GetManifestResourceStream("StartUp."+shortName+".dat"))

{

stm.Seek(0, 0);

byte[] assemBytes = new byte[stm.Length];

stm.Read(assemBytes, 0, (int)stm.Length);

byte[] decompAssemBytes = DecompressAssembly(assemBytes);

assembly = Assembly.Load(decompAssemBytes);

libs[shortName]=assembly;

}

return assembly;

}

The reason for "caching" requested assemblies in the Dictionary is to ensure that if the CLR wants the same assembly again in your application, we return the exact same object. If we don't, a loaded assembly's types will be incompatible with those loaded previously, even though the binary images are identical.

If you Load twice from an identical location path, you will get the previously cached copy. However if you load twice from a byte array, you get two identical assemblies in memory, and their types are incompatible. Incidentally, loading from a byte array gets around the problem of locked assembly files!

This introduces the possibility of "Selective Patching" of an application. The app could download any updated libraries and save them to Isolated Storage. Then in your AssemblyResolve handler, you could check for the presence of a library in the isolated storage area before loading it from the compressed resource in the executable.

This new code is not in the provided sample solution, but it should be easy to add the Dictionary line and replace the AssemblyResolve handler with the one above.

I hope this exercise is useful to you. If you have a question suggestion, or improvement, use the "Ask a Question" link at the bottom of this article to post it to our forums. You can download the Visual Studio 2005 Solution here.

By Peter Bromberg   Popularity  (3728 Views)