Chapter 15. Compressing Data to Reduce Memory Footprint

 

The programmer’s primary weapon in the never-ending battle against slow systems is to change the intramodular structure. Our first response should be to reorganize the modules’ data structures.

 
 --Frederick P. Brooks

Games are being produced with multiple gigabytes of game assets, and it is projected that file sizes will increase at an exponential rate in the years to come. One of the largest issues for building reusable and efficient tools comes down to scalability, and how to build tools that can manage countless assets. One way to achieve this goal is through the use of data compression to reduce the file size of each game asset.

A variety of software development projects employ data compression, and almost all operating systems and platforms have libraries and tools available to perform data compression for different types of situations and datasets. Fortunately, .NET 2.0 introduced some new data compression components that make the whole process very easy.

As for a definition, data compression removes redundancy from data, which can come in a lot of different forms depending on the type of data in question. On a small scale, repeated bit sequences (11111111) or repeated byte sequences (XXXXXXXX) can be transformed. On a larger scale, redundancies tend to come from sequences of varying lengths that are relatively common. Basically, data compression aims at finding algorithmic transformations of a dataset that will produce a more compact representation of the original dataset.

Choosing the best compression algorithm depends on a number of factors, such as expected patterns and regularities in the data, storage and data persistence requirements, and both CPU and memory limits. This chapter briefly covers some data compression theory, but it mostly covers implementation of data compression using the built-in C# components.

Types of Compression

Data compression basically comes in two flavors, lossy and lossless.

Lossy compression is a representation of the original dataset that is “close enough” in comparison. File sizes are significantly reduced by losing a reasonable amount of data in the compression process. Lossy compression can produce far more compact dataset representations than lossless compression. The main problem with lossy compression is that valid data is actually lost and unrecoverable, but this limitation is all right for images, sound files, and video clips where data loss is acceptable because humans can only perceive a subset of the actual data anyway. In the data persistence world, where data cannot be lost or corruption would occur, lossy compression algorithms will not suffice. Storing a “close enough” representation of a data file would be useless. Lossy compression also does not generally provide a decompression algorithm because of the data loss.

Lossless compression is a representation of the original dataset that enables reproduction of the exact contents of the original dataset by performing a decompression transformation. No data is ever lost in the compression process, making it the perfect solution for compressing data that must maintain integrity. This chapter only covers lossless data compression, because we generally want tools to maintain 100 percent data integrity unless we are dealing with image compression.

GZipStream Compression in .NET 2.0

Microsoft .NET 1.1 did not include any data compression components other than third-party solutions. Recently introduced in .NET 2.0 is the System.IO.Compression namespace that provides compression and decompression services for streams. There are currently two supported algorithms: deflate and gzip. This chapter covers the gzip algorithm exclusively.

The gzip algorithm is a lossless data format that is safe from patents. The gzip implementation provided by Microsoft is completely compatible with the unix gzip functionality, though the .NET implementation has a slightly weaker compression algorithm. The gzip implementation follows the format from RFC 1952. Microsoft .NET 2.0 provides gzip functionality through the GZipStream class.

Another great feature of the gzip format is that there is a cyclic redundancy checksum that is used to detect data corruption.

Note

The GZipStream class cannot be used to compress files larger than four gigabytes in size.

Implementation for Arbitrary Data

The first step to use the GZipStream class is to include the appropriate namespaces.

using System;
using System.IO;
using System.IO.Compression;

The following method is used to compress arbitrary data stored in a byte array and return a byte array containing the compressed data. Notice that the input data length is written as the first four bytes of the stream. This is so the decompression method can decompress that data without having to determine the original file size of the data. This was done to improve performance and speed, sacrificing compatibility with other gzip implementations. We want to know the original size of the data before compression so we can allocate enough memory to store the data after decompression.

Data is compressed on the fly as it is written into the GZipStream. Notice that the constructor for GZipStream references the memory stream that will hold the resultant data. This compression can be done against any stream object, including, FileStream for files.

internal static byte[] CompressData(byte[] input)
{
    try
    {
         using (MemoryStream output = new MemoryStream())
         {
             output.Write(BitConverter.GetBytes(input.Length), 0, 4);
             using (GZipStream zipStream = new GZipStream(output,
                                                  CompressionMode.Compress, true))
             {
                 zipStream.Write(input, 0, input.Length);
             }
             return output.ToArray();
        }
    }
    catch (Exception)
    {
        return null;
    }
}

Decompression is handled in the same way as compression, except the CompressionMode.Decompress enum value is used. The first step is to read the initial four data bytes from the stream as an integer describing the buffer size for the decompressed data. Then the data buffer is created and the input data is decompressed and read into it.

internal static byte[] DecompressData(byte[] input)
{
    try
    {
         using (MemoryStream inputData = new MemoryStream(input))
         {
             byte[] lengthData = new byte[4];

             if (inputData.Read(lengthData, 0, 4) == 4)
             {
                 int decompressedLength = BitConverter.ToInt32(lengthData, 0);
                 using (GZipStream zipStream = new GZipStream(inputData,
                                                     CompressionMode.Decompress))
                 {
                     byte[] decompressedData = new byte[decompressedLength];

                     if (zipStream.Read(decompressedData,
                                      0,
                                      decompressedLength) == decompressedLength)
                     {
                         return decompressedData;
                     }
                 }
             }
        }

        return null;
    }
    catch (Exception)
    {
        return null;
    }
}

Implementation for Serializable Objects

A powerful feature of the .NET platform is the ability to serialize objects into an XML or binary representation to make storing, sending, or transforming data extremely easy. Serialization is common practice and is used in many facets of .NET application or systems development. The BinaryFormatter class can serialize and deserialize data into a stream, which makes GZipStream a suitable target for data transformation.

The first step is to include the appropriate namespaces.

using System;
using System.IO;
using System.IO.Compression;
using System.Runtime.Serialization.Formatters.Binary;

The following code describes a simple serializable class that is used in the accompanying example for this chapter. It shows how to create a serializable class and properly decorate it with the SerializableAttribute.

[Serializable]
internal class TestObject
{
    private string testString;
    private int testInteger;

    public string TestString
    {
        get { return testString; }
        set { testString = value; }
    }

    public int TestInteger
    {
        get { return testInteger; }
        set { testInteger = value; }
    }
    
    internal TestObject()
    {
        testString = string.Empty;
        testInteger = 0;
    }
}

The next method is used to compress TestObject instances into a byte array containing the compressed data. You will notice that the code is very similar to compressing arbitrary data except the BinaryFormatter is in charge of writing to the GZipStream.

internal static byte[] CompressTestObject(TestObject testObject)
{
    try
    {
         using (MemoryStream output = new MemoryStream())
         {
             using (GZipStream zipStream = new GZipStream(output,
                                                 CompressionMode.Compress))
             {
                 BinaryFormatter formatter = new BinaryFormatter();
                 formatter.Serialize(zipStream, testObject);
             }

             return output.ToArray();
        }
   }
   catch (Exception)
   {
       return null;
   }
}

Decompression works the same as the compression method, except the input data is decompressed and deserialized into a TestObject instance. This approach does not require the data length to be written to the stream because BinaryFormatter knows how big the class data is.

internal static TestObject DecompressTestObject(byte[] input)
{
    try
    {
         using (MemoryStream output = new MemoryStream(input))
         {
             using (GZipStream zipStream = new GZipStream(output,
                                                 CompressionMode.Decompress))
             {
                 BinaryFormatter formatter = new BinaryFormatter();
                 return (formatter.Deserialize(zipStream) as TestObject);
             }
       }
   }
   catch (Exception)
   {
       return null;
   }
}

Conclusion

This chapter briefly covered part of data compression theory, though barely scratching the surface of a complex topic, and then later jumped into implementation details for the GZipStream class introduced in .NET 2.0.

Data compression has been and always will be a crucial element of many tools, especially with the projected increase in the volume of game content over the next couple of years. Data compression also has its place with network tools where bandwidth and transfer speed is limited.

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.31.67