The programmer’s primary weapon in the never-ending battle against slow systems is to change the intramodular structure. Our first response should be to reorganize the modules’ data structures. | ||
--Frederick P. Brooks |
Games are being produced with multiple gigabytes of game assets, and it is projected that file sizes will increase at an exponential rate in the years to come. One of the largest issues for building reusable and efficient tools comes down to scalability, and how to build tools that can manage countless assets. One way to achieve this goal is through the use of data compression to reduce the file size of each game asset.
A variety of software development projects employ data compression, and almost all operating systems and platforms have libraries and tools available to perform data compression for different types of situations and datasets. Fortunately, .NET 2.0 introduced some new data compression components that make the whole process very easy.
As for a definition, data compression removes redundancy from data, which can come in a lot of different forms depending on the type of data in question. On a small scale, repeated bit sequences (11111111) or repeated byte sequences (XXXXXXXX) can be transformed. On a larger scale, redundancies tend to come from sequences of varying lengths that are relatively common. Basically, data compression aims at finding algorithmic transformations of a dataset that will produce a more compact representation of the original dataset.
Choosing the best compression algorithm depends on a number of factors, such as expected patterns and regularities in the data, storage and data persistence requirements, and both CPU and memory limits. This chapter briefly covers some data compression theory, but it mostly covers implementation of data compression using the built-in C# components.
Data compression basically comes in two flavors, lossy and lossless.
Lossy compression is a representation of the original dataset that is “close enough” in comparison. File sizes are significantly reduced by losing a reasonable amount of data in the compression process. Lossy compression can produce far more compact dataset representations than lossless compression. The main problem with lossy compression is that valid data is actually lost and unrecoverable, but this limitation is all right for images, sound files, and video clips where data loss is acceptable because humans can only perceive a subset of the actual data anyway. In the data persistence world, where data cannot be lost or corruption would occur, lossy compression algorithms will not suffice. Storing a “close enough” representation of a data file would be useless. Lossy compression also does not generally provide a decompression algorithm because of the data loss.
Lossless compression is a representation of the original dataset that enables reproduction of the exact contents of the original dataset by performing a decompression transformation. No data is ever lost in the compression process, making it the perfect solution for compressing data that must maintain integrity. This chapter only covers lossless data compression, because we generally want tools to maintain 100 percent data integrity unless we are dealing with image compression.
Microsoft .NET 1.1 did not include any data compression components other than third-party solutions. Recently introduced in .NET 2.0 is the System.IO.Compression
namespace that provides compression and decompression services for streams. There are currently two supported algorithms: deflate
and gzip
. This chapter covers the gzip
algorithm exclusively.
The gzip
algorithm is a lossless data format that is safe from patents. The gzip
implementation provided by Microsoft is completely compatible with the unix gzip
functionality, though the .NET implementation has a slightly weaker compression algorithm. The gzip
implementation follows the format from RFC 1952. Microsoft .NET 2.0 provides gzip
functionality through the GZipStream
class.
Another great feature of the gzip
format is that there is a cyclic redundancy checksum that is used to detect data corruption.
The first step to use the GZipStream
class is to include the appropriate namespaces.
using System; using System.IO; using System.IO.Compression;
The following method is used to compress arbitrary data stored in a byte array and return a byte array containing the compressed data. Notice that the input data length is written as the first four bytes of the stream. This is so the decompression method can decompress that data without having to determine the original file size of the data. This was done to improve performance and speed, sacrificing compatibility with other gzip implementations. We want to know the original size of the data before compression so we can allocate enough memory to store the data after decompression.
Data is compressed on the fly as it is written into the GZipStream
. Notice that the constructor for GZipStream
references the memory stream that will hold the resultant data. This compression can be done against any stream object, including, FileStream
for files.
internal static byte[] CompressData(byte[] input) { try { using (MemoryStream output = new MemoryStream()) { output.Write(BitConverter.GetBytes(input.Length), 0, 4); using (GZipStream zipStream = new GZipStream(output, CompressionMode.Compress, true)) { zipStream.Write(input, 0, input.Length); } return output.ToArray(); } } catch (Exception) { return null; } }
Decompression is handled in the same way as compression, except the CompressionMode.Decompress enum
value is used. The first step is to read the initial four data bytes from the stream as an integer describing the buffer size for the decompressed data. Then the data buffer is created and the input data is decompressed and read into it.
internal static byte[] DecompressData(byte[] input) { try { using (MemoryStream inputData = new MemoryStream(input)) { byte[] lengthData = new byte[4]; if (inputData.Read(lengthData, 0, 4) == 4) { int decompressedLength = BitConverter.ToInt32(lengthData, 0); using (GZipStream zipStream = new GZipStream(inputData, CompressionMode.Decompress)) { byte[] decompressedData = new byte[decompressedLength]; if (zipStream.Read(decompressedData, 0, decompressedLength) == decompressedLength) { return decompressedData; } } } } return null; } catch (Exception) { return null; } }
A powerful feature of the .NET platform is the ability to serialize objects into an XML or binary representation to make storing, sending, or transforming data extremely easy. Serialization is common practice and is used in many facets of .NET application or systems development. The BinaryFormatter
class can serialize and deserialize data into a stream, which makes GZipStream
a suitable target for data transformation.
The first step is to include the appropriate namespaces.
using System; using System.IO; using System.IO.Compression; using System.Runtime.Serialization.Formatters.Binary;
The following code describes a simple serializable class that is used in the accompanying example for this chapter. It shows how to create a serializable class and properly decorate it with the SerializableAttribute
.
[Serializable] internal class TestObject { private string testString; private int testInteger; public string TestString { get { return testString; } set { testString = value; } } public int TestInteger { get { return testInteger; } set { testInteger = value; } } internal TestObject() { testString = string.Empty; testInteger = 0; } }
The next method is used to compress TestObject
instances into a byte array containing the compressed data. You will notice that the code is very similar to compressing arbitrary data except the BinaryFormatter
is in charge of writing to the GZipStream
.
internal static byte[] CompressTestObject(TestObject testObject) { try { using (MemoryStream output = new MemoryStream()) { using (GZipStream zipStream = new GZipStream(output, CompressionMode.Compress)) { BinaryFormatter formatter = new BinaryFormatter(); formatter.Serialize(zipStream, testObject); } return output.ToArray(); } } catch (Exception) { return null; } }
Decompression works the same as the compression method, except the input data is decompressed and deserialized into a TestObject
instance. This approach does not require the data length to be written to the stream because BinaryFormatter
knows how big the class data is.
internal static TestObject DecompressTestObject(byte[] input) { try { using (MemoryStream output = new MemoryStream(input)) { using (GZipStream zipStream = new GZipStream(output, CompressionMode.Decompress)) { BinaryFormatter formatter = new BinaryFormatter(); return (formatter.Deserialize(zipStream) as TestObject); } } } catch (Exception) { return null; } }
This chapter briefly covered part of data compression theory, though barely scratching the surface of a complex topic, and then later jumped into implementation details for the GZipStream
class introduced in .NET 2.0.
Data compression has been and always will be a crucial element of many tools, especially with the projected increase in the volume of game content over the next couple of years. Data compression also has its place with network tools where bandwidth and transfer speed is limited.
18.118.31.67