Binary Data Serialization

There are basically two ways to serialize ADO.NET objects: using the object’s own XML interface, and using .NET Framework data formatters. So far, we have reviewed the DataSet object’s methods for serializing data to XML, and you’ve learned how to persist other objects like DataTable and DataView to XML. Let’s look now at what’s needed to serialize ADO.NET objects using the standard .NET Framework data formatters.

The big difference between methods like WriteXml and .NET Framework data formatters is that in the former case, the object itself controls its own serialization process. When .NET Framework data formatters are involved, any object can behave in one of two ways. The object can declare itself as serializable (using the Serializable attribute) and passively let the formatter extrapolate any significant information that needs to be serialized. This type of object serialization uses .NET Framework reflection to list all the properties that make up the state of an object.

The second behavior entails the object implementing the ISerializable interface, thus passing the formatters the data to be serialized. After this step, however, the object no longer controls the process. A class that neither is marked with the Serializable attribute nor implements the ISerializable interface can’t be serialized. No ADO.NET class declares itself as serializable, and only DataSet and DataTable implement the ISerializable interface. For example, you can’t serialize to any .NET Framework formatters a DataColumn or a DataRow object.

Ordinary .NET Framework Serialization

The .NET Framework comes with two predefined formatter objects defined in the System.Runtime.Serialization.Formatters namespace—the binary formatter and the SOAP formatter. The classes that provide these two serializers are Binary­Formatter and SoapFormatter. The former is more efficient, is faster, and produces more compact code. The latter is designed for interoperability and generates a SOAP-based description of the class that can be easily consumed on non-.NET platforms.

Note

A formatter object is merely a class that implements the IFormatter interface to support the serialization of a graph of objects. The SoapFormatter and BinaryFormatter classes also implement the IRemotingFormatter interface to support remote procedure calls across AppDomains. No technical reasons prevent you from implementing custom formatters. In most cases, however, you only need to tweak the serialization process of a given class instead of creating an extension to the general serialization mechanism. Quite often, this objective can be reached simply by implementing the ISerializable interface.


The following code shows what’s needed to serialize a DataTable object using a binary formatter:

BinaryFormatter bf = new BinaryFormatter();
StreamWriter swDat = new StreamWriter(outputFile);
bf.Serialize(swDat.BaseStream, dataTable);
swDat.Close();

The Serialize method causes the formatter to flush the contents of an object to a binary stream. The Deserialize method does the reverse—it reads from a previously created binary stream, rebuilds the object, and returns it to the caller, as shown here:

DataTable dt = new DataTable();
BinaryFormatter bf = new BinaryFormatter();
StreamReader sr = new StreamReader(sourceFile);
dt = (DataTable) bf.Deserialize(sr.BaseStream);  
sr.Close();

When you run this code, something surprising happens. Have you ever tried to serialize a DataTable object, or a DataSet object, using the binary formatter? If so, you certainly got a binary file, but with a ton of XML in it. Unfortunately, XML data in serialized binary files only makes them huge, without the portability and readability advantages that XML normally offers. As a result, deserializing such files might take a while to complete—usually seconds.

There is an architectural reason for this odd behavior. The DataTable and DataSet classes implement the ISerializable interface, thus making themselves responsible for the data being serialized. The ISerializable interface consists of a single method—GetObjectData—whose output the formatter takes and flushes into the output stream.

Can you guess what happens next? By design, the DataTable and DataSet classes describe themselves to serializers using an XML DiffGram document. The binary formatter takes this rather long string and appends it to the stream. In this way, DataSet and DataTable objects are always remoted and transferred using XML—which is great. Unfortunately, if you are searching for a more compact representation of persisted tables, the ordinary .NET Framework run-time serialization for ADO.NET objects is not for you. Let’s see how to work around it.

Custom Binary Serialization

To optimize the binary representation of a DataTable object (or a DataSet object), you have no other choice than mapping the class to an intermediate object whose serialization process is under your control. The entire operation is articulated into a few steps:

  1. Create a custom class, and mark it as serializable (or, alternatively, implement the ISerializable interface).

  2. Copy the key properties of the DataTable object to the members of the class. Which members you actually map is up to you. However, the list must certainly include the column names and types, plus the rows.

  3. Serialize this new class to the binary formatter, and when deserialization occurs, use the restored information to build a new instance of the DataTable object.

Let’s analyze these steps in more detail.

Creating a Serializable Ghost Class

Assuming that you need to persist only columns and rows of a DataTable object, a ghost class can be quickly created. In the following example, this ghost class is named GhostDataTable:

[Serializable]
public class GhostDataTable
{
    public GhostDataTable()
    {
        colNames = new ArrayList();
        colTypes = new ArrayList();
        dataRows = new ArrayList();
    }

    public ArrayList colNames;
    public ArrayList colTypes;
    public ArrayList dataRows;
}

This class consists of three, serializable ArrayList objects that contain column names, column types, and data rows.

The serialization process now involves the GhostDataTable class rather than the DataTable object, as shown here:

private void BinarySerialize(DataTable dt, string outputFile)
{
    BinaryFormatter bf = new BinaryFormatter();
    StreamWriter swBin = new StreamWriter(outputFile);
            
    // Instantiate and fill the worker class
    GhostDataTable ghost = new GhostDataTable(); 
    CreateTableGraph(dt, ghost);

    // Serialize the object
    bf.Serialize(swBin.BaseStream, ghost);
    swBin.Close();
}

The key event here is how the DataTable object is mapped to the GhostData­Table class. The mapping takes place in the folds of the CreateTableGraph routine.

Mapping Table Information

The CreateTableGraph routine populates the colNames array with column names and the colTypes array with the names of the data types, as shown in the following code. The dataRows array is filled with an array that represents all the values in the row.

void CreateTableGraph(DataTable dt, GhostDataTable ghost)
{
    // Insert column information (names and types)
    foreach(DataColumn col in dt.Columns)
    {
        ghost.colNames.Add(col.ColumnName); 
        ghost.colTypes.Add(col.DataType.FullName);   
    }

    // Insert rows information
    foreach(DataRow row in dt.Rows)
        ghost.dataRows.Add(row.ItemArray);
}

The DataRow object’s ItemArray property is an array of objects. It turns out to be particularly handy, as it lets you handle the contents of the entire row as a single, monolithic piece of data. Internally, the get accessor of ItemArray is implemented as a simple loop that reads and stores one column after the next. The set accessor is even more valuable, because it automatically groups all the changes in a pair of BeginEdit/EndEdit calls and fires column-changed events as appropriate.

Sizing Up Serialized Data

The sample application shown in Figure 9-9 demonstrates that a DataTable object serialized using a ghost class can be up to 80 percent smaller than an identical object serialized the standard way.

Figure 9-9. The difference between ordinary and custom binary serialization.


In particular, consider the DataTable object resulting from the following query:

SELECT * FROM [Order Details]

The table contains five columns and 2155 records. It would take up half a megabyte if serialized to the binary formatter as a DataTable object. By using an intermediate ghost class, the size of the output is 83 percent less. Looking at things the other way round, the results of the standard serialization process is about 490 percent larger than the results you obtain using the ghost class.

Of course, not all cases give you such an impressive result. In all the tests I ran on the Northwind database, however, I got an average 60 percent reduction. The more the table content consists of numbers, the more space you save. The more BLOB fields you have, the less space you save. Try running the following query, in which photo is the BLOB field that contains an employee’s picture:

SELECT photo FROM employees

The ratio of savings here is only 25 percent and represents the bottom end of the Northwind test results. Interestingly, if you add only a couple of traditional fields to the query, the ratio increases to 28 percent. The application shown in Figure 9-9 (included in this book’s sample files) is a useful tool for fine-tuning the structure of the table and the queries for better serialization results.

Deserializing Data

Once the binary data has been deserialized, you hold an instance of the ghost class that must be transformed back into a usable DataTable object. Here’s how the sample application accomplishes this:

DataTable BinaryDeserialize(string sourceFile)
{
    BinaryFormatter bf = new BinaryFormatter();
    StreamReader sr = new StreamReader(sourceFile);
    GhostDataTable ghost = 
        (GhostDataTable) bf.Deserialize(sr.BaseStream);  
    sr.Close();

    // Rebuild the DataTable object
    DataTable dt = new DataTable();

    // Add columns
    for(int i=0; i<ghost.colNames.Count; i++)
    {
        DataColumn col = new DataColumn(ghost.colNames[i].ToString(), 
            Type.GetType(ghost.colTypes[i].ToString()));     
        dt.Columns.Add(col);
    }

    // Add rows
    for(int i=0; i<ghost.dataRows.Count; i++)
    {
        DataRow row = dt.NewRow();
        row.ItemArray = (object[]) ghost.dataRows[i];
        dt.Rows.Add(row);
    }

    dt.AcceptChanges();
    return dt;
}

The information stored in the ghost arrays is used to add columns and rows to a newly created DataTable object. Figure 9-9 demonstrates the perfect equivalence of the objects obtained by deserializing a DataTable and a ghost class.

Caution

The ghost class used in the preceding sample code serializes the minimal amount of information necessary to rebuild the Data­Table object. You should add new properties to track other DataColumn or DataRow properties that are significant in your own application. Note that you can’t simply serialize the DataColumn and DataRow objects as a whole because none of them is marked as serializable.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.206.68