Chapter 18. File Format Differences

In this chapter

A Brief History of File Formats 366

Using the New Binary File Format: BIFF12 368

Using the New XML File Formats: XLSX and XLSM 368

The Future of File Formats 370

Microsoft does not introduce file format changes lightly. File sharing becomes a nightmare when someone using Excel 2007 tries to share files with someone using Excel 2003.

Excel 2007 offers three new file formats. This chapter discusses these three file formats. Chapter 19, “Compatibility Mode,” discusses how you can continue to share files with people using older versions of Excel.

Note

A fourth file format, the .XLAM file type, can be used by developers to distribute add-ins to extend the functionality of Excel. This file type is beyond the scope of this book.

A Brief History of File Formats

Excel has traditionally stored workbooks in Binary Interchange File Format (BIFF). The BIFF specification has changed occasionally over time.

In 1993, when Excel expanded to 16,384 rows, Microsoft began using BIFF5 format. In 1993, most companies did not have corporate local area networks (LANs); a file format conversion therefore usually affected just one person on one computer. If you had upgraded from Excel 4 to Excel 5, as long as you had a way to convert your Excel 4 files to Excel’s new BIFF5 format, everything was fine.

In 1997, Microsoft introduced a major file change, BIFF8. This version of BIFF allowed 65,536 rows. The rise of the Internet and email meant that far more people were now sharing files. Excel 97 offered a way to save files in the old format in case you needed to share files with a person using prior versions of Excel.

All BIFF versions are proprietary formats. Figure 18.1 shows a simple Excel 2003 spreadsheet and the corresponding BIFF, as viewed in Notepad. You would certainly never be able to open a Notepad window and begin typing a new spreadsheet. Similarly, it would be very difficult for other applications to extract data from the BIFF format.

Figure 18.1. BIFF files are difficult for other applications to read.

Image

In Excel 2000, Microsoft flirted with a new HTML file format. By default, files were stored as XLS files in BIFF8 format. However, you could save a file as an HTML file and later open that HTML file in Excel 2000. With some limitations, most contents of the file and formatting could be successfully round-tripped from Excel to HTML and back to Excel.

This produced an interesting new paradigm: It would be possible for any program that could read or write text files to extract data from the Excel HTML file. Figure 18.2 shows the Figure 18.1 spreadsheet saved in HTML format. A program other than Excel could easily read or produce this format.

Figure 18.2. Starting in Excel 2000, you could choose to save your Excel files as HTML.

Image

Using HTML made sense in 1998–2000. The rise of the Internet made HTML a very popular format. However, although HTML is a great language for the display of information, it is not necessarily a smart language.

In 1998, the World Wide Web Consortium published the first 1.0 specification for a new language called Extensible Markup Language (XML), which presents data that any platform or application can read. Like HTML, XML is a simple text file that can be read or created with Notepad. Excel 2002 offered a way to export data in XML. Excel 2003 continued to use BIFF8 as the standard file format, but you could choose to save a workbook in XML format. When you later opened the XML file in Excel, all the formulas and formatting would be successfully round-tripped. XML in Excel 2003 did not support VBA or charts.

Figure 18.3 shows the spreadsheet from Figures 18.1 and 18.2, saved as XML in Excel 2003.

Figure 18.3. Excel 2003 offered support for XML formats.

Image

There are a number of advantages to XML. Because an XML file is a simple text file, any program can easily read data from it. This file format is also less prone to corruption than BIFF. If you randomly wipe out several bytes of a BIFF file, it is likely that the file will be corrupt and no longer open in Excel. If you truncate or corrupt several bytes of an XML file, the rest of the data is still readable in Excel.

Excel 2007 offers three new official file formats—BIFF12, XLSX, and XLSM—described in the following sections. In addition, Excel offers support for BIFF8 and even BIFF5, in case you have files floating around from Excel 95.

Using the New Binary File Format: BIFF12

With Excel 2007’s increase in rows and columns, BIFF8 would no longer work. Excel 2007 can save files in a new binary file format known as BIFF12. Files stored in BIFF12 have an .xlsb file extension. The Save As dialog box calls this type of file Excel Binary Workbook. For the first time, the binary workbook is not the default method for saving in Excel.

BIFF12 suffers from the same problems as all previous BIFF versions: It is difficult for other applications to read from or write to BIFF formats, and if parts of the BIFF12 file become corrupted or truncated, Excel has a difficult time successfully loading the file.

If you are extremely concerned with performance issues, you might want to use BIFF12 because a large BIFF12 file loads more quickly and saves more quickly than the new XML formats.

Using the New XML File Formats: XLSX and XLSM

XML in Excel 2003 was almost an ideal solution: Files could be round-tripped from Excel to XML and back to Excel, provided that the files did not include VBA macros, charts, or other embedded images.

Excel 2007 now offers complete 100% support for every feature in the new XML file formats. Workbooks can contain charts, tables, WordArt, SmartArt, shapes, and images. For security purposes, Excel supports XML file formats that are macro free and file formats that are macro-enabled. These are the two XML file formats that Excel 2007 supports:

  • XLSX—Files stored with the .xlsx extension are the default file type in Excel 2007. This XML file format does not allow macros.
  • XLSM—Files stored with the .xlsm extension are XML files that allow for the inclusion of VBA macros.

The new XLSX and XLSM file formats are actually zip files, which makes it easy to look inside the file formats.

Figure 18.4 shows a worksheet that has a number of elements. It has a table in Row 21, WordArt, SmartArt, a chart, a shape with a glow effect, and clip art with a reflection.

Figure 18.4. This workbook is saved in XLSM, one of the new XML formats.

Image

To look inside any Excel 2007 document, you follow these steps:

  1. In Windows Explorer, right-click the document name, which is in the format as filename.xlsm, and choose Rename.
  2. Change the file extension to .zip. Windows warns you that if you change a file extension, the file may become unusable.
  3. Click Yes to confirm the change.
  4. Open the ZIP file with WinZip or any other zip utility.

As shown in Figure 18.5, inside the ZIP file, you can see several XML components. The embedded image is included in the ZIP file. All the settings and styles, drawings, and data are stored as separate XML files within the ZIP file. Unzipped, these components would take up 159KB. Because they are zipped files, the data is stored in 29KB.

Figure 18.5. The components of the workbook are stored as XML files, zipped, and then renamed with an .xlsx or .xlsm extension.

Image

The Future of File Formats

Spreadsheet XML will dramatically change the way other applications can interact with Excel data. In May 2006, the 1.3 draft of the Office XML format specifications already spanned 4,000 pages of documentation. As developers become more familiar with the format, you can be sure that more applications will be able to consume Excel spreadsheets in XML.

For the current specification on Spreadsheet XML, use Google to search for “spreadsheet XML specification.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.51.157