In this chapter
A Brief History of File Formats 366
Using the New Binary File Format: BIFF12 368
Using the New XML File Formats: XLSX and XLSM 368
The Future of File Formats 370
Microsoft does not introduce file format changes lightly. File sharing becomes a nightmare when someone using Excel 2007 tries to share files with someone using Excel 2003.
Excel 2007 offers three new file formats. This chapter discusses these three file formats. Chapter 19, “Compatibility Mode,” discusses how you can continue to share files with people using older versions of Excel.
Note
A fourth file format, the .XLAM file type, can be used by developers to distribute add-ins to extend the functionality of Excel. This file type is beyond the scope of this book.
Excel has traditionally stored workbooks in Binary Interchange File Format (BIFF). The BIFF specification has changed occasionally over time.
In 1993, when Excel expanded to 16,384 rows, Microsoft began using BIFF5 format. In 1993, most companies did not have corporate local area networks (LANs); a file format conversion therefore usually affected just one person on one computer. If you had upgraded from Excel 4 to Excel 5, as long as you had a way to convert your Excel 4 files to Excel’s new BIFF5 format, everything was fine.
In 1997, Microsoft introduced a major file change, BIFF8. This version of BIFF allowed 65,536 rows. The rise of the Internet and email meant that far more people were now sharing files. Excel 97 offered a way to save files in the old format in case you needed to share files with a person using prior versions of Excel.
All BIFF versions are proprietary formats. Figure 18.1 shows a simple Excel 2003 spreadsheet and the corresponding BIFF, as viewed in Notepad. You would certainly never be able to open a Notepad window and begin typing a new spreadsheet. Similarly, it would be very difficult for other applications to extract data from the BIFF format.
In Excel 2000, Microsoft flirted with a new HTML file format. By default, files were stored as XLS files in BIFF8 format. However, you could save a file as an HTML file and later open that HTML file in Excel 2000. With some limitations, most contents of the file and formatting could be successfully round-tripped from Excel to HTML and back to Excel.
This produced an interesting new paradigm: It would be possible for any program that could read or write text files to extract data from the Excel HTML file. Figure 18.2 shows the Figure 18.1 spreadsheet saved in HTML format. A program other than Excel could easily read or produce this format.
Using HTML made sense in 1998–2000. The rise of the Internet made HTML a very popular format. However, although HTML is a great language for the display of information, it is not necessarily a smart language.
In 1998, the World Wide Web Consortium published the first 1.0 specification for a new language called Extensible Markup Language (XML), which presents data that any platform or application can read. Like HTML, XML is a simple text file that can be read or created with Notepad. Excel 2002 offered a way to export data in XML. Excel 2003 continued to use BIFF8 as the standard file format, but you could choose to save a workbook in XML format. When you later opened the XML file in Excel, all the formulas and formatting would be successfully round-tripped. XML in Excel 2003 did not support VBA or charts.
Figure 18.3 shows the spreadsheet from Figures 18.1 and 18.2, saved as XML in Excel 2003.
There are a number of advantages to XML. Because an XML file is a simple text file, any program can easily read data from it. This file format is also less prone to corruption than BIFF. If you randomly wipe out several bytes of a BIFF file, it is likely that the file will be corrupt and no longer open in Excel. If you truncate or corrupt several bytes of an XML file, the rest of the data is still readable in Excel.
Excel 2007 offers three new official file formats—BIFF12, XLSX, and XLSM—described in the following sections. In addition, Excel offers support for BIFF8 and even BIFF5, in case you have files floating around from Excel 95.
With Excel 2007’s increase in rows and columns, BIFF8 would no longer work. Excel 2007 can save files in a new binary file format known as BIFF12. Files stored in BIFF12 have an .xlsb
file extension. The Save As dialog box calls this type of file Excel Binary Workbook. For the first time, the binary workbook is not the default method for saving in Excel.
BIFF12 suffers from the same problems as all previous BIFF versions: It is difficult for other applications to read from or write to BIFF formats, and if parts of the BIFF12 file become corrupted or truncated, Excel has a difficult time successfully loading the file.
If you are extremely concerned with performance issues, you might want to use BIFF12 because a large BIFF12 file loads more quickly and saves more quickly than the new XML formats.
XML in Excel 2003 was almost an ideal solution: Files could be round-tripped from Excel to XML and back to Excel, provided that the files did not include VBA macros, charts, or other embedded images.
Excel 2007 now offers complete 100% support for every feature in the new XML file formats. Workbooks can contain charts, tables, WordArt, SmartArt, shapes, and images. For security purposes, Excel supports XML file formats that are macro free and file formats that are macro-enabled. These are the two XML file formats that Excel 2007 supports:
.xlsx
extension are the default file type in Excel 2007. This XML file format does not allow macros..xlsm
extension are XML files that allow for the inclusion of VBA macros.The new XLSX and XLSM file formats are actually zip files, which makes it easy to look inside the file formats.
Figure 18.4 shows a worksheet that has a number of elements. It has a table in Row 21, WordArt, SmartArt, a chart, a shape with a glow effect, and clip art with a reflection.
To look inside any Excel 2007 document, you follow these steps:
filename
.xlsm
, and choose Rename..zip
. Windows warns you that if you change a file extension, the file may become unusable.As shown in Figure 18.5, inside the ZIP file, you can see several XML components. The embedded image is included in the ZIP file. All the settings and styles, drawings, and data are stored as separate XML files within the ZIP file. Unzipped, these components would take up 159KB. Because they are zipped files, the data is stored in 29KB.
.xlsx
or .xlsm
extension.Spreadsheet XML will dramatically change the way other applications can interact with Excel data. In May 2006, the 1.3 draft of the Office XML format specifications already spanned 4,000 pages of documentation. As developers become more familiar with the format, you can be sure that more applications will be able to consume Excel spreadsheets in XML.
For the current specification on Spreadsheet XML, use Google to search for “spreadsheet XML specification.”
3.149.27.72