Working with Excel XML Spreadsheets

Problem

You want to export data from Excel to XML, but not in the native format supported by Microsoft.

Solution

If you have an Excel spreadsheet that looks like this:

Date

Price

Volume

20010817

61.88

260163

20010820

62.7

241859

20010821

60.78

233989

20010822

60.66

387444

Then the Excel XML version looks like this:

<?xml version="1.0"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-
microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:
ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/
REC-html40">
  <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
    <Author>Salvatore R. Mangano</Author>
    <LastAuthor>Salvatore R. Mangano</LastAuthor>
    <Created>2002-08-18T00:43:49Z</Created>
    <LastSaved>2002-08-18T02:19:21Z</LastSaved>
    <Company>Descriptix</Company>
    <Version>10.3501</Version>
  </DocumentProperties>
  <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
    <DownloadComponents/>
    <LocationOfComponents HRef="/"/>
  </OfficeDocumentSettings>
  <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
    <WindowHeight>9915</WindowHeight>
    <WindowWidth>10140</WindowWidth>
    <WindowTopX>240</WindowTopX>
    <WindowTopY>255</WindowTopY>
    <ProtectStructure>False</ProtectStructure>
    <ProtectWindows>False</ProtectWindows>
  </ExcelWorkbook>
  <Styles>
    <Style ss:ID="Default" ss:Name="Normal">
      <Alignment ss:Vertical="Bottom"/>
      <Borders/>
      <Font/>
      <Interior/>
      <NumberFormat/>
      <Protection/>
    </Style>
  </Styles>
  <Worksheet ss:Name="msft">
    <Table ss:ExpandedColumnCount="3" ss:ExpandedRowCount="5" x:FullColumns="1" 
    x:FullRows="1">
      <Row>
        <Cell>
          <Data ss:Type="String">Date</Data>
        </Cell>
        <Cell>
          <Data ss:Type="String">Price</Data>
        </Cell>
        <Cell>
          <Data ss:Type="String">Volume</Data>
        </Cell>
      </Row>
      <Row>
        <Cell>
          <Data ss:Type="Number">20010817</Data>
        </Cell>
        <Cell>
          <Data ss:Type="Number">61.88</Data>
        </Cell>
        <Cell>
          <Data ss:Type="Number">260163</Data>
        </Cell>
      </Row>
      <Row>
        <Cell>
          <Data ss:Type="Number">20010820</Data>
        </Cell>
        <Cell>
          <Data ss:Type="Number">62.7</Data>
        </Cell>
        <Cell>
          <Data ss:Type="Number">241859</Data>
        </Cell>
      </Row>
      <Row>
        <Cell>
          <Data ss:Type="Number">20010821</Data>
        </Cell>
        <Cell>
          <Data ss:Type="Number">60.78</Data>
        </Cell>
        <Cell>
          <Data ss:Type="Number">233989</Data>
        </Cell>
      </Row>
      <Row>
        <Cell>
          <Data ss:Type="Number">20010822</Data>
        </Cell>
        <Cell>
          <Data ss:Type="Number">60.66</Data>
        </Cell>
        <Cell>
          <Data ss:Type="Number">387444</Data>
        </Cell>
      </Row>
    </Table>
    <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
      <Selected/>
      <Panes>
        <Pane>
          <Number>3</Number>
          <ActiveRow>11</ActiveRow>
          <ActiveCol>5</ActiveCol>
        </Pane>
      </Panes>
      <ProtectObjects>False</ProtectObjects>
      <ProtectScenarios>False</ProtectScenarios>
    </WorksheetOptions>
  </Worksheet>
</Workbook>

which is probably not what you had in mind!

This example conveniently maps an Excel XML file to a simpler XML file. Many spreadsheets created in Excel have a structure in which the first row contains column names and subsequent rows contain data for those columns.

One obvious mapping would convert the column names into element names and the remaining cells into element content. The only missing pieces of information are the names of the top-level element and the element containing each row. This stylesheet takes these names as parameters with some obvious defaults. It converts some of the useful metadata into comments and throws away the Excel-specific stuff. This section provides several other parameters that increase the generality of the conversion, such as which row contains the column names, where the data starts, and what to do about empty cells:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:o="urn:schemas-microsoft-com:office:office" 
                xmlns:x="urn:schemas-microsoft-com:office:excel" 
                xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">

  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  
  <!-- The name of the top level element -->
  <xsl:param name="topLevelName" select=" 'Table' "/>
  <!-- The name of each row -->
  <xsl:param name="rowName" select=" 'Row' "/>
  <!-- The namespace to use -->
  <xsl:param name="namespace"/>
  <!-- The namespace prefix to use -->
  <xsl:param name="namespacePrefix"/>
  <!-- The character to use if column names contain white space -->
  <xsl:param name="wsSub" select="'_'"/>
  <!--Determines which row contains the col names-->
  <xsl:param name="colNamesRow" select="1"/>
  <!--Determines which row the data begins -->
  <xsl:param name="dataRowStart" select="2"/>
  <!-- If false then cells with null or whitespace only content -->
  <!-- will be skipped -->
  <xsl:param name="includeEmpty" select="true(  )"/>
  <!-- If false then author and creation meta data will not be put -->
  <!-- into a comment-->
  <xsl:param name="includeComment" select="true(  )"/>
  
  <!--Normalize the namespacePrefix -->
  <xsl:variable name="nsp">
    <xsl:if test="$namespace">
      <!-- Only use prefix if namespace is specified -->
      <xsl:choose>
        <xsl:when test="contains($namespacePrefix,':')">
          <xsl:value-of 
               select="concat(translate(substring-before(
                                            $namespacePrefix,
                                            ':'),' ',''),':')"/>
        </xsl:when>
        <xsl:when test="translate($namespacePrefix,' ','')">
          <xsl:value-of 
               select="concat(translate($namespacePrefix,' ',''),':')"/>
        </xsl:when>
        <xsl:otherwise/>
      </xsl:choose>
    </xsl:if>
  </xsl:variable>
  
  <!--Get the names of all the columns with white space replaced by  -->
  <xsl:variable name="COLS" select="/*/*/*/ss:Row[$colNamesRow]/ss:Cell"/>
  
  <xsl:template match="o:DocumentProperties">
    <xsl:if test="$includeComment">
      <xsl:text>&#xa;</xsl:text>
      <xsl:comment>
       <xsl:text>&#xa;</xsl:text>
        <xsl:if test="normalize-space(o:Company)">
          <xsl:text>Company: </xsl:text>
          <xsl:value-of select="o:Company"/>
          <xsl:text>&#xa;</xsl:text>
        </xsl:if>
        <xsl:text>Author: </xsl:text>
        <xsl:value-of select="o:Author"/>
        <xsl:text>&#xa;</xsl:text>
        <xsl:text>Created on: </xsl:text>
        <xsl:value-of select="translate(o:Created,'TZ',' ')"/>
        <xsl:text>&#xa;</xsl:text>
        <xsl:text>Last Author: </xsl:text>
        <xsl:value-of select="o:LastAuthor"/>
        <xsl:text>&#xa;</xsl:text>
        <xsl:text>Saved on:</xsl:text>
        <xsl:value-of select="translate(o:LastSaved,'TZ',' ')"/>
        <xsl:text>&#xa;</xsl:text>
      </xsl:comment>
    </xsl:if>
  </xsl:template>
  
  <xsl:template match="ss:Table">
    <xsl:element 
         name="{concat($nsp,translate($topLevelName,
                       '&#x20;&#x9;&#xA;',$wsSub))}" 
         namespace="{$namespace}">
      <xsl:apply-templates select="ss:Row[position(  ) >= $dataRowStart]"/>
    </xsl:element>
  </xsl:template>
  
  <xsl:template match="ss:Row">
    <xsl:element
        name="{concat($nsp,translate($rowName,
                      '&#x20;&#x9;&#xA;',$wsSub))}" 
        namespace="{$namespace}">
      <xsl:for-each select="ss:Cell">
        <xsl:variable name="pos" select="position(  )"/>
   
       <!-- Get the correct column name even if there were empty -->
       <!-- cols in original spreadsheet -->     
        <xsl:variable name="colName">
          <xsl:choose>
            <xsl:when test="@ss:Index and 
                            $COLS[@ss:Index = current(  )/@ss:Index]">
              <xsl:value-of 
                  select="$COLS[@ss:Index = current(  )/@ss:Index]/ss:Data"/>
            </xsl:when>
            <xsl:when test="@ss:Index">
              <xsl:value-of 
                    select="$COLS[number(current(  )/@ss:Index)]/ss:Data"/>
            </xsl:when>
            <xsl:otherwise>
              <xsl:value-of select="$COLS[$pos]/ss:Data"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:variable>
        
        <xsl:if test="$includeEmpty or 
                      translate(ss:Data,'&#x20;&#x9;&#xA;','')">
          <xsl:element
               name="{concat($nsp,translate($colName,
                                      '&#20;&#x9;&#xA;',$wsSub))}" 
               namespace="{$namespace}">
            <xsl:value-of select="ss:Data"/>
          </xsl:element>
        </xsl:if>
        
      </xsl:for-each>
    </xsl:element>
  </xsl:template>
  
  <xsl:template match="text(  )"/>
  
</xsl:stylesheet>

The result of the transformation, with default parameter values, is the much more direct XML representation that follows:

<Table>
  <Row>
    <Date>20010817</Date>
    <Price>61.88</Price>
    <Volume>260163</Volume>
  </Row>
  <Row>
    <Date>20010820</Date>
    <Price>62.7</Price>
    <Volume>241859</Volume>
  </Row>
  <Row>
    <Date>20010821</Date>
    <Price>60.78</Price>
    <Volume>233989</Volume>
  </Row>
  <Row>
    <Date>20010822</Date>
    <Price>60.66</Price>
    <Volume>387444</Volume>
  </Row>
</Table>

Discussion

I almost did not include this recipe in the book because it initially seemed trivial. However, I realized that the book needs to handle many special cases, and many implementations (including my first) would miss them. For example, spreadsheets often contain empty columns used as spacers. You need to know how to handle them by looking for the @ss:Index attribute. This book’s initial version also hardcoded many of the choices this version exposes as parameters.

At least one obvious additional extension could be made to this stylesheet: the handling of multiple ss:Worksheet elements. This handling could be done by specifying the worksheet number as a parameter:

  <xsl:param name="WSNum" select="1"/>
   
  <xsl:variable name="COLS" 
         select="/*/ss:Worksheet[$WSNum]/*/ss:Row[$colNamesRow]/ss:Cell"/>
   
  <xsl:template match="ss:Workbook">
    <xsl:element name="{concat($nsp,translate($topLevelName,
                      '&#x20;&#x9;&#xA;',$wsSub))}" 
                 namespace="{$namespace}">
      <xsl:apply-templates select="ss:Worksheet[number($WSNum)]/ss:Table"/>
    </xsl:element>
  </xsl:template>

A more ambitious solution handles each Worksheet in a multiple Worksheet document as a separate element in the resulting document. This setup means that the column names can no longer be handled as a global variable:

  <xsl:template match="ss:Workbook">
    <xsl:element name="{concat($nsp,translate($topLevelName,
                      '&#x20;&#x9;&#xA;',$wsSub))}" 
                 namespace="{$namespace}">
      <xsl:choose>
        <xsl:when test="number($WSNum) > 0">
          <xsl:apply-templates 
               select="ss:Worksheet[number($WSNum)]/ss:Table">
            <xsl:with-param name="COLS" 
                 select="ss:Worksheet[number($WSNum)]
                                         /*/ss:Row[$colNamesRow]/ss:Cell"/>
          </xsl:apply-templates>
        </xsl:when>
        <xsl:otherwise>
          <xsl:for-each select="ss:Worksheet">
            <xsl:element 
                 name="{concat($nsp,translate(@ss:Name,
                        '&#x20;&#x9;&#xA;',$wsSub))}"
                 namespace="{$namespace}">
              <xsl:apply-templates select="ss:Table">
                <xsl:with-param name="COLS" 
                                select="*/ss:Row[$colNamesRow]/ss:Cell"/>
              </xsl:apply-templates>
            </xsl:element>
          </xsl:for-each>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:element>
  </xsl:template>
  
  <xsl:template match="ss:Table">
    <xsl:param name="COLS"/>
      <xsl:apply-templates select="ss:Row[position(  ) >= $dataRowStart]">
        <xsl:with-param name="COLS" select="$COLS"/>
      </xsl:apply-templates>
  </xsl:template>
  
  <xsl:template match="ss:Row">
    <xsl:param name="COLS"/>
   
     <!-- The rest is the same as original ... -->
     
  </xsl:template>

The only trouble with this solution is that it assumes that the column names have to be in the same row in each worksheet.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.166.98