Reading an Excel file

In this recipe, you will learn how to read and extract data from an Excel file. If you are a Windows or Mac user, changes are that you have worked in some capacity with Microsoft Excel, the de facto standard for spreadsheets. Excel files are often the default data interchange format of the enterprise. Entire departments are run using incredibly complex Excel files with thousands of rows and absurdly convoluted formulas. As a programmer, it is often the case that you are asked to interact with such files.

Getting ready

Groovy (and Java) does not offer any out-of-the-box ingredient to manipulate Excel files. Fortunately, there is no shortage of third-party libraries that are able to deal with Microsoft documents. The undisputed king of this realm is Apache POI, the Java API for Microsoft documents. With Apache POI, it is possible to parse as well as write Excel 2007 OOXML documents (.xlsx) and older formats (.xls).

How to do it...

Let's start with a simple Excel file with only a few columns:

How to do it...
  1. The script that follows, extracts the content of the Excel file depicted in the previous screenshot and prints it on the console:
    @Grab('org.apache.poi:poi:3.8')
    @Grab('org.apache.poi:poi-ooxml:3.8')
    @GrabExclude('xml-apis:xml-apis')
    import org.apache.poi.xssf.usermodel.XSSFWorkbook
    import org.apache.poi.xssf.usermodel.XSSFSheet
    def excelFile = new File('Workbook1.xlsx')
    excelFile.withInputStream { is ->
      workbook = new XSSFWorkbook(is)
      (0..<workbook.numberOfSheets).each { sheetNumber ->
        XSSFSheet sheet = workbook.getSheetAt( sheetNumber )
        sheet.rowIterator().each { row ->
          row.cellIterator().each { cell ->
            println cell.toString()
          }
        }
      }
    }
  2. Running this script displays the following output:
    100.0
    A
    200.0
    B
    300.0
    C
    400.0
    D

How it works...

The recipe's code uses Grape (see the Simplifying dependency management with Grape recipe in Chapter 2, Using Groovy Ecosystem) to fetch the necessary dependencies: Apache POI and POI OOXML, the latter required to deal with the post-2007 Excel files. The GrabExclude explicitly excludes the xml-apis library to avoid conflicts with the Groovy XML parser. Please note that the code imports classes from the org.apache.poi.xssf, the package used to access the newer Excel format. For the older format (pre-2007), we would have imported classes from the org.apache.poi.xssf package.

The script creates an XSSFWorkbook instance from the Workbook1.xlsx file and iterates on all the sheets found in the document. For each sheet, we use two nested closures (one for the rows and one for the current row's cells) to traverse the cells that have content and print the value on the console.

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.82.154