In this recipe, you will learn how to read and extract data from an Excel file. If you are a Windows or Mac user, changes are that you have worked in some capacity with Microsoft Excel, the de facto standard for spreadsheets. Excel files are often the default data interchange format of the enterprise. Entire departments are run using incredibly complex Excel files with thousands of rows and absurdly convoluted formulas. As a programmer, it is often the case that you are asked to interact with such files.
Groovy (and Java) does not offer any out-of-the-box ingredient to manipulate Excel files. Fortunately, there is no shortage of third-party libraries that are able to deal with Microsoft documents. The undisputed king of this realm is Apache POI, the Java API for Microsoft documents. With Apache POI, it is possible to parse as well as write Excel 2007 OOXML documents (.xlsx
) and older formats (.xls
).
Let's start with a simple Excel file with only a few columns:
@Grab('org.apache.poi:poi:3.8') @Grab('org.apache.poi:poi-ooxml:3.8') @GrabExclude('xml-apis:xml-apis') import org.apache.poi.xssf.usermodel.XSSFWorkbook import org.apache.poi.xssf.usermodel.XSSFSheet def excelFile = new File('Workbook1.xlsx') excelFile.withInputStream { is -> workbook = new XSSFWorkbook(is) (0..<workbook.numberOfSheets).each { sheetNumber -> XSSFSheet sheet = workbook.getSheetAt( sheetNumber ) sheet.rowIterator().each { row -> row.cellIterator().each { cell -> println cell.toString() } } } }
100.0 A 200.0 B 300.0 C 400.0 D
The recipe's code uses Grape (see the Simplifying dependency management with Grape recipe in Chapter 2, Using Groovy Ecosystem) to fetch the necessary dependencies: Apache POI and POI OOXML, the latter required to deal with the post-2007 Excel files. The GrabExclude
explicitly excludes the xml-apis
library to avoid conflicts with the Groovy XML parser. Please note that the code imports classes from the org.apache.poi.xssf
, the package used to access the newer Excel format. For the older format (pre-2007), we would have imported classes from the org.apache.poi.xssf
package.
The script creates an XSSFWorkbook
instance from the Workbook1.xlsx
file and iterates on all the sheets found in the document. For each sheet, we use two nested closures (one for the rows and one for the current row's cells) to traverse the cells that have content and print the value on the console.
3.144.82.154