netCDF4

netCDF4 is the fourth version of the netCDF library that's implemented on top of HDF5 (Hierarchical Data Format, designed to store and organize large amounts of data), which makes it possible to manage extremely large and complex multidimensional data. The greatest advantage of netCDF4 is that it is a completely portable file format with no limit on the number or size of data objects in a collection, and it's appendable while being archivable as well. Many scientific research organizations use it for data storage. Python also has an interface to access and create this type of data format.

You can download and install the module from its official documentation page at http://unidata.github.io/netcdf4-python/, or clone it from its GitHub repository at https://github.com/Unidata/netcdf4-python. It's not included in the standard Python Scientific distribution, but it's built into NumPy and can build with Cython (this is recommended but not required).

For the following example, we are going to use the sample netCDF4 file on the Unidata website at  http://www.unidata.ucar.edu/software/netcdf/examples/files.html, and we will use the climate system model as an example: sresa1b_ncar_ccsm3-example.nc

First, we will use the netCDF4 module to explore the dataset a bit, and extract the values we need for further analysis:

In [1]: import netCDF4 as nc 
In [2]: dataset = nc.Dataset('sresa1b_ncar_ccsm3-example.nc', 'r') 
In [3]: variables = [var for var in dataset.variables] 
In [4]: variables 
Out[4]: 
['area', 'lat', 'lat_bnds', 'lon', 'lon_bnds', 'msk_rgn', 'plev', 
'pr', 'tas', 'time', 'time_bnds', 'ua'] 

We imported the python netCDF4 module, and we used the Dataset() function to read the sample netCDF4 file. The r parameter means the file is in read-only mode, so we can also specify a when we want to append the file or w to create a new file. Then, we obtained all the variables stored in the dataset and saved them to a list called variables (note that the variables attribute will return a Python dictionary of the object of the variables). Lastly, we printed out the variables in the dataset using this command:

In [5]: precipitation = dataset.variables['pr'] 
In [6]: precipitation.standard_name 
Out[6]: 'precipitation_flux' 
In [7]: precipitation.missing_value 
Out[7]: 1e+20 
In [8]: precipitation.ndim 
Out[8]: 3 
In [9]: precipitation.shape 
Out[9]: (1, 128, 256) 
In [10]: precipitation[:, 1, :10] 
Out[10]: 
array([[  8.50919207e-07,   8.01471970e-07,   7.74396426e-07, 
          7.74230614e-07,   7.47181844e-07,   7.21426375e-07, 
          7.19294349e-07,   6.99790974e-07,   6.83397502e-07, 
          6.74683179e-07]], dtype=float32) 

In the preceding example, we picked a variable named pr and saved it to precipitation. As we all know netCDF4 is a self-describing file format; you can create and access any user-defined attribute stored in the variable, though the most common one is standard_name, which tells us that the variable represents the precipitation flux. We checked another commonly used attribute, missing_value, which represents the no-data value stored in the netCDF4 file. Then, we printed the number of dimensions of the precipitation variable by its ndim and the shape by the shape attribute. Lastly, we want to get the value of row 1, that is, the first 10 columns in the netCDF4 file; to do this, just use the indexing as we always do.

Next, we are going to cover the basics of creating a netCDF4 file and storing a three-dimensional NumPy ndarray as a variable:

In [11]: import numpy as np 
In [12]: time = np.arange(10) 
In [13]: lat = 54 + np.random.randn(8) 
In [14]: lon = np.random.randn(6) 
In [15]: data = np.random.randn(480).reshape(10, 8, 6) 

First, we prepared a three-dimensional ndarray (data) to store in the netCDF4 file; the data is built in three dimensions, which are time (time, size of 10), latitude (lat, size of 8), and longitude (lon, size of 6). In netCDF4, time is not a datetime object, but the number of time units (these can be seconds, hours, days, and so on) from the defined start time (specified in the unit attribute; we will explain this to you later). Now, we have all the data we want to store in the file, so let's build the netCDF structure:

In [16]: output = nc.Dataset('test_output.nc', 'w') 
In [17]: output.createDimension('time', 10) 
In [18]: output.createDimension('lat', 8) 
In [19]: output.createDimension('lon', 6) 
In [20]: time_var = output.createVariable('time', 'f4', ('time',)) 
In [21]: time_var[:] = time 
In [22]: lat_var = output.createVariable('lat', 'f4', ('lat',)) 
In [23]: lat_var[:] = lat 
In [24]: lon_var = output.createVariable('lon', 'f4', ('lon',)) 
In [25]: lon_var[:] = lon 

We initialized the netCDF4 file by specifying the file path and using the w write mode. Then, we built the structure using createDimension()to specify the dimensions: timelat, and lon. Each dimension has a variable to represent its values, just like the scales for an axis. Next, we are going to save the three-dimensional data to the file:

In [26]: var = output.createVariable('test', 'f8', ('time', 'lat', 'lon')) 
In [27]: var[:] = data 

The creation of a variable always starts with the createVariable()function and specifies the variable name, variable datatype, and the dimensions associated with it. The second step is to pass the same shape of ndarray into the declared variable. Now that we have the entire data store in the file, we can specify the attribute to help describe the dataset. The following example uses the time variable to show how we can specify the attribute:

In [28]: time_var.standard_name = 'Time' 
In [29]: time_var.units = 'days since 2015-01-01 00:00:00' 
In [30]: time_var.calendar = 'gregorian' 

So, now that the time variable has the unit and calendar associated with it, the ndarray time will be converted to a date based on the unit and calendar that we specified; this is similar to all the variables. When the creation of netCDF4 file is done, the last step is to close the file connection:

In [31]: output.close() 

The preceding code shows you the usage of Python netCDF4 API in order to read and create a netCDF4 file. This module doesn't include any scientific computations (so it's not included in any Python scientific distribution), but the target is in the interface for the file I/O, which can be the very first or last stage in your research and analytics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.76.164