My question is simple.
With an wrfout file "out.nc" for example.
The file contain Geo2D, Geo3D and 1D variables.
Using GDAL package in Python 2.7, I can extract the Geo2D variables easily like this:
## T2 is 2-d variable means temperature 2 m above the ground
temp = gdal.Open('NETCDF:"'+"out.nc"+'":T2')
But when I want to use this code to extract 1d array, it failed.
## Time is 1-d array represent the timeseries throught the simulation period
time = gdal.Open('NETCDF:"'+"out.nc"+'":Time')
Nothing happened! Wish some one offer some advice to read any-dimension of WRF output variables easyily!
You can also use the NetCDF reader in scipy.io:
import scipy.io.netcdf as nc
# Open a netcdf file object and assign the data values to a variable
time = nc.netcdf_file('out.nc', 'r').variables['Time'][:]
It has the benefit of scipy being a very popular and widely installed package, while working similar to opening files in some respects.
Related
I have a NetCDF file called air.sig995.2012.nc. it has four variables :
('lat','lon','time','air').
I am trying to read the values of any of the variables, lets say the variable air using below line:
import scipy.io.netcdf as S
fileobj=S.netcdf_file('air.sig995.2012.nc','r')
data=fileobj.variables['air'].getValue()
but it gives me below error:
ValueError: can only convert an array of size 1 to a Python scalar
I am fairly new to python. Can anyone help me on this one.
If not xarray, you can do this with the netcdf4-python library with the same slice syntax:
from netCDF4 import Dataset
nc = Dataset('air.sig995.2012.nc')
my_array = nc.variables['air'][:]
The output you're expecting is quite ambiguous but both methods should work depending on your specific goal in mind.
`import xarray as xr `
### open netcdf file ###
df = xr.open_dataset('path/file.nc')
### extract values of variable 'air' ####
air = df['air'][:]
air_flat = df['air'].values.flatten() # 1-d data
I need to be able to quickly read lots of netCDF variables in python (1 variable per file). I'm finding that the Dataset function in netCDF4 library is rather slow compared to reading utilities in other languages (e.g., IDL).
My variables have shape of (2600,5200) and type float. They don't seem that big to me (filesize = 52Mb).
Here is my code:
import numpy as np
from netCDF4 import Dataset
import time
file = '20151120-235839.netcdf'
t0=time.time()
openFile = Dataset(file,'r')
raw_data = openFile.variables['MergedReflectivityQCComposite']
data = np.copy(raw_data)
openFile.close()
print time.time-t0
It takes about 3 seconds to read one variable (one file). I think the main slowdown is np.copy. raw_data is <type 'netCDF4.Variable'>, thus the copy. Is this the best/fastest way to do netCDF reads in python?
Thanks.
The power of Numpy is that you can create views into the exiting data in memory via the metadata it retains about the data. So a copy will always be slower than a view, via pointers. As JCOidl says it's not clear why you don't just use:
raw_data = openFile.variables['MergedReflectivityQCComposite'][:]
For more info see SciPy Cookbook and SO View onto a numpy array?
I'm not sure what to say about the np.copy operation (which is indeed slow), but I find that the PyNIO module from UCAR works well for both NetCDF and HDF files. This will place data into a numpy array:
import Nio
f = Nio.open_file(file, format="netcdf")
data = f.variables['MergedReflectivityQCComposite'][:]
f.close()
Testing your code versus the PyNIO code on a ndfCDF file I have resulted in 1.1 seconds for PyNIO, versus 3.1 seconds for the netCDF4 module. Your results may vary; worth a look though.
You can use xarray for that.
%matplotlib inline
import xarray as xr
### Single netcdf file ###
ds = xr.open_dataset('path/file.nc')
### Opening multiple NetCDF files and concatenating them by time ####
ds = xr.open_mfdatset('path/*.nc', concat_dim='time
To read the variable you can simply type ds.MergedReflectivityQCCompositeor ds.['MergedReflectivityQCComposite'][:]
You can also use xr.load_dataset but I find that it uses up more space than the open function. For xr.open_mfdataset, you can also chunk along the dimensions of the file if you want. There are other options for both functions and you might be interested to learn more about it in the xarray documentation.
I am into a task where I am converting .m files to .py. But to test the code I have to dump or log values of each variables for both Python and Matlab in some log files.
Then I compare each after opening them in Excel sheet using its column property. Like what is the array index - what each index / column-row value is etc. This is very tiresome and I am not sure how we can compare variable / statements output for a specific variable programmatically in regards that it is just a .m to .py conversion.
You can run the program in Matlab and save all the variables using the save command. This saves to a .mat file. Then you can load the variables from that file into python using scipy.io.loadmat and compare them in python.
First, in matlab:
save 'data.mat' var1 var2 var3
Then in python (in the same directory, or provide a full path):
import scipy.io
vars = scipy.io.loadmat('data.mat', squeeze_me=True)
var1_matlab = vars['var1']
var2_matlab = vars['var2']
var3_matlab = vars['var3']
Note that numpy has 1D arrays, while Matlab does not (1D arrays in Matlab are actually 2D arrays where one dimension has a length of 1). This may mean that the number of dimensions in the python and scipy versions of a variable are different. squeeze_me fixes this by eliminating dimensions with a length of 1, but it may, for example, take a 2D array from Matlab that happens to just have a length of 1 in some dimension and squeezes that to a 1D python array. So you may have to do some manual dimension matching no matter what.
To get this to work, make sure matlab is configured to save files in the "MATLAB Version 5 or later" file format, though (in 2014B this is in preferences>General>MAT-Files).
If you absolutely must use version 7 files, you can try hdf5storage, which says it supports them. However, you probably have scipy already installed, and I have personally used the scipy approach and confirmed it worked but have not done the same with hdf5storage.
My usual method for extracting the min/max of a variable's data values from a NetCDF file is a magnitude of order slower when switching to the netCDF4 Python module compared to scipy.io.netcdf.
I am working with relatively large ocean model output files (from ROMS) with multiple depth levels over a given map region (Hawaii). When these were in NetCDF-3, I used scipy.io.netcdf.
Now that these files are in NetCDF-4 ("Classic") I can no longer use scipy.io.netcdf and have instead switched over to using the netCDF4 Python module. However, the slowness is a concern and I wondered if there is a more efficient method of extracting a variable's data range (minimum and maximum data values)?
Here was my NetCDF-3 method using scipy:
import scipy.io.netcdf
netcdf = scipy.io.netcdf.netcdf_file(file)
var = netcdf.variables['sea_water_potential_temperature']
min = var.data.min()
max = var.data.max()
Here is my NetCDF-4 method using netCDF4:
import netCDF4
netcdf = netCDF4.Dataset(file)
var = netcdf.variables['sea_water_potential_temperature']
var_array = var.data.flatten()
min = var_array.data.min()
max = var_array.data.max()
The notable difference is that I must first flatten the data array in netCDF4, and this operation apparently slows things down.
Is there a better/faster way?
Per suggestion of hpaulj here is a function that calls the nco command ncwa using subprocess. It hangs terribly when using an OPeNDAP address, and I don't have any files on hand to test it locally.
You can see if it works for you and what the speed difference is.
This assumes you have the nco library installed.
def ncwa(path, fnames, var, op_type, times=None, lons=None, lats=None):
'''Perform arithmetic operations on netCDF file or OPeNDAP data
Args
----
path: str
prefix
fnames: str or iterable
Names of file(s) to perform operation on
op_type: str
ncwa arithmetic operation to perform. Available operations are:
avg,mabs,mebs,mibs,min,max,ttl,sqravg,avgsqr,sqrt,rms,rmssdn
times: tuple
Minimum and maximum timestamps within which to perform the operation
lons: tuple
Minimum and maximum longitudes within which to perform the operation
lats: tuple
Minimum and maximum latitudes within which to perform the operation
Returns
-------
result: float
Result of the operation on the selected data
Note
----
Adapted from the OPeNDAP examples in the NCO documentation:
http://nco.sourceforge.net/nco.html#OPeNDAP
'''
import os
import netCDF4
import numpy
import subprocess
output = 'tmp_output.nc'
# Concatenate subprocess command
cmd = ['ncwa']
cmd.extend(['-y', '{}'.format(op_type)])
if times:
cmd.extend(['-d', 'time,{},{}'.format(times[0], times[1])])
if lons:
cmd.extend(['-d', 'lon,{},{}'.format(lons[0], lons[1])])
if lats:
cmd.extend(['-d', 'lat,{},{}'.format(lats[0], lats[1])])
cmd.extend(['-p', path])
cmd.extend(numpy.atleast_1d(fnames).tolist())
cmd.append(output)
# Run cmd and check for errors
subprocess.run(cmd, stdout=subprocess.PIPE, check=True)
# Load, read, close data and delete temp .nc file
data = netCDF4.Dataset(output)
result = float(data[var][:])
data.close()
os.remove(output)
return result
path = 'https://ecowatch.ncddc.noaa.gov/thredds/dodsC/hycom/hycom_reg6_agg/'
fname = 'HYCOM_Region_6_Aggregation_best.ncd'
times = (0.0, 48.0)
lons = (201.5, 205.5)
lats = (18.5, 22.5)
smax = ncwa(path, fname, 'salinity', 'max', times, lons, lats)
If you're just getting the min/max values across an array of a variable, you can use xarray.
%matplotlib inline
import xarray as xr
da = xr.open_dataset('infile/file.nc')
max = da.sea_water_potential_temperature.max()
min = da.sea_water_potential_temperature.min()
This should give you a single value of min/max, respectively. You could also get the min/max of a variable across a selected dimension like time, longitude, latitude etc. Xarray is great for handling multidimensional arrays that is why it's pretty easy to handle NetCDF in python when you're not using other operating tools like CDO and NCO.
Lastly, xarray is also used in other related libraries that deals with weather and climate data in python ( http://xarray.pydata.org/en/stable/related-projects.html ).
A Python solution (using CDO as a backend) is my package nctoolkit (https://pypi.org/project/nctoolkit/ https://nctoolkit.readthedocs.io/en/latest/installing.html).
This has a number of built in methods for calculating different types of min/max values.
We would first need to read the file in as a dataset:
import nctoolkit as nc
data = nc.open_data(file)
If you wanted the maximum value across space, for each timestep, you would do the following:
data.spatial_max()
Maximum across all depths for each grid cell and time step would be calculated as follows:
data.vertical_max()
If you wanted the maximum across time, you would do:
data.max()
These methods are chainable, and the CDO backend is very efficient, so should be ideal for working with ROMS data.
I am using Scientific.IO.NetCDF to read NetCDF data into Python. I am trying to read a 4d 32bit variable with size (366,30,476,460) but I end up with zeros in my ndarray. Strangely if I read just the 3d data (1,30,476,460), the returned values are ok.
This is what I am trying to do:
from Scientific.IO.NetCDF import NetCDFFile as Dataset
from collections import namedtuple
# Define output data structure as a named tuple
Roms_data=namedtuple('Roms_data','Ti Tf Nt U V W Zeta')
# Open the NetCDF file for reading.
ncfile = Dataset(data_file,'r')
if Tstart==-1:
ti=0
tf=NTsav-1
else:
ti=Tstart-1
tf=Tend-1
try:
udata = ncfile.variables['u'][:]
print str(udata.shape)
except:
print ' Failed to read u data from '+data_file
The "[:]" means i am reading the whole 4d variable 'u' into an ndarray called udata. This does not work and udata is full of zeros. However, if I do:
try:
udata = ncfile.variables['u'][0,:,:,:]
print str(udata.shape)
except:
print ' Failed to read u data from '+data_file
then "udata" that is now a 3d ndarray has the values it is supposed to read from the NetCDF file.
Any help? Thanks in advance.
It is unclear to me what may cause your problem, but I have one alternative suggestionyou may try. It seems you are reading NetCDF4 data output from a ROMS ocean model. I do this regularily, but I always prefer to use the netcdf-python module for this:
from netCDF4 import Dataset
cdf=Dataset("ns8km_avg_16482_GLORYS2V1.nc","r")
u=cdf.variables["u"][:]
One benefit of the netcdf-python module is that it automatically adjusts for offset, scale, and fill_value in a netcdf file. The 4D array read from the netcdf file will therefore contain masked values. I wonder if the masking in your approach is not done correctly. Perhaps you could try installing netcdf-python and read your data with this approach and hopefully it could help.
Cheers, Trond