I am using Scientific.IO.NetCDF to read NetCDF data into Python. I am trying to read a 4d 32bit variable with size (366,30,476,460) but I end up with zeros in my ndarray. Strangely if I read just the 3d data (1,30,476,460), the returned values are ok.
This is what I am trying to do:
from Scientific.IO.NetCDF import NetCDFFile as Dataset
from collections import namedtuple
# Define output data structure as a named tuple
Roms_data=namedtuple('Roms_data','Ti Tf Nt U V W Zeta')
# Open the NetCDF file for reading.
ncfile = Dataset(data_file,'r')
if Tstart==-1:
ti=0
tf=NTsav-1
else:
ti=Tstart-1
tf=Tend-1
try:
udata = ncfile.variables['u'][:]
print str(udata.shape)
except:
print ' Failed to read u data from '+data_file
The "[:]" means i am reading the whole 4d variable 'u' into an ndarray called udata. This does not work and udata is full of zeros. However, if I do:
try:
udata = ncfile.variables['u'][0,:,:,:]
print str(udata.shape)
except:
print ' Failed to read u data from '+data_file
then "udata" that is now a 3d ndarray has the values it is supposed to read from the NetCDF file.
Any help? Thanks in advance.
It is unclear to me what may cause your problem, but I have one alternative suggestionyou may try. It seems you are reading NetCDF4 data output from a ROMS ocean model. I do this regularily, but I always prefer to use the netcdf-python module for this:
from netCDF4 import Dataset
cdf=Dataset("ns8km_avg_16482_GLORYS2V1.nc","r")
u=cdf.variables["u"][:]
One benefit of the netcdf-python module is that it automatically adjusts for offset, scale, and fill_value in a netcdf file. The 4D array read from the netcdf file will therefore contain masked values. I wonder if the masking in your approach is not done correctly. Perhaps you could try installing netcdf-python and read your data with this approach and hopefully it could help.
Cheers, Trond
Related
I want to store multiple GeoTiff files in one HDF5 file to use it for further analysis since the function I am supposed to use can just deal with HDF5 (so basically like a raster stack in R but stored in a HDF5). I have to use Python. I am relatively new to HDF5 format (and geoanalysis in Python generally) and don't really know how to approach this issue. Especially keeping the geolocation/projection inforation seems tricky to me. So far I tried:
import h5py
import rasterio
r1 = rasterio.open("filename.tif")
r2 = rasterio.open("filename2.tif")
with h5py.File('path/test.h5', 'w') as hdf:
hdf.create_dataset('GeoTiff1', data=r1)
hdf.create_dataset('GeoTiff2', data=r2)
Yielding the following errror:
TypeError: Object dtype dtype('O') has no native HDF5 equivalent
I am pretty sure this not at all the correct approach and I'm happy about any suggestions.
What you can try is to do this:
import numpy as np
spec_dtype = h5py.special_dtype(vlen=np.dtype('float64'))
Just make a spec_dtype variable with float64 type then apply this to create_dataset:
with h5py.File('path/test.h5', 'w') as hdf:
hdf.create_dataset('GeoTiff1', data=r1,, dtype=spec_dtype)
hdf.create_dataset('GeoTiff2', data=r2,, dtype=spec_dtype)
Apply these and hopefully it will work.
Using HDFql in Python, your use-case could be solved as follows:
import HDFql
HDFql.execute("SHOW FILE SIZE filename.tif, filename2.tif")
HDFql.cursor_next()
HDFql.execute("CREATE DATASET path/test.h5 GeoTiff1 AS OPAQUE(%d) VALUES FROM BINARY FILE filename.tif" % HDFql.cursor_get_bigint())
HDFql.cursor_next()
HDFql.execute("CREATE DATASET path/test.h5 GeoTiff2 AS OPAQUE(%d) VALUES FROM BINARY FILE filename2.tif" % HDFql.cursor_get_bigint())
I have a NetCDF file called air.sig995.2012.nc. it has four variables :
('lat','lon','time','air').
I am trying to read the values of any of the variables, lets say the variable air using below line:
import scipy.io.netcdf as S
fileobj=S.netcdf_file('air.sig995.2012.nc','r')
data=fileobj.variables['air'].getValue()
but it gives me below error:
ValueError: can only convert an array of size 1 to a Python scalar
I am fairly new to python. Can anyone help me on this one.
If not xarray, you can do this with the netcdf4-python library with the same slice syntax:
from netCDF4 import Dataset
nc = Dataset('air.sig995.2012.nc')
my_array = nc.variables['air'][:]
The output you're expecting is quite ambiguous but both methods should work depending on your specific goal in mind.
`import xarray as xr `
### open netcdf file ###
df = xr.open_dataset('path/file.nc')
### extract values of variable 'air' ####
air = df['air'][:]
air_flat = df['air'].values.flatten() # 1-d data
I am opening netcdf data from an opendap server (a subset of the data) using an URL. When I open it the data is (as far as I can see) not actually loaded until the variable is requested. I would like to save the data to a file on disk, how would I do this?
I currently have:
import numpy as np
import netCDF4 as NC
url = u'http://etc/etc/hourly?varname[0:1:10][0:1:30]'
set = NC.Dataset(url) # I think data is not yet loaded here, only the "layout"
varData = set.variables['varname'][:,:] # I think data is loaded here
# now i want to save this data to a file (for example test.nc), set.close() obviously wont work
Hope someone can help, thanks!
If you can use xarray, this should work as:
import xarray as xr
url = u'http://etc/etc/hourly?varname[0:1:10][0:1:30]'
ds = xr.open_dataset(url, engine='netcdf4') # or engine='pydap'
ds.to_netcdf('test.nc')
The xarray documentation has another example of how you could do this.
It's quite simple; create a new NetCDF file, and copy whatever you want :) Luckily this can be automated for a large part, in copying the correct dimensions, NetCDF attributes, ... from the input file. I quickly coded this example, the input file is also a local file, but if the reading with OPenDAP already works, it should work in a similar way.
import netCDF4 as nc4
# Open input file in read (r), and output file in write (w) mode:
nc_in = nc4.Dataset('drycblles.default.0000000.nc', 'r')
nc_out = nc4.Dataset('local_copy.nc', 'w')
# For simplicity; copy all dimensions (with correct size) to output file
for dim in nc_in.dimensions:
nc_out.createDimension(dim, nc_in.dimensions[dim].size)
# List of variables to copy (they have to be in nc_in...):
# If you want all vaiables, this could be replaced with nc_in.variables
vars_out = ['z', 'zh', 't', 'th', 'thgrad']
for var in vars_out:
# Create variable in new file:
var_in = nc_in.variables[var]
var_out = nc_out.createVariable(var, datatype=var_in.dtype, dimensions=var_in.dimensions)
# Copy NetCDF attributes:
for attr in var_in.ncattrs():
var_out.setncattr(attr, var_in.getncattr(attr))
# Copy data:
var_out[:] = var_in[:]
nc_out.close()
Hope it helps, if not let me know.
My question is simple.
With an wrfout file "out.nc" for example.
The file contain Geo2D, Geo3D and 1D variables.
Using GDAL package in Python 2.7, I can extract the Geo2D variables easily like this:
## T2 is 2-d variable means temperature 2 m above the ground
temp = gdal.Open('NETCDF:"'+"out.nc"+'":T2')
But when I want to use this code to extract 1d array, it failed.
## Time is 1-d array represent the timeseries throught the simulation period
time = gdal.Open('NETCDF:"'+"out.nc"+'":Time')
Nothing happened! Wish some one offer some advice to read any-dimension of WRF output variables easyily!
You can also use the NetCDF reader in scipy.io:
import scipy.io.netcdf as nc
# Open a netcdf file object and assign the data values to a variable
time = nc.netcdf_file('out.nc', 'r').variables['Time'][:]
It has the benefit of scipy being a very popular and widely installed package, while working similar to opening files in some respects.
I need to be able to quickly read lots of netCDF variables in python (1 variable per file). I'm finding that the Dataset function in netCDF4 library is rather slow compared to reading utilities in other languages (e.g., IDL).
My variables have shape of (2600,5200) and type float. They don't seem that big to me (filesize = 52Mb).
Here is my code:
import numpy as np
from netCDF4 import Dataset
import time
file = '20151120-235839.netcdf'
t0=time.time()
openFile = Dataset(file,'r')
raw_data = openFile.variables['MergedReflectivityQCComposite']
data = np.copy(raw_data)
openFile.close()
print time.time-t0
It takes about 3 seconds to read one variable (one file). I think the main slowdown is np.copy. raw_data is <type 'netCDF4.Variable'>, thus the copy. Is this the best/fastest way to do netCDF reads in python?
Thanks.
The power of Numpy is that you can create views into the exiting data in memory via the metadata it retains about the data. So a copy will always be slower than a view, via pointers. As JCOidl says it's not clear why you don't just use:
raw_data = openFile.variables['MergedReflectivityQCComposite'][:]
For more info see SciPy Cookbook and SO View onto a numpy array?
I'm not sure what to say about the np.copy operation (which is indeed slow), but I find that the PyNIO module from UCAR works well for both NetCDF and HDF files. This will place data into a numpy array:
import Nio
f = Nio.open_file(file, format="netcdf")
data = f.variables['MergedReflectivityQCComposite'][:]
f.close()
Testing your code versus the PyNIO code on a ndfCDF file I have resulted in 1.1 seconds for PyNIO, versus 3.1 seconds for the netCDF4 module. Your results may vary; worth a look though.
You can use xarray for that.
%matplotlib inline
import xarray as xr
### Single netcdf file ###
ds = xr.open_dataset('path/file.nc')
### Opening multiple NetCDF files and concatenating them by time ####
ds = xr.open_mfdatset('path/*.nc', concat_dim='time
To read the variable you can simply type ds.MergedReflectivityQCCompositeor ds.['MergedReflectivityQCComposite'][:]
You can also use xr.load_dataset but I find that it uses up more space than the open function. For xr.open_mfdataset, you can also chunk along the dimensions of the file if you want. There are other options for both functions and you might be interested to learn more about it in the xarray documentation.