Unable to read variable from netCDF file - python

I'm trying to read a specific variable from a netCDF file but have been unable to do so. the variable is data type "int16" and has five dimensions integrated into it (e.g. latitude, longitude, time, elevation, etc.). So far, here is what I've tried and, consequently, here are the errors I've received.
import netCDF4 as nc
import numpy as np
x = nc.Dataset('file.nc')
lat = x.variables('lat')
print(lat)
shape = x.variables('shape')
The error reads:
"Attribute Error: netCDF4\_netCDF4.pyx in
netCDF4._netCDF4.Dataset._getattr_()
netCDF4._netCDF4.Dataset._getncattr_()
netCDF4._netCDF4.Dataset._get_att_()
netCDF4._netCDF4.Dataset._ensure_nc_success_()
Attribute Error: NetCDF: Attribute not found.
Any help would be greatly appreciated! Thanks!

Some typo's above. Try this:
import netCDF4 as nc
import numpy as np
x = nc.Dataset('file.nc')
lat = x.variables['lat']
print(lat)
shape = x.variables['lat'].shape
# or
shape = lat.shape
print(shape)

Related

Issues with Scipy interpolate griddata

I have a netcdf file with a spatial resolution of 0.05º and I want to regrid it to a spatial resolution of 0.01º like this other netcdf. I tried using scipy.interpolate.griddata, but I am not really getting there, I think there is something that I am missing.
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
According to scipy.interpolate.griddata documentation, I need to construct my interpolation pipeline as following:
grid = griddata(points, values, (grid_x_new, grid_y_new),
method='nearest')
So in my case, I assume it would be as following:
#Saving in variables the old and new grids
grid_x_new = target_dataset['lon']
grid_y_new = target_dataset['lat']
grid_x_old = original_dataset ['lon']
grid_y_old = original_dataset ['lat']
points = (grid_x_old,grid_y_old)
values = original_dataset['analysed_sst'] #My variable in the netcdf is the sea surface temp.
Now, when I run griddata:
from scipy.interpolate import griddata
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')
I am getting the following error:
ValueError: shape mismatch: objects cannot be broadcast to a single
shape
I assume it has something to do with the lat/lon array shapes. I am quite new to netcdf field and don't really know what can be the issue here. Any help would be very appreciated!
In your original code the indices in grid_x_old and grid_y_old should correspond to each unique coordinate in the dataset. To get things working correctly something like the following will work:
import xarray as xr
from scipy.interpolate import griddata
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
#Saving in variables the old and new grids
grid_x_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_old = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
grid_x_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lon
grid_y_new = target_dataset.to_dataframe().reset_index().loc[:,["lat", "lon"]].lat
values = original_dataset.to_dataframe().reset_index().loc[:,["lat", "lon", "analysed_sst"]].analysed_sst
points = (grid_x_old,grid_y_old)
grid = griddata(points, values, (grid_x_new, grid_y_new),method='nearest')
I recommend using xesm for regridding xarray datasets. The code below will regrid your dataset:
import xarray as xr
import xesmf as xe
original_dataset = xr.open_dataset('to_regrid.nc')
target_dataset= xr.open_dataset('SSTA_L4_MED_0_1dg_2022-01-18.nc')
regridder = xe.Regridder(original_dataset, target_dataset, "bilinear")
dr_out = regridder(original_dataset)

xarray cannot directly convert an xarray.Dataset into a numpy array

I'm experiencing an error that I cannot seem to resolve when attempting to convert from a Dataset to array when using xarray. I'm encountering this because I'm attempting to add a time dimension to a netcdf file (open netcdf, add a timestamp that is the same across all data, save out netcdf).
import xarray as xr
import pandas as pd
scriptpath = os.path.dirname(os.path.abspath(__file__))
outputfile = scriptpath + '\\20210629_deadgrass.aus.nc'
times= pd.to_datetime(str(yesterday.strftime('%Y%m%d')))
time_da = xr.Dataset({"time": times})
arr = xr.open_dataset(outputfile)
ds = arr.to_array()
dst = ds.expand_dims(time=time_da) #errors here
The error I'm receiving is
Exception has occurred: TypeError
cannot directly convert an xarray.Dataset into a numpy array. Instead, create an xarray.DataArray first, either with indexing on the Dataset or by invoking the `to_array()` method.
File "Z:\UpdateAussieGRASS.py", line 101, in <module>
dst = ds.expand_dims(time=time_da)
I can't seem to work out what I'm doing wrong with to_array() in the second last line. Examples of to_array() are here. Autogenerated documentation is here.
ds is already an xarray.DataArray. The error occurs on this line:
dst = ds.expand_dims(time=time_da) #errors here
Because while ds is a DataArray, time_da is not. This should work:
dst = ds.expand_dims(time=time_da.to_array())

Cannot cast array data from dtype('0') to dtype('int32') using scipy peak_widths method

I am trying to run the code below, but when it tries to calculate the widths of the peaks it gives me this error:
TypeError: Cannot cast array from dtype('O') to dtype('int32') according to the rule 'safe'
I have read the documentation on scipy.signal.find_peaks and scipy.signal.peak_widths but everything that I have read tells me what I have should work.
Here is the link for the .csv file I am using: https://drive.google.com/file/d/18rtoGSRRLmoeOglvuAYvd2S3NeLvw90T/view?usp=sharing
import pandas as pd
import scipy.signal as sp
signal_data = pd.read_csv('Data.csv')
signal = signal_data['Signal']
retention_time = signal_data['Retention Time (s)']
peaks = sp.find_peaks(signal, distance=300, prominence=2000)
print(peaks)
widths = sp.peak_widths(signal, peaks)
print(widths)
Any help would be much appreciated.
As you can see in the documentation example, output of sp.find_peaks is a N-dimensional array. So you need to do the following in the relevant line.
peaks, _ = sp.find_peaks(signal, distance=300, prominence=2000)

How to use numpy in the Programmable Filter in ParaView

Assume, I have a ProgrammableFilter in paraview, which gets two inputs: mesh1 with data and mesh2 without.
Furthermore, I know the permutation of the points from mesh1 to mesh2.
Inside the filter, I can access the point values through
data0=inputs[0].GetPointData().GetArray('data')`
and obtain a part of the array using
subData=data0[0:6]
for example. But how could I add this subData to the output without a python loop?
To experiment with the code, I created a (not so small) working example:
#!/usr/bin/python
from paraview.simple import *
import numpy as np
import vtk
from vtk.util.numpy_support import numpy_to_vtk
#generate an arbitrary source with data
mesh2=Sphere()
mesh2.Center=[0.0, 0.0, 0.0]
mesh2.EndPhi=360
mesh2.EndTheta=360
mesh2.PhiResolution=100
mesh2.Radius=1.0
mesh2.StartPhi=0.0
mesh2.StartTheta=0.0
mesh2.ThetaResolution=100
mesh2.UpdatePipeline()
#add the data
mesh2Vtk=servermanager.Fetch(mesh2)
nPointsSphere=mesh2Vtk.GetNumberOfPoints()
mesh2Data=paraview.vtk.vtkFloatArray()
mesh2Data.SetNumberOfValues(nPointsSphere)
mesh2Data.SetName("mesh2Data")
#TODO: use numpy here?? do this with a ProgrammableFilter ?
data=np.random.rand(nPointsSphere,1)
for k in range(nPointsSphere):
mesh2Data.SetValue(k, data[k])
mesh2Vtk.GetPointData().AddArray(mesh2Data)
#send back to paraview server
#from https://public.kitware.com/pipermail/paraview/2011-February/020120.html
t=TrivialProducer()
filter= t.GetClientSideObject()
filter.SetOutput(mesh2Vtk)
t.UpdatePipeline()
w=CreateWriter('Sphere_withData.vtp')
w.UpdatePipeline()
Delete(w)
#create mesh1 without data
mesh1=Line()
mesh1.Point1=[0,0,0]
mesh1.Point2=[0,0,1]
mesh1.Resolution=5
mesh1.UpdatePipeline()
progFilter=ProgrammableFilter(mesh1)
progFilter.Input=[mesh1, t]
progFilter.Script="curT=inputs[1].GetPointData().GetArray('mesh2Data')"\
"\nglobIndices=range(0,6)"\
"\nsubT=curT[globIndices]"\
"\nswap=vtk.vtkFloatArray()"\
"\nswap.SetNumberOfValues(len(globIndices))"\
"\nswap.SetName('T')"\
"\n#TODO: how can i avoid this loop, i.e. write output.GetPointData().AddArray(converToVTK(subT))"\
"\nfor k in range(len(globIndices)):"\
"\n swap.SetValue(k,subT[k])"\
"\noutput.PointData.AddArray(swap)"
progFilter.UpdatePipeline()
w=CreateWriter('Line_withData.vtp')
w.UpdatePipeline()
Delete(w)
I accepted the answer, because it looks right. The following two scripts even show the problem:
base script 'run.py':
src1='file1.vtu'
r1=XMLUnstructuredGridReader(FileName=src1)
progFilter=ProgrammableFilter(r1)
progFilter.Input=[r1]
with open('script.py','r') as myFile:
progFilter.Script=myFile.read()
progFilter.UpdatePipeline()
progData=progFilter.GetPointDataInformation()
print progData.GetArray('T2').GetRange()
and the script for the programmable filter:
import vtk
import vtk.numpy_interface.dataset_adapter as dsa
import numpy as np
globIndices=inputs[0].GetPointData().GetArray('T')
subT=np.ones((globIndices.shape[0],1))
subTVtk=dsa.VTKArray(subT)
output.PointData.append(subTVtk, 'T2')
With this combination, I get the error messages:
File "/usr/lib/python2.7/dist-packages/vtk/numpy_interface/dataset_adapter.py", line 652, in append
self.VTKObject.AddArray(arr)
TypeError: AddArray argument 1: method requires a VTK object
File "run.py", line 15, in
print progData.GetArray('T2').GetRange()
AttributeError: 'NoneType' object has no attribute 'GetRange'
The first error message stems seems to be the reason for the second one.
Here's a minimal example that creates a VTK data array from a Numpy array. You should be able to adapt it for your purposes.
import numpy as np
import vtk
from vtk.numpy_interface import dataset_adapter as da
np_arr = np.ones(6)
vtk_arr = da.VTKArray(np_arr)
output.PointData.append(vtk_arr, "my data")

Pandas Dataframe Data Type Conversion or Isomap Transformation

I load images with scipy's misc.imread, which returns in my case 2304x3 ndarray. Later, I append this array to the list and convert it to a DataFrame. The purpose of doing so is to later apply Isomap transform on the DataFrame. My data frame is 84 rows/samples (images in the folder) and 2304 features each feature is array/list of 3 elements. When I try using Isomap transform I get error:
ValueError: setting an array element with a sequence.
I think error is there because elements of my data frame are of the object type. First I tried using a conversion to_numeric on each column, but got an error, then I wrote a loop to convert each element to numeric. The results I get are still of the object type. Here is my code:
import pandas as pd
from scipy import misc
from mpl_toolkits.mplot3d import Axes3D
import matplotlib
import matplotlib.pyplot as plt
import glob
from sklearn import manifold
samples = []
path = 'Datasets/ALOI/32/*.png'
files = glob.glob(path)
for name in files:
img = misc.imread(name)
img = img[::2, ::2]
x = (img/255.0).reshape(-1,3)
samples.append(x)
df = pd.DataFrame.from_records(samples, coerce_float = True)
for i in range(0,2304):
for j in range(0,84):
df[i][j] = pd.to_numeric(df[i][j], errors = 'coerce')
df[i] = pd.to_numeric(df[i], errors = 'coerce')
print df[2303][83]
print df[2303].dtype
print df[2303][83].dtype
#iso = manifold.Isomap(n_neighbors=6, n_components=3)
#iso.fit(df)
#manifold = iso.transform(df)
#print manifold.shape
Last four lines commented out because they give an error. The output I get is:
[ 0.05098039 0.05098039 0.05098039]
object
float64
As you can see each element of DataFrame is of the type float64 but whole column is an object.
Does anyone know how to convert whole data frame to numeric?
Is there another way of applying Isomap?
Do you want to reshape your image to a new shape instead of the original one?
If that is not the case then you should change the following line in your code
x = (img/255.0).reshape(-1,3)
with
x = (img/255.0).reshape(-1)
Hope this will resolve your issue

Categories

Resources