Rioxarray dimension missing coordinates - python

I'm currently using the Python module rioxarray to clip an Xarray dataset based on a specific geometry to produce a latitude/longitude grid of coordinates. My data is below:
obs_dataset_full
xarray.Dataset
Dimensions:
Lat: 451 Lon: 350 lat: 450 lon: 350
Coordinates:
Lon
(Lon)
int64
0 1 2 3 4 5 ... 345 346 347 348 349
Lat
(Lat)
int64
0 1 2 3 4 5 ... 446 447 448 449 450
Longitude
(lon)
float64
-105.7 -105.7 ... -78.34 -78.26
Latitude
(lat)
float64
35.04 35.08 35.11 ... 51.52 51.56
spatial_ref
()
int64
0
Data variables:
precip_var
(lon, lat, Lon, Lat)
float64
0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 nan
Attributes:
grid_mapping :
spatial_ref
Note: The Lon/Lat dimensions are irrelevant; I'm trying to utilize the lon/lat, which have the actual coordinates.
obs_dataset_full.rio.write_crs('EPSG:4326',inplace = True)
obs_dataset_cropped = obs_dataset_full.rio.clip(geometries=cropping_geometries, crs='EPSG:4326')
When I run this code, I get the following error:
DimensionMissingCoordinateError: lon missing coordinates.
Both the obs_dataset_full dataset and the precip_var data array have the appropriate coordinates, and the rioxarray documentation page is not particularly clear as to what this exception entails. Any help is much appreciated!

The issue you are facing is that rioxarray expects your spatial dimensions and coordinate to have the same name. I would recommend using the rename methods of in xarray to rename the dimensions and coordinates so they are both longitide and latitude or x and y.

Related

xarray slice function alternative for calculating average along a dimension

I'm using Xarray and netCDF meteorological data. I have the usual dimensions time, latitude and longitude and two main variables: the wind speed (time, lat, lon) and a latitudinal position (time, lon).
<xarray.Dataset>
Dimensions: (lon: 53, time: 25873, lat: 20)
Coordinates:
* lon (lon) float64 -80.0 -77.5 -75.0 -72.5 ... 45.0 47.5 50.0
* time (time) datetime64[ns] 1950-01-31 ... 2020-12-01
* lat (lat) float32 70.0 67.5 65.0 62.5 ... 27.5 25.0 22.5
Data variables:
uwnd (time, lat, lon) float32 -0.0625 0.375 ... -1.812 -2.75
positions (time, lon) float64 40.0 40.0 45.0 ... 70.0 70.0 70.0
For each time, lon, I'd like to calculate a latitudinal average around the positions.
If I do a loop, I would do this (for a +-2.5° latitude average):
for i in ds.lon.values:
for t in ds.time.values:
wind_averaged.loc[t,i]=ds.uwnd.sel(lon=i,time=t).sel(lat=slice(2.5+ds.positions.sel(lon=i,time=t).values,ds.positions.sel(lon=i,time=t).values-2.5)).mean('lat')
This is obviously very bad and I wanted to use slice() like this:
wind_averaged=ds.uwnd.sel(lat=slice(2.5+ds.jet_positions.values,ds.jet_positions.values-2.5)).mean('lat')
but it gives an error because I
cannot use non-scalar arrays in a slice for xarray indexing
Is there any alternative to do what I want without doing two for loops by using Xarray power?
Thanks
I believe you are looking for the multidimensional groupby. If I understand correctly, there is a tutorial for this problem here: https://xarray.pydata.org/en/stable/examples/multidimensional-coords.html

Merging datasets with xarray makes variables to be nan

I want to represent in the same plot two datasets, so I am merging them using xarray. These is how they look like:
ds1
<xarray.Dataset>
Dimensions: (time: 1, lat: 1037, lon: 1345)
Coordinates:
* lat (lat) float32 37.7 37.7 37.69 37.69 37.69 ... 35.01 35.01 35.0 35.0
* time (time) datetime64[ns] 2021-11-23
* lon (lon) float32 -9.001 -8.999 -8.996 -8.993 ... -5.507 -5.504 -5.501
Data variables:
CHL (time, lat, lon) float32 ...
ds2
<xarray.Dataset>
Dimensions: (time: 1, lat: 852, lon: 1168)
Coordinates:
* time (time) datetime64[ns] 2021-11-23
* lat (lat) float32 35.0 35.0 35.01 35.01 35.01 ... 37.29 37.29 37.3 37.3
* lon (lon) float32 -5.501 -5.498 -5.494 -5.491 ... -1.507 -1.503 -1.5
Data variables:
CHL (time, lat, lon) float32 ...
So then I use:
ds3 = xr.merge([ds1,ds2])
It works for the dimensions, but my variable CHL becomes nan:
<xarray.Dataset>
Dimensions: (lat: 1887, lon: 2513, time: 1)
Coordinates:
* lat (lat) float64 35.0 35.0 35.0 35.0 35.01 ... 37.69 37.69 37.7 37.7
* lon (lon) float64 -9.001 -8.999 -8.996 -8.993 ... -1.507 -1.503 -1.5
* time (time) datetime64[ns] 2021-11-23
Data variables:
CHL (time, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
So when I plot this dataset I have the following result:
I assume those white stripes are caused by the variable CHL becoming nan...
Any ideas of what could be happening? Thank you!
I don't think that any values become NaNs. Rather, I think that the latitude coordinates simply differ. Because you do an outer join (the default for xr.merge), xarray has to fill up the matrix at places where there is no information about the values. The default for the fill_value seems to be NaN.
So the question is, what values would you expect in these locations?
One possibility could be to fill the missing places by interpolation. In several dimensions this might be tricky, but as far as I see you are just placing two images next to each other with no overlap in the lon dimension.
In that case, xarray lets you interpolate the lat dimension easily:
ds3["CHL"].interpolate_na(dim="lat", method="linear")

Can dimensions be passed to a variable within an XArray Dataset?

I'm currently working on a project that involves clipping Xarray datasets based on a certain grid of latitude and longitude values. To do this, I've imported a file with the specified lats and lons, and added dimensions onto the dataset to give it the necessary lat/lon information. However, I also want to pass the dimensions to a data variable (called 'precip') within the dataset. I know that I'm able to pass the dimensions to the array of values in the variable, but can they be passed to the variable itself?
My code is below:
precip = obs_dataset_full.precip.expand_dims(lon = 350, lat = 450)
precip.coords['Longitude'] = (-1 * (360 - obs_dataset_full.Longitude))
precip.coords['Latitude'] = obs_dataset_full.Latitude
precip
With output as such:
xarray.Dataset
Dimensions:
dim_0: 350 dim_1: 451 lat: 450 lon: 350
Coordinates:
dim_0
(dim_0)
int64
0 1 2 3 4 5 ... 345 346 347 348 349
dim_1
(dim_1)
int64
0 1 2 3 4 5 ... 446 447 448 449 450
Longitude
(lon)
float64
-105.7 -105.7 ... -78.34 -78.26
Latitude
(lat)
float64
35.04 35.08 35.11 ... 51.52 51.56
Data variables:
precip
(dim_0, dim_1)
float64
0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 nan
Attributes: (0)
Specifically, I want the data variable precip to also possess dimensions of lat and lon, as the dataset does.
Thanks in advance!

Slicing longitude coordinate in xarray which crosses the 0

I have xarray dataset with longitude coordinate from 0.5 to 359.5, like the following:
Dimensions: (bnds: 2, lat: 40, lev: 35, lon: 31, member_id: 1)
Coordinates:
lev_bnds (lev, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
lat_bnds (lat, bnds) float64 ...
* lon (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 359.5
* lev (lev) float64 2.5 10.0 20.0 32.5 ... 5e+03 5.5e+03 6e+03 6.5e+03
* lat (lat) float64 -89.5 -88.5 -87.5 -86.5 ... -53.5 -52.5 -51.5 -50.5
* member_id (member_id) object 'r1i1p1f1'
Dimensions without coordinates: bnds
Data variables:
so (member_id, lev, lat, lon) float32 nan nan nan ...
The area I'm interested in is between 60W to 30E, which probably corresponds to longitude 300.5 to 30.5. Is there any way to slice the dataset between these coordinates?
I tried to use isel(slice(-60,30) but it's not possible to have negative to positive numbers in the slice function.
I know I can just split the data into two small ones (300.5-359.5 and 0.5-30.5), but I was wondering if there is a better way.
Thank you!
As you correctly point out, currently isel can’t select from both the start and end of a dimension in a single pass.
If combined with roll http://xarray.pydata.org/en/stable/generated/xarray.DataArray.roll.html, you can move the points you want into a contiguous region, and then select those you need.
NB: I couldn’t be sure from your example, but it looks like you may want sel rather than isel, given you may be selecting by index value rather than position

Regridding/Remapping of GCMs data

I have a precipitation data in a netCDF file which I have downloaded from CMIP5 database. I became able to make a subset of the file and I obtained the attributes which are given below. These data have 2.5 X 3.75 degree spatial resolution. Now I need to convert then into 0.05 Degree spatial resolution. Is there anyone who can help me by writing How can I do it using Python.
Please keep in mind that, I am using python 3.7 on windows machine. CDO or NCO doesn't suit on windows. The data properties are here.
Dimensions: (bnds: 2, lat: 15, lon: 13, time: 122)
Coordinates:
* time (time) float64 15.0 45.0 75.0 ... 3.585e+03 3.615e+03 3.645e+03
* lat (lat) float64 -42.5 -40.0 -37.5 -35.0 ... -15.0 -12.5 -10.0 -7.5
* lon (lon) float64 112.5 116.2 120.0 123.8 ... 146.2 150.0 153.8 157.5
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) float64 ...
lat_bnds (lat, bnds) float64 ...
lon_bnds (lon, bnds) float64 ...
pr (time, lat, lon) float32 ...
I would be grateful and appreciate if anyone can help me anyway. Thanks in advance.
I can propose some solution like this with some random data, where I re-grid data from one resolution to another.
#!/usr/bin/env ipython
# ---------------------
import numpy as np
from netCDF4 import Dataset,num2date,date2num
# -----------------------------
ntime,nlon,nlat=10,10,10;
lonin=np.linspace(0.,1.,10);
latin=np.linspace(0.,1.,10);
dataout=np.random.random((ntime,nlat,nlon));
unout='seconds since 2018-01-01 00:00:00'
# ---------------------
# make data:
ncout=Dataset('in.nc','w','NETCDF3_CLASSIC');
ncout.createDimension('lon',nlon);
ncout.createDimension('lat',nlat);
ncout.createDimension('time',None);
ncout.createVariable('lon','float32',('lon'));ncout.variables['lon'][:]=lonin;
ncout.createVariable('lat','float32',('lat'));ncout.variables['lat'][:]=latin;
ncout.createVariable('time','float64',('time'));ncout.variables['time'].setncattr('units',unout);ncout.variables['time'][:]=np.linspace(0,3600*ntime,ntime);
ncout.createVariable('randomdata','float32',('time','lat','lon'));ncout.variables['randomdata'][:]=dataout;
ncout.close()
# ----------------------
# regrid:
from scipy.interpolate import griddata
lonout=np.linspace(0.,1.,20);
latout=np.linspace(0.,1.,20);
ncout=Dataset('out.nc','w','NETCDF3_CLASSIC');
ncout.createDimension('lon',np.size(lonout));
ncout.createDimension('lat',np.size(latout));
ncout.createDimension('time',None);
ncout.createVariable('lon','float32',('lon'));ncout.variables['lon'][:]=lonout;
ncout.createVariable('lat','float32',('lat'));ncout.variables['lat'][:]=latout;
ncout.createVariable('time','float64',('time'));ncout.variables['time'].setncattr('units',unout);ncout.variables['time'][:]=np.linspace(0,3600*ntime,ntime);
ncout.createVariable('randomdata','float32',('time','lat','lon'));
ncin=Dataset('in.nc');
lonin=ncin.variables['lon'][:];latin=ncin.variables['lat'][:];
lonmin,latmin=np.meshgrid(lonin,latin);
lonmout,latmout=np.meshgrid(lonout,latout);
for itime in range(np.size(ncin.variables['time'][:])):
zout=griddata((lonmin.flatten(),latmin.flatten()),ncin.variables['randomdata'][itime,:,:].flatten(),(lonmout,latmout),'linear');
ncout.variables['randomdata'][itime,:]=zout;
ncin.close();ncout.close()

Categories

Resources