Extract coordinate values in xarray - python

I would like to extract the values of the coordinate variables.
For example I create a DataArray as:
import xarray as xr
import numpy as np
import pandas as pd
years_arr=range(1982,1986)
time = pd.date_range('14/1/' + str(years_arr[0]) + ' 12:00:00', periods=len(years_arr), freq=pd.DateOffset(years=1))
lon = range(20,24)
lat = range(10,14)
arr1 = xr.DataArray(data, coords=[time, lat, lon], dims=['time', 'latitude', 'longitude'])
I now would like to output the lon values from arr1.
I'm asking as arr1 going into a function so I may not have the lon values available.

arr1.coords['lon'] gives you longitude as a xarray.DataArray, and arr1.coords['lon'].values gives you the values as a numpy array.

Another possible solution is:
time, lat, lon = arr1.indexes.values()
The result is a Float64Index for your lat/lon coordinates.

Related

adding extra dimensions in a numpy array or a dataframe

I have a pandas series named obs(62824,) that has values of temperatures as follows
0 16.9
1 11.0
2 5.9
3 9.4
4 15.4
...
I want to use the following code to basically transform my numpy array to a xr.DataArray
lat = 35.93679
lon = 14.45663
obs_data = xr.DataArray(obs_tas, dims=['time','lat','lon'], \
coords=[pd.date_range('1979-01-01', '2021-12-31', freq='D'), lat, lon])
My issue is that I get the following error
ValueError: dimensions ('lat',) must have the same length as the number of data dimensions, ndim=0
from my understanding is because the numpy array has only 1 dimension. I tried the following
obs = obs[..., np.newaxis, np.newaxis]
However that did not work as well and I still get the same error.
How can I fix that?
You are correct about adding dimensions to obs.
In Creating a DataArray and API reference it is mentioned that the coordinates themselves should be array-like.
Your lat and lon are floats. I believe all you have to do is wrap them in a list, like so:
lat = [35.93679] # <- list
lon = [14.45663] # <- list
obs_data = xr.DataArray(
obs[:, None, None],
dims=['time', 'lat', 'lon'],
coords=[
pd.date_range('1979-01-01', '2021-12-31', freq='D'), lat, lon
]
)

How to use apply_ufunc depending on data dims?

I want to interpolate 3D array air (time, lat, lon) with one 2D array newlat (time, lon) which depends on time and lon.
For loop method
import xarray as xr
import numpy as np
from scipy.interpolate import interp1d
air = (
xr.tutorial.load_dataset("air_temperature")
.air.sortby("lat") # np.interp needs coordinate in ascending order
.isel(time=slice(4), lon=slice(3))
) # choose a small subset for convenience
newlat = xr.DataArray(np.random.rand(air.sizes['time'], air.sizes['lon'])*75,
dims=['time', 'lon'],
coords={'time': air.time, 'lon': air.lon}
)
# create empty array to save result
result = np.empty((4, 3))
# loop each dim
for t in range(air.sizes['time']):
for lon in range(air.sizes['lon']):
# interpolation relying on time and lon
f = interp1d(air.lat, air.isel(time=t, lon=lon), kind='linear', fill_value='extrapolate')
result[t, lon] = f(newlat.isel(time=t, lon=lon))
apply_ufunc method
def interp1d_np(data, x, xi):
f = interp1d(x, data, kind='linear', fill_value='extrapolate')
return f(xi)
t_index = 0
lon_index = 0
xr.apply_ufunc(
interp1d_np,
air.isel(time=t_index, lon=lon_index),
air.lat,
newlat.isel(time=t_index, lon=lon_index),
input_core_dims=[["lat"], ["lat"], []],
exclude_dims=set(("lat",)),
vectorize=True,
)
Note that t_idnex and lon_index are the same for input air and newlat.
The code above only works for one specific part of air. How to apply it to the whole air DataArray?
Temperary Solution
We can just use the embedded function like this:
air.interp(lat=newlat, kwargs={"fill_value": None})
But, I'm still curious of how to use apply_ufunc in this situation, because users may have their own functions instead of the simple interpolation.
You're basically there. Delete the .isel calls and it will work! vectorize=True will automatically loop over the "non core dimensions" i.e. time and lon in this case.
xr.apply_ufunc(
interp1d_np,
air,
air.lat,
newlat,
input_core_dims=[["lat"], ["lat"], []],
exclude_dims=set(("lat",)),
vectorize=True,
)

Interpolating a 3-dim (time, lat, lon) field with _FillValue in Python. How to avoid for loops?

I am interpolating data from the oceanic component of CMIP6 models to a 2x2 grid. The field has a dim of (time, nav_lat, nav_lon) and nan values in continent. Here, nav_lon and nav_lat are two-dimensional curvilinear grid. I can do the interpolation using griddata from scipy, but I have to use a loop over time. The loop makes it pretty slow if the data has thousands of time records. My question is how to vectorize the interpolation over time.
The following is my code:
import xarray as xr
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
source = xr.open_dataset('data/zos_2850.nc',decode_times=False)
# obtain old lon and lat (and put lon in 0-360 range)
# nav_lon is from -180 to 180, not in 0-360 range
loni, lati = source.nav_lon.values%360, source.nav_lat.values
# flatten the source coordinates
loni_flat, lati_flat = loni.flatten(), lati.flatten()
# define a 2x2 lon-lat grid
lon, lat = np.linspace(0,360,181), np.linspace(-90,90,91)
# create mesh
X, Y = np.meshgrid(lon,lat)
# loop over time
ntime = len(source.time)
tmp = []
for t in range(ntime):
print(t)
var_s = source.zos[t].values
var_s_flat = var_s.flatten()
# index indicates where they are valid values
index = np.where(~np.isnan(var_s_flat))
# remap the valid values to the new grid
var_t = griddata((loni_flat[index],lati_flat[index]),var_s_flat[index], (X,Y),
method='cubic')
# interpolate mask using nearest
maskinterp = griddata((loni_flat,lati_flat),var_s_flat, (X,Y), method='nearest')
# re-mask interpolated data
var_t[np.isnan(maskinterp)] = np.nan
tmp.append(var_t)
# convert to DataArray
da = xr.DataArray(data=tmp,
dims=["time","lat","lon"],
coords=dict(lon=(["lon"], lon),lat=(["lat"], lat),time=source['time']))

Is there a simple way of getting an xyz array from xarray dataset?

Is there a simple way of getting an array of xyz values (i.e. an array of 3 cols and nrows = number of pixels) from an xarray dataset? Something like what we get from the rasterToPoints function in R.
I'm opening a netcdf file with values for a certain variable (chl). I'm not able to add images here directly, but here is a screenshot of the output:
Xarray dataset structure
I need to end with an array that have this structure:
[[lon1, lat1, val],
[lon1, lat2, val]]
And so on, getting the combination of lon/lat for each point. I'm sorry if I'm missing something really obvious, but I'm new to Python.
The natural format you are probably looking for here is a pandas dataframe, where lon, lat and chl are columns. This can be easily created using xarray's to_dataframe method, as follows.
import xarray as xr
ds = xr.open_dataset("infile.nc")
df = (
ds
.to_dataframe()
.reset_index()
)
I can suggest you a small pseudo-code:
import numpy as np
lons = ds.variables['lon'].load()
lats = ds.variables['lat'].load()
chl = ds.variables['chl'].load()
xm,ym = np.meshgrid(lons,lats)
dataout = np.concatenate((xm.flatten()[np.newaxis,:],ym.flatten()[np.newaxis,:],chla.flatten()[np.newaxis,:]),axis=0)
Might be it does not work out-of-the box, but at least one solution could be similar with this.

Coordinates from UTM to Latitude and Longitude in pandas

I have a DataFrame with the following result:
and I want to convert those coordinate columns from WGS84 to Lon & Lat and finally add those new columns in my data frame:
For conversion I am using the following code, but I think there should a better way without converting the coordinate columns to list and create a new one DataFrame.
import pyproj as pp
from mpl_toolkits.basemap import Basemap
import pandas as pd
cx =dfb.COORDENADA_X.tolist()
cy =dfb.COORDENADA_Y.tolist()
utm15_wgs84 = pp.Proj(init='epsg:32615')
for ix, iy in zip(cx, cy):
lon, lat = utm15_wgs84(ix, iy, inverse=True)
print(lon, lat)
Any suggestion for doing this?
Use the apply function in pandas DataFrame. For example
dfb[['wgs_x', 'wgs_y']] = dfb.apply(lambda row:utm15_wgs84(row['COORDENADA_X'], row['COORDENADA_Y'], inverse=True), axis=1).apply(pd.Series)

Categories

Resources