How to use apply_ufunc depending on data dims? - python

I want to interpolate 3D array air (time, lat, lon) with one 2D array newlat (time, lon) which depends on time and lon.
For loop method
import xarray as xr
import numpy as np
from scipy.interpolate import interp1d
air = (
xr.tutorial.load_dataset("air_temperature")
.air.sortby("lat") # np.interp needs coordinate in ascending order
.isel(time=slice(4), lon=slice(3))
) # choose a small subset for convenience
newlat = xr.DataArray(np.random.rand(air.sizes['time'], air.sizes['lon'])*75,
dims=['time', 'lon'],
coords={'time': air.time, 'lon': air.lon}
)
# create empty array to save result
result = np.empty((4, 3))
# loop each dim
for t in range(air.sizes['time']):
for lon in range(air.sizes['lon']):
# interpolation relying on time and lon
f = interp1d(air.lat, air.isel(time=t, lon=lon), kind='linear', fill_value='extrapolate')
result[t, lon] = f(newlat.isel(time=t, lon=lon))
apply_ufunc method
def interp1d_np(data, x, xi):
f = interp1d(x, data, kind='linear', fill_value='extrapolate')
return f(xi)
t_index = 0
lon_index = 0
xr.apply_ufunc(
interp1d_np,
air.isel(time=t_index, lon=lon_index),
air.lat,
newlat.isel(time=t_index, lon=lon_index),
input_core_dims=[["lat"], ["lat"], []],
exclude_dims=set(("lat",)),
vectorize=True,
)
Note that t_idnex and lon_index are the same for input air and newlat.
The code above only works for one specific part of air. How to apply it to the whole air DataArray?
Temperary Solution
We can just use the embedded function like this:
air.interp(lat=newlat, kwargs={"fill_value": None})
But, I'm still curious of how to use apply_ufunc in this situation, because users may have their own functions instead of the simple interpolation.

You're basically there. Delete the .isel calls and it will work! vectorize=True will automatically loop over the "non core dimensions" i.e. time and lon in this case.
xr.apply_ufunc(
interp1d_np,
air,
air.lat,
newlat,
input_core_dims=[["lat"], ["lat"], []],
exclude_dims=set(("lat",)),
vectorize=True,
)

Related

Using xgcm to interpolate 2D periodic data to points

Similar to this question, I would like to interpolate a 2D field with periodic boundary conditions to its values at a list of points (2D coords), but specifically using xgcm. I.e. I would like the value of the field, defined on a regular 2D grid (in my case a lat/lon grid), at arbitrary points. I would like to use xgcm to handle the periodic nature of the boundaries. An answer using xesmf would be fine, but I already know how to do this using wrapped data.
import numpy as np
import xarray as xr
import xgcm as xg
data = np.arange(360 * 180).reshape(360, 180)
lon = np.linspace(0.5, 359.5, 360)
lat = np.linspace(-89.5, 89.5, 180)
da = xr.DataArray(
coords=dict(
lon=lon,
lat=lat,
),
data=data,
)
ds = da.to_dataset(name='data')
# Setup xgcm grid with periodic lon.
grid = xg.Grid(ds, coords={'lon': {'center': 'lon'}, 'lat': {'center': 'lat'}}, periodic=['lon'])
# lon/lat values of points - first point is (0.1, 23) and is outside the non-periodic boundary of values,
# because of lon=0.1.
points = np.array([[0.1, 23], [359.9, 43]])
# What comes next?
interp_values = grid.interp(...)

Interpolating a 3-dim (time, lat, lon) field with _FillValue in Python. How to avoid for loops?

I am interpolating data from the oceanic component of CMIP6 models to a 2x2 grid. The field has a dim of (time, nav_lat, nav_lon) and nan values in continent. Here, nav_lon and nav_lat are two-dimensional curvilinear grid. I can do the interpolation using griddata from scipy, but I have to use a loop over time. The loop makes it pretty slow if the data has thousands of time records. My question is how to vectorize the interpolation over time.
The following is my code:
import xarray as xr
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
source = xr.open_dataset('data/zos_2850.nc',decode_times=False)
# obtain old lon and lat (and put lon in 0-360 range)
# nav_lon is from -180 to 180, not in 0-360 range
loni, lati = source.nav_lon.values%360, source.nav_lat.values
# flatten the source coordinates
loni_flat, lati_flat = loni.flatten(), lati.flatten()
# define a 2x2 lon-lat grid
lon, lat = np.linspace(0,360,181), np.linspace(-90,90,91)
# create mesh
X, Y = np.meshgrid(lon,lat)
# loop over time
ntime = len(source.time)
tmp = []
for t in range(ntime):
print(t)
var_s = source.zos[t].values
var_s_flat = var_s.flatten()
# index indicates where they are valid values
index = np.where(~np.isnan(var_s_flat))
# remap the valid values to the new grid
var_t = griddata((loni_flat[index],lati_flat[index]),var_s_flat[index], (X,Y),
method='cubic')
# interpolate mask using nearest
maskinterp = griddata((loni_flat,lati_flat),var_s_flat, (X,Y), method='nearest')
# re-mask interpolated data
var_t[np.isnan(maskinterp)] = np.nan
tmp.append(var_t)
# convert to DataArray
da = xr.DataArray(data=tmp,
dims=["time","lat","lon"],
coords=dict(lon=(["lon"], lon),lat=(["lat"], lat),time=source['time']))

Is there a simple way of getting an xyz array from xarray dataset?

Is there a simple way of getting an array of xyz values (i.e. an array of 3 cols and nrows = number of pixels) from an xarray dataset? Something like what we get from the rasterToPoints function in R.
I'm opening a netcdf file with values for a certain variable (chl). I'm not able to add images here directly, but here is a screenshot of the output:
Xarray dataset structure
I need to end with an array that have this structure:
[[lon1, lat1, val],
[lon1, lat2, val]]
And so on, getting the combination of lon/lat for each point. I'm sorry if I'm missing something really obvious, but I'm new to Python.
The natural format you are probably looking for here is a pandas dataframe, where lon, lat and chl are columns. This can be easily created using xarray's to_dataframe method, as follows.
import xarray as xr
ds = xr.open_dataset("infile.nc")
df = (
ds
.to_dataframe()
.reset_index()
)
I can suggest you a small pseudo-code:
import numpy as np
lons = ds.variables['lon'].load()
lats = ds.variables['lat'].load()
chl = ds.variables['chl'].load()
xm,ym = np.meshgrid(lons,lats)
dataout = np.concatenate((xm.flatten()[np.newaxis,:],ym.flatten()[np.newaxis,:],chla.flatten()[np.newaxis,:]),axis=0)
Might be it does not work out-of-the box, but at least one solution could be similar with this.

creating vector using netcdf into array

I'm fairly new to python and have found stack overflow one of the best resources out there, now I'm hoping someone can help me with what I believe is a fairly basic question.
I'm looking to create a land mask from a list of lats and lons and rainfall data extracted from a netCDF file. I need to get the data from the netcdf file to line up so I can remove rows which have a rainfall value of '-9999.' (indicating no data because its over the ocean). I can access the file, I can create a mesh grid, but when it comes to inserting the rainfall data for the final check I'm getting odd shapes and no luck with the logical test. Can someone have a look at this code and let me know what you think?
from netCDF4 import Dataset
import numpy as np
f=Dataset('/Testing/Ensemble_grid/1970_2012_eMAST_ANUClimate_mon_evap_v1m0_197001.nc')
lat = f.variables['latitude'][:]
lon = f.variables['longitude'][:]
rainfall = np.array(f.variables['lwe_thickness_of_precipitation_amount'])
lons, lats = np.meshgrid(lon,lat)
full_ary = np.array((lats,lons))
full_lats_lons = np.swapaxes(full_ary,0,2)
rain_data = np.squeeze(rainfall,axis=(0,))
grid = np.array((full_lats_lons,rain_data))
full_grid = np.expand_dims(grid,axis=1)
full_grid_col = np.swapaxes(full_grid,0,1)
land_grid = np.logical_not(full_grid_col[:,1]==-9999.)
Here is an alternative method that simply creates a new 2D variable, landmask, where each grid cell is either 0 (ocean) or 1 (land). (I like to use 1 and 0 landmasks because you can transform it into a boolean numpy array and do quick land-averages this way.)
import netCDF4
import numpy as np
ncfile = netCDF4.('/path/to/your/ncfile.nc', 'r')
lat = ncfile.variables['lat'][:]
lon = ncfile.variables['lon'][:]
# Presuming here that rainfall is 2D, if not, just read in the first time step, i.e. [0,:,:]
rain = ncfile.variables['lwe_thickness_of_precipitation_amount'][:,:]
ncfile.close()
nlat, nlon = len(lat), len(lon)
# Populate a 2D landmask array, where 1=land and 0=ocean
landmask = np.zeros([nlat, nlon], dtype='int')
for y in range(nlat):
for x in range(nlon):
if rain[y,x]!=-9999: # We're at a land point
landmask[y,x] = 1
# Now you can write out the landmask into a new netCDF file
filename_out = './landmask.nc'
ncfile_out = netCDF4.Dataset(filename_out, 'w')
ncfile_out.createDimension('lat', nlat)
ncfile_out.createDimension('lon', nlon)
lat_out = ncfile_out.createVariable('lat', 'f4', ('lat',))
lon_out = ncfile_out.createVariable('lon', 'f4', ('lon',))
landmask_out = ncfile_out.createVariable('landmask', 'i', ('lat', 'lon',))
setattr(lat_out, 'units', 'degrees_north')
setattr(lon_out, 'units', 'degrees_east')
setattr(landmask_out, 'description', '1=land 0=ocean')
lat_out[:] = lat
lon_out[:] = lon
landmask_out[:,:] = landmask[:,:]
ncfile_out.close()
Ian, you need to put a repeatable example up here...
I suspect what you need is something like this;
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
x.flat

How to create a grid from LiDAR points (X,Y,Z) with GDAL python?

I'm new really to python programming, and I was just wondering if you can create a regular grid of 0.5 by o.5 m of resolution using LiDAR points.
My data are in LAS format (reading with from liblas import file as lasfile) and they have the following format: X,Y,Z. Where X and Y are coordinates.
The points are randomly positioned and some pixel are empty (NAN value) and in some pixel there are more of one points. Where there are more of one point, I wish to obtain a mean value. In the end i need to save the data in a TIF format or Ascii format.
I am studying osgeo module and GDAL but I honest to say that i don't know if osgeo module is the best solution.
I am really glad for help with some code that i can study and implement,
Thanks in Advance for the help, I really need.
I don't know the best way to get a grid with these parameters.
It's a bit late but maybe this answer will be useful for others, if not for you...
I have done this with Numpy and Pandas, and it's pretty fast. I was using TLS data and could do this with several million data points without any trouble on a decent 2009-vintage laptop. The key is 'binning' by rounding the data, and then using Pandas' GroupBy methods to do the aggregating and calculate the means.
If you need to round to a power of 10 you can use np.round, otherwise you can round to an arbitrary value by making a function to do so, which I have done by modifying this SO answer.
import numpy as np
import pandas as pd
# make rounding function:
def round_to_val(a, round_val):
return np.round( np.array(a, dtype=float) / round_val) * round_val
# load data
data = np.load( 'shape of ndata, 3')
n_d = data.shape[0]
# round the data
d_round = np.empty( [n_d, 5] )
d_round[:,0] = data[:,0]
d_round[:,1] = data[:,1]
d_round[:,2] = data[:,2]
del data # free up some RAM
d_round[:,3] = round_to_val( d_round[:,0], 0.5)
d_round[:,4] = round_to_val( d_round[:,1], 0.5)
# sorting data
ind = np.lexsort( (d_round[:,4], d_round[:,3]) )
d_sort = d_round[ind]
# making dataframes and grouping stuff
df_cols = ['x', 'y', 'z', 'x_round', 'y_round']
df = pd.DataFrame( d_sort)
df.columns = df_cols
df_round = df[['x_round', 'y_round', 'z']]
group_xy = df_round.groupby(['x_round', 'y_round'])
# calculating the mean, write to csv, which saves the file with:
# [x_round, y_round, z_mean] columns. You can exit Python and then start up
# later to clear memory if that's an issue.
group_mean = group_xy.mean()
group_mean.to_csv('your_binned_data.csv')
# Restarting...
import numpy as np
from scipy.interpolate import griddata
binned_data = np.loadtxt('your_binned_data.csv', skiprows=1, delimiter=',')
x_bins = binned_data[:,0]
y_bins = binned_data[:,1]
z_vals = binned_data[:,2]
pts = np.array( [x_bins, y_bins])
pts = pts.T
# make grid (with borders rounded to 0.5...)
xmax, xmin = 640000.5, 637000
ymax, ymin = 6070000.5, 6067000
grid_x, grid_y = np.mgrid[640000.5:637000:0.5, 6067000.5:6070000:0.5]
# interpolate onto grid
data_grid = griddata(pts, z_vals, (grid_x, grid_y), method='cubic')
# save to ascii
np.savetxt('data_grid.txt', data_grid)
When I've done this, I have saved the output as a .npy and converted to a tiff with the Image library, and then georeferenced in ArcMap. There is probably a way to do that with osgeo but I haven't used it.
Hope this helps someone at least...
You can use the histogram function in Numpy to do binning, for instance:
import numpy as np
points = np.random.random(1000)
#create 10 bins from 0 to 1
bins = np.linspace(0, 1, 10)
means = (numpy.histogram(points, bins, weights=data)[0] /
numpy.histogram(points, bins)[0])
Try LAStools, particularly lasgrid or las2dem.

Categories

Resources