Interpolating between values of a DataArray - python

I would like to know what is the most efficient, or most elegant way to interpolate between values of a DataArray. Ideally, this should be usable for arbitrary number of dimensions, but good solutions for low dimensions such as 2D and 3D would also be useful.
I am aware of the 'method' keyword for the sel method and have the feeling that is going to be part of the answer, but I find the solution I came up with not to be very elegant. (I do not know about its efficiency.) Let me illustrate this solution:
>>> import xarray as xr
>>> arr = xr.DataArray([[12, 32], [14, 34]],
... dims=['x', 'y'], coords={'x': [1, 3], 'y': [2, 4]})
>>> arr
<xarray.DataArray (x: 2, y: 2)>
array([[12, 32],
[14, 34]])
Coordinates:
* x (x) int64 1 3
* y (y) int64 2 4
>>> arry = ((arr.sel(x=2, method='pad') * (2 - arr.coords['x'].sel(x=2, method='pad')) +
... arr.sel(x=2, method='bfill') * (arr.coords['x'].sel(x=2, method='bfill') - 2)) /
... (arr.coords['x'].sel(x=2, method='bfill') - arr.coords['x'].sel(x=2, method='pad')))
>>> arry
<xarray.DataArray (y: 2)>
array([ 13., 33.])
Coordinates:
* y (y) int64 2 4
>>> ((arry.sel(y=3, method='pad') * (3 - arry.coords['y'].sel(y=3, method='pad')) +
... arry.sel(y=3, method='bfill') * (arry.coords['y'].sel(y=3, method='bfill') - 3)) /
... (arry.coords['y'].sel(y=3, method='bfill') - arry.coords['y'].sel(y=3, method='pad')))
<xarray.DataArray ()>
array(23.0)
For why I find this approach sub-optimal: the calculation of arry could include a large number of unnecessary arithmetic operations when the length of the y index is large (and not just 2 as in this toy example).

Related

How to reshape xarray dataset by collapsing coordinate

I currently have a dataset that when opened with xarray contains three coordinates x, y, band. The band coordinate has temperature and dewpoint each at 4 different time intervals, meaning there are 8 total bands. Is there a way to reshape this so that I could have x, y, band, time such that the band coordinate is now only length 2 and the time coordinate would be length 4?
I thought I could add a new coordinate named time and then add the bands in but
ds = ds.assign_coords(time=[1,2,3,4])
returns ValueError: cannot add coordinates with new dimensions to a DataArray.
You can re-assign the "band" coordinate to a MultiIndex:
In [4]: da = xr.DataArray(np.random.random((4, 4, 8)), dims=['x', 'y', 'band'])
In [5]: da.coords['band'] = pd.MultiIndex.from_arrays(
...: [
...: [1, 1, 1, 1, 2, 2, 2, 2],
...: pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'] * 2),
...: ],
...: names=['band_stacked', 'time'],
...: )
In [6]: stacked
Out[6]:
<xarray.DataArray (x: 4, y: 4, band: 8)>
array([[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01, 5.23808010e-01,
8.56941412e-01, 2.75757101e-01, 7.88877551e-02, 1.54739786e-02],
[3.70350510e-01, 1.90604842e-02, 2.17871931e-01, 9.40704074e-01,
4.28769745e-02, 9.24407375e-01, 2.81715762e-01, 9.12889594e-01],
[7.36529770e-02, 1.53507827e-01, 2.83341417e-01, 3.00687140e-01,
7.41822972e-01, 6.82413237e-01, 7.92126231e-01, 4.84821281e-01],
[5.24897891e-01, 4.69537663e-01, 2.47668326e-01, 7.56147251e-02,
6.27767921e-01, 2.70630355e-01, 5.44669493e-01, 3.53063860e-01]],
...
[[1.56513994e-02, 8.49568142e-01, 3.67268562e-01, 7.28406400e-01,
2.82383223e-01, 5.00901504e-01, 9.99643260e-01, 1.16446139e-01],
[9.98980637e-01, 2.45060112e-02, 8.12423749e-01, 4.49895624e-01,
6.64880037e-01, 8.73506549e-01, 1.79186788e-01, 1.94347924e-01],
[6.32000394e-01, 7.60414128e-01, 4.90153658e-01, 3.40693056e-01,
5.19820559e-01, 4.49398587e-01, 1.90339730e-01, 6.38101614e-02],
[7.64102189e-01, 6.79961676e-01, 7.63165470e-01, 6.23766131e-02,
5.62677420e-01, 3.85784911e-01, 4.43436365e-01, 2.44385584e-01]]])
Coordinates:
* band (band) MultiIndex
- band_stacked (band) int64 1 1 1 1 2 2 2 2
- time (band) datetime64[ns] 2020-01-01 2021-01-01 ... 2023-01-01
Dimensions without coordinates: x, y
Then you can expand the dimensionality by unstacking:
In [7]: unstacked
Out[7]:
<xarray.DataArray (x: 4, y: 4, band: 2, time: 4)>
array([[[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01,
5.23808010e-01],
[8.56941412e-01, 2.75757101e-01, 7.88877551e-02,
1.54739786e-02]],
...
[[7.64102189e-01, 6.79961676e-01, 7.63165470e-01,
6.23766131e-02],
[5.62677420e-01, 3.85784911e-01, 4.43436365e-01,
2.44385584e-01]]]])
Coordinates:
* band (band) int64 1 2
* time (time) datetime64[ns] 2020-01-01 2021-01-01 2022-01-01 2023-01-01
Dimensions without coordinates: x, y
Another more manual option would be to reshape in numpy and just create a new DataArray. Note that this manual reshape is much faster for a larger array:
In [8]: reshaped = xr.DataArray(
...: da.data.reshape((4, 4, 2, 4)),
...: dims=['x', 'y', 'band', 'time'],
...: coords={
...: 'time': pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01']),
...: 'band': [1, 2],
...: },
...: )
Note that if your data is chunked (and assuming you'd like to keep it that way) your options are a bit more limited - see the dask docs on reshaping dask arrays. The first (MultiIndexing unstack) approach does work with dask arrays as long as the arrays are not chunked along the unstacked dimension. See this question for an example.

Python xarray - vectorized indexing

I'm trying to understand vectorized indexing in xarray by following this example from the docs:
import xarray as xr
import numpy as np
da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'],
coords={'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})
ind_x = xr.DataArray([0, 1], dims=['x'])
ind_y = xr.DataArray([0, 1], dims=['y'])
The output of the array da is as follows:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
So far so good. Now in the example there are shown two ways of indexing. Orthogonal (not interested in this case) and vectorized (what I want). For the vectorized indexing the following is shown:
In [37]: da[ind_x, ind_x] # vectorized indexing
Out[37]:
<xarray.DataArray (x: 2)>
array([0, 5])
Coordinates:
y (x) <U1 'a' 'b'
* x (x) int64 0 1
The result seems to be what I want, but this feels very strange to me. ind_x (which in theory refers to dims=['x']) is being passed twice but somehow is capable of indexing what appears to be both in the x and y dims. As far as I understand the x dim would be the rows and y dim would be the columns, is that correct? How come the same ind_x is capable of accessing both the rows and the cols?
This seems to be the concept I need for my problem, but can't understand how it works or how to extend it to more dimensions. I was expecting this result to be given by da[ind_x, ind_y] however that seems to yield the orthogonal indexing surprisingly enough.
Having the example with ind_x being used twice is probably a little confusing: actually, the dimension of the indexer doesn't have to matter at all for the indexing behavior! Observe:
ind_a = xr.DataArray([0, 1], dims=["a"]
da[ind_a, ind_a]
Gives:
<xarray.DataArray (a: 2)>
array([0, 5])
Coordinates:
x (a) int32 0 1
y (a) <U1 'a' 'b'
Dimensions without coordinates: a
The same goes for the orthogonal example:
ind_a = xr.DataArray([0, 1], dims=["a"])
ind_b = xr.DataArray([0, 1], dims=["b"])
da[ind_a, ind_b]
Result:
<xarray.DataArray (a: 2, b: 2)>
array([[0, 2],
[4, 6]])
Coordinates:
x (a) int32 0 1
y (b) <U1 'a' 'c'
Dimensions without coordinates: a, b
The difference is purely in terms of "labeling", as in this case you end up with dimensions without coordinates.
Fancy indexing
Generally stated, I personally do not find "fancy indexing" the most intuitive concept. I did find this example in NEP 21 pretty clarifying: https://numpy.org/neps/nep-0021-advanced-indexing.html
Specifically, this:
Consider indexing a 2D array by two 1D integer arrays, e.g., x[[0, 1], [0, 1]]:
Outer indexing is equivalent to combining multiple integer indices with itertools.product(). The result in this case is another 2D
array with all combinations of indexed elements, e.g.,
np.array([[x[0, 0], x[0, 1]], [x[1, 0], x[1, 1]]])
Vectorized indexing is equivalent to combining multiple integer
indices with zip(). The result in this case is a 1D array containing
the diagonal elements, e.g., np.array([x[0, 0], x[1, 1]]).
Back to xarray
da[ind_x, ind_y]
Can also be written as:
da.isel(x=ind_x, y=ind_y)
The dimensions are implicit in the order. However, xarray still attempts to broadcast (based on dimension labels), so da[ind_y] mismatches and results in an error. da[ind_a] and da[ind_b] both work.
More dimensions
The dims you provide for the indexer are what determines the shape of the output, not the dimensions of the array you're indexing.
If you want to select single values along the dimensions (so we're zip()-ing through the indexes simultaneously), just make sure that your indexers share the dimension, here for a 3D array:
da = xr.DataArray(
data=np.arange(3 * 4 * 5).reshape(3, 4, 5),
coords={
"x": [1, 2, 3],
"y": ["a", "b", "c", "d"],
"z": [1.0, 2.0, 3.0, 4.0, 5.0],
},
dims=["x", "y", "z"],
)
ind_along_x = xr.DataArray([0, 1], dims=["new_index"])
ind_along_y = xr.DataArray([0, 2], dims=["new_index"])
ind_along_z = xr.DataArray([0, 3], dims=["new_index"])
da[ind_along_x, ind_along_y, ind_along_z]
Note that the values of the indexers do not have to the same -- that would be a pretty severe limitation, after all.
Result:
<xarray.DataArray (new_index: 2)>
array([ 0, 33])
Coordinates:
x (new_index) int32 1 2
y (new_index) <U1 'a' 'c'
z (new_index) float64 1.0 4.0
Dimensions without coordinates: new_index

Stack xarray DataArray

I have N 1D xr.DataArray's with an 1 array coordinate b and 1 scalar coordinate a. I want to combine them to a 2D DataArray with array coordinates b, a. How to do this? I have tried:
x1 = xr.DataArray(np.arange(0,3)[...,np.newaxis], coords=[('b', np.arange(3,6)),('a', [10])]).squeeze()
x2 = xr.DataArray(np.arange(0,3)[...,np.newaxis], coords=[('b', np.arange(3,6)),('a', [11])]).squeeze()
xcombined = xr.concat([x1, x2])
xcombined
Results in :
<xarray.DataArray (concat_dims: 2, b: 3)>
array([[0, 1, 2],
[0, 1, 2]])
Coordinates:
* b (b) int64 3 4 5
a (concat_dims) int64 10 11
Dimensions without coordinates: concat_dims
Now I like to select a particularly 'a':
xcombined.sel(a=10)
However, this raises:
ValueError: dimensions or multi-index levels ['a'] do not exist
If you supply dim to concat, this works:
xcombined = xr.concat([x1, x2], dim='a')
And then:
xcombined.sel(a=10)
<xarray.DataArray (b: 3)>
array([0, 1, 2])
Coordinates:
* b (b) int64 3 4 5
a int64 10

Getting ValueError when trying to use np.piecewise for multivariate function

I'm trying to define a multivariate piecewise function using np.piecewise as follows:
X = np.array([
[1, 2],
[3, 4],
[5, 6]
])
pw = np.piecewise(
X,
[
np.abs(X[:, 0] - X[:, 1]) < 1,
np.abs(X[:, 0] - X[:, 1]) >= 1
],
[
lambda X: 1 + 2 * X[:, 0] + 3 * X[:, 1],
lambda X: 1.5 + 2.5 * X[:, 0] + 3.5 * X[:, 1]
]
)
Running this snippet giives the following error:
ValueError: shape mismatch: value array of shape (3,) could not be broadcast to indexing result of shape (3,2)
For context, I'm attempting to represent a map f: R^2 -> R in this example, evaluating it on each of the rows of X at once.
Any idea? Do I need to define the final parameter differently so that the indexing correctly broadcasts?
IMO np.piecewise is more suitable if you have two arrays from np.meshgrid, so that np.piecewise can match the condition's dimension with your array dimension.
In your case, to represent a piecewise map $f:R^2 \to R$ with input being of shape (n,2) and evaluated row by row (each column representing a variable), the easiest way to generate vectorized code would be simply using np.select:
def pw(X):
return np.select([np.abs(X[:,0] - X[:,1]) < 1, np.abs(X[:,0] - X[:,1]) >= 1],
[1 + 2 * X[:, 0] + 3 * X[:, 1], 1.5 + 2.5 * X[:, 0] + 3.5 * X[:, 1]])
and pw(X) yields the answer you want.
By using a structured array I could cast the 2d formulation into a 1d one:
In [76]: X = np.array([(1,2),(3,4),(5,6)],'f,f')
In [77]: X
Out[77]: array([(1., 2.), (3., 4.), (5., 6.)], dtype=[('f0', '<f4'), ('f1', '<f4')])
In [78]: pw = np.piecewise(
...: X,
...: [
...: np.abs(X['f0'] - X['f1']) < 1,
...: np.abs(X['f0'] - X['f1']) >= 1
...: ],
...: [
...: lambda X: 1 + 2 * X['f0'] + 3 * X['f1'],
...: lambda X: 1.5 + 2.5 * X['f0'] + 3.5 * X['f1']
...: ]
...: )
In [79]: pw
Out[79]:
array([(11., 11.), (23., 23.), (35., 35.)],
dtype=[('f0', '<f4'), ('f1', '<f4')])
The numbers are repeated in pw because piecewise returns an array with the same shape and dtype as X, even though the lambdas only return scalar values.

How to multiply element by element between matrices in Python?

Let's assume I have 2 matrices which each of them represents vector:
X = np.matrix([[1],[2],[3]])
Y = np.matrix([[4],[5],[6]])
I want the output to be the result of multiplying it element by element, which means it should be:
[[4],[10],[18]]
Note that it is np.matrix and not np.array
Tested np.multiply() on ipython and it worked like a charm
In [41]: X = np.matrix([[1],[2],[3]])
In [42]: Y = np.matrix([[4],[5],[6]])
In [43]: np.multiply(X, Y)
Out[43]:
matrix([[ 4],
[10],
[18]])
so remember that NumPy matrix is a subclass of NumPy array, and array operations are element-wise.
therefore, you can convert your matrices to NumPy arrays, then multiply them with the "*" operator, which will be element-wise:
>>> import numpy as NP
>>> X = NP.matrix([[1],[2],[3]])
>>> Y = NP.matrix([[4],[5],[6]])
>>> X1 = NP.array(X)
>>> Y1 = NP.array(Y)
>>> XY1 = X1 * Y1
array([[ 4],
[10],
[18]])
>>> XY = matrix(XY1)
>>> XY
matrix([[ 4],
[10],
[18]])
alternatively you can use a generic function for element-wise multiplication:
>>> a = NP.matrix("4 5 7; 9 3 2; 3 9 1")
>>> b = NP.matrix("5 2 9; 8 4 2; 1 7 4")
>>> ab = NP.multiply(a, b)
>>> ab
matrix([[20, 10, 63],
[72, 12, 4],
[ 3, 63, 4]])
these two differ in the return type and so you probably want to choose the first if the next function in your data flow requires a NumPy array; if it requires a NumPy matrix, then the second

Categories

Resources