Related
I currently have a dataset that when opened with xarray contains three coordinates x, y, band. The band coordinate has temperature and dewpoint each at 4 different time intervals, meaning there are 8 total bands. Is there a way to reshape this so that I could have x, y, band, time such that the band coordinate is now only length 2 and the time coordinate would be length 4?
I thought I could add a new coordinate named time and then add the bands in but
ds = ds.assign_coords(time=[1,2,3,4])
returns ValueError: cannot add coordinates with new dimensions to a DataArray.
You can re-assign the "band" coordinate to a MultiIndex:
In [4]: da = xr.DataArray(np.random.random((4, 4, 8)), dims=['x', 'y', 'band'])
In [5]: da.coords['band'] = pd.MultiIndex.from_arrays(
...: [
...: [1, 1, 1, 1, 2, 2, 2, 2],
...: pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'] * 2),
...: ],
...: names=['band_stacked', 'time'],
...: )
In [6]: stacked
Out[6]:
<xarray.DataArray (x: 4, y: 4, band: 8)>
array([[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01, 5.23808010e-01,
8.56941412e-01, 2.75757101e-01, 7.88877551e-02, 1.54739786e-02],
[3.70350510e-01, 1.90604842e-02, 2.17871931e-01, 9.40704074e-01,
4.28769745e-02, 9.24407375e-01, 2.81715762e-01, 9.12889594e-01],
[7.36529770e-02, 1.53507827e-01, 2.83341417e-01, 3.00687140e-01,
7.41822972e-01, 6.82413237e-01, 7.92126231e-01, 4.84821281e-01],
[5.24897891e-01, 4.69537663e-01, 2.47668326e-01, 7.56147251e-02,
6.27767921e-01, 2.70630355e-01, 5.44669493e-01, 3.53063860e-01]],
...
[[1.56513994e-02, 8.49568142e-01, 3.67268562e-01, 7.28406400e-01,
2.82383223e-01, 5.00901504e-01, 9.99643260e-01, 1.16446139e-01],
[9.98980637e-01, 2.45060112e-02, 8.12423749e-01, 4.49895624e-01,
6.64880037e-01, 8.73506549e-01, 1.79186788e-01, 1.94347924e-01],
[6.32000394e-01, 7.60414128e-01, 4.90153658e-01, 3.40693056e-01,
5.19820559e-01, 4.49398587e-01, 1.90339730e-01, 6.38101614e-02],
[7.64102189e-01, 6.79961676e-01, 7.63165470e-01, 6.23766131e-02,
5.62677420e-01, 3.85784911e-01, 4.43436365e-01, 2.44385584e-01]]])
Coordinates:
* band (band) MultiIndex
- band_stacked (band) int64 1 1 1 1 2 2 2 2
- time (band) datetime64[ns] 2020-01-01 2021-01-01 ... 2023-01-01
Dimensions without coordinates: x, y
Then you can expand the dimensionality by unstacking:
In [7]: unstacked
Out[7]:
<xarray.DataArray (x: 4, y: 4, band: 2, time: 4)>
array([[[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01,
5.23808010e-01],
[8.56941412e-01, 2.75757101e-01, 7.88877551e-02,
1.54739786e-02]],
...
[[7.64102189e-01, 6.79961676e-01, 7.63165470e-01,
6.23766131e-02],
[5.62677420e-01, 3.85784911e-01, 4.43436365e-01,
2.44385584e-01]]]])
Coordinates:
* band (band) int64 1 2
* time (time) datetime64[ns] 2020-01-01 2021-01-01 2022-01-01 2023-01-01
Dimensions without coordinates: x, y
Another more manual option would be to reshape in numpy and just create a new DataArray. Note that this manual reshape is much faster for a larger array:
In [8]: reshaped = xr.DataArray(
...: da.data.reshape((4, 4, 2, 4)),
...: dims=['x', 'y', 'band', 'time'],
...: coords={
...: 'time': pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01']),
...: 'band': [1, 2],
...: },
...: )
Note that if your data is chunked (and assuming you'd like to keep it that way) your options are a bit more limited - see the dask docs on reshaping dask arrays. The first (MultiIndexing unstack) approach does work with dask arrays as long as the arrays are not chunked along the unstacked dimension. See this question for an example.
I'm trying to understand vectorized indexing in xarray by following this example from the docs:
import xarray as xr
import numpy as np
da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'],
coords={'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})
ind_x = xr.DataArray([0, 1], dims=['x'])
ind_y = xr.DataArray([0, 1], dims=['y'])
The output of the array da is as follows:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
So far so good. Now in the example there are shown two ways of indexing. Orthogonal (not interested in this case) and vectorized (what I want). For the vectorized indexing the following is shown:
In [37]: da[ind_x, ind_x] # vectorized indexing
Out[37]:
<xarray.DataArray (x: 2)>
array([0, 5])
Coordinates:
y (x) <U1 'a' 'b'
* x (x) int64 0 1
The result seems to be what I want, but this feels very strange to me. ind_x (which in theory refers to dims=['x']) is being passed twice but somehow is capable of indexing what appears to be both in the x and y dims. As far as I understand the x dim would be the rows and y dim would be the columns, is that correct? How come the same ind_x is capable of accessing both the rows and the cols?
This seems to be the concept I need for my problem, but can't understand how it works or how to extend it to more dimensions. I was expecting this result to be given by da[ind_x, ind_y] however that seems to yield the orthogonal indexing surprisingly enough.
Having the example with ind_x being used twice is probably a little confusing: actually, the dimension of the indexer doesn't have to matter at all for the indexing behavior! Observe:
ind_a = xr.DataArray([0, 1], dims=["a"]
da[ind_a, ind_a]
Gives:
<xarray.DataArray (a: 2)>
array([0, 5])
Coordinates:
x (a) int32 0 1
y (a) <U1 'a' 'b'
Dimensions without coordinates: a
The same goes for the orthogonal example:
ind_a = xr.DataArray([0, 1], dims=["a"])
ind_b = xr.DataArray([0, 1], dims=["b"])
da[ind_a, ind_b]
Result:
<xarray.DataArray (a: 2, b: 2)>
array([[0, 2],
[4, 6]])
Coordinates:
x (a) int32 0 1
y (b) <U1 'a' 'c'
Dimensions without coordinates: a, b
The difference is purely in terms of "labeling", as in this case you end up with dimensions without coordinates.
Fancy indexing
Generally stated, I personally do not find "fancy indexing" the most intuitive concept. I did find this example in NEP 21 pretty clarifying: https://numpy.org/neps/nep-0021-advanced-indexing.html
Specifically, this:
Consider indexing a 2D array by two 1D integer arrays, e.g., x[[0, 1], [0, 1]]:
Outer indexing is equivalent to combining multiple integer indices with itertools.product(). The result in this case is another 2D
array with all combinations of indexed elements, e.g.,
np.array([[x[0, 0], x[0, 1]], [x[1, 0], x[1, 1]]])
Vectorized indexing is equivalent to combining multiple integer
indices with zip(). The result in this case is a 1D array containing
the diagonal elements, e.g., np.array([x[0, 0], x[1, 1]]).
Back to xarray
da[ind_x, ind_y]
Can also be written as:
da.isel(x=ind_x, y=ind_y)
The dimensions are implicit in the order. However, xarray still attempts to broadcast (based on dimension labels), so da[ind_y] mismatches and results in an error. da[ind_a] and da[ind_b] both work.
More dimensions
The dims you provide for the indexer are what determines the shape of the output, not the dimensions of the array you're indexing.
If you want to select single values along the dimensions (so we're zip()-ing through the indexes simultaneously), just make sure that your indexers share the dimension, here for a 3D array:
da = xr.DataArray(
data=np.arange(3 * 4 * 5).reshape(3, 4, 5),
coords={
"x": [1, 2, 3],
"y": ["a", "b", "c", "d"],
"z": [1.0, 2.0, 3.0, 4.0, 5.0],
},
dims=["x", "y", "z"],
)
ind_along_x = xr.DataArray([0, 1], dims=["new_index"])
ind_along_y = xr.DataArray([0, 2], dims=["new_index"])
ind_along_z = xr.DataArray([0, 3], dims=["new_index"])
da[ind_along_x, ind_along_y, ind_along_z]
Note that the values of the indexers do not have to the same -- that would be a pretty severe limitation, after all.
Result:
<xarray.DataArray (new_index: 2)>
array([ 0, 33])
Coordinates:
x (new_index) int32 1 2
y (new_index) <U1 'a' 'c'
z (new_index) float64 1.0 4.0
Dimensions without coordinates: new_index
I have N 1D xr.DataArray's with an 1 array coordinate b and 1 scalar coordinate a. I want to combine them to a 2D DataArray with array coordinates b, a. How to do this? I have tried:
x1 = xr.DataArray(np.arange(0,3)[...,np.newaxis], coords=[('b', np.arange(3,6)),('a', [10])]).squeeze()
x2 = xr.DataArray(np.arange(0,3)[...,np.newaxis], coords=[('b', np.arange(3,6)),('a', [11])]).squeeze()
xcombined = xr.concat([x1, x2])
xcombined
Results in :
<xarray.DataArray (concat_dims: 2, b: 3)>
array([[0, 1, 2],
[0, 1, 2]])
Coordinates:
* b (b) int64 3 4 5
a (concat_dims) int64 10 11
Dimensions without coordinates: concat_dims
Now I like to select a particularly 'a':
xcombined.sel(a=10)
However, this raises:
ValueError: dimensions or multi-index levels ['a'] do not exist
If you supply dim to concat, this works:
xcombined = xr.concat([x1, x2], dim='a')
And then:
xcombined.sel(a=10)
<xarray.DataArray (b: 3)>
array([0, 1, 2])
Coordinates:
* b (b) int64 3 4 5
a int64 10
I'm trying to define a multivariate piecewise function using np.piecewise as follows:
X = np.array([
[1, 2],
[3, 4],
[5, 6]
])
pw = np.piecewise(
X,
[
np.abs(X[:, 0] - X[:, 1]) < 1,
np.abs(X[:, 0] - X[:, 1]) >= 1
],
[
lambda X: 1 + 2 * X[:, 0] + 3 * X[:, 1],
lambda X: 1.5 + 2.5 * X[:, 0] + 3.5 * X[:, 1]
]
)
Running this snippet giives the following error:
ValueError: shape mismatch: value array of shape (3,) could not be broadcast to indexing result of shape (3,2)
For context, I'm attempting to represent a map f: R^2 -> R in this example, evaluating it on each of the rows of X at once.
Any idea? Do I need to define the final parameter differently so that the indexing correctly broadcasts?
IMO np.piecewise is more suitable if you have two arrays from np.meshgrid, so that np.piecewise can match the condition's dimension with your array dimension.
In your case, to represent a piecewise map $f:R^2 \to R$ with input being of shape (n,2) and evaluated row by row (each column representing a variable), the easiest way to generate vectorized code would be simply using np.select:
def pw(X):
return np.select([np.abs(X[:,0] - X[:,1]) < 1, np.abs(X[:,0] - X[:,1]) >= 1],
[1 + 2 * X[:, 0] + 3 * X[:, 1], 1.5 + 2.5 * X[:, 0] + 3.5 * X[:, 1]])
and pw(X) yields the answer you want.
By using a structured array I could cast the 2d formulation into a 1d one:
In [76]: X = np.array([(1,2),(3,4),(5,6)],'f,f')
In [77]: X
Out[77]: array([(1., 2.), (3., 4.), (5., 6.)], dtype=[('f0', '<f4'), ('f1', '<f4')])
In [78]: pw = np.piecewise(
...: X,
...: [
...: np.abs(X['f0'] - X['f1']) < 1,
...: np.abs(X['f0'] - X['f1']) >= 1
...: ],
...: [
...: lambda X: 1 + 2 * X['f0'] + 3 * X['f1'],
...: lambda X: 1.5 + 2.5 * X['f0'] + 3.5 * X['f1']
...: ]
...: )
In [79]: pw
Out[79]:
array([(11., 11.), (23., 23.), (35., 35.)],
dtype=[('f0', '<f4'), ('f1', '<f4')])
The numbers are repeated in pw because piecewise returns an array with the same shape and dtype as X, even though the lambdas only return scalar values.
Let's assume I have 2 matrices which each of them represents vector:
X = np.matrix([[1],[2],[3]])
Y = np.matrix([[4],[5],[6]])
I want the output to be the result of multiplying it element by element, which means it should be:
[[4],[10],[18]]
Note that it is np.matrix and not np.array
Tested np.multiply() on ipython and it worked like a charm
In [41]: X = np.matrix([[1],[2],[3]])
In [42]: Y = np.matrix([[4],[5],[6]])
In [43]: np.multiply(X, Y)
Out[43]:
matrix([[ 4],
[10],
[18]])
so remember that NumPy matrix is a subclass of NumPy array, and array operations are element-wise.
therefore, you can convert your matrices to NumPy arrays, then multiply them with the "*" operator, which will be element-wise:
>>> import numpy as NP
>>> X = NP.matrix([[1],[2],[3]])
>>> Y = NP.matrix([[4],[5],[6]])
>>> X1 = NP.array(X)
>>> Y1 = NP.array(Y)
>>> XY1 = X1 * Y1
array([[ 4],
[10],
[18]])
>>> XY = matrix(XY1)
>>> XY
matrix([[ 4],
[10],
[18]])
alternatively you can use a generic function for element-wise multiplication:
>>> a = NP.matrix("4 5 7; 9 3 2; 3 9 1")
>>> b = NP.matrix("5 2 9; 8 4 2; 1 7 4")
>>> ab = NP.multiply(a, b)
>>> ab
matrix([[20, 10, 63],
[72, 12, 4],
[ 3, 63, 4]])
these two differ in the return type and so you probably want to choose the first if the next function in your data flow requires a NumPy array; if it requires a NumPy matrix, then the second