I have N 1D xr.DataArray's with an 1 array coordinate b and 1 scalar coordinate a. I want to combine them to a 2D DataArray with array coordinates b, a. How to do this? I have tried:
x1 = xr.DataArray(np.arange(0,3)[...,np.newaxis], coords=[('b', np.arange(3,6)),('a', [10])]).squeeze()
x2 = xr.DataArray(np.arange(0,3)[...,np.newaxis], coords=[('b', np.arange(3,6)),('a', [11])]).squeeze()
xcombined = xr.concat([x1, x2])
xcombined
Results in :
<xarray.DataArray (concat_dims: 2, b: 3)>
array([[0, 1, 2],
[0, 1, 2]])
Coordinates:
* b (b) int64 3 4 5
a (concat_dims) int64 10 11
Dimensions without coordinates: concat_dims
Now I like to select a particularly 'a':
xcombined.sel(a=10)
However, this raises:
ValueError: dimensions or multi-index levels ['a'] do not exist
If you supply dim to concat, this works:
xcombined = xr.concat([x1, x2], dim='a')
And then:
xcombined.sel(a=10)
<xarray.DataArray (b: 3)>
array([0, 1, 2])
Coordinates:
* b (b) int64 3 4 5
a int64 10
Related
I have a xarray dataset from which I would like to extract points based on their coordinates. When sel is used for two coordinates it returns a 2D array. Sometimes this is what I want and it is the intended behavior, but I would like to extract a line from the dataset.
import xarray as xr
import numpy as np
ds = xr.Dataset(
{'data': (('y', 'x'), np.linspace(1, 9, 9).reshape(3, 3))},
coords={
'x': [0, 1, 2],
'y': [0, 1, 2]
}
)
"""
<xarray.Dataset>
Dimensions: (x: 3, y: 3)
Coordinates:
* x (x) int32 0 1 2
* y (y) int32 0 1 2
Data variables:
xx = np.array([0, 1])
yy = np.array([1, 2])
data (y, x) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
"""
xx = np.array([0, 1])
yy = np.array([1, 2])
print(ds.sel(x=xx, y=yy).data.values)
"""
[[4. 5.]
[7. 8.]]
"""
[ds.sel(x=x, y=y).data.item() for x, y in zip(xx, yy)]
"""
[4.0, 8.0]
"""
The example is given for sel. Ideally I would like to use the interp option of the dataset in the same way.
xx = np.array([0.25, 1.25])
yy = np.array([0.75, 1.75])
ds.interp(x=xx, y=yy).data.values
"""
array([[3.5, 4.5],
[6.5, 7.5]])
"""
[ds.interp(x=x, y=y).data.item() for x, y in zip(xx, yy)]
"""
[3.5, 7.5]
"""
See the docs on More Advanced Indexing. When you select or interpolate using a DataArray rather than a numpy array, the result will be reshaped to conform to the dimensions indexing the selector:
xx = xr.DataArray([0, 1], dims=["point"])
yy = xr.DataArray([1, 2], dims=["point"])
# will be indexed by point (Len 2) not x or y
ds.sel(x=xx, y=yy)
This works the same way with interp
I currently have a dataset that when opened with xarray contains three coordinates x, y, band. The band coordinate has temperature and dewpoint each at 4 different time intervals, meaning there are 8 total bands. Is there a way to reshape this so that I could have x, y, band, time such that the band coordinate is now only length 2 and the time coordinate would be length 4?
I thought I could add a new coordinate named time and then add the bands in but
ds = ds.assign_coords(time=[1,2,3,4])
returns ValueError: cannot add coordinates with new dimensions to a DataArray.
You can re-assign the "band" coordinate to a MultiIndex:
In [4]: da = xr.DataArray(np.random.random((4, 4, 8)), dims=['x', 'y', 'band'])
In [5]: da.coords['band'] = pd.MultiIndex.from_arrays(
...: [
...: [1, 1, 1, 1, 2, 2, 2, 2],
...: pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01'] * 2),
...: ],
...: names=['band_stacked', 'time'],
...: )
In [6]: stacked
Out[6]:
<xarray.DataArray (x: 4, y: 4, band: 8)>
array([[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01, 5.23808010e-01,
8.56941412e-01, 2.75757101e-01, 7.88877551e-02, 1.54739786e-02],
[3.70350510e-01, 1.90604842e-02, 2.17871931e-01, 9.40704074e-01,
4.28769745e-02, 9.24407375e-01, 2.81715762e-01, 9.12889594e-01],
[7.36529770e-02, 1.53507827e-01, 2.83341417e-01, 3.00687140e-01,
7.41822972e-01, 6.82413237e-01, 7.92126231e-01, 4.84821281e-01],
[5.24897891e-01, 4.69537663e-01, 2.47668326e-01, 7.56147251e-02,
6.27767921e-01, 2.70630355e-01, 5.44669493e-01, 3.53063860e-01]],
...
[[1.56513994e-02, 8.49568142e-01, 3.67268562e-01, 7.28406400e-01,
2.82383223e-01, 5.00901504e-01, 9.99643260e-01, 1.16446139e-01],
[9.98980637e-01, 2.45060112e-02, 8.12423749e-01, 4.49895624e-01,
6.64880037e-01, 8.73506549e-01, 1.79186788e-01, 1.94347924e-01],
[6.32000394e-01, 7.60414128e-01, 4.90153658e-01, 3.40693056e-01,
5.19820559e-01, 4.49398587e-01, 1.90339730e-01, 6.38101614e-02],
[7.64102189e-01, 6.79961676e-01, 7.63165470e-01, 6.23766131e-02,
5.62677420e-01, 3.85784911e-01, 4.43436365e-01, 2.44385584e-01]]])
Coordinates:
* band (band) MultiIndex
- band_stacked (band) int64 1 1 1 1 2 2 2 2
- time (band) datetime64[ns] 2020-01-01 2021-01-01 ... 2023-01-01
Dimensions without coordinates: x, y
Then you can expand the dimensionality by unstacking:
In [7]: unstacked
Out[7]:
<xarray.DataArray (x: 4, y: 4, band: 2, time: 4)>
array([[[[2.55228052e-01, 6.71680777e-01, 8.76158643e-01,
5.23808010e-01],
[8.56941412e-01, 2.75757101e-01, 7.88877551e-02,
1.54739786e-02]],
...
[[7.64102189e-01, 6.79961676e-01, 7.63165470e-01,
6.23766131e-02],
[5.62677420e-01, 3.85784911e-01, 4.43436365e-01,
2.44385584e-01]]]])
Coordinates:
* band (band) int64 1 2
* time (time) datetime64[ns] 2020-01-01 2021-01-01 2022-01-01 2023-01-01
Dimensions without coordinates: x, y
Another more manual option would be to reshape in numpy and just create a new DataArray. Note that this manual reshape is much faster for a larger array:
In [8]: reshaped = xr.DataArray(
...: da.data.reshape((4, 4, 2, 4)),
...: dims=['x', 'y', 'band', 'time'],
...: coords={
...: 'time': pd.to_datetime(['2020-01-01', '2021-01-01', '2022-01-01', '2023-01-01']),
...: 'band': [1, 2],
...: },
...: )
Note that if your data is chunked (and assuming you'd like to keep it that way) your options are a bit more limited - see the dask docs on reshaping dask arrays. The first (MultiIndexing unstack) approach does work with dask arrays as long as the arrays are not chunked along the unstacked dimension. See this question for an example.
I'm trying to understand vectorized indexing in xarray by following this example from the docs:
import xarray as xr
import numpy as np
da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'],
coords={'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})
ind_x = xr.DataArray([0, 1], dims=['x'])
ind_y = xr.DataArray([0, 1], dims=['y'])
The output of the array da is as follows:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
So far so good. Now in the example there are shown two ways of indexing. Orthogonal (not interested in this case) and vectorized (what I want). For the vectorized indexing the following is shown:
In [37]: da[ind_x, ind_x] # vectorized indexing
Out[37]:
<xarray.DataArray (x: 2)>
array([0, 5])
Coordinates:
y (x) <U1 'a' 'b'
* x (x) int64 0 1
The result seems to be what I want, but this feels very strange to me. ind_x (which in theory refers to dims=['x']) is being passed twice but somehow is capable of indexing what appears to be both in the x and y dims. As far as I understand the x dim would be the rows and y dim would be the columns, is that correct? How come the same ind_x is capable of accessing both the rows and the cols?
This seems to be the concept I need for my problem, but can't understand how it works or how to extend it to more dimensions. I was expecting this result to be given by da[ind_x, ind_y] however that seems to yield the orthogonal indexing surprisingly enough.
Having the example with ind_x being used twice is probably a little confusing: actually, the dimension of the indexer doesn't have to matter at all for the indexing behavior! Observe:
ind_a = xr.DataArray([0, 1], dims=["a"]
da[ind_a, ind_a]
Gives:
<xarray.DataArray (a: 2)>
array([0, 5])
Coordinates:
x (a) int32 0 1
y (a) <U1 'a' 'b'
Dimensions without coordinates: a
The same goes for the orthogonal example:
ind_a = xr.DataArray([0, 1], dims=["a"])
ind_b = xr.DataArray([0, 1], dims=["b"])
da[ind_a, ind_b]
Result:
<xarray.DataArray (a: 2, b: 2)>
array([[0, 2],
[4, 6]])
Coordinates:
x (a) int32 0 1
y (b) <U1 'a' 'c'
Dimensions without coordinates: a, b
The difference is purely in terms of "labeling", as in this case you end up with dimensions without coordinates.
Fancy indexing
Generally stated, I personally do not find "fancy indexing" the most intuitive concept. I did find this example in NEP 21 pretty clarifying: https://numpy.org/neps/nep-0021-advanced-indexing.html
Specifically, this:
Consider indexing a 2D array by two 1D integer arrays, e.g., x[[0, 1], [0, 1]]:
Outer indexing is equivalent to combining multiple integer indices with itertools.product(). The result in this case is another 2D
array with all combinations of indexed elements, e.g.,
np.array([[x[0, 0], x[0, 1]], [x[1, 0], x[1, 1]]])
Vectorized indexing is equivalent to combining multiple integer
indices with zip(). The result in this case is a 1D array containing
the diagonal elements, e.g., np.array([x[0, 0], x[1, 1]]).
Back to xarray
da[ind_x, ind_y]
Can also be written as:
da.isel(x=ind_x, y=ind_y)
The dimensions are implicit in the order. However, xarray still attempts to broadcast (based on dimension labels), so da[ind_y] mismatches and results in an error. da[ind_a] and da[ind_b] both work.
More dimensions
The dims you provide for the indexer are what determines the shape of the output, not the dimensions of the array you're indexing.
If you want to select single values along the dimensions (so we're zip()-ing through the indexes simultaneously), just make sure that your indexers share the dimension, here for a 3D array:
da = xr.DataArray(
data=np.arange(3 * 4 * 5).reshape(3, 4, 5),
coords={
"x": [1, 2, 3],
"y": ["a", "b", "c", "d"],
"z": [1.0, 2.0, 3.0, 4.0, 5.0],
},
dims=["x", "y", "z"],
)
ind_along_x = xr.DataArray([0, 1], dims=["new_index"])
ind_along_y = xr.DataArray([0, 2], dims=["new_index"])
ind_along_z = xr.DataArray([0, 3], dims=["new_index"])
da[ind_along_x, ind_along_y, ind_along_z]
Note that the values of the indexers do not have to the same -- that would be a pretty severe limitation, after all.
Result:
<xarray.DataArray (new_index: 2)>
array([ 0, 33])
Coordinates:
x (new_index) int32 1 2
y (new_index) <U1 'a' 'c'
z (new_index) float64 1.0 4.0
Dimensions without coordinates: new_index
I would like to know what is the most efficient, or most elegant way to interpolate between values of a DataArray. Ideally, this should be usable for arbitrary number of dimensions, but good solutions for low dimensions such as 2D and 3D would also be useful.
I am aware of the 'method' keyword for the sel method and have the feeling that is going to be part of the answer, but I find the solution I came up with not to be very elegant. (I do not know about its efficiency.) Let me illustrate this solution:
>>> import xarray as xr
>>> arr = xr.DataArray([[12, 32], [14, 34]],
... dims=['x', 'y'], coords={'x': [1, 3], 'y': [2, 4]})
>>> arr
<xarray.DataArray (x: 2, y: 2)>
array([[12, 32],
[14, 34]])
Coordinates:
* x (x) int64 1 3
* y (y) int64 2 4
>>> arry = ((arr.sel(x=2, method='pad') * (2 - arr.coords['x'].sel(x=2, method='pad')) +
... arr.sel(x=2, method='bfill') * (arr.coords['x'].sel(x=2, method='bfill') - 2)) /
... (arr.coords['x'].sel(x=2, method='bfill') - arr.coords['x'].sel(x=2, method='pad')))
>>> arry
<xarray.DataArray (y: 2)>
array([ 13., 33.])
Coordinates:
* y (y) int64 2 4
>>> ((arry.sel(y=3, method='pad') * (3 - arry.coords['y'].sel(y=3, method='pad')) +
... arry.sel(y=3, method='bfill') * (arry.coords['y'].sel(y=3, method='bfill') - 3)) /
... (arry.coords['y'].sel(y=3, method='bfill') - arry.coords['y'].sel(y=3, method='pad')))
<xarray.DataArray ()>
array(23.0)
For why I find this approach sub-optimal: the calculation of arry could include a large number of unnecessary arithmetic operations when the length of the y index is large (and not just 2 as in this toy example).
Consider the following vectors (essentially2x1 matrices):
a = sc.array([[1], [2], [3]])
>>> a
[[1]
[2]
[3]]
b = sc.array([[4], [5], [6]])
>>> b
[[4]
[5]
[6]]
The cross product of these vectors can be calculated using numpy.cross(). Why does this not work:
import numpy as np
np.cross(a, b)
ValueError: incompatible dimensions for cross product
(dimension must be 2 or 3)
but this does?:
np.cross(a.T, b.T)
[[-3 6 -3]]
To compute the cross product using numpy.cross, the dimension (length) of the array dimension which defines the two vectors must either by two or three. To quote the documentation:
If a and b are arrays of vectors, the vectors
are defined by the last axis of a and b by default, and these axes
can have dimensions 2 or 3.
Note that the last axis is the default. In your example:
In [17]: a = np.array([[1], [2], [3]])
In [18]: b = np.array([[4], [5], [6]])
In [19]: print a.shape,b.shape
(3, 1) (3, 1)
the last axis is only of length 1, so the cross product is not defined. However, if you use the transpose, the length along the last axis is 3, so it is valid. You could also do:
In [20]: np.cross(a,b,axis=0)
Out[20]:
array([[-3],
[ 6],
[-3]])
which tells cross that the vectors are defined along the first axis, rather than the last axis.
In numpy we often use 1d arrays to represent vectors, and we treat it as either a row vector or a column vector depending on the context, for example:
In [13]: a = np.array([1, 2, 3])
In [15]: b = np.array([4, 5, 6])
In [16]: np.cross(a, b)
Out[16]: array([-3, 6, -3])
In [17]: np.dot(a, b)
Out[17]: 32
You can store vectors as 2d arrays, this is most useful when you have a collection of vectors you want to treat in a similar way. For example if I want to cross 4 vectors in a with 4 vectors in b. By default numpy assumes the vectors are along the last dimensions but you can use the axisa and axisb arguments to explicitly specify that the vectors are along the first dimension.
In [26]: a = np.random.random((3, 4))
In [27]: b = np.random.random((3, 4))
In [28]: np.cross(a, b, axisa=0, axisb=0)
Out[28]:
array([[-0.34780508, 0.54583745, -0.25644455],
[ 0.03892861, 0.18446659, -0.36877085],
[ 0.36736545, 0.13549752, -0.32647531],
[-0.46253185, 0.56148668, -0.10056834]])
You should create a and b like this:
a = sc.array([1, 2, 3])
b = sc.array([4, 5, 6])
so that they have dimension = 3.