xarray mask for selected points - python

I can use slicing to select a region when opening netcdf files in xarray, using preprocess ie:
SSA=dict(lat=slice(-38,-34),lon=slice(138,141))
def Mask(ds):
return ds.sel(**SSA)
xr.open_mfdataset(filelist, preprocess=Mask)
but what is the most efficient way to extract the data for a list of seperate points by latitude and longitude??

A list of points can be selected using a DataArray as the indexer. This will result in the array being reindexed along the indexer's coordinates.
Straight from the docs on More Advanced Indexing:
In [78]: da = xr.DataArray(np.arange(56).reshape((7, 8)), dims=['x', 'y'])
In [79]: da
Out[79]:
<xarray.DataArray (x: 7, y: 8)>
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55]])
Dimensions without coordinates: x, y
In [80]: da.isel(x=xr.DataArray([0, 1, 6], dims='z'),
....: y=xr.DataArray([0, 1, 0], dims='z'))
....:
Out[80]:
<xarray.DataArray (z: 3)>
array([ 0, 9, 48])
Dimensions without coordinates: z
The indexing array can also be easily pulled out of a pandas DataFrame, with something like da.sel(longitude=df.longitude.to_xarray(), latitude=df.latitude.to_xarray()), which will result in the DataArray being reindexed by the DataFrame's index.

Related

np.take from 3D matrix given indices of second dimension

given a 3D array:
a = np.arange(3*4*5).reshape(3,4,5)
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
I would like to create the following matrix:
result =
array([[20, 21, 22, 23, 24],
[ 5, 6, 7, 8, 9],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]])
Using the indices idx = [1,0,2,2]
I.e, I would like to "take" per matrix, the row specified in idx, where len(idx)==a.shape[1] and np.max(idx)<a.shape[0] as idx choose from dimension 1.
Given that your array has three dimensions (x,y,z), since you want to take one value for each row in the yth direction, you can do this:
a[idx, range(a.shape[1])]
Output:
array([[20, 21, 22, 23, 24],
[ 5, 6, 7, 8, 9],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]])

How to keep np.array properties in pandas dataframe where elements are arrays?

I want to use a dataframe as a sort of database where elements of it are numpy arrays and I want to keep their properties for later use in dataframe operations.
import numpy as np
import pandas as pd
names=["randomname"]*4
identifiers=["a","b","b","c"]
arrays=[np.arange(0,10),np.arange(20,30),np.arange(22,32),np.arange(40,50)]
alltogether=np.array([names,identifiers,arrays])
df=pd.DataFrame(data=alltogether.T,columns=["names","id","arrays"])
This gives me somewhat the desired Dataframe.
However I want to be able to use DataFrame indexing logic together with plotting.
For example
df[df["id"]=="b"].plot()
this currently gives
TypeError: no numeric data to plot
Now can anybody help on how to still keep this element consisting of a np.array ?
Ideally my indexing logic would enable me to plot multiple of the arrays with certain criteria(here id=b)
I am kinda lost
Your code:
In [38]: names=["randomname"]*4
...: identifiers=["a","b","b","c"]
...: arrays=[np.arange(0,10),np.arange(20,30),np.arange(22,32),np.arange(40,
...: 50)]
...: alltogether=np.array([names,identifiers,arrays])
...: df=pd.DataFrame(data=alltogether.T,columns=["names","id","arrays"])
<ipython-input-38-f2bc5f6c3a15>:4: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
alltogether=np.array([names,identifiers,arrays])
Did you get this warning? Does it bother you? It's produced by that alltogether line. You are mixing strings and arrays, and result has to be an object dtype array.
Anyways, the result (which you should have shown :( ):
In [39]: df
Out[39]:
names id arrays
0 randomname a [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 randomname b [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
2 randomname b [22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
3 randomname c [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
In [40]: df.dtypes
Out[40]:
names object
id object
arrays object
dtype: object
In [41]: alltogether
Out[41]:
array([['randomname', 'randomname', 'randomname', 'randomname'],
['a', 'b', 'b', 'c'],
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
array([22, 23, 24, 25, 26, 27, 28, 29, 30, 31]),
array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]], dtype=object)
In [42]: df['arrays'].to_numpy()
Out[42]:
array([array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
array([22, 23, 24, 25, 26, 27, 28, 29, 30, 31]),
array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])], dtype=object)
Selecting a couple of rows:
In [46]: df[df['id']=='b']
Out[46]:
names id arrays
1 randomname b [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
2 randomname b [22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
What is the plot method supposed to do? I haven't used it
I can make a simple line plot from a frame like this:
In [58]: adf = pd.DataFrame(np.arange(5)**2)
In [59]: adf
Out[59]:
0
0 0
1 1
2 4
3 9
4 16
In [60]: adf.plot()
But that has one number per cell, not strings (your id and names columns) or arrays.
I could use matplotlib plot function calls on the individual array elements of your frame. But a simple call the dataframe plot method won't do it.
In [68]: df[df['id']=='b']['arrays'].to_numpy()
Out[68]:
array([array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
array([22, 23, 24, 25, 26, 27, 28, 29, 30, 31])], dtype=object)
In [69]: plt.plot(_[0],_[1])
Out[69]: [<matplotlib.lines.Line2D at 0x7f3915e6bfd0>]
This uses the 20:30 range as x axis, and 22:32 as y.

More Pythonic / elegant way to fill a 2D array with sequences of integers?

I want to create a 6x6 numpy matrix, with the first row filled with: 0, 1, ..., 5, the second row filled with 10, 11, ... , 15, and the last row filled with 50, 51, ... , 55.
I thought about using (1) nested (two layer) list comprehensions, and then converting list-of-list into a numpy.matrix object, or (2) using variables inside of range function, i.e. - range(x) and vary x from 1 to 6. I was not able to get either of these two ideas to work.
Below is my non-vectorized / looping code to construct this matrix. Is there a more Pythonic way of doing this?
a = np.zeros((6,6))
for i in range(6):
for j in range(6):
a[i,j] = 10*i + j
print(a)
(This is one of the examples given at 39:00 in the intro video to NumPy on Youtube:
Intro to Numerical Computing with NumPy
How about np.ogrid?
np.add(*np.ogrid[:60:10, :6])
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
Details
ogrid returns an open meshgrid:
a, b = np.ogrid[:60:10, :6]
a
# array([[ 0],
# [10],
# [20],
# [30],
# [40],
# [50]])
b
# array([[0, 1, 2, 3, 4, 5]])
You can then perform broadcasted addition:
# a + b
np.add(a, b)
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
Similarly, you can also generate two ranges using np.arange and add them:
np.arange(0, 60, 10)[:,None] + np.arange(6)
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
This can be accomplished with broadcasting,
arange(0, 6) + 10*arange(0, 6)[:, None]
array([[ 0, 1, 2, 3, 4, 5],
[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55]])
I'd recommend reading https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html and https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html. "Pythonic" doesn't really matter when working with numpy. Some times iterating, list comprehensions, and other pythonic approaches work well with arrays, other times they are terribly inefficient. However, the links given cover some high level concepts that are very powerfull with numpy.

numpy mask using np.where then replace values

I've got two 2-D numpy arrays with same shape, let's say (10,6).
The first array x is full of some meaningful float numbers.
x = np.arange(60).reshape(-1,6)
The second array a is sparse array, with each row contains ONLY 2 non-zero values.
a = np.zeros((10,6))
for i in range(10):
a[i, 1] = 1
a[i, 2] = 1
Then there's a third array with the shape of (10,2), and I want to update the values of each row to the first array x at the position where a is not zero.
v = np.arange(20).reshape(10,2)
so the original x and the updated x will be:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59]])
and
array([[ 0, 0, 1, 3, 4, 5],
[ 6, 2, 3, 9, 10, 11],
[12, 4, 5, 15, 16, 17],
[18, 6, 7, 21, 22, 23],
[24, 8, 9, 27, 28, 29],
[30, 10, 11, 33, 34, 35],
[36, 12, 13, 39, 40, 41],
[42, 14, 15, 45, 46, 47],
[48, 16, 17, 51, 52, 53],
[54, 18, 19, 57, 58, 59]])
I've tried the following method
x[np.where(a!=0)] = v
Then I got an error of shape mismatch: value array of shape (10,2) could not be broadcast to indexing result of shape (20,)
What's wrong with this approach, is there an alternative to do it? Thanks a lot.
Thanks to the comment by #Divakar, the problem happens because the shapes of the two variables on both side of the assignment mark = are different.
To the left, the expression x[np.where(a!=0)] or x[a!=0] or x[np.nonzero(a)] are not structured, which has a shape of (20,)
To the right, we need an array of similar shape to finish the assignment. Therefore, a simple ravel() or reshape(-1) will do the job.
so the solution is as simple as x[a!=0] = v.ravel().
import numpy as np
arrayOne = np.random.rand(6).reshape((2, 3))
arrayTwo = np.asarray([[0,1,2], [1,2,0]])
arrayThree = np.zeros((2, 2))
arrayOne[arrayTwo != 0] = arrayThree.ravel()
print(arrayOne)
[[0.56251284 0. 0. ]
[0. 0. 0.20076913]]
Note regarding edit: The solution above is not mine, all credit goes to Divakar. I edited because my earlier answer misunderstood OP's question and I wish to avoid confusion.

numpy 3 dimension array middle indexing bug

I seems found a bug when I'm using python 2.7 with numpy module:
import numpy as np
x=np.arange(3*4*5).reshape(3,4,5)
x
Here I got the full 'x' array as follows:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
Then I try to indexing single row values in sheet [1]:
x[1][0][:]
Result:
array([20, 21, 22, 23, 24])
But something wrong while I was try to indexing single column in sheet [1]:
x[1][:][0]
Result still be the same as previous:
array([20, 21, 22, 23, 24])
Should it be array([20, 25, 30, 35])??
It seems something wrong while indexing the middle index with range?
No, it's not a bug.
When you use [:] you are using slicing notation and it takes all the list:
l = ["a", "b", "c"]
l[:]
#output:
["a", "b", "c"]
and in your case:
x[1][:]
#output:
array([[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]])
What you realy wish is using numpy indexing notation:
x[1, : ,0]
#output:
array([20, 25, 30, 35])
This is not a bug. x[1][:][0] is not a multiple index ("give me the elements where first dimension is 1, second is any, third is 0"). Instead, you are indexing three times, three objects.
x1 = x[1] # x1 is the first 4x5 subarray
x2 = x1[:] # x2 is same as x1
x3 = x2[0] # x3 is the first row of x2
To use multiple index, you want to do it in a single slice:
x[1, :, 0]

Categories

Resources