numpy mask using np.where then replace values - python

I've got two 2-D numpy arrays with same shape, let's say (10,6).
The first array x is full of some meaningful float numbers.
x = np.arange(60).reshape(-1,6)
The second array a is sparse array, with each row contains ONLY 2 non-zero values.
a = np.zeros((10,6))
for i in range(10):
a[i, 1] = 1
a[i, 2] = 1
Then there's a third array with the shape of (10,2), and I want to update the values of each row to the first array x at the position where a is not zero.
v = np.arange(20).reshape(10,2)
so the original x and the updated x will be:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59]])
and
array([[ 0, 0, 1, 3, 4, 5],
[ 6, 2, 3, 9, 10, 11],
[12, 4, 5, 15, 16, 17],
[18, 6, 7, 21, 22, 23],
[24, 8, 9, 27, 28, 29],
[30, 10, 11, 33, 34, 35],
[36, 12, 13, 39, 40, 41],
[42, 14, 15, 45, 46, 47],
[48, 16, 17, 51, 52, 53],
[54, 18, 19, 57, 58, 59]])
I've tried the following method
x[np.where(a!=0)] = v
Then I got an error of shape mismatch: value array of shape (10,2) could not be broadcast to indexing result of shape (20,)
What's wrong with this approach, is there an alternative to do it? Thanks a lot.

Thanks to the comment by #Divakar, the problem happens because the shapes of the two variables on both side of the assignment mark = are different.
To the left, the expression x[np.where(a!=0)] or x[a!=0] or x[np.nonzero(a)] are not structured, which has a shape of (20,)
To the right, we need an array of similar shape to finish the assignment. Therefore, a simple ravel() or reshape(-1) will do the job.
so the solution is as simple as x[a!=0] = v.ravel().

import numpy as np
arrayOne = np.random.rand(6).reshape((2, 3))
arrayTwo = np.asarray([[0,1,2], [1,2,0]])
arrayThree = np.zeros((2, 2))
arrayOne[arrayTwo != 0] = arrayThree.ravel()
print(arrayOne)
[[0.56251284 0. 0. ]
[0. 0. 0.20076913]]
Note regarding edit: The solution above is not mine, all credit goes to Divakar. I edited because my earlier answer misunderstood OP's question and I wish to avoid confusion.

Related

How do you print elements that are less than a variable from a numpy array

Hi so Im fairly new to python and an assignment require me to print elements that are less than a variable from a numpy array.
I made a 20x10 numpy array of random integers between -5 and 50
x = np.random.randint (-5, 50, (20, 10))
x
array([[17, 23, 15, 13, -1, 17, 30, 14, 2, 3],
[ 8, 0, -5, 3, 10, 10, 48, 6, -1, 34],
[23, 40, 21, 5, 47, 41, 44, 22, 46, 30],
[36, 13, 48, 29, 46, 25, 48, 38, 13, 40],
[18, -4, 1, 37, 48, 43, 25, 11, 21, 30],
[44, 37, 4, 39, 8, 1, 33, 34, 3, 8],
[ 2, 11, 17, 10, 20, 3, 30, 1, 12, 2],
[15, 20, -3, 11, 45, 40, 18, 19, -1, 31],
[39, 44, 18, 25, 49, 20, 15, 28, 32, 18],
[22, 24, 28, 46, 48, 46, 17, 49, 2, 36],
[44, 4, 49, -5, 14, 31, 12, 15, 48, 43],
[-2, 37, -4, 15, 31, -1, 11, 43, 42, 5],
[40, 35, 25, 22, 38, 26, 15, 1, 4, 22],
[42, 30, 14, 7, 13, 44, 5, 29, 28, 38],
[-2, 7, 31, -4, 44, -5, 34, 19, 31, 30],
[ 0, 1, -2, 29, 35, 28, 23, -1, 21, 27],
[40, 46, 4, 48, 0, 28, 2, 25, 3, 49],
[15, 2, -2, 16, 22, 39, -2, 33, 15, 2],
[14, 26, -5, 0, 22, 38, 25, 4, 14, 2],
[16, 32, 23, 3, 38, 41, -5, 35, 46, 33]])
above is the result. Now i want to print the number of elements that are less than 5 in each row.
I managed to do this
print (x[0, :] < 5)
[False False False False True False False False True True]
the result is as shown above but what i wanted was for it to show the number of elements that is less than 5. I wanted for it to give me 3 since there are 3 elements.
Can anyone help me with this? Thank you
It's possible to use np.sum for arrays of type bool like yours. So, at first I have tried the following:
[np.sum(n<5) for n in x]
This gives me a list [3, 4, 0, 0, 2, 3, 4, 2, 0, 1, 2, 3, 2, 0, 3, 4, 4, 4, 4, 2] which is correct but the bad thing is that you need to avoid list comprehensions in numpy actions. Here is the best way to do this in numpy:
np.sum(x<5, axis=1)
This command makes bool array out of x and then calculates True values for each row along y axis (axis number 1)
You can use your boolean mask to index the array and then count the elements. Alternatively, you can use numpy.where(). Similar to your approach, it will give you a boolean mask where a certain condition is met.
For your example:
indices = numpy.where(x < 3)
values_greater_than_3 = x[indices]
count = len(values_greater_than_3)
print(count)

More Pythonic / elegant way to fill a 2D array with sequences of integers?

I want to create a 6x6 numpy matrix, with the first row filled with: 0, 1, ..., 5, the second row filled with 10, 11, ... , 15, and the last row filled with 50, 51, ... , 55.
I thought about using (1) nested (two layer) list comprehensions, and then converting list-of-list into a numpy.matrix object, or (2) using variables inside of range function, i.e. - range(x) and vary x from 1 to 6. I was not able to get either of these two ideas to work.
Below is my non-vectorized / looping code to construct this matrix. Is there a more Pythonic way of doing this?
a = np.zeros((6,6))
for i in range(6):
for j in range(6):
a[i,j] = 10*i + j
print(a)
(This is one of the examples given at 39:00 in the intro video to NumPy on Youtube:
Intro to Numerical Computing with NumPy
How about np.ogrid?
np.add(*np.ogrid[:60:10, :6])
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
Details
ogrid returns an open meshgrid:
a, b = np.ogrid[:60:10, :6]
a
# array([[ 0],
# [10],
# [20],
# [30],
# [40],
# [50]])
b
# array([[0, 1, 2, 3, 4, 5]])
You can then perform broadcasted addition:
# a + b
np.add(a, b)
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
Similarly, you can also generate two ranges using np.arange and add them:
np.arange(0, 60, 10)[:,None] + np.arange(6)
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
This can be accomplished with broadcasting,
arange(0, 6) + 10*arange(0, 6)[:, None]
array([[ 0, 1, 2, 3, 4, 5],
[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55]])
I'd recommend reading https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html and https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html. "Pythonic" doesn't really matter when working with numpy. Some times iterating, list comprehensions, and other pythonic approaches work well with arrays, other times they are terribly inefficient. However, the links given cover some high level concepts that are very powerfull with numpy.

xarray mask for selected points

I can use slicing to select a region when opening netcdf files in xarray, using preprocess ie:
SSA=dict(lat=slice(-38,-34),lon=slice(138,141))
def Mask(ds):
return ds.sel(**SSA)
xr.open_mfdataset(filelist, preprocess=Mask)
but what is the most efficient way to extract the data for a list of seperate points by latitude and longitude??
A list of points can be selected using a DataArray as the indexer. This will result in the array being reindexed along the indexer's coordinates.
Straight from the docs on More Advanced Indexing:
In [78]: da = xr.DataArray(np.arange(56).reshape((7, 8)), dims=['x', 'y'])
In [79]: da
Out[79]:
<xarray.DataArray (x: 7, y: 8)>
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55]])
Dimensions without coordinates: x, y
In [80]: da.isel(x=xr.DataArray([0, 1, 6], dims='z'),
....: y=xr.DataArray([0, 1, 0], dims='z'))
....:
Out[80]:
<xarray.DataArray (z: 3)>
array([ 0, 9, 48])
Dimensions without coordinates: z
The indexing array can also be easily pulled out of a pandas DataFrame, with something like da.sel(longitude=df.longitude.to_xarray(), latitude=df.latitude.to_xarray()), which will result in the DataArray being reindexed by the DataFrame's index.

numpy 3 dimension array middle indexing bug

I seems found a bug when I'm using python 2.7 with numpy module:
import numpy as np
x=np.arange(3*4*5).reshape(3,4,5)
x
Here I got the full 'x' array as follows:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
Then I try to indexing single row values in sheet [1]:
x[1][0][:]
Result:
array([20, 21, 22, 23, 24])
But something wrong while I was try to indexing single column in sheet [1]:
x[1][:][0]
Result still be the same as previous:
array([20, 21, 22, 23, 24])
Should it be array([20, 25, 30, 35])??
It seems something wrong while indexing the middle index with range?
No, it's not a bug.
When you use [:] you are using slicing notation and it takes all the list:
l = ["a", "b", "c"]
l[:]
#output:
["a", "b", "c"]
and in your case:
x[1][:]
#output:
array([[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]])
What you realy wish is using numpy indexing notation:
x[1, : ,0]
#output:
array([20, 25, 30, 35])
This is not a bug. x[1][:][0] is not a multiple index ("give me the elements where first dimension is 1, second is any, third is 0"). Instead, you are indexing three times, three objects.
x1 = x[1] # x1 is the first 4x5 subarray
x2 = x1[:] # x2 is same as x1
x3 = x2[0] # x3 is the first row of x2
To use multiple index, you want to do it in a single slice:
x[1, :, 0]

Slicing 3d numpy array returns strange shape

If I slice a 2d array with a set of coordinates
>>> test = np.reshape(np.arange(40),(5,8))
>>> coords = np.array((1,3,4))
>>> slice = test[:, coords]
then my slice has the shape that I would expect
>>> slice.shape
(5, 3)
But if I repeat this with a 3d array
>>> test = np.reshape(np.arange(80),(2,5,8))
>>> slice = test[0, :, coords]
then the shape is now
>>> slice.shape
(3, 5)
Is there a reason that these are different? Separating the indices returns the shape that I would expect
>>> slice = test[0][:][coords]
>>> slice.shape
(5, 3)
Why would these views have different shapes?
slice = test[0, :, coords]
is simple indexing, in effect saying "take the 0th element of the first coordinate, all of the second coordinate, and [1,3,4] of the third coordinate". Or more precisely, take coordinates (0,whatever,1) and make it our first row, (0,whatever,2) and make it our second row, and (0,whatever,3) and make it our third row. There are 5 whatevers, so you end up with (3,5).
The second example you gave is like this:
slice = test[0][:][coords]
In this case you're looking at a (5,8) array, and then taking the 1st, 3rd and 4th elements, which are the 1st, 3rd and 4th rows, so you end up with a (5,3) array.
Edit to discuss 2D case:
In the 2D case, where:
>>> test = np.reshape(np.arange(40),(5,8))
>>> test
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39]])
the behaviour is similar.
Case 1:
>>> test[:,[1,3,4]]
array([[ 1, 3, 4],
[ 9, 11, 12],
[17, 19, 20],
[25, 27, 28],
[33, 35, 36]])
is simply selecting columns 1,3, and 4.
Case 2:
>>> test[:][[1,3,4]]
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39]])
is taking the 1st, 3rd and 4th element of the array, which are the rows.
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
The docs talk about the complexity of combining advanced and basic indexing.
test[0, :, coords]
The indexing coords comes first, with the [0,:] after, producing the the (3,5).
The easiest way to understand the situation may be to think in terms of the result shape. There are two parts to the indexing operation, the subspace defined by the basic indexing (excluding integers) and the subspace from the advanced indexing part. [in the case where]
The advanced indexes are separated by a slice, ellipsis or newaxis. For example x[arr1, :, arr2].
.... the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that.
I recall discussing this kind of indexing in a previous SO question, but it would take some digging to find it.
https://stackoverflow.com/a/28353446/901925 Why does the order of dimensions change with boolean indexing?
How does numpy order array slice indices?
The [:] in test[0][:][coords] does nothing. test[0][:,coords] produces the desired (5,3) result.
In [145]: test[0,:,[1,2,3]] # (3,5) array
Out[145]:
array([[ 1, 9, 17, 25, 33], # test[0,:,1]
[ 2, 10, 18, 26, 34],
[ 3, 11, 19, 27, 35]])
In [146]: test[0][:,[1,2,3]] # same values but (5,3)
Out[146]:
array([[ 1, 2, 3],
[ 9, 10, 11],
[17, 18, 19],
[25, 26, 27],
[33, 34, 35]])
In [147]: test[0][:][[1,2,3]] # [:] does nothing; select 3 from 2nd axis
Out[147]:
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31]])
In [148]: test[0][[1,2,3]] # same as test[0,[1,2,3],:]
Out[148]:
array([[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31]])

Categories

Resources