Numpy function to extract values from ndarray by indices vector - python

I have a matrix mat of size (3, 5, 4) and a vector vec of size (4,) with indices corresponding to the first dimension of the matrix (i.e. between 0 and 2). I would like to extract an array of size (4, 5), which can be done via mat[vec, :, [True] * len(vec)], but I was wondering if there is a more elegant solution using numpy functions without the need to create a new list of boolean values.

In [15]: mat = np.arange(3 * 5 * 4).reshape(3, 5, 4)
In [16]: idx = np.array([0, 2, 1, 1])
In [18]: mat[idx, :, [True] * len(idx)]
Out[18]:
array([[ 0, 4, 8, 12, 16],
[41, 45, 49, 53, 57],
[22, 26, 30, 34, 38],
[23, 27, 31, 35, 39]])
equivalent - whether it's more elegant?
In [19]: mat[idx, :, np.arange(4)]
Out[19]:
array([[ 0, 4, 8, 12, 16],
[41, 45, 49, 53, 57],
[22, 26, 30, 34, 38],
[23, 27, 31, 35, 39]])
Unless you want a (4,5,4), you will have to provide equal size arrays for the 1st and 3rd dimensions. There's no way around that.

Related

How to add values at repeat index locations on multidimensional arrays of Numpy?

I have a matrix B with shape (6, 9) . And for every row of B, I want to add 1 at some column indices. The column indices may appear more than once, so I hope add m on one column if which index appear m times. Please see the following example codes:
import numpy as np
B = np.arange(6*9).reshape(6, 9)
idx = np.array([[0, 1, 2],
[6, 7, 0],
[2, 3, 4],
[4, 5, 6]], dtype=np.int)
B[:, idx] += 1 # the result is not what I want.
Furthermore, np.add.at and np.bincount also do not seem to work for above case.
I hope your help. Thanks very much.
More Information:
In idx array, index 0, 2 4 and 6 appear twice, so I want
B[:, [0, 2, 4, 6]] += 2. For other indices appeared once, just add 1. So the final B should be
B = np.array([[ 2, 2, 4, 4, 6, 6, 8, 8, 8],
[11, 11, 13, 13, 15, 15, 17, 17, 17],
[20, 20, 22, 22, 24, 24, 26, 26, 26],
[29, 29, 31, 31, 33, 33, 35, 35, 35],
[38, 38, 40, 40, 42, 42, 44, 44, 44],
[47, 47, 49, 49, 51, 51, 53, 53, 53]])
I think you can use np.add.at function to get what you want. Its syntax is
np.add.at('array', ('slice or array of indices for 1st dimension', 'slice or array of indices for 2nd dimension'), 'what to add')
So, in your case, if you want to add 1 for every row for every column, specified in idx, you should use
>>> a = np.arange(6 * 9).reshape(6, 9)
>>> np.add.at(a, (np.s_[:], idx), 1)
np.s_[:] is a slice object that tells us to perform it for each row

xarray mask for selected points

I can use slicing to select a region when opening netcdf files in xarray, using preprocess ie:
SSA=dict(lat=slice(-38,-34),lon=slice(138,141))
def Mask(ds):
return ds.sel(**SSA)
xr.open_mfdataset(filelist, preprocess=Mask)
but what is the most efficient way to extract the data for a list of seperate points by latitude and longitude??
A list of points can be selected using a DataArray as the indexer. This will result in the array being reindexed along the indexer's coordinates.
Straight from the docs on More Advanced Indexing:
In [78]: da = xr.DataArray(np.arange(56).reshape((7, 8)), dims=['x', 'y'])
In [79]: da
Out[79]:
<xarray.DataArray (x: 7, y: 8)>
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55]])
Dimensions without coordinates: x, y
In [80]: da.isel(x=xr.DataArray([0, 1, 6], dims='z'),
....: y=xr.DataArray([0, 1, 0], dims='z'))
....:
Out[80]:
<xarray.DataArray (z: 3)>
array([ 0, 9, 48])
Dimensions without coordinates: z
The indexing array can also be easily pulled out of a pandas DataFrame, with something like da.sel(longitude=df.longitude.to_xarray(), latitude=df.latitude.to_xarray()), which will result in the DataArray being reindexed by the DataFrame's index.

numpy mask using np.where then replace values

I've got two 2-D numpy arrays with same shape, let's say (10,6).
The first array x is full of some meaningful float numbers.
x = np.arange(60).reshape(-1,6)
The second array a is sparse array, with each row contains ONLY 2 non-zero values.
a = np.zeros((10,6))
for i in range(10):
a[i, 1] = 1
a[i, 2] = 1
Then there's a third array with the shape of (10,2), and I want to update the values of each row to the first array x at the position where a is not zero.
v = np.arange(20).reshape(10,2)
so the original x and the updated x will be:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59]])
and
array([[ 0, 0, 1, 3, 4, 5],
[ 6, 2, 3, 9, 10, 11],
[12, 4, 5, 15, 16, 17],
[18, 6, 7, 21, 22, 23],
[24, 8, 9, 27, 28, 29],
[30, 10, 11, 33, 34, 35],
[36, 12, 13, 39, 40, 41],
[42, 14, 15, 45, 46, 47],
[48, 16, 17, 51, 52, 53],
[54, 18, 19, 57, 58, 59]])
I've tried the following method
x[np.where(a!=0)] = v
Then I got an error of shape mismatch: value array of shape (10,2) could not be broadcast to indexing result of shape (20,)
What's wrong with this approach, is there an alternative to do it? Thanks a lot.
Thanks to the comment by #Divakar, the problem happens because the shapes of the two variables on both side of the assignment mark = are different.
To the left, the expression x[np.where(a!=0)] or x[a!=0] or x[np.nonzero(a)] are not structured, which has a shape of (20,)
To the right, we need an array of similar shape to finish the assignment. Therefore, a simple ravel() or reshape(-1) will do the job.
so the solution is as simple as x[a!=0] = v.ravel().
import numpy as np
arrayOne = np.random.rand(6).reshape((2, 3))
arrayTwo = np.asarray([[0,1,2], [1,2,0]])
arrayThree = np.zeros((2, 2))
arrayOne[arrayTwo != 0] = arrayThree.ravel()
print(arrayOne)
[[0.56251284 0. 0. ]
[0. 0. 0.20076913]]
Note regarding edit: The solution above is not mine, all credit goes to Divakar. I edited because my earlier answer misunderstood OP's question and I wish to avoid confusion.

What does transpose(3, 0, 1, 2) mean?

What does this mean?
data.transpose(3, 0, 1, 2)
Also, if data.shape == (10, 10, 10), why do I get ValueError: axes don't match array?
Let me discuss in terms of Python3.
I use the transpose function in python as data.transpose(3, 0, 1, 2)
This is wrong as this operation requires 4 dimensions, while you only provide 3 (as in (10,10,10)). Reproducible as:
>>> a = np.arange(60).reshape((1,4,5,3))
>>> b = a.transpose((2,0,1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: axes don't match array
You can either add another dimension simply by reshaping (10,10,10) to (1,10,10,10) if the image batch is 1. This can be done as:
w,h,c = original_image.shape #10,10,10
modified_img = np.reshape((1,w,h,c)) #(1,10,10,10)
what does it mean of 3, 0, 1, 2.
For 2D numpy arrays, transpose for an array (matrix) operates just as the names say. But for higher dimensional arrays like yours, it basically works as moveaxis.
>>> a = np.arange(60).reshape((4,5,3))
>>> b = a.transpose((2,0,1))
>>> b.shape
(3, 4, 5)
>>> c = np.moveaxis(a,-1,0)
>>> c.shape
(3, 4, 5)
>>> b
array([[[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57]],
[[ 1, 4, 7, 10, 13],
[16, 19, 22, 25, 28],
[31, 34, 37, 40, 43],
[46, 49, 52, 55, 58]],
[[ 2, 5, 8, 11, 14],
[17, 20, 23, 26, 29],
[32, 35, 38, 41, 44],
[47, 50, 53, 56, 59]]])
>>> c
array([[[ 0, 3, 6, 9, 12],
[15, 18, 21, 24, 27],
[30, 33, 36, 39, 42],
[45, 48, 51, 54, 57]],
[[ 1, 4, 7, 10, 13],
[16, 19, 22, 25, 28],
[31, 34, 37, 40, 43],
[46, 49, 52, 55, 58]],
[[ 2, 5, 8, 11, 14],
[17, 20, 23, 26, 29],
[32, 35, 38, 41, 44],
[47, 50, 53, 56, 59]]])
As evident, both methods work the same.
The operation converts from (samples, rows, columns, channels) into (samples, channels, rows, cols),maybe opencv to pytorch.
Have a look at numpy.transpose
Use transpose(a, argsort(axes)) to invert the transposition of tensors
when using the axes keyword argument.
Transposing a 1-D array returns an unchanged view of the original
array.
e.g.
>>> x = np.arange(4).reshape((2,2))
>>> x
array([[0, 1],
[2, 3]])
>>>
>>> np.transpose(x)
array([[0, 2],
[1, 3]])
You specified too many values in the transpose
>>> a = np.arange(8).reshape(2,2,2)
>>> a.shape (2, 2, 2)
>>> a.transpose([2,0,1])
array([[[0, 2],
[4, 6]],
[[1, 3],
[5, 7]]])
>>> a.transpose(3,0,1,2) Traceback (most recent call last): File "<interactive input>", line 1, in <module> ValueError: axes don't match array
>>>
From the python documentation on np.transpose, the second argument of the np.transpose function is axes, which is a list of ints, optional
by default and reverse the dimensions, otherwise permute the axes
according to the values given.
Example :
>>> x = np.arange(9).reshape((3,3))
>>> x
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.transpose(x, (0,1))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.transpose(x, (1,0))
array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])
The thing is you have taken a 3 dimensional matrix and applied a 4 dimensional transpose.
your command is to convert a 4d matrix(batch,rows,cols,channel) to another 4d matrix (rows,cols,channel,batch) but you need a command to convert 3d matrix.so remove 3 and write
data.transpose(2, 0, 1).
For all i, j, k, l, the following holds true:
arr[i, j, k, l] == arr.transpose(3, 0, 1, 2)[l, i, j, k]
transpose(3, 0, 1, 2) reorders the array dimensions from (a, b, c, d) to (d, a, b, c):
>>> arr = np.zeros((10, 11, 12, 13))
>>> arr.transpose(3, 0, 1, 2).shape
(13, 10, 11, 12)

collapsing all dimensions of numpy array except the first two

I have a variable dimension numpy array, for example it could have the following shapes
(64, 64)
(64, 64, 2, 5)
(64, 64, 40)
(64, 64, 10, 20, 4)
What I want to do is that if the number of dimensions is greater than 3, I want to collapse/stack everything else into the third dimension while preserving order. So, in my above example the shapes after the operation should be:
(64, 64)
(64, 64, 10)
(64, 64, 40)
(64, 64, 800)
Also, the order needs to be preserved. For example, the array of the shape (64, 64, 2, 5) should be stacked as
(64, 64, 2)
(64, 64, 2)
(64, 64, 2)
(64, 64, 2)
(64, 64, 2)
i.e. the 3D slices one after the other. Also, after the operation I would like to reshape it back to the original shape without any permutation i.e. preserve the original order.
One way I could do is multiply all the dimension values from 3 to the last dimension i.e.
shape = array.shape
if len(shape) > 3:
final_dim = 1
for i in range(2, len(shape)):
final_dim *= shape[i]
and then reshape the array. Something like:
array.reshape(64, 64, final_dim)
However, I was first of all not sure if the order is preserved as I want and whether there is a better pythonic way to achieve this?
EDIT: As pointed out in the other answers it is even easier to just provide -1 as the third dimension for reshape. Numpy automatically determines the correct shape then.
I am not sure what the problem here is. You can just use np.reshape and it preserves order. See the following code:
import numpy as np
A = np.random.rand(20,20,2,2,18,5)
print A.shape
new_dim = np.prod(A.shape[2:])
print new_dim
B = np.reshape(A, (A.shape[0], A.shape[1], np.prod(A.shape[2:])))
print B.shape
C = B.reshape((20,20,2,2,18,5))
print np.array_equal(A,C)
The output is:
(20L, 20L, 2L, 2L, 18L, 5L)
360
(20L, 20L, 360L)
True
This accomplishes exactly what you asked for.
reshape accept automatic re-dimension :
a=rand(20,20,8,6,4)
s=a.shape[:2]
if a.ndim>2 : s = s+ (-1,)
b=a.reshape(s)
Going by the requirement of stacking for the given (64, 64, 2, 5) sample, I think you need to permute the axes. For the permuting, we can use np.rollaxis, like so -
def collapse_dims(a):
if a.ndim>3:
return np.rollaxis(a,-1,2).reshape(a.shape[0],a.shape[1],-1)
else:
return a
Sample run on the given four sample shapes -
1) Sample shapes :
In [234]: shp1 = (64, 64)
...: shp2 = (64, 64, 2, 5)
...: shp3 = (64, 64, 40)
...: shp4 = (64, 64, 10, 20, 4)
...:
Case #1 :
In [235]: a = np.random.randint(11,99,(shp1))
In [236]: np.allclose(a, collapse_dims(a))
Out[236]: True
Case #2 :
In [237]: a = np.random.randint(11,99,(shp2))
In [238]: np.allclose(a[:,:,:,0], collapse_dims(a)[:,:,0:2])
Out[238]: True
In [239]: np.allclose(a[:,:,:,1], collapse_dims(a)[:,:,2:4])
Out[239]: True
In [240]: np.allclose(a[:,:,:,2], collapse_dims(a)[:,:,4:6]) # .. so on
Out[240]: True
Case #3 :
In [241]: a = np.random.randint(11,99,(shp3))
In [242]: np.allclose(a, collapse_dims(a))
Out[242]: True
Case #4 :
In [243]: a = np.random.randint(11,99,(shp4))
In [244]: np.allclose(a[:,:,:,:,0].ravel(), collapse_dims(a)[:,:,:200].ravel())
Out[244]: True
In [245]: np.allclose(a[:,:,:,:,1].ravel(), collapse_dims(a)[:,:,200:400].ravel())
Out[245]: True
I'll try to illustrate the concern that #Divaker brings up.
In [522]: arr = np.arange(2*2*3*4).reshape(2,2,3,4)
In [523]: arr
Out[523]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]],
[[[24, 25, 26, 27],
[28, 29, 30, 31],
[32, 33, 34, 35]],
[[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]]])
4 is the inner most dimension, so it displays the array as 3x4 blocks. And if you pay attention to spaces and [] you'll see there are 2x2 blocks.
Notice what happens when we use the reshape:
In [524]: arr1 = arr.reshape(2,2,-1)
In [525]: arr1
Out[525]:
array([[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]],
[[24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]]])
Now it is 2 2x12 blocks. You can do anything to those 12 element rows, and reshape them back to 3x4 blocks
In [526]: arr1.reshape(2,2,3,4)
Out[526]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
...
But I could also split this array on the last dimension. np.split can do it, but a list comprehension is easier to understand:
In [527]: alist = [arr[...,i] for i in range(4)]
In [528]: alist
Out[528]:
[array([[[ 0, 4, 8],
[12, 16, 20]],
[[24, 28, 32],
[36, 40, 44]]]),
array([[[ 1, 5, 9],
[13, 17, 21]],
[[25, 29, 33],
[37, 41, 45]]]),
array([[[ 2, 6, 10],
[14, 18, 22]],
[[26, 30, 34],
[38, 42, 46]]]),
array([[[ 3, 7, 11],
[15, 19, 23]],
[[27, 31, 35],
[39, 43, 47]]])]
This contains 4 (2,2,3) arrays. Note that the 3 element rows display as columns in the 4d display.
I can reform into a 4d array with np.stack (which is like np.array, but gives more control of how the arrays are joined):
In [529]: np.stack(alist, axis=-1)
Out[529]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
...
[[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]]])
==========
The split equivalent is [x[...,0] for x in np.split(arr, 4, axis=-1)]. Without the indexing split produces (2, 2, 3, 1) arrays.
collapse_dims produces (for my example):
In [532]: np.rollaxis(arr,-1,2).reshape(arr.shape[0],arr.shape[1],-1)
Out[532]:
array([[[ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11],
[12, 16, 20, 13, 17, 21, 14, 18, 22, 15, 19, 23]],
[[24, 28, 32, 25, 29, 33, 26, 30, 34, 27, 31, 35],
[36, 40, 44, 37, 41, 45, 38, 42, 46, 39, 43, 47]]])
A (2,2,12) array, but with the elements in rows in a different order. It does a transpose on the inner 2 dimensions before flattening.
In [535]: arr[0,0,:,:].ravel()
Out[535]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [536]: arr[0,0,:,:].T.ravel()
Out[536]: array([ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11])
Restoring that back to the original order requires another roll or transpose
In [542]: arr2.reshape(2,2,4,3).transpose(0,1,3,2)
Out[542]:
array([[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
....
[[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]]])

Categories

Resources