Selecting specific rows and columns from NumPy array - python

I've been going crazy trying to figure out what stupid thing I'm doing wrong here.
I'm using NumPy, and I have specific row indices and specific column indices that I want to select from. Here's the gist of my problem:
import numpy as np
a = np.arange(20).reshape((5,4))
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15],
# [16, 17, 18, 19]])
# If I select certain rows, it works
print a[[0, 1, 3], :]
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [12, 13, 14, 15]])
# If I select certain rows and a single column, it works
print a[[0, 1, 3], 2]
# array([ 2, 6, 14])
# But if I select certain rows AND certain columns, it fails
print a[[0,1,3], [0,2]]
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# ValueError: shape mismatch: objects cannot be broadcast to a single shape
Why is this happening? Surely I should be able to select the 1st, 2nd, and 4th rows, and 1st and 3rd columns? The result I'm expecting is:
a[[0,1,3], [0,2]] => [[0, 2],
[4, 6],
[12, 14]]

As Toan suggests, a simple hack would be to just select the rows first, and then select the columns over that.
>>> a[[0,1,3], :] # Returns the rows you want
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[12, 13, 14, 15]])
>>> a[[0,1,3], :][:, [0,2]] # Selects the columns you want as well
array([[ 0, 2],
[ 4, 6],
[12, 14]])
[Edit] The built-in method: np.ix_
I recently discovered that numpy gives you an in-built one-liner to doing exactly what #Jaime suggested, but without having to use broadcasting syntax (which suffers from lack of readability). From the docs:
Using ix_ one can quickly construct index arrays that will index the
cross product. a[np.ix_([1,3],[2,5])] returns the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].
So you use it like this:
>>> a = np.arange(20).reshape((5,4))
>>> a[np.ix_([0,1,3], [0,2])]
array([[ 0, 2],
[ 4, 6],
[12, 14]])
And the way it works is that it takes care of aligning arrays the way Jaime suggested, so that broadcasting happens properly:
>>> np.ix_([0,1,3], [0,2])
(array([[0],
[1],
[3]]), array([[0, 2]]))
Also, as MikeC says in a comment, np.ix_ has the advantage of returning a view, which my first (pre-edit) answer did not. This means you can now assign to the indexed array:
>>> a[np.ix_([0,1,3], [0,2])] = -1
>>> a
array([[-1, 1, -1, 3],
[-1, 5, -1, 7],
[ 8, 9, 10, 11],
[-1, 13, -1, 15],
[16, 17, 18, 19]])

Fancy indexing requires you to provide all indices for each dimension. You are providing 3 indices for the first one, and only 2 for the second one, hence the error. You want to do something like this:
>>> a[[[0, 0], [1, 1], [3, 3]], [[0,2], [0,2], [0, 2]]]
array([[ 0, 2],
[ 4, 6],
[12, 14]])
That is of course a pain to write, so you can let broadcasting help you:
>>> a[[[0], [1], [3]], [0, 2]]
array([[ 0, 2],
[ 4, 6],
[12, 14]])
This is much simpler to do if you index with arrays, not lists:
>>> row_idx = np.array([0, 1, 3])
>>> col_idx = np.array([0, 2])
>>> a[row_idx[:, None], col_idx]
array([[ 0, 2],
[ 4, 6],
[12, 14]])

USE:
>>> a[[0,1,3]][:,[0,2]]
array([[ 0, 2],
[ 4, 6],
[12, 14]])
OR:
>>> a[[0,1,3],::2]
array([[ 0, 2],
[ 4, 6],
[12, 14]])

Using np.ix_ is the most convenient way to do it (as answered by others), but it also can be done as follows:
>>> rows = [0, 1, 3]
>>> cols = [0, 2]
>>> (a[rows].T)[cols].T
array([[ 0, 2],
[ 4, 6],
[12, 14]])

Related

splitting an array where it meets the peak values

hope doing well.
I have an extremely big numpy array and want to split it into several ones. My array has three columns and I want to split it where the all the columns are reaching their maximum values:
array = [[0, 0, 0],
[0, 0, 5],
[10, 5, 10],
[1, 1, 1],
[5, 5, 15],
[10, 8, 20],
[2, 0, 0],
[10, 10, 12],
[1, 2, 0],
[2, 5, 9]]
Now, I want to split it into four array:
sub_array_1=[[0, 0, 0],
[0, 0, 5],
[10, 5, 10]]
sub_array_2=[[1, 1, 1],
[5, 5, 15],
[10, 8, 20]]
sub_array_3=[[2, 0, 0],
[10, 10, 12]]
sub_array_4=[[1, 2, 0],
[2, 5, 9]]
I tried to it in a for loop having if statements saying that give me an array when each element of my input is bigger than the element stored in the both upper and lower rows. And I also should figure out the last row:
import numpy as np
sub_array_1=np.array([])
for i in array:
if array[i,:]>array[i+1,:] and array[i,:]>array[i+1,:]:
vert_1=np.append(sub_array_1,array[0:i,:])
My code doesn't work, but it simply shows my idea.
I am quite new in Python and I could not find the way to write my idea as a code. So, I appreciate any help and contribution.
Cheers,
Ali
IIUC, one way using numpy.diff with numpy.array_split:
indices = np.argwhere(np.all(np.diff(array, axis=0) < 0, axis=1))
np.array_split(array, indices.ravel()+1, axis=0)
Output:
[array([[ 0, 0, 0],
[ 0, 0, 5],
[10, 5, 10]]),
array([[ 1, 1, 1],
[ 5, 5, 15],
[10, 8, 20]]),
array([[ 2, 0, 0],
[10, 10, 12]]),
array([[1, 2, 0],
[2, 5, 9]])]
np.all and np.diff find a row where all elements of the row as a negative difference with a next row (i.e. where the peak ends)
np.array_split will then split the given array based on the locations of the peak found.

Union of 2d array in Python by row according to first column

I'm trying to find a union of two 2d arrays based on the first column:
>>> x1
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> x2
array([[ 7, -1, -1],
[10, 11, 12]])
If two rows have a matching first value, I want the one from x2. I.e. the union of the first column of x1[:, 0] and x2[:, 0] is [1, 4, 7, 10] and I want the row [7, -1, -1] from x2, not [7, 8, 9] from x1. The expected result in this case is:
>>> res
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, -1, -1],
[10, 11, 12]])
I see there is a possible solution for union of a 2D array here, where I get the result:
>>> res
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, -1, -1],
[ 7, 8, 9],
[10, 11, 12]])
In this result, I wanted the row [7, 8, 9] from x1 to be excluded. How could I do that?
You can use np.unique and np.concatenate, placing x2 first. Unique can compute the index of your values, based on the first occurrence:
values = np.concatenate((x2[:, 0], x1[:, 0]))
_, index = np.unique(values, return_index=True)
mask = index >= x2.shape[0]
result = np.concatenate((x1[index[mask] - x2.shape[0], :], x2[index[~mask], :]), axis=0)
The result is exactly the array you would expect:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, -1, -1],
[10, 11, 12]])
Keep in mind that the result of unique is sorted, which coincidentally happens to correspond to your original order. You can get the original order with a clever application of return_inverse=True, which will be left as an exercise for the reader.

How to create a 2D array of ranges using numpy

I have an array of start and stop indices, like this:
[[0, 3], [4, 7], [15, 18]]
and i would like to construct a 2D numpy array where each row is a range from the corresponding pair of start and stop indices, as follows:
[[0, 1, 2],
[4, 5, 6],
[15, 16, 18]]
Currently, i am creating an empty array and filling it in a for loop:
ranges = numpy.empty((3, 3))
a = [[0, 3], [4, 7], [15, 18]]
for i, r in enumerate(a):
ranges[i] = numpy.arange(r[0], r[1])
Is there a more compact and (more importantly) faster way of doing this? possibly something that doesn't involve using a loop?
One way is to use broadcast to add the left hand edges to the base arange:
In [11]: np.arange(3) + np.array([0, 4, 15])[:, None]
Out[11]:
array([[ 0, 1, 2],
[ 4, 5, 6],
[15, 16, 17]])
Note: this requires all ranges to be the same length.
If the ranges were to result in different lengths, for a vectorized approach you could use n_ranges from the linked solution:
a = np.array([[0, 3], [4, 7], [15, 18]])
n_ranges(a[:,0], a[:,1], return_flat=False)
# [array([0, 1, 2]), array([4, 5, 6]), array([15, 16, 17])]
Which would also work with the following array:
a = np.array([[0, 3], [4, 9], [15, 18]])
n_ranges(*a.T, return_flat=False)
# [array([0, 1, 2]), array([4, 5, 6, 7, 8]), array([15, 16, 17])]

Apply function n items at a time along axis

I am looking for a way to apply a function n items at the time along an axis. E.g.
array([[ 1, 2],
[ 3, 4],
[ 5, 6],
[ 7, 8]])
If I apply sum across the rows 2 items at a time I get:
array([[ 4, 6],
[ 12, 14]])
Which is the sum of 1st 2 rows and the last 2 rows.
NB: I am dealing with much larger array and I have to apply the function to n items which I can be decided at runtime.
The data extends along different axis. E.g.
array([[... [ 1, 2, ...],
[ 3, 4, ...],
[ 5, 6, ...],
[ 7, 8, ...],
...], ...])
This is a reduction:
numpy.add.reduceat(a, [0,2])
>>> array([[ 4, 6],
[12, 14]], dtype=int32)
As long as by "larger" you mean longer in the "y" axis, you can extend:
a = numpy.array([[ 1, 2],
[ 3, 4],
[ 5, 6],
[ 7, 8],
[ 9, 10],
[11, 12]])
numpy.add.reduceat(a, [0,2,4])
>>> array([[ 4, 6],
[12, 14],
[20, 22]], dtype=int32)
EDIT: actually, this works fine for "larger in both dimensions", too:
a = numpy.arange(24).reshape(6,4)
numpy.add.reduceat(a, [0,2,4])
>>> array([[ 4, 6, 8, 10],
[20, 22, 24, 26],
[36, 38, 40, 42]], dtype=int32)
I will leave it up to you to adapt the indices to your specific case.
Reshape splitting the first axis into two axes, such that the second split axis is of length n to have a 3D array and then sum along that split axis, like so -
a.reshape(a.shape[0]//n,n,a.shape[1]).sum(1)
It should be pretty efficient as reshaping just creates a view into input array.
Sample run -
In [55]: a
Out[55]:
array([[2, 8, 0, 0],
[1, 5, 3, 3],
[6, 1, 4, 7],
[0, 4, 0, 7],
[8, 0, 8, 1],
[8, 3, 3, 8]])
In [56]: n = 2 # Sum every two rows
In [57]: a.reshape(a.shape[0]//n,n,a.shape[1]).sum(1)
Out[57]:
array([[ 3, 13, 3, 3],
[ 6, 5, 4, 14],
[16, 3, 11, 9]])
How about something like this?
n = 2
# calculate the cumsum along axis 0 and take one row from every n rows
cumarr = arr.cumsum(axis = 0)[(n-1)::n]
# calculate the difference of the resulting numpy array along axis 0
np.vstack((cumarr[0][None, :], np.diff(cumarr, axis=0)))
# array([[ 4, 6],
# [12, 14]])

Replace subarrays in numpy

Given an array,
>>> n = 2
>>> a = numpy.array([[[1,1,1],[1,2,3],[1,3,4]]]*n)
>>> a
array([[[1, 1, 1],
[1, 2, 3],
[1, 3, 4]],
[[1, 1, 1],
[1, 2, 3],
[1, 3, 4]]])
I know that it's possible to replace values in it succinctly like so,
>>> a[a==2] = 0
>>> a
array([[[1, 1, 1],
[1, 0, 3],
[1, 3, 4]],
[[1, 1, 1],
[1, 0, 3],
[1, 3, 4]]])
Is it possible to do the same for an entire row (last axis) in the array? I know that a[a==[1,2,3]] = 11 will work and replace all the elements of the matching subarrays with 11, but I'd like to substitute a different subarray. My intuition tells me to write the following, but an error results,
>>> a[a==[1,2,3]] = [11,22,33]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: array is not broadcastable to correct shape
In summary, what I'd like to get is:
array([[[1, 1, 1],
[11, 22, 33],
[1, 3, 4]],
[[1, 1, 1],
[11, 22, 33],
[1, 3, 4]]])
... and n of course is, in general, a lot larger than 2, and the other axes are also larger than 3, so I don't want to loop over them if I don't need to.
Update: The [1,2,3] (or whatever else I'm looking for) is not always at index 1. An example:
a = numpy.array([[[1,1,1],[1,2,3],[1,3,4]], [[1,2,3],[1,1,1],[1,3,4]]])
You can achieve this with a much higher performance using np.all to check if all the columns have a True value for your comparison, then using the created mask to replace the values:
mask = np.all(a==[1,2,3], axis=2)
a[mask] = [11, 22, 23]
print(a)
#array([[[ 1, 1, 1],
# [11, 22, 33],
# [ 1, 3, 4]],
#
# [[ 1, 1, 1],
# [11, 22, 33],
# [ 1, 3, 4]]])
You have to do something a little more complicated to acheive what you want.
You can't select slices of arrays as such, but you can select all the specific indexes you want.
So first you need to construct an array that represents the rows you wish to select. ie.
data = numpy.array([[1,2,3],[55,56,57],[1,2,3]])
to_select = numpy.array([1,2,3]*3).reshape(3,3) # three rows of [1,2,3]
selected_indices = data == to_select
# array([[ True, True, True],
# [False, False, False],
# [ True, True, True]], dtype=bool)
data = numpy.where(selected_indices, [4,5,6], data)
# array([[4, 5, 6],
# [55, 56, 57],
# [4, 5, 6]])
# done in one step, but perhaps not very clear as to its intent
data = numpy.where(data == numpy.array([1,2,3]*3).reshape(3,3), [4,5,6], data)
numpy.where works by selecting from the second argument if true and the third argument if false.
You can use where to select from 3 different types of data. The first is an array that has the same shape as selected_indices, the second is just a value on its own (like 2 or 7). The first is most complicated as can be of shape that can be broadcast into the same shape as selected_indices. In this case we provided [1,2,3] which can be stacked together to get an array with shape 3x3.
Note sure if this is what you want, your code example does not create the array you say it does. But:
>>> a = np.array([[[1,1,1],[1,2,3],[1,3,4]], [[1,1,1],[1,2,3],[1,3,4]]])
>>> a
array([[[1, 1, 1],
[1, 2, 3],
[1, 3, 4]],
[[1, 1, 1],
[1, 2, 3],
[1, 3, 4]]])
>>> a[:,1,:] = [[8, 8, 8], [8,8,8]]
>>> a
array([[[1, 1, 1],
[8, 8, 8],
[1, 3, 4]],
[[1, 1, 1],
[8, 8, 8],
[1, 3, 4]]])
>>> a[:,1,:] = [88, 88, 88]
>>> a
array([[[ 1, 1, 1],
[88, 88, 88],
[ 1, 3, 4]],
[[ 1, 1, 1],
[88, 88, 88],
[ 1, 3, 4]]])

Categories

Resources