Select all rows from Numpy array where each column satisfies some condition

Select all rows from Numpy array where each column satisfies some condition - python

I have an array x of the form,
x = [[1,2,3,...,7,8,9],
[1,2,3,...,7,9,8],
...,
[9,8,7,...,3,1,2],
[9,8,7,...,3,2,1]]
I also have an array of non-allowed numbers for each column. I want to select all of the rows which only have allowed characters in each column. For instance, I might have that I want only rows which do not have any of [1,2,3] in the first column; I can do this by,
x[~np.in1d(x[:,0], [1,2,3])]
And for any single column, I can do this. But I'm looking to essentially do this for all columns at once, selecting only the rows for which every elemnt is an allowed number for its column. I can't seem to get x.any or x.all to do this well - how should I go about this?
EDIT: To clarify, the non-allowed numbers are different for each column. In actuality, I will have some array y,
y = [[1,4,...,7,8],
[2,5,...,9,4],
[3,6,...,8,6]]
Where I want rows from x for which column 1 cannot be in [1,2,3], column 2 cannot be in [4,5,6], and so on.

You can broadcast the comparison, then all to check:
x[(x != y[:,None,:]).all(axis=(0,-1))]
Break down:
# compare each element of `x` to each element of `y`
# mask.shape == (y.shape[0], x.shape[0], x.shape[1])
mask = (x != y[:,None,:])
# `all(0)` checks, for each element in `x`, it doesn't match any element in the same column of `y`
# `all(-1) checks along the rows of `x`
mask = mask.all(axis=(0,-1)
# slice
x[mask]
For example, consider:
x = np. array([[1, 2],
[9, 8],
[5, 6],
[7, 8]])
y = np.array([[1, 4],
[2, 5],
[3, 7]])
Then mask = (x != y[:,None,:]).all(axis=(0,1)) gives
array([False, True, True, True])

It's recommended to use np.isin rather than np.in1d these days. This lets you (a) compare the entire array all at once, and (b) invert the mask more efficiently.
x[np.isin(x, [1, 2, 3], invert=True).all(1)]
np.isin preserves the shape of x, so you can then use .all across the columns. It also has an invert argument which allows you to do the equivalent of ~isin(x, [1, 2, 3]), but more efficiently.
This solution vectorizes a similar computation to what the other is suggesting much more efficiently (although it's still a linear search), and avoids creating the temporary arrays as well.

Related

Updating elements in multiply indexed np.array

I have a 2D numpy array and need to update a selection of elements via multiple layers of indexing. The obvious way to do this for me does not work since it seems numpy is only updating a copy of the array and not the array itself:
import numpy as np
# Create an array and indices that should be updated
arr = np.arange(9).reshape(3,3)
idx = np.array([[0,2], [1,1],[2,0]])
bool_idx = np.array([True, True, False])
# This line does not work as intended since the original array stays unchanged
arr[idx[:,0],idx[:,1]][bool_idx] = -1 * arr[idx[:,0],idx[:,1]][bool_idx]
This is the resulting output:
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
However, I expected this output:
>>> arr
array([[0, 1, -2],
[3, -4, 5],
[6, 7, 8]])

We need to mask the indices with the given mask and then index into arr and assign new values. For indexing, we can use tuple(masked_indices) to index or use the two columns of the index-array for integer-indexing, thus giving us two methods.
Method #1 :
arr[tuple(idx[bool_idx].T)] *= -1
Method #2 :
idx_masked = idx[bool_idx]
arr[idx_masked[:,0],idx_masked[:,1]] *= -1
Why didn't the original method work?
On LHS you were doing arr[idx[:,0],idx[:,1]][bool_idx], which is esssentially two steps : arr[idx[:,0],idx[:,1]], which under the hoods calls arr.__getitem__(indexer)*. When indexer is a slice, the regularity of the elements allows NumPy to return a view (by modifying the strides and offset). When indexer is an arbitrary boolean mask or arbitrary array of integers, there is in general no regularity to the elements selected, so there is no way to return a view. Let's call arr[idx[:,0],idx[:,1]] as arr2.
In the next step, with the combined arr[idx[:,0],idx[:,1]][bool_idx], i.e. arr2[bool_idx], under the hoods it calls arr2.__setitem__(mask), which is implemented to modify arr2 and as such doesn't propagate back to arr.
*Inspiration from - https://stackoverflow.com/a/38768993/.
More info on __getitem__,__setitem__.
Why did the methods posted in this post work?
Because both directly used the indexer on arr with arr.__setitem__(indexer) that modifies arr.

You just need to make a small change to your own attempt -- you need to apply the boolean index array on each of your integer index expressions. In other words, this should work:
arr[idx[:,0][bool_idx],idx[:,1][bool_idx]] *= -1
(I've just moved the [bool_idx] inside the square brackets, to apply it on the both of the integer index expressions -- idx[:,0] and idx[:,1])

delete all columns of a dimension except for a specific column

I want to make a function which takes a n-dimensional array, the dimension and the column index, and it will return the (n-1)-dimensional array after removing all the other columns of that specific dimension.
Here is the code I am using now
a = np.arange(6).reshape((2, 3)) # the n-dimensional array
axisApplied = 1
colToKeep = 0
colsToDelete = np.delete(np.arange(a.shape[axisApplied]), colToKeep)
a = np.squeeze(np.delete(a, colsToDelete, axisApplied), axis=axisApplied)
print(a)
# [0, 3]
Note that I have to manually calculate the n-1 indices (the complement of the specific column index) to use np.delete(), and I am wondering whether there is a more convenient way to achieve my goal, e.g. specify which column to keep directly.
Thank you for reading and I am welcome to any suggestions.

In [1]: arr = np.arange(6).reshape(2,3)
In [2]: arr
Out[2]:
array([[0, 1, 2],
[3, 4, 5]])
Simple indexing:
In [3]: arr[:,0]
Out[3]: array([0, 3])
Or if you need to used the general axis parameter, try take:
In [4]: np.take(arr,0,axis=1)
Out[4]: array([0, 3])
Picking one element, or a list of elements, along an axis is a lot easier than deleting some. Look at the code for np.delete.

Deleting elements at specific positions in a M X N numpy array

I am trying to implement Seam carving algorithm wherein we have to delete a seam from the image. Image is stored as a numpy M X N array. I have found the seam, which is nothing but an array of M integers whose value specifies column values to be deleted.
Eg: a 2 X 3 array
import numpy
img_array = numpy.array([[1, 2, 3],[4, 5, 6]])
and
seam = numpy.array([1,2])
This means that we have to delete from the Img 1st element from the 1st row (1), and second element from the second row (5). After deletion, Img will be
print img_array
[[2,3]
[4,6]]
Work done:
I am new to python and I have found solutions which concern about single dimensional array or deleting an entire row or column. But I could not find a way to delete elements from specific columns.

Will you always delete one element from each row? If you try to delete one element from one row, but not another, you will end up with a ragged array. That is why there isn't a general purpose way of removing single elements from a 2d array.
One option is to figure out which ones you want to delete, remove them from a flattened array, and then reshape it back to the correct shape. Then it is your responsibility to ensure that the correct number of elements are removed.
All of these 'delete' methods actually copy the 'keep' values to a new array. Nothing actually deletes elements from the original array. So you could just as easily (and just as fast) do your own copy to a new array.
Another option is to work with lists of lists. Those are more tolerant of becoming ragged.
Here's an example of using a boolean mask to remove selected elements from an array (making a copy of course):
In [100]: x=np.arange(1,7).reshape(2,3)
In [101]: x
Out[101]:
array([[1, 2, 3],
[4, 5, 6]])
In [102]: mask=np.ones_like(x,bool)
In [103]: mask
Out[103]:
array([[ True, True, True],
[ True, True, True]], dtype=bool)
In [104]: mask[0,0]=False
In [105]: mask[1,1]=False
In [106]: mask
Out[106]:
array([[False, True, True],
[ True, False, True]], dtype=bool)
In [107]: x[mask]
Out[107]: array([2, 3, 4, 6]) # it's flat
In [108]: x[mask].reshape(2,2)
Out[108]:
array([[2, 3],
[4, 6]])
Notice that even though both x and mask are 2d, the indexing result is flattened. Such a mask could easily have produced an array that couldn't be reshape back to 2d.

Each row in your matrix is a single dimensional array.
import numpy
ary=numpy.array([[1,2,3],[4,5,6]])
print ary[0]
Gives
array([1, 2, 3])
You could iterate over your matrix, using the values from you seam to remove an element from the current row. Append the result to a modified matrix you are building.
seam = numpy.array([1,2])
for i in range(2):
tmp = numpy.delete(ary[i],seam[i]-1)
if i == 0:
modified_ary = tmp
else:
modified_ary = numpy.vstack((modified_ary,tmp))
print modified_ary
Gives
[[2 3]
[4 6]]

How do I remove the first and last rows and columns from a 2D numpy array?

I'd like to know how to remove the first and last rows and columns from a 2D array in numpy. For example, say we have a (N+1) x (N+1) matrix called H then in MATLAB/Octave, the code I'd use would be:
Hsub = H(2:N,2:N);
What's the equivalent code in Numpy? I thought that np.reshape might do what I want but I'm not sure how to get it to remove just the target rows as I think if I reshape to a (N-1) x (N-1) matrix, it'll remove the last two rows and columns.

How about this?
Hsub = H[1:-1, 1:-1]
The 1:-1 range means that we access elements from the second index, or 1, and we go up to the second last index, as indicated by the -1 for a dimension. We do this for both dimensions independently. When you do this independently for both dimensions, the result is the intersection of how you're accessing each dimension, which is essentially chopping off the first row, first column, last row and last column.
Remember, the ending index is exclusive, so if we did 0:3 for example, we only get the first three elements of a dimension, not four.
Also, negative indices mean that we access the array from the end. -1 is the last value to access in a particular dimension, but because of the exclusivity, we are getting up to the second last element, not the last element. Essentially, this is the same as doing:
Hsub = H[1:H.shape[0]-1, 1:H.shape[1]-1]
... but using negative indices is much more elegant. You also don't have to use the number of rows and columns to extract out what you need. The above syntax is dimension agnostic. However, you need to make sure that the matrix is at least 3 x 3, or you'll get an error.
Small bonus
In MATLAB / Octave, you can achieve the same thing without using the dimensions by:
Hsub = H(2:end-1, 2:end-1);
The end keyword with regards to indexing means to get the last element for a particular dimension.
Example use
Here's an example (using IPython):
In [1]: import numpy as np
In [2]: H = np.meshgrid(np.arange(5), np.arange(5))[0]
In [3]: H
Out[3]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In [4]: Hsub = H[1:-1,1:-1]
In [5]: Hsub
Out[5]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
As you can see, the first row, first column, last row and last column have been removed from the source matrix H and the remainder has been placed in the output matrix Hsub.

Indexing NumPy 2D array with another 2D array

I have something like
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
and
select = array([0,1,0,0])
My target is
result = array([1, 5, 7, 6])
I tried _ix as I read at Simplfy row AND column extraction, numpy, but this did not result in what I wanted.
p.s. Please change the title of this question if you can think of a more precise one.

The numpy way to do this is by using np.choose or fancy indexing/take (see below):
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
select = array([0,1,0,0])
result = np.choose(select, m.T)
So there is no need for python loops, or anything, with all the speed advantages numpy gives you. m.T is just needed because choose is really more a choise between the two arrays np.choose(select, (m[:,0], m[:1])), but its straight forward to use it like this.
Using fancy indexing:
result = m[np.arange(len(select)), select]
And if speed is very important np.take, which works on a 1D view (its quite a bit faster for some reason, but maybe not for these tiny arrays):
result = m.take(select+np.arange(0, len(select) * m.shape[1], m.shape[1]))

I prefer to use NP.where for indexing tasks of this sort (rather than NP.ix_)
What is not mentioned in the OP is whether the result is selected by location (row/col in the source array) or by some condition (e.g., m >= 5). In any event, the code snippet below covers both scenarios.
Three steps:
create the condition array;
generate an index array by calling NP.where, passing in this
condition array; and
apply this index array against the source array
>>> import numpy as NP
>>> cnd = (m==1) | (m==5) | (m==7) | (m==6)
>>> cnd
matrix([[ True, False],
[False, True],
[ True, False],
[ True, False]], dtype=bool)
>>> # generate the index array/matrix
>>> # by calling NP.where, passing in the condition (cnd)
>>> ndx = NP.where(cnd)
>>> ndx
(matrix([[0, 1, 2, 3]]), matrix([[0, 1, 0, 0]]))
>>> # now apply it against the source array
>>> m[ndx]
matrix([[1, 5, 7, 6]])
The argument passed to NP.where, cnd, is a boolean array, which in this case, is the result from a single expression comprised of compound conditional expressions (first line above)
If constructing such a value filter doesn't apply to your particular use case, that's fine, you just need to generate the actual boolean matrix (the value of cnd) some other way (or create it directly).

What about using python?
result = array([subarray[index] for subarray, index in zip(m, select)])

IMHO, this is simplest variant:
m[np.arange(4), select]

Since the title is referring to indexing a 2D array with another 2D array, the actual general numpy solution can be found here.
In short:
A 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, is used to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.

result = array([m[j][0] if i==0 else m[j][1] for i,j in zip(select, range(0, len(m)))])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Select all rows from Numpy array where each column satisfies some condition - python

Related

Updating elements in multiply indexed np.array

delete all columns of a dimension except for a specific column

Deleting elements at specific positions in a M X N numpy array

How do I remove the first and last rows and columns from a 2D numpy array?

Indexing NumPy 2D array with another 2D array

Categories

Resources