Deleting elements at specific positions in a M X N numpy array - python

I am trying to implement Seam carving algorithm wherein we have to delete a seam from the image. Image is stored as a numpy M X N array. I have found the seam, which is nothing but an array of M integers whose value specifies column values to be deleted.
Eg: a 2 X 3 array
import numpy
img_array = numpy.array([[1, 2, 3],[4, 5, 6]])
and
seam = numpy.array([1,2])
This means that we have to delete from the Img 1st element from the 1st row (1), and second element from the second row (5). After deletion, Img will be
print img_array
[[2,3]
[4,6]]
Work done:
I am new to python and I have found solutions which concern about single dimensional array or deleting an entire row or column. But I could not find a way to delete elements from specific columns.

Will you always delete one element from each row? If you try to delete one element from one row, but not another, you will end up with a ragged array. That is why there isn't a general purpose way of removing single elements from a 2d array.
One option is to figure out which ones you want to delete, remove them from a flattened array, and then reshape it back to the correct shape. Then it is your responsibility to ensure that the correct number of elements are removed.
All of these 'delete' methods actually copy the 'keep' values to a new array. Nothing actually deletes elements from the original array. So you could just as easily (and just as fast) do your own copy to a new array.
Another option is to work with lists of lists. Those are more tolerant of becoming ragged.
Here's an example of using a boolean mask to remove selected elements from an array (making a copy of course):
In [100]: x=np.arange(1,7).reshape(2,3)
In [101]: x
Out[101]:
array([[1, 2, 3],
[4, 5, 6]])
In [102]: mask=np.ones_like(x,bool)
In [103]: mask
Out[103]:
array([[ True, True, True],
[ True, True, True]], dtype=bool)
In [104]: mask[0,0]=False
In [105]: mask[1,1]=False
In [106]: mask
Out[106]:
array([[False, True, True],
[ True, False, True]], dtype=bool)
In [107]: x[mask]
Out[107]: array([2, 3, 4, 6]) # it's flat
In [108]: x[mask].reshape(2,2)
Out[108]:
array([[2, 3],
[4, 6]])
Notice that even though both x and mask are 2d, the indexing result is flattened. Such a mask could easily have produced an array that couldn't be reshape back to 2d.

Each row in your matrix is a single dimensional array.
import numpy
ary=numpy.array([[1,2,3],[4,5,6]])
print ary[0]
Gives
array([1, 2, 3])
You could iterate over your matrix, using the values from you seam to remove an element from the current row. Append the result to a modified matrix you are building.
seam = numpy.array([1,2])
for i in range(2):
tmp = numpy.delete(ary[i],seam[i]-1)
if i == 0:
modified_ary = tmp
else:
modified_ary = numpy.vstack((modified_ary,tmp))
print modified_ary
Gives
[[2 3]
[4 6]]

Related

Finding Element in Array Indicing Syntax Explanation

I have an array in Python that looks like this:
array = [[UUID('0d9ba9c6-632b-4dd4-912c-e8ff0a7134f7'), array([['1', '1']], dtype='<U21')], [UUID('9cb1feb6-0ef4-4e15-9070-7735545d12c9'), array([['2', '1']], dtype='<U21')], [UUID('955d308b-3570-4166-895e-81a077e6b9f9'), array([['3', '1']], dtype='<U21')]]
I also have a query that looks like this:
query = UUID('0d9ba9c6-632b-4dd4-912c-e8ff0a7134f7')
I am trying to find the sub-array in the main array that contains this UUID. So, querying this would return in:
[UUID('0d9ba9c6-632b-4dd4-912c-e8ff0a7134f7'), array([['1', '1']], dtype='<U21')]
I found this syntax online to do this:
out = array[array[:, 0] == query]
I know that this only works in NumPy if the array itself is an NP array. But why does this work and how? I am extremely confused by the syntax.
You might want to read the numpy tutorial on indexing on ndarrays.
But here are some basic explanations, understanding the examples would be a
good starting point.
So you have an array:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
The most basic indexing is with slices, as for basic lists, but you can access
nested dimensions with tuples (arr[i1:i2, j1:j2]) instead of chained indexing
as with basic lists (arr[i1:i2][j1:j2]):
arr[0] # array([1, 2, 3])
arr[:, 0] # array([1, 4])
arr[0, 0] # 1
Another way of indexing with numpy is to use boolean arrays.
You can use a tuple of lists of booleans, one list per dimension, each list
having the same size as the dimension.
And you can notice that you can use booleans on one dimension ([False, True])
and slices on the other dimension (:):
arr[[False, True], :] # array([[4, 5, 6]])
arr[[False, True]] # array([[4, 5, 6]])
arr[[False, True], [True, False, True]] # array([[4, 6]])
You can also use a single big boolean numpy array that has the same shape as
the array you are indexing:
arr[np.array([[False, True, False], [False, False, True]])] # array([2, 6])
Also, otherwise elementwise operation (+, /, % ...) are redefined by
numpy so they can work on whole arrays at once:
def is_even(x):
return x % 2 == 0
is_even(2) # True
is_even(arr) # array([[False, True, False], [True, False, True]])
Here we just constructed a big boolean array, so it can be used on your original array:
arr[is_even(arr)] # array([2, 4, 6])
But in your case you was only indexing on the first dimension, so using the tuple of boolean lists indexing method:
arr[is_even(arr[:, 0]), :] # array([4, 5, 6])

Select all rows from Numpy array where each column satisfies some condition

I have an array x of the form,
x = [[1,2,3,...,7,8,9],
[1,2,3,...,7,9,8],
...,
[9,8,7,...,3,1,2],
[9,8,7,...,3,2,1]]
I also have an array of non-allowed numbers for each column. I want to select all of the rows which only have allowed characters in each column. For instance, I might have that I want only rows which do not have any of [1,2,3] in the first column; I can do this by,
x[~np.in1d(x[:,0], [1,2,3])]
And for any single column, I can do this. But I'm looking to essentially do this for all columns at once, selecting only the rows for which every elemnt is an allowed number for its column. I can't seem to get x.any or x.all to do this well - how should I go about this?
EDIT: To clarify, the non-allowed numbers are different for each column. In actuality, I will have some array y,
y = [[1,4,...,7,8],
[2,5,...,9,4],
[3,6,...,8,6]]
Where I want rows from x for which column 1 cannot be in [1,2,3], column 2 cannot be in [4,5,6], and so on.
You can broadcast the comparison, then all to check:
x[(x != y[:,None,:]).all(axis=(0,-1))]
Break down:
# compare each element of `x` to each element of `y`
# mask.shape == (y.shape[0], x.shape[0], x.shape[1])
mask = (x != y[:,None,:])
# `all(0)` checks, for each element in `x`, it doesn't match any element in the same column of `y`
# `all(-1) checks along the rows of `x`
mask = mask.all(axis=(0,-1)
# slice
x[mask]
For example, consider:
x = np. array([[1, 2],
[9, 8],
[5, 6],
[7, 8]])
y = np.array([[1, 4],
[2, 5],
[3, 7]])
Then mask = (x != y[:,None,:]).all(axis=(0,1)) gives
array([False, True, True, True])
It's recommended to use np.isin rather than np.in1d these days. This lets you (a) compare the entire array all at once, and (b) invert the mask more efficiently.
x[np.isin(x, [1, 2, 3], invert=True).all(1)]
np.isin preserves the shape of x, so you can then use .all across the columns. It also has an invert argument which allows you to do the equivalent of ~isin(x, [1, 2, 3]), but more efficiently.
This solution vectorizes a similar computation to what the other is suggesting much more efficiently (although it's still a linear search), and avoids creating the temporary arrays as well.

Updating elements in multiply indexed np.array

I have a 2D numpy array and need to update a selection of elements via multiple layers of indexing. The obvious way to do this for me does not work since it seems numpy is only updating a copy of the array and not the array itself:
import numpy as np
# Create an array and indices that should be updated
arr = np.arange(9).reshape(3,3)
idx = np.array([[0,2], [1,1],[2,0]])
bool_idx = np.array([True, True, False])
# This line does not work as intended since the original array stays unchanged
arr[idx[:,0],idx[:,1]][bool_idx] = -1 * arr[idx[:,0],idx[:,1]][bool_idx]
This is the resulting output:
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
However, I expected this output:
>>> arr
array([[0, 1, -2],
[3, -4, 5],
[6, 7, 8]])
We need to mask the indices with the given mask and then index into arr and assign new values. For indexing, we can use tuple(masked_indices) to index or use the two columns of the index-array for integer-indexing, thus giving us two methods.
Method #1 :
arr[tuple(idx[bool_idx].T)] *= -1
Method #2 :
idx_masked = idx[bool_idx]
arr[idx_masked[:,0],idx_masked[:,1]] *= -1
Why didn't the original method work?
On LHS you were doing arr[idx[:,0],idx[:,1]][bool_idx], which is esssentially two steps : arr[idx[:,0],idx[:,1]], which under the hoods calls arr.__getitem__(indexer)*. When indexer is a slice, the regularity of the elements allows NumPy to return a view (by modifying the strides and offset). When indexer is an arbitrary boolean mask or arbitrary array of integers, there is in general no regularity to the elements selected, so there is no way to return a view. Let's call arr[idx[:,0],idx[:,1]] as arr2.
In the next step, with the combined arr[idx[:,0],idx[:,1]][bool_idx], i.e. arr2[bool_idx], under the hoods it calls arr2.__setitem__(mask), which is implemented to modify arr2 and as such doesn't propagate back to arr.
*Inspiration from - https://stackoverflow.com/a/38768993/.
More info on __getitem__,__setitem__.
Why did the methods posted in this post work?
Because both directly used the indexer on arr with arr.__setitem__(indexer) that modifies arr.
You just need to make a small change to your own attempt -- you need to apply the boolean index array on each of your integer index expressions. In other words, this should work:
arr[idx[:,0][bool_idx],idx[:,1][bool_idx]] *= -1
(I've just moved the [bool_idx] inside the square brackets, to apply it on the both of the integer index expressions -- idx[:,0] and idx[:,1])

Membership checking in Numpy ndarray

I have written a script that evaluates if some entry of arr is in check_elements. My approach does not compare single entries, but whole vectors inside of arr. Thus, the script checks if [8, 3], [4, 5], ... is in check_elements.
Here's an example:
import numpy as np
# arr.shape -> (2, 3, 2)
arr = np.array([[[8, 3],
[4, 5],
[6, 2]],
[[9, 0],
[1, 10],
[7, 11]]])
# check_elements.shape -> (3, 2)
# generally: (n, 2)
check_elements = np.array([[4, 5], [9, 0], [7, 11]])
# rslt.shape -> (2, 3)
rslt = np.zeros((arr.shape[0], arr.shape[1]), dtype=np.bool)
for i, j in np.ndindex((arr.shape[0], arr.shape[1])):
if arr[i, j] in check_elements: # <-- condition is checked against
# the whole last dimension
rslt[i, j] = True
else:
rslt[i, j] = False
Now:
print(rslt)
...would print:
[[False True False]
[ True False True]]
For getting the indices of I use:
print(np.transpose(np.nonzero(rslt)))
...which prints the following:
[[0 1] # arr[0, 1] -> [4, 5] -> is in check_elements
[1 0] # arr[1, 0] -> [9, 0] -> is in check_elements
[1 2]] # arr[1, 2] -> [7, 11] -> is in check_elements
This task would be easy and performant if I would check a condition on single values, like arr > 3 or np.where(...), but I am not interested in single values. I want to check a condition against the whole last dimension (or slices of it).
My question is: is there a faster way to achieve the same result? Am I right that vectorized attempts and things like np.where can not be used for my problem, because they always operate on single values and not on a whole dimension or slices of that dimension?
Here is a Numpythonic approach using broadcasting:
>>> (check_elements == arr[:,:,None]).reshape(2, 3, 6).any(axis=2)
array([[False, True, False],
[ True, False, True]], dtype=bool)
The numpy_indexed package (disclaimer: I am its author) contains functionality to perform these kind of queries; specifically, containment relations for nd (sub)arrays:
import numpy_indexed as npi
flatidx = npi.indices(arr.reshape(-1, 2), check_elements)
idx = np.unravel_index(flatidx, arr.shape[:-1])
Note that the implementation is fully vectorized under the hood.
Also, note that with this approach, the order of the indices in idx match with the order of check_elements; the first item in idx are the row and col of the first item in check_elements. This information is lost when using an approach along the lines you posted above, or when using one of the alternative suggested answers, which will give you the idx sorted by their order of appearance in arr instead, which is often undesirable.
You can use np.in1d even though it is meant for 1D arrays by giving it a 1D view of your array, containing one element per last axis:
arr_view = arr.view((np.void, arr.dtype.itemsize*arr.shape[-1])).ravel()
check_view = check_elements.view((np.void,
check_elements.dtype.itemsize*check_elements.shape[-1])).ravel()
will give you two 1D arrays, which contain a void type version of you 2 element arrays along the last axis. Now you can check, which of the elements in arr is also in check_view by doing:
flatResult = np.in1d(arr_view, check_view)
This will give a flattened array, which you can then reshape to the shape of arr, dropping the last axis:
print(flatResult.reshape(arr.shape[:-1]))
which will give you the desired result:
array([[False, True, False],
[ True, False, True]], dtype=bool)

Indexing NumPy 2D array with another 2D array

I have something like
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
and
select = array([0,1,0,0])
My target is
result = array([1, 5, 7, 6])
I tried _ix as I read at Simplfy row AND column extraction, numpy, but this did not result in what I wanted.
p.s. Please change the title of this question if you can think of a more precise one.
The numpy way to do this is by using np.choose or fancy indexing/take (see below):
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
select = array([0,1,0,0])
result = np.choose(select, m.T)
So there is no need for python loops, or anything, with all the speed advantages numpy gives you. m.T is just needed because choose is really more a choise between the two arrays np.choose(select, (m[:,0], m[:1])), but its straight forward to use it like this.
Using fancy indexing:
result = m[np.arange(len(select)), select]
And if speed is very important np.take, which works on a 1D view (its quite a bit faster for some reason, but maybe not for these tiny arrays):
result = m.take(select+np.arange(0, len(select) * m.shape[1], m.shape[1]))
I prefer to use NP.where for indexing tasks of this sort (rather than NP.ix_)
What is not mentioned in the OP is whether the result is selected by location (row/col in the source array) or by some condition (e.g., m >= 5). In any event, the code snippet below covers both scenarios.
Three steps:
create the condition array;
generate an index array by calling NP.where, passing in this
condition array; and
apply this index array against the source array
>>> import numpy as NP
>>> cnd = (m==1) | (m==5) | (m==7) | (m==6)
>>> cnd
matrix([[ True, False],
[False, True],
[ True, False],
[ True, False]], dtype=bool)
>>> # generate the index array/matrix
>>> # by calling NP.where, passing in the condition (cnd)
>>> ndx = NP.where(cnd)
>>> ndx
(matrix([[0, 1, 2, 3]]), matrix([[0, 1, 0, 0]]))
>>> # now apply it against the source array
>>> m[ndx]
matrix([[1, 5, 7, 6]])
The argument passed to NP.where, cnd, is a boolean array, which in this case, is the result from a single expression comprised of compound conditional expressions (first line above)
If constructing such a value filter doesn't apply to your particular use case, that's fine, you just need to generate the actual boolean matrix (the value of cnd) some other way (or create it directly).
What about using python?
result = array([subarray[index] for subarray, index in zip(m, select)])
IMHO, this is simplest variant:
m[np.arange(4), select]
Since the title is referring to indexing a 2D array with another 2D array, the actual general numpy solution can be found here.
In short:
A 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, is used to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
result = array([m[j][0] if i==0 else m[j][1] for i,j in zip(select, range(0, len(m)))])

Categories

Resources