I have a 2D numpy array and need to update a selection of elements via multiple layers of indexing. The obvious way to do this for me does not work since it seems numpy is only updating a copy of the array and not the array itself:
import numpy as np
# Create an array and indices that should be updated
arr = np.arange(9).reshape(3,3)
idx = np.array([[0,2], [1,1],[2,0]])
bool_idx = np.array([True, True, False])
# This line does not work as intended since the original array stays unchanged
arr[idx[:,0],idx[:,1]][bool_idx] = -1 * arr[idx[:,0],idx[:,1]][bool_idx]
This is the resulting output:
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
However, I expected this output:
>>> arr
array([[0, 1, -2],
[3, -4, 5],
[6, 7, 8]])
We need to mask the indices with the given mask and then index into arr and assign new values. For indexing, we can use tuple(masked_indices) to index or use the two columns of the index-array for integer-indexing, thus giving us two methods.
Method #1 :
arr[tuple(idx[bool_idx].T)] *= -1
Method #2 :
idx_masked = idx[bool_idx]
arr[idx_masked[:,0],idx_masked[:,1]] *= -1
Why didn't the original method work?
On LHS you were doing arr[idx[:,0],idx[:,1]][bool_idx], which is esssentially two steps : arr[idx[:,0],idx[:,1]], which under the hoods calls arr.__getitem__(indexer)*. When indexer is a slice, the regularity of the elements allows NumPy to return a view (by modifying the strides and offset). When indexer is an arbitrary boolean mask or arbitrary array of integers, there is in general no regularity to the elements selected, so there is no way to return a view. Let's call arr[idx[:,0],idx[:,1]] as arr2.
In the next step, with the combined arr[idx[:,0],idx[:,1]][bool_idx], i.e. arr2[bool_idx], under the hoods it calls arr2.__setitem__(mask), which is implemented to modify arr2 and as such doesn't propagate back to arr.
*Inspiration from - https://stackoverflow.com/a/38768993/.
More info on __getitem__,__setitem__.
Why did the methods posted in this post work?
Because both directly used the indexer on arr with arr.__setitem__(indexer) that modifies arr.
You just need to make a small change to your own attempt -- you need to apply the boolean index array on each of your integer index expressions. In other words, this should work:
arr[idx[:,0][bool_idx],idx[:,1][bool_idx]] *= -1
(I've just moved the [bool_idx] inside the square brackets, to apply it on the both of the integer index expressions -- idx[:,0] and idx[:,1])
Related
Can someone explain what this code is doing?
a = np.array([[1, 2], [3, 4]])
a[..., [True, False]]
What is the [True, False] doing there?
Ellipsis Notation and Booleans as Integers
From the numpy docs:
Ellipsis expand to the number of : objects needed to make a selection tuple of the same length as x.ndim. There may only be a single ellipsis present
True and False are just obfuscated 0 and 1. Taking the example from the docs:
x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
x[...,0]
# outputs: array([[1, 2, 3],
# [4, 5, 6]])
x[..., False] # same thing
The boolean values are specifying an index, just like the numbers 0 or 1 would.
In response to your question in the comments
It first seems magical that
a = np.array([[1, 2], [3, 4]])
a[..., [True, True]] # = [[2,2],[4,4]]
But when we consider it as
a[..., [1,1]] # = [[2,2],[4,4]]
It seems less impressive.
Similarly:
b = array([[1,2,3],[4,5,6]])
b[...,[2,2]] # = [[3,3],[5,5]]
After applying the ellipsis rules; the true and false grab column indices, just like 0, 1, or 17 would have
Boolean Arrays for Complex Indexing
There are some subtle differences (bool's have a different type than ints). A lot of the hairy details can be found here. These do not seem to have any roll in your code, but they are interesting in figuring out how numpy indexing works.
In particular, this line is probably what you're looking for:
In the future Boolean array-likes (such as lists of python bools) will
always be treated as Boolean indexes
On this page, they talk about boolean arrays, which are quite complex as an indexing tool
Boolean arrays used as indices are treated in a different manner
entirely than index arrays. Boolean arrays must be of the same shape
as the initial dimensions of the array being indexed
Skipping down a bit
Unlike in the case of integer index arrays, in the boolean case, the
result is a 1-D array containing all the elements in the indexed array
corresponding to all the true elements in the boolean array. The
elements in the indexed array are always iterated and returned in
row-major (C-style) order. The result is also identical to
y[np.nonzero(b)]. As with index arrays, what is returned is a copy of
the data, not a view as one gets with slices.
I am trying to implement Seam carving algorithm wherein we have to delete a seam from the image. Image is stored as a numpy M X N array. I have found the seam, which is nothing but an array of M integers whose value specifies column values to be deleted.
Eg: a 2 X 3 array
import numpy
img_array = numpy.array([[1, 2, 3],[4, 5, 6]])
and
seam = numpy.array([1,2])
This means that we have to delete from the Img 1st element from the 1st row (1), and second element from the second row (5). After deletion, Img will be
print img_array
[[2,3]
[4,6]]
Work done:
I am new to python and I have found solutions which concern about single dimensional array or deleting an entire row or column. But I could not find a way to delete elements from specific columns.
Will you always delete one element from each row? If you try to delete one element from one row, but not another, you will end up with a ragged array. That is why there isn't a general purpose way of removing single elements from a 2d array.
One option is to figure out which ones you want to delete, remove them from a flattened array, and then reshape it back to the correct shape. Then it is your responsibility to ensure that the correct number of elements are removed.
All of these 'delete' methods actually copy the 'keep' values to a new array. Nothing actually deletes elements from the original array. So you could just as easily (and just as fast) do your own copy to a new array.
Another option is to work with lists of lists. Those are more tolerant of becoming ragged.
Here's an example of using a boolean mask to remove selected elements from an array (making a copy of course):
In [100]: x=np.arange(1,7).reshape(2,3)
In [101]: x
Out[101]:
array([[1, 2, 3],
[4, 5, 6]])
In [102]: mask=np.ones_like(x,bool)
In [103]: mask
Out[103]:
array([[ True, True, True],
[ True, True, True]], dtype=bool)
In [104]: mask[0,0]=False
In [105]: mask[1,1]=False
In [106]: mask
Out[106]:
array([[False, True, True],
[ True, False, True]], dtype=bool)
In [107]: x[mask]
Out[107]: array([2, 3, 4, 6]) # it's flat
In [108]: x[mask].reshape(2,2)
Out[108]:
array([[2, 3],
[4, 6]])
Notice that even though both x and mask are 2d, the indexing result is flattened. Such a mask could easily have produced an array that couldn't be reshape back to 2d.
Each row in your matrix is a single dimensional array.
import numpy
ary=numpy.array([[1,2,3],[4,5,6]])
print ary[0]
Gives
array([1, 2, 3])
You could iterate over your matrix, using the values from you seam to remove an element from the current row. Append the result to a modified matrix you are building.
seam = numpy.array([1,2])
for i in range(2):
tmp = numpy.delete(ary[i],seam[i]-1)
if i == 0:
modified_ary = tmp
else:
modified_ary = numpy.vstack((modified_ary,tmp))
print modified_ary
Gives
[[2 3]
[4 6]]
import numpy as np
matrix1 = np.array([[1,2,3],[4,5,6]])
vector1 = matrix1[:,0] # This should have shape (2,1) but actually has (2,)
matrix2 = np.array([[2,3],[5,6]])
np.hstack((vector1, matrix2))
ValueError: all the input arrays must have same number of dimensions
The problem is that when I select the first column of matrix1 and put it in vector1, it gets converted to a row vector, so when I try to concatenate with matrix2, I get a dimension error. I could do this.
np.hstack((vector1.reshape(matrix2.shape[0],1), matrix2))
But this looks too ugly for me to do every time I have to concatenate a matrix and a vector. Is there a simpler way to do this?
The easier way is
vector1 = matrix1[:,0:1]
For the reason, let me refer you to another answer of mine:
When you write something like a[4], that's accessing the fifth element of the array, not giving you a view of some section of the original array. So for instance, if a is an array of numbers, then a[4] will be just a number. If a is a two-dimensional array, i.e. effectively an array of arrays, then a[4] would be a one-dimensional array. Basically, the operation of accessing an array element returns something with a dimensionality of one less than the original array.
Here are three other options:
You can tidy up your solution a bit by allowing the row dimension of the vector to be set implicitly:
np.hstack((vector1.reshape(-1, 1), matrix2))
You can index with np.newaxis (or equivalently, None) to insert a new axis of size 1:
np.hstack((vector1[:, np.newaxis], matrix2))
np.hstack((vector1[:, None], matrix2))
You can use np.matrix, for which indexing a column with an integer always returns a column vector:
matrix1 = np.matrix([[1, 2, 3],[4, 5, 6]])
vector1 = matrix1[:, 0]
matrix2 = np.matrix([[2, 3], [5, 6]])
np.hstack((vector1, matrix2))
Subsetting
The even simpler way is to subset the matrix.
>>> matrix1
[[1 2 3]
[4 5 6]]
>>> matrix1[:, [0]] # Subsetting
[[1]
[4]]
>>> matrix1[:, 0] # Indexing
[1 4]
>>> matrix1[:, 0:1] # Slicing
[[1]
[4]]
I also mentioned this in a similar question.
It works somewhat similarly to a Pandas dataframe. If you index the dataframe, it gives you a Series. If you subset or slice the dataframe, it gives you a dataframe.
Your approach uses indexing, David Z's approach uses slicing, and my approach uses subsetting.
I have something like
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
and
select = array([0,1,0,0])
My target is
result = array([1, 5, 7, 6])
I tried _ix as I read at Simplfy row AND column extraction, numpy, but this did not result in what I wanted.
p.s. Please change the title of this question if you can think of a more precise one.
The numpy way to do this is by using np.choose or fancy indexing/take (see below):
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
select = array([0,1,0,0])
result = np.choose(select, m.T)
So there is no need for python loops, or anything, with all the speed advantages numpy gives you. m.T is just needed because choose is really more a choise between the two arrays np.choose(select, (m[:,0], m[:1])), but its straight forward to use it like this.
Using fancy indexing:
result = m[np.arange(len(select)), select]
And if speed is very important np.take, which works on a 1D view (its quite a bit faster for some reason, but maybe not for these tiny arrays):
result = m.take(select+np.arange(0, len(select) * m.shape[1], m.shape[1]))
I prefer to use NP.where for indexing tasks of this sort (rather than NP.ix_)
What is not mentioned in the OP is whether the result is selected by location (row/col in the source array) or by some condition (e.g., m >= 5). In any event, the code snippet below covers both scenarios.
Three steps:
create the condition array;
generate an index array by calling NP.where, passing in this
condition array; and
apply this index array against the source array
>>> import numpy as NP
>>> cnd = (m==1) | (m==5) | (m==7) | (m==6)
>>> cnd
matrix([[ True, False],
[False, True],
[ True, False],
[ True, False]], dtype=bool)
>>> # generate the index array/matrix
>>> # by calling NP.where, passing in the condition (cnd)
>>> ndx = NP.where(cnd)
>>> ndx
(matrix([[0, 1, 2, 3]]), matrix([[0, 1, 0, 0]]))
>>> # now apply it against the source array
>>> m[ndx]
matrix([[1, 5, 7, 6]])
The argument passed to NP.where, cnd, is a boolean array, which in this case, is the result from a single expression comprised of compound conditional expressions (first line above)
If constructing such a value filter doesn't apply to your particular use case, that's fine, you just need to generate the actual boolean matrix (the value of cnd) some other way (or create it directly).
What about using python?
result = array([subarray[index] for subarray, index in zip(m, select)])
IMHO, this is simplest variant:
m[np.arange(4), select]
Since the title is referring to indexing a 2D array with another 2D array, the actual general numpy solution can be found here.
In short:
A 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, is used to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
result = array([m[j][0] if i==0 else m[j][1] for i,j in zip(select, range(0, len(m)))])
I've got this array, named v, of dtype('float64'):
array([[ 9.33350000e+05, 8.75886500e+06, 3.45765000e+02],
[ 4.33350000e+05, 8.75886500e+06, 6.19200000e+00],
[ 1.33360000e+05, 8.75886500e+06, 6.76650000e+02]])
... which I've acquired from a file by using the np.loadtxt command. I would like to sort it after the values of the first column, without mixing up the structure that keeps the numbers listed on the same line together. Using v.sort(axis=0) gives me:
array([[ 1.33360000e+05, 8.75886500e+06, 6.19200000e+00],
[ 4.33350000e+05, 8.75886500e+06, 3.45765000e+02],
[ 9.33350000e+05, 8.75886500e+06, 6.76650000e+02]])
... i.e. places the smallest number of the third column in the first line, etc. I would rather want something like this...
array([[ 1.33360000e+05, 8.75886500e+06, 6.76650000e+02],
[ 4.33350000e+05, 8.75886500e+06, 6.19200000e+00],
[ 9.33350000e+05, 8.75886500e+06, 3.45765000e+02]])
... where the elements of each line hasn't been moved relatively to each other.
Try
v[v[:,0].argsort()]
(with v being the array). v[:,0] is the first column, and .argsort() returns the indices that would sort the first column. You then apply this ordering to the whole array using advanced indexing. Note that you get a sorte copy of the array.
The only way I know of to sort the array in place is to use a record dtype:
v.dtype = [("x", float), ("y", float), ("z", float)]
v.shape = v.size
v.sort(order="x")
Alternatively
Try
import numpy as np
order = v[:, 0].argsort()
sorted = np.take(v, order, 0)
'order' has the order of the first row.
and then 'np.take' take the columns their corresponding order.
See the help of 'np.take' as
help(np.take)
take(a, indices, axis=None, out=None,
mode='raise')
Take elements from an array along an axis.
This function does the same thing as "fancy" indexing (indexing arrays
using arrays); however, it can be easier to use if you need elements
along a given axis.
Parameters
----------
a : array_like
The source array.
indices : array_like
The indices of the values to extract.
axis : int, optional
The axis over which to select values. By default, the flattened
input array is used.
out : ndarray, optional
If provided, the result will be placed in this array. It should
be of the appropriate shape and dtype.
mode : {'raise', 'wrap', 'clip'}, optional
Specifies how out-of-bounds indices will behave.
* 'raise' -- raise an error (default)
* 'wrap' -- wrap around
* 'clip' -- clip to the range
'clip' mode means that all indices that are too large are
replaced
by the index that addresses the last element along that axis. Note
that this disables indexing with negative numbers.
Returns
-------
subarray : ndarray
The returned array has the same type as `a`.
See Also
--------
ndarray.take : equivalent method
Examples
--------
>>> a = [4, 3, 5, 7, 6, 8]
>>> indices = [0, 1, 4]
>>> np.take(a, indices)
array([4, 3, 6])
In this example if `a` is an ndarray, "fancy" indexing can be used.
>>> a = np.array(a)
>>> a[indices]
array([4, 3, 6])
If you have instances where v[:,0] has some identical values and you want to secondarily sort on columns 1, 2, etc.., then you'll want to use numpy.lexsort() or numpy.sort(v, order=('col1', 'col2', etc..) but for the order= case, v will need to be a structured array.
An example application of numpy.lexsort() to sort the rows of an array and deals with ties in the first column. Note that lexsort effectively sorts columns and starts with the last column, so you need to reverse the rows of a then take the transpose before the lexsort, and finally transpose the result (you'd have thought this should be easier, but hey!):
In [1]: import numpy as np
In [2]: a = np.array([[1,2,3,4],[1,0,4,1],[0,4,1,1]])
In [3]: a[np.lexsort(np.flip(a, axis=1).T).T]
Out[3]:
array([[0, 4, 1, 1],
[1, 0, 4, 1],
[1, 2, 3, 4]])
In [4]: a
Out[4]:
array([[1, 2, 3, 4],
[1, 0, 4, 1],
[0, 4, 1, 1]])
Thanks go to #Paul for the suggestion to use lexsort.