Indexing NumPy 2D array with another 2D array - python

I have something like
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
and
select = array([0,1,0,0])
My target is
result = array([1, 5, 7, 6])
I tried _ix as I read at Simplfy row AND column extraction, numpy, but this did not result in what I wanted.
p.s. Please change the title of this question if you can think of a more precise one.

The numpy way to do this is by using np.choose or fancy indexing/take (see below):
m = array([[1, 2],
[4, 5],
[7, 8],
[6, 2]])
select = array([0,1,0,0])
result = np.choose(select, m.T)
So there is no need for python loops, or anything, with all the speed advantages numpy gives you. m.T is just needed because choose is really more a choise between the two arrays np.choose(select, (m[:,0], m[:1])), but its straight forward to use it like this.
Using fancy indexing:
result = m[np.arange(len(select)), select]
And if speed is very important np.take, which works on a 1D view (its quite a bit faster for some reason, but maybe not for these tiny arrays):
result = m.take(select+np.arange(0, len(select) * m.shape[1], m.shape[1]))

I prefer to use NP.where for indexing tasks of this sort (rather than NP.ix_)
What is not mentioned in the OP is whether the result is selected by location (row/col in the source array) or by some condition (e.g., m >= 5). In any event, the code snippet below covers both scenarios.
Three steps:
create the condition array;
generate an index array by calling NP.where, passing in this
condition array; and
apply this index array against the source array
>>> import numpy as NP
>>> cnd = (m==1) | (m==5) | (m==7) | (m==6)
>>> cnd
matrix([[ True, False],
[False, True],
[ True, False],
[ True, False]], dtype=bool)
>>> # generate the index array/matrix
>>> # by calling NP.where, passing in the condition (cnd)
>>> ndx = NP.where(cnd)
>>> ndx
(matrix([[0, 1, 2, 3]]), matrix([[0, 1, 0, 0]]))
>>> # now apply it against the source array
>>> m[ndx]
matrix([[1, 5, 7, 6]])
The argument passed to NP.where, cnd, is a boolean array, which in this case, is the result from a single expression comprised of compound conditional expressions (first line above)
If constructing such a value filter doesn't apply to your particular use case, that's fine, you just need to generate the actual boolean matrix (the value of cnd) some other way (or create it directly).

What about using python?
result = array([subarray[index] for subarray, index in zip(m, select)])

IMHO, this is simplest variant:
m[np.arange(4), select]

Since the title is referring to indexing a 2D array with another 2D array, the actual general numpy solution can be found here.
In short:
A 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, is used to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.

result = array([m[j][0] if i==0 else m[j][1] for i,j in zip(select, range(0, len(m)))])

Related

Updating elements in multiply indexed np.array

I have a 2D numpy array and need to update a selection of elements via multiple layers of indexing. The obvious way to do this for me does not work since it seems numpy is only updating a copy of the array and not the array itself:
import numpy as np
# Create an array and indices that should be updated
arr = np.arange(9).reshape(3,3)
idx = np.array([[0,2], [1,1],[2,0]])
bool_idx = np.array([True, True, False])
# This line does not work as intended since the original array stays unchanged
arr[idx[:,0],idx[:,1]][bool_idx] = -1 * arr[idx[:,0],idx[:,1]][bool_idx]
This is the resulting output:
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
However, I expected this output:
>>> arr
array([[0, 1, -2],
[3, -4, 5],
[6, 7, 8]])
We need to mask the indices with the given mask and then index into arr and assign new values. For indexing, we can use tuple(masked_indices) to index or use the two columns of the index-array for integer-indexing, thus giving us two methods.
Method #1 :
arr[tuple(idx[bool_idx].T)] *= -1
Method #2 :
idx_masked = idx[bool_idx]
arr[idx_masked[:,0],idx_masked[:,1]] *= -1
Why didn't the original method work?
On LHS you were doing arr[idx[:,0],idx[:,1]][bool_idx], which is esssentially two steps : arr[idx[:,0],idx[:,1]], which under the hoods calls arr.__getitem__(indexer)*. When indexer is a slice, the regularity of the elements allows NumPy to return a view (by modifying the strides and offset). When indexer is an arbitrary boolean mask or arbitrary array of integers, there is in general no regularity to the elements selected, so there is no way to return a view. Let's call arr[idx[:,0],idx[:,1]] as arr2.
In the next step, with the combined arr[idx[:,0],idx[:,1]][bool_idx], i.e. arr2[bool_idx], under the hoods it calls arr2.__setitem__(mask), which is implemented to modify arr2 and as such doesn't propagate back to arr.
*Inspiration from - https://stackoverflow.com/a/38768993/.
More info on __getitem__,__setitem__.
Why did the methods posted in this post work?
Because both directly used the indexer on arr with arr.__setitem__(indexer) that modifies arr.
You just need to make a small change to your own attempt -- you need to apply the boolean index array on each of your integer index expressions. In other words, this should work:
arr[idx[:,0][bool_idx],idx[:,1][bool_idx]] *= -1
(I've just moved the [bool_idx] inside the square brackets, to apply it on the both of the integer index expressions -- idx[:,0] and idx[:,1])

get indices of n-dimensional array when condition is True python [duplicate]

In Numpy, nonzero(a), where(a) and argwhere(a), with a being a numpy array, all seem to return the non-zero indices of the array. What are the differences between these three calls?
On argwhere the documentation says:
np.argwhere(a) is the same as np.transpose(np.nonzero(a)).
Why have a whole function that just transposes the output of nonzero ? When would that be so useful that it deserves a separate function?
What about the difference between where(a) and nonzero(a)? Wouldn't they return the exact same result?
nonzero and argwhere both give you information about where in the array the elements are True. where works the same as nonzero in the form you have posted, but it has a second form:
np.where(mask,a,b)
which can be roughly thought of as a numpy "ufunc" version of the conditional expression:
a[i] if mask[i] else b[i]
(with appropriate broadcasting of a and b).
As far as having both nonzero and argwhere, they're conceptually different. nonzero is structured to return an object which can be used for indexing. This can be lighter-weight than creating an entire boolean mask if the 0's are sparse:
mask = a == 0 # entire array of bools
mask = np.nonzero(a)
Now you can use that mask to index other arrays, etc. However, as it is, it's not very nice conceptually to figure out which indices correspond to 0 elements. That's where argwhere comes in.
I can't comment on the usefulness of having a separate convenience function that transposes the result of another, but I can comment on where vs nonzero. In it's simplest use case, where is indeed the same as nonzero.
>>> np.where(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
>>> np.nonzero(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
or
>>> a = np.array([[1, 2],[3, 4]])
>>> np.where(a == 3)
(array([1, 0]),)
>>> np.nonzero(a == 3)
(array([1, 0]),)
where is different from nonzero in the case when you wish to pick elements of from array a if some condition is True and from array b when that condition is False.
>>> a = np.array([[6, 4],[0, -3]])
>>> b = np.array([[100, 200], [300, 400]])
>>> np.where(a > 0, a, b)
array([[6, 4], [300, 400]])
Again, I can't explain why they added the nonzero functionality to where, but this at least explains how the two are different.
EDIT: Fixed the first example... my logic was incorrect previously

How to quickly grab specific indices from a numpy array?

But I don't have the index values, I just have ones in those same indices in a different array. For example, I have
a = array([3,4,5,6])
b = array([0,1,0,1])
Is there some NumPy method than can quickly look at both of these and extract all values from a whose indices match the indices of all 1's in b? I want it to result in:
array([4,6])
It is probably worth mentioning that my a array is multidimensional, while my b array will always have values of either 0 or 1. I tried using NumPy's logical_and function, though this returns ValueError with a and b having different dimensions:
a = numpy.array([[3,2], [4,5], [6,1]])
b = numpy.array([0, 1, 0])
print numpy.logical_and(a,b)
ValueError: operands could not be broadcast together with shapes (3,2) (3,)
Though this method does seem to work if a is flat. Either way, the return type of numpy.logical_and() is a boolean, which I do not want. Is there another way? Again, in the second example above, the desired return would be
array([[4,5]])
Obviously I could write a simple loop to accomplish this, I'm just looking for something a bit more concise.
Edit:
This will introduce more constraints, I should also mention that each element of the multidimensional array a may be any arbitrary length, that does not match its neighbour.
You can simply use fancy indexing.
b == 1
will give you a boolean array:
>>> from numpy import array
>>> a = array([3,4,5,6])
>>> b = array([0,1,0,1])
>>> b==1
array([False, True, False, True], dtype=bool)
which you can pass as an index to a.
>>> a[b==1]
array([4, 6])
Demo for your second example:
>>> a = array([[3,2], [4,5], [6,1]])
>>> b = array([0, 1, 0])
>>> a[b==1]
array([[4, 5]])
You could use compress:
>>> a = np.array([3,4,5,6])
>>> b = np.array([0,1,0,1])
>>> a.compress(b)
array([4, 6])
You can provide an axis argument for multi-dimensional cases:
>>> a2 = np.array([[3,2], [4,5], [6,1]])
>>> b2 = np.array([0, 1, 0])
>>> a2.compress(b2, axis=0)
array([[4, 5]])
This method will work even if the axis of a you're indexing against is a different length to b.

Deleting elements at specific positions in a M X N numpy array

I am trying to implement Seam carving algorithm wherein we have to delete a seam from the image. Image is stored as a numpy M X N array. I have found the seam, which is nothing but an array of M integers whose value specifies column values to be deleted.
Eg: a 2 X 3 array
import numpy
img_array = numpy.array([[1, 2, 3],[4, 5, 6]])
and
seam = numpy.array([1,2])
This means that we have to delete from the Img 1st element from the 1st row (1), and second element from the second row (5). After deletion, Img will be
print img_array
[[2,3]
[4,6]]
Work done:
I am new to python and I have found solutions which concern about single dimensional array or deleting an entire row or column. But I could not find a way to delete elements from specific columns.
Will you always delete one element from each row? If you try to delete one element from one row, but not another, you will end up with a ragged array. That is why there isn't a general purpose way of removing single elements from a 2d array.
One option is to figure out which ones you want to delete, remove them from a flattened array, and then reshape it back to the correct shape. Then it is your responsibility to ensure that the correct number of elements are removed.
All of these 'delete' methods actually copy the 'keep' values to a new array. Nothing actually deletes elements from the original array. So you could just as easily (and just as fast) do your own copy to a new array.
Another option is to work with lists of lists. Those are more tolerant of becoming ragged.
Here's an example of using a boolean mask to remove selected elements from an array (making a copy of course):
In [100]: x=np.arange(1,7).reshape(2,3)
In [101]: x
Out[101]:
array([[1, 2, 3],
[4, 5, 6]])
In [102]: mask=np.ones_like(x,bool)
In [103]: mask
Out[103]:
array([[ True, True, True],
[ True, True, True]], dtype=bool)
In [104]: mask[0,0]=False
In [105]: mask[1,1]=False
In [106]: mask
Out[106]:
array([[False, True, True],
[ True, False, True]], dtype=bool)
In [107]: x[mask]
Out[107]: array([2, 3, 4, 6]) # it's flat
In [108]: x[mask].reshape(2,2)
Out[108]:
array([[2, 3],
[4, 6]])
Notice that even though both x and mask are 2d, the indexing result is flattened. Such a mask could easily have produced an array that couldn't be reshape back to 2d.
Each row in your matrix is a single dimensional array.
import numpy
ary=numpy.array([[1,2,3],[4,5,6]])
print ary[0]
Gives
array([1, 2, 3])
You could iterate over your matrix, using the values from you seam to remove an element from the current row. Append the result to a modified matrix you are building.
seam = numpy.array([1,2])
for i in range(2):
tmp = numpy.delete(ary[i],seam[i]-1)
if i == 0:
modified_ary = tmp
else:
modified_ary = numpy.vstack((modified_ary,tmp))
print modified_ary
Gives
[[2 3]
[4 6]]

ValueError: too many boolean indices for a n=600 array (float)

I am getting an issue where I am trying to run (on Python):
#Loading in the text file in need of analysis
x,y=loadtxt('2.8k to 293k 15102014_rerun 47_0K.txt',skiprows=1,unpack=True,dtype=float,delimiter=",")
C=-1.0 #Need to flip my voltage axis
yone=C*y #Actually flipping the array
plot(x,yone)#Test
origin=600.0#Where is the origin? i.e V=0, taking the 0 to 1V elements of array
xorg=x[origin:1201]# Array from the origin to the final point (n)
xfit=xorg[(x>0.85)==True] # Taking the array from the origin and shortening it further to get relevant area
It returns the ValueError. I have tried doing this process with a much smaller array of 10 elements and the xfit=xorg[(x>0.85)==True] command works fine. What the program is trying to do is to narrow the field of vision, of some data, to a relevant point so I can fit a line of best fit a linear element of the data.
I apologise for the formatting being messy but this is the first question I have asked on this website as I cannot search for something that I can understand where I am going wrong.
This answer is for people that don't know about numpy arrays (like me), thanks MrE for the pointers to numpy docs.
Numpy arrays have this nice feature of boolean masks.
For numpy arrays, most operators return an array of the operation applied to every element - instead of a single result like in plain Python lists:
>>> alist = range(10)
>>> alist
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> alist > 5
True
>>> anarray = np.array(alist)
>>> anarray
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> anarray > 5
array([False, False, False, False, False, False, True, True, True, True], dtype=bool)
You can use an array of bool as the index for a numpy array, in this case you get a filtered array for the positions where the corresponding bool array element is True.
>>> mask = anarray > 5
>>> anarray[mask]
array([6, 7, 8, 9])
The mask must not be bigger than the array:
>>> anotherarray = anarray[mask]
>>> anotherarray
array([6, 7, 8, 9])
>>> anotherarray[mask]
ValueError: too many boolean indices
So you cant use a mask bigger than the array you are masking:
>>> anotherarray[anarray > 7]
ValueError: too many boolean indices
>>> anotherarray[anotherarray > 7]
array([8, 9])
Since xorg is smaller than x, a mask based on x will be longer than xorg and you get the ValueError exception.
Change
xfit=xorg[x>0.85]
to
xfit=xorg[xorg>0.85]
x is larger than xorg so x > 0.85 has more elements than xorg
Try the following:
replace your code
xorg=x[origin:1201]
xfit=xorg[(x>0.85)==True]
with
mask = x > 0.85
xfit = xorg[mask[origin:1201]]
This works when x is a numpy.ndarray, otherwise you might end up in problems as advanced indexing will return a view, not a copy, see SciPy/NumPy documentation.
I'm unsure whether you like to use numpy, but when trying to fit data, numpy/scipy is a good choice anyway...

Categories

Resources