numpy write with two masks - python

If I have an array a with 100 elements that I want to conditionally update. I have the first mask m which selects elements of a that will be tried to update. Out of a[m] (say, 50 elements), I want to update a subset some elements, but leaves others. So the second mask m2 has 50=m.sum() elements, only some of which are True.
For completeness, a minimal example:
a = np.random.random(size=100)
m = a > 0
m2 = np.random.random(size=m.sum()) < 0
newvalues = -np.random.randint(size=m2.sum())
Then if I were to do
a[m][m2] = newvalues
This does not change the values of a, because fancy indexing a[m] makes a copy here. using indices (with where) has the same behaviour.
Instead, this works:
m12 = m.copy()
m12[m] = m2
a[m12] = newvalues
However, this is verbose and difficult to read.
Is there a more elegant way to update a subset of a subset of an array?

You can potentially first compute the "final index" of interest and then use those indexes to update. One way to achieve this in a more "numpy" way is to mask the first index array, which is computed based on the first mask array.
final_mask = np.where(m)[0][m2]
a[final_mask] = newvalues

First compute the indices of elements to update:
indices = np.array(range(100))
indices = indices[m1][m2]
then use indices to update array a:
a[indices] = newvalue

Related

Selecting numpy columns based on values in a row

Suppose I have a numpy array with 2 rows and 10 columns. I want to select columns with even values in the first row. The outcome I want can be obtained is as follows:
a = list(range(10))
b = list(reversed(range(10)))
c = np.concatenate([a, b]).reshape(2, 10).T
c[c[:, 0] % 2 == 0].T
However, this method transposes twice and I don't suppose it's very pythonic. Is there a way to do the same job cleaner?
Numpy allows you to select along each dimension separately. You pass in a tuple of indices whose length is the number of dimensions.
Say your array is
a = np.random.randint(10, size=(2, 10))
The even elements in the first row are given by the mask
m = (a[0, :] % 2 == 0)
You can use a[0] to get the first row instead of a[0, :] because missing indices are synonymous with the slice : (take everything).
Now you can apply the mask to just the second dimension:
result = a[:, m]
You can also convert the mask to indices first. There are subtle differences between the two approaches, which you won't see in this simple case. The biggest difference is usually that linear indices are a little faster, especially if applied more than once:
i = np.flatnonzero(m)
result = a[:, i]

How can I find the values that are included in each of n arrays (Python)?

I'll preface this with saying that I'm new to Python, but not new to OOP.
I'm using numpy.where to find the indices in n arrays at which a particular condition is met, specifically if the value in the array is greater than x.
What I want to do is find the indicies in which all n arrays meet that condition - so in each each array, at index y, the element is greater than x.
n0[y] > x
n1[y] > x
n2[y] > x
n3[y] > x
For example, if my arrays after using numpy.where were:
a = [0,1,2,3,4,5,6,7,8,9,10]
b = [0,2,4,6,8,10,12,14,16,18,20]
c = [0,2,3,5,7,11,13,17,19,23]
d = [0,1,2,3,5,8,13,21,34,55]
I want to get the output
[0,2]
I found the function numpy.isin, which seems to do what I want for just two arrays. I don't know how to go about expanding this to more than two arrays and am not sure if it's possible.
Here's the start of my code, in which I generate the indices meeting my criteria:
n = np.empty([0])
n = np.append(n,np.where(sensor[i] > x)[0])
I'm a little stuck. I know I could create a new array with the same number of indicies as my original arrays and set the values in it to true or false, but that would not be very efficient and my original arrays are 25k+ elements long.
To find the intersection of n different arrays, first convert them all to sets. Then it is possible to apply set.intersection(). For the example with a, b, c and d, simply do:
set.intersection(*map(set, [a,b,c,d]))
This will result in a set {0, 2}.

How to find index of minimum non zero element with numpy?

I have a 4x1 array that I want to search for the minimum non zero value and find its index. For example:
theta = array([0,1,2,3]).reshape(4,1)
It was suggested in a similar thread to use nonzero() or where(), but when I tried to use that in the way that was suggested, it creates a new array that doesn't have the same indices as the original:
np.argmin(theta[np.nonzero(theta)])
gives an index of zero, which clearly isn't right. I think this is because it creates a new array of non zero elements first. I am only interested in the first minimum value if there are duplicates.
np.nonzero(theta) returns the index of the values that are non-zero. In your case, it returns,
[1,2,3]
Then, theta[np.nonzero(theta)] returns the values
[1,2,3]
When you do np.argmin(theta[np.nonzero(theta)]) on the previous output, it returns the index of the value 1 which is 0.
Hence, the correct approach would be:
i,j = np.where( theta==np.min(theta[np.nonzero(theta)])) where i,j are the indices of the minimum non zero element of the original numpy array
theta[i,j] or theta[i] gives the respective value at that index.
#!/usr/bin/env python
# Solution utilizing numpy masking of zero value in array
import numpy as np
import numpy.ma as ma
a = [0,1,2,3]
a = np.array(a)
print "your array: ",a
# the non-zero minimum value
minval = np.min(ma.masked_where(a==0, a))
print "non-zero minimum: ",minval
# the position/index of non-zero minimum value in the array
minvalpos = np.argmin(ma.masked_where(a==0, a))
print "index of non-zero minimum: ", minvalpos
I think you #Emily were very close to the correct answer. You said:
np.argmin(theta[np.nonzero(theta)]) gives an index of zero, which clearly isn't right. I think this is because it creates a new array of non zero elements first.
The last sentence is correct => the first one is wrong since it is expected to give the index in the new array.
Let's now extract the correct index in the old (original) array:
nztheta_ind = np.nonzero(theta)
k = np.argmin(theta[nztheta_ind])
i = nztheta_ind[0][k]
j = nztheta_ind[1][k]
or:
[i[k] for i in nztheta_ind]
for arbitrary dimensionality of original array.
ndim Solution
i = np.unravel_index(np.where(theta!=0, theta, theta.max()+1).argmin(), theta.shape)
Explaination
Masking the zeros out creates t0. There are other ways, see the perfplot.
Finding the minimum location, returns the flattened (1D) index.
unravel_index fixes this problem, and hasn't been suggested yet.
theta = np.triu(np.random.rand(4,4), 1) # example array
t0 = np.where(theta!=0, theta, np.nan) # 1
i0 = np.nanargmin(t0) # 2
i = np.unravel_index(i0, theta.shape) # 3
print(theta, i, theta[i]) #
mask: i = np.unravel_index(np.ma.masked_where(a==0, a).argmin(), a.shape)
nan: i = np.unravel_index(np.nanargmin(np.where(a!=0, a, np.nan)), a.shape)
max: i = np.unravel_index(np.where(a!=0, a, a.max()+1).argmin(), a.shape)

Return elements in a location corresponding to the minimum values of another array

I have two arrays with the same shape in the first two dimensions and I'm looking to record the minimum value in each row of the first array. However I would also like to record the elements in the corresponding position in the third dimension of the second array. I can do it like this:
A = np.random.random((5000, 100))
B = np.random.random((5000, 100, 3))
A_mins = np.ndarray((5000, 4))
for i, row in enumerate(A):
current_min = min(row)
A_mins[i, 0] = current_min
A_mins[i, 1:] = B[i, row == current_min]
I'm new to programming (so correct me if I'm wrong) but I understand that with Numpy doing calculations on whole arrays is faster than iterating over them. With this in mind is there a faster way of doing this? I can't see a way to get rid of the row == current_min bit even though the location of the minimum point must have been 'known' to the computer when it was calculating the min().
Any tips/suggestions appreciated! Thanks.
Something along what #lib talked about:
index = np.argmin(A, axis=1)
A_mins[:,0] = A[np.arange(len(A)), index]
A_mins[:,1:] = B[np.arange(len(A)), index]
It is much faster than using a for loop.
For getting the index of the minimum value, use amin instead of min + comparison
The amin function (and many other functions in numpy) also takes the argument axis, that you can use to get the minimum of each row or each column.
See http://docs.scipy.org/doc/numpy/reference/generated/numpy.amin.html

Delete a column in a multi-dimensional array if all elements in that column satisfy a condition

I have a multi-dimensional array such as;
a = [[1,1,5,12,0,4,0],
[0,1,2,11,0,4,2],
[0,4,3,17,0,4,9],
[1,3,5,74,0,8,16]]
How can I delete the column if all entries within that column are equal to zero? In the array a that would mean deleting the 4th column resulting in:
a = [[1,1,5,12,4,0],
[0,1,2,11,4,2],
[0,4,3,17,4,9],
[1,3,5,74,8,16]]
N.b I've written a as a nested list but only to make it clear. I also don't know a priori where the zero column will be in the array.
My attempt so far only finds the index of the column in which all elements are equal to zero:
a = np.array([[1,1,5,12,0,4,0],[0,1,2,11,0,4,2],[0,4,3,17,0,4,9],[1,3,5,74,0,8,16]])
b = np.vstack(a)
ind = []
for n,m in zip(b.T,range(len(b.T))):
if sum(n) == 0:
ind.append(m)
Is there any way to achieve this?
With the code you already have, you can just do:
for place in ind:
for sublist in a:
del sublist[place]
Which gets the job done but is not very satisfactory...
Edit: numpy is strong
import numpy as np
a = np.array(a)
a = a[:, np.sum(a, axis=0)!=0]

Categories

Resources