I'm trying to return maximum values of multiple array in an element-wise comparison. For example:
A = array([0, 1, 2])
B = array([1, 0, 3])
C = array([3, 0, 4])
I want the resulting array to be array([3,1,4]).
I wanted to use numpy.maximum, but it is only good for two arrays. Is there a simple function for more than two arrays?
With this setup:
>>> A = np.array([0,1,2])
>>> B = np.array([1,0,3])
>>> C = np.array([3,0,4])
You can either do:
>>> np.maximum.reduce([A,B,C])
array([3, 1, 4])
Or:
>>> np.vstack([A,B,C]).max(axis=0)
array([3, 1, 4])
I would go with the first option.
You can use reduce. It repeatedly applies a binary function to a list of values...
For A, B and C given in question...
np.maximum.reduce([A,B,C])
array([3,1,4])
It first computes the np.maximum of A and B and then computes the np.maximum of (np.maximum of A and B) and C.
You can also use:
np.column_stack([A, B, C]).max(axis=1)
The result is the same as the solutions from the other answers.
I use Pandas more heavily than NumPy so for me it's easier to think of 1D arrays as something similar to Pandas Series. The above would be equivalent to:
pd.concat([A, B, C], axis=1).max(axis=1)
Related
I want to create a 3d array where basically the content is identical to the indeces used to access it. So m[2,5] would result in array([2, 5]).
I couldn't find an obvious solution with the numpy functions indices, ogrid, concatenate, etc.
At the moment I'm using this, but was wondering whether there is a solution that makes better use of the API:
a, b = 3, 4
m = np.ones((a, b, 2))
for x in range(a):
m[x,:, 1] = np.array(range(b))
for y in range(b):
m[:,y,0] = np.array(range(a))
Try np.mgrid:
a, b = 3, 4
m = np.mgrid[:a,:b].transpose(1,2,0)
print(m[1,2])
# array([1, 2])
In Numpy, nonzero(a), where(a) and argwhere(a), with a being a numpy array, all seem to return the non-zero indices of the array. What are the differences between these three calls?
On argwhere the documentation says:
np.argwhere(a) is the same as np.transpose(np.nonzero(a)).
Why have a whole function that just transposes the output of nonzero ? When would that be so useful that it deserves a separate function?
What about the difference between where(a) and nonzero(a)? Wouldn't they return the exact same result?
nonzero and argwhere both give you information about where in the array the elements are True. where works the same as nonzero in the form you have posted, but it has a second form:
np.where(mask,a,b)
which can be roughly thought of as a numpy "ufunc" version of the conditional expression:
a[i] if mask[i] else b[i]
(with appropriate broadcasting of a and b).
As far as having both nonzero and argwhere, they're conceptually different. nonzero is structured to return an object which can be used for indexing. This can be lighter-weight than creating an entire boolean mask if the 0's are sparse:
mask = a == 0 # entire array of bools
mask = np.nonzero(a)
Now you can use that mask to index other arrays, etc. However, as it is, it's not very nice conceptually to figure out which indices correspond to 0 elements. That's where argwhere comes in.
I can't comment on the usefulness of having a separate convenience function that transposes the result of another, but I can comment on where vs nonzero. In it's simplest use case, where is indeed the same as nonzero.
>>> np.where(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
>>> np.nonzero(np.array([[0,4],[4,0]]))
(array([0, 1]), array([1, 0]))
or
>>> a = np.array([[1, 2],[3, 4]])
>>> np.where(a == 3)
(array([1, 0]),)
>>> np.nonzero(a == 3)
(array([1, 0]),)
where is different from nonzero in the case when you wish to pick elements of from array a if some condition is True and from array b when that condition is False.
>>> a = np.array([[6, 4],[0, -3]])
>>> b = np.array([[100, 200], [300, 400]])
>>> np.where(a > 0, a, b)
array([[6, 4], [300, 400]])
Again, I can't explain why they added the nonzero functionality to where, but this at least explains how the two are different.
EDIT: Fixed the first example... my logic was incorrect previously
But I don't have the index values, I just have ones in those same indices in a different array. For example, I have
a = array([3,4,5,6])
b = array([0,1,0,1])
Is there some NumPy method than can quickly look at both of these and extract all values from a whose indices match the indices of all 1's in b? I want it to result in:
array([4,6])
It is probably worth mentioning that my a array is multidimensional, while my b array will always have values of either 0 or 1. I tried using NumPy's logical_and function, though this returns ValueError with a and b having different dimensions:
a = numpy.array([[3,2], [4,5], [6,1]])
b = numpy.array([0, 1, 0])
print numpy.logical_and(a,b)
ValueError: operands could not be broadcast together with shapes (3,2) (3,)
Though this method does seem to work if a is flat. Either way, the return type of numpy.logical_and() is a boolean, which I do not want. Is there another way? Again, in the second example above, the desired return would be
array([[4,5]])
Obviously I could write a simple loop to accomplish this, I'm just looking for something a bit more concise.
Edit:
This will introduce more constraints, I should also mention that each element of the multidimensional array a may be any arbitrary length, that does not match its neighbour.
You can simply use fancy indexing.
b == 1
will give you a boolean array:
>>> from numpy import array
>>> a = array([3,4,5,6])
>>> b = array([0,1,0,1])
>>> b==1
array([False, True, False, True], dtype=bool)
which you can pass as an index to a.
>>> a[b==1]
array([4, 6])
Demo for your second example:
>>> a = array([[3,2], [4,5], [6,1]])
>>> b = array([0, 1, 0])
>>> a[b==1]
array([[4, 5]])
You could use compress:
>>> a = np.array([3,4,5,6])
>>> b = np.array([0,1,0,1])
>>> a.compress(b)
array([4, 6])
You can provide an axis argument for multi-dimensional cases:
>>> a2 = np.array([[3,2], [4,5], [6,1]])
>>> b2 = np.array([0, 1, 0])
>>> a2.compress(b2, axis=0)
array([[4, 5]])
This method will work even if the axis of a you're indexing against is a different length to b.
Is there any way to get the indices of several elements in a NumPy array at once?
E.g.
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
I would like to find the index of each element of a in b, namely: [0,1,4].
I find the solution I am using a bit verbose:
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
c = np.zeros_like(a)
for i, aa in np.ndenumerate(a):
c[i] = np.where(b == aa)[0]
print('c: {0}'.format(c))
Output:
c: [0 1 4]
You could use in1d and nonzero (or where for that matter):
>>> np.in1d(b, a).nonzero()[0]
array([0, 1, 4])
This works fine for your example arrays, but in general the array of returned indices does not honour the order of the values in a. This may be a problem depending on what you want to do next.
In that case, a much better answer is the one #Jaime gives here, using searchsorted:
>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([0, 1, 4])
This returns the indices for values as they appear in a. For instance:
a = np.array([1, 2, 4])
b = np.array([4, 2, 3, 1])
>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([3, 1, 0]) # the other method would return [0, 1, 3]
This is a simple one-liner using the numpy-indexed package (disclaimer: I am its author):
import numpy_indexed as npi
idx = npi.indices(b, a)
The implementation is fully vectorized, and it gives you control over the handling of missing values. Moreover, it works for nd-arrays as well (for instance, finding the indices of rows of a in b).
All of the solutions here recommend using a linear search. You can use np.argsort and np.searchsorted to speed things up dramatically for large arrays:
sorter = b.argsort()
i = sorter[np.searchsorted(b, a, sorter=sorter)]
For an order-agnostic solution, you can use np.flatnonzero with np.isin (v 1.13+).
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
res = np.flatnonzero(np.isin(a, b)) # NumPy v1.13+
res = np.flatnonzero(np.in1d(a, b)) # earlier versions
# array([0, 1, 2], dtype=int64)
There are a bunch of approaches for getting the index of multiple items at once mentioned in passing in answers to this related question: Is there a NumPy function to return the first index of something in an array?. The wide variety and creativity of the answers suggests there is no single best practice, so if your code above works and is easy to understand, I'd say keep it.
I personally found this approach to be both performant and easy to read: https://stackoverflow.com/a/23994923/3823857
Adapting it for your example:
import numpy as np
a = np.array([1, 2, 4])
b_list = [1, 2, 3, 10, 4]
b_array = np.array(b_list)
indices = [b_list.index(x) for x in a]
vals_at_indices = b_array[indices]
I personally like adding a little bit of error handling in case a value in a does not exist in b.
import numpy as np
a = np.array([1, 2, 4])
b_list = [1, 2, 3, 10, 4]
b_array = np.array(b_list)
b_set = set(b_list)
indices = [b_list.index(x) if x in b_set else np.nan for x in a]
vals_at_indices = b_array[indices]
For my use case, it's pretty fast, since it relies on parts of Python that are fast (list comprehensions, .index(), sets, numpy indexing). Would still love to see something that's a NumPy equivalent to VLOOKUP, or even a Pandas merge. But this seems to work for now.
I have 2 arrays, for the sake of simplicity let's say the original one is a random set of numbers:
import numpy as np
a=np.random.rand(N)
Then I sample and shuffle a subset from this array:
b=np.array() <------size<N
The shuffling I do do not store the index values, so b is an unordered subset of a
Is there an easy way to get the original indexes of b, so they are in the same order as a, say, if element 2 of b has the index 4 in a, create an array of its assignation.
I could use a for cycle checking element by element, but perhaps there is a more pythonic way
Thanks
I think the most computationally efficient thing to do is to keep track of the indices that associate b with a as b is created.
For example, instead of sampling a, sample the indices of a:
indices = random.sample(range(len(a)), k) # k < N
b = a[indices]
On the off chance a happens to be sorted you could do:
>>> from numpy import array
>>> a = array([1, 3, 4, 10, 11])
>>> b = array([11, 1, 4])
>>> a.searchsorted(b)
array([4, 0, 2])
If a is not sorted you're probably best off going with something like #unutbu's answer.