Numpy unique with distinct members - python

I'm trying to find the unique elements of a N x 2 array irrespective of the order. For example, given the array
a = [[1,0],
[2,5],
[0,1],
[1,0]]
it would give me back
a = [[1,0],
[2,5]]
Currently I'm using an approach with numpy.sort()
a = np.unique(np.sort(a, axis=1), axis=0)
which gets the job done but I feel like this is an overly complicated way of achieving my goal and possibly slow especially for larger arrays. Are there better, "sort-avoiding" methods?

Following the comment, I don't think numpy can do this natively.
What you can do is use python sets to generate unique objects.
numpy.unique doesn't seem to handle sets very well, so an extra conversion step to tuple is needed:
np.unique(list(map(tuple, map(set, a))), axis=0)
output:
array([[0, 1],
[2, 5]])
If you want to recover the indices:
_, idx = np.unique(list(map(tuple, map(set, a))), axis=0, return_index=True)
a[idx]

Related

Generation of numpy arrays for permutations with constraints

To make a long story short, I'm trying to generate all the possible permutations of a set of numpy arrays. I have three numbers [j,k,m] and I would like to specify a maximum value for each one [J,K,M]. How would I then get all the combinations of arrays under these values? How could I force the k values to always be even as well? For instance:
So with the max values set to [1,2,2], the permutations would be: [0,0,0], [0,0,1], [0,0,2], [0,2,0], [0,2,1], [0,2,2], [1,0,0], [1,0,1] ...
I realise I don't have any example to code to show but I'm afraid I have literally no idea where to start with this.
From other answers it seems like sympy would be of some use?
I found answer that might be interested for you here and generalised it. So you can construct list of possible values for each item like so:
X = [[0, 1], [0, 1, 2], [0, 1, 2]]
And then use:
np.array(np.meshgrid(*X)).T.reshape(-1, len(X))
Output contains 18 items that you wanted. Actually, if you have only maximum values [J, K, L], you can construct X using X = [range(J+1), range(K+1), range(L+1)]

how do you find and save duplicated rows in a numpy array?

I have an array e.g.
Array = [[1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5],[1,1,1],[2,2,2]]
And i would like something that would output the following:
Repeated = [[1,1,1],[2,2,2]]
Preserving the number of repeated rows would work too, e.g.
Repeated = [[1,1,1],[1,1,1],[2,2,2],[2,2,2]]
I thought the solution might include numpy.unique, but i can't get it to work, is there a native python / numpy function?
Using the new axis functionality of np.unique alongwith return_counts=True that gives us the unique rows and the corresponding counts for each of those rows, we can mask out the rows with counts > 1 and thus have our desired output, like so -
In [688]: a = np.array([[1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5],[1,1,1],[2,2,2]])
In [689]: unq, count = np.unique(a, axis=0, return_counts=True)
In [690]: unq[count>1]
Out[690]:
array([[1, 1, 1],
[2, 2, 2]])
If you need to get indices of the repeated rows
import numpy as np
a = np.array([[1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5],[1,1,1],[2,2,2]])
unq, count = np.unique(a, axis=0, return_counts=True)
repeated_groups = unq[count > 1]
for repeated_group in repeated_groups:
repeated_idx = np.argwhere(np.all(a == repeated_group, axis=1))
print(repeated_idx.ravel())
# [0 5]
# [1 6]
You could use something like Repeated = list(set(map(tuple, Array))) if you didn't necessarily need order preserved. The advantage of this is you don't need additional dependencies like numpy. Depending on what you're doing next, you could probably get away with Repeated = set(map(tuple, Array)) and avoid a type conversion if you would like.

Check how many numpy array within a numpy array are equal to other numpy arrays within another numpy array of different size

My problem
Suppose I have
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
They are two arrays, of different sizes, containing other arrays (the inner arrays have same sizes!)
I want to count how many items of b (i.e. inner arrays) are also in a. Notice that I am not considering their position!
How can I do that?
My Try
count = 0
for bitem in b:
for aitem in a:
if aitem==bitem:
count+=1
Is there a better way? Especially in one line, maybe with some comprehension..
The numpy_indexed package contains efficient (nlogn, generally) and vectorized solutions to these types of problems:
import numpy_indexed as npi
count = len(npi.intersection(a, b))
Note that this is subtly different than your double loop, discarding duplicate entries in a and b for instance. If you want to retain duplicates in b, this would work:
count = npi.in_(b, a).sum()
Duplicate entries in a could also be handled by doing npi.count(a) and factoring in the result of that; but anyway, im just rambling on for illustration purposes since I imagine the distinction probably does not matter to you.
Here is a simple way to do it:
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = np.count_nonzero(
np.any(np.all(a[:, np.newaxis, :] == b[np.newaxis, :, :], axis=-1), axis=0))
print(count)
>>> 2
You can do what you want in one liner as follows:
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
Explanation
Here's an explanation of what's happening:
Iterate through the two arrays using itertools.product which will create an iterator over the cartesian product of the two arrays.
Compare each two arrays in a tuple (x,y) coming from step 1. using np.array_equal
True is equal to 1 when using sum on a list
Full example:
The final code looks like this:
import numpy as np
from itertools import product
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
# output: 2
You can convert the rows to dtype = np.void and then use np.in1d as on the resulting 1d arrays
def void_arr(a):
return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
b[np.in1d(void_arr(b), void_arr(a))]
array([[5, 6],
[1, 2]])
If you just want the number of intersections, it's
np.in1d(void_arr(b), void_arr(a)).sum()
2
Note: if there are repeat items in b or a, then np.in1d(void_arr(b), void_arr(a)).sum() likely won't be equal to np.in1d(void_arr(a), void_arr(b)).sum(). I've reversed the order from my original answer to match your question (i.e. how many elements of b are in a?)
For more information, see the third answer here

Numpy minimum like np.outer()

Maybe I'm just being lazy here, but let's say that I have two arrays, of length n and m, and I'd like a pairwise minimum of all of the elements of the two arrays compared against each other. For example:
a = [1,5,3]
b = [2,4]
cross_min(a,b)
= [[1,1],[2,4],[2,3]]
This is similar to the behavior of np.outer(), except that instead of multiplying the two arrays, it computes the minimum of the two elements.
Is there an operation in numpy that does a similar thing?
I know that I can just run np.minimum() along b and stack the results together. I'm wondering if this is a well-known operation that I just don't know the name of.
You can use np.minimum.outer(a, b)
You might turn one of the array into a 2d array, and then make use of the broadcasting rule and np.minimum:
import numpy as np
a = np.array([1,5,3])
b = np.array([2,4])
np.minimum(a[:,None], b)
#array([[1, 1],
# [2, 4],
# [2, 3]])

best way to create a numpy array from a list and additional individual values

I want to create an array from list entries and some additional individual values.
I am using the following approach which seems clumsy:
x=[1,2,3]
y=some_variable1
z=some_variable2
x.append(y)
x.append(z)
arr = np.array(x)
#print arr --> [1 2 3 some_variable1 some_variable2]
is there a better solution to the problem?
You can use list addition to add the variables all placed in a list to the larger one, like so:
arr = np.array(x + [y, z])
Appending or concatenating lists is fine, and probably fastest.
Concatenating at the array level works as well
In [456]: np.hstack([x,y,z])
Out[456]: array([1, 2, 3, 4, 5])
This is compact, but under the covers it does
np.concatenate([np.array(x),np.array([y]),np.array([z])])

Categories

Resources