Numpy unique elements according to every column

Numpy unique elements according to every column - python

I'm looking for a way to reduce a Nx2 numpy matrix to a smaller matrix where each number occurs in every column only once.
For example:
A = np.array([[2, 0],
[1, 0],
[0, 1],
[1, 1],
[1, 3],
[1, 2]])
# Would be output as
[[2, 0],
[0, 1],
[1, 3]]
Order must be maintained (the first case where each number occurs must be the one used).
A python implementation for this is:
output = []
x_occurances = set()
y_occurances = set()
for x, y in A:
if x not in x_occurances and y not in y_occurances:
output.append([x, y])
x_occurances.add(x)
y_occurances.add(y)
But I would like to know if a more numpy-centric solution exists. I was looking at np.unique() however cases I find only seem to work over unique rows or columns.

Related

Efficient way to find all the pairs in a list without using nested loop

Suppose I have a list that stores many 2D points. In this list, some positions are stored the same points, consider the index of positions that stored the same point as an index pair. I want to find all the pairs in the list and return all 2 by 2 index pairs. It is possible that the list has some points repeated more than two times, but only the first match needs to be treated as a pair.
For example, in the below list, I have 9 points in total and there are 5 positions containing repeated points. The indices 0, 3, and 7 store the same point ([1, 1]), and the indicies 1 and 6 store the same point ([2, 3]).
[[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]]
So, for this list, I want to return the index pair as (index 0, index 3) and (index 1, index 6). The only solution I can come up with is doing this is through nested loops, which I code up as following
A = np.array([[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]], dtype=int)
# I don't want to modified the original list, looping through a index list insted.
Index = np.arange(0, A.shape[0], 1, dtype=int)
Pair = [] # for store the index pair
while Index.size != 0:
current_index = Index[0]
pi = A[current_index]
Index = np.delete(Index, 0, 0)
for j in range(Index.shape[0]):
pj = A[Index[j]]
distance = linalg.norm(pi - pj, ord=2, keepdims=True)
if distance == 0:
Pair.append([current_index, Index[j]])
Index = np.delete(Index, j, 0)
break
While this code works for me but the time complexity is O(n^2), where n == len(A), I'm wondering if is there any more efficient way to do this job with a lower time complexity. Thanks for any ideas and help.

You can use a dictionary to keep track of the indices for each point.
Then, you can iterate over the items in the dictionary, printing out the indices corresponding to points that appear more than once. The runtime of this procedure is linear, rather than quadratic, in the number of points in A:
points = {}
for index, point in enumerate(A):
point_tuple = tuple(point)
if point_tuple not in points:
points[point_tuple] = []
points[point_tuple].append(index)
for point, indices in points.items():
if len(indices) > 1:
print(indices)
This prints out:
[0, 3, 7]
[1, 6]
If you only want the first two indices where a point appears, you can use print(indices[:2]) rather than print(indices).

This is similar to the other answer, but since you only want the first two in the event of multiple pairs you can do it in a single iteration. Add the indices under the appropriate key in a dict and yield the indices if (and only if) there are two points:
from collections import defaultdict
l = [[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]]
def get_pairs(l):
ind = defaultdict(list)
for i, pair in enumerate(l):
t = tuple(pair)
ind[t].append(i)
if len(ind[t]) == 2:
yield list(ind[t])
list(get_pairs(l))
# [[0, 3], [1, 6]]

One pure-Numpy solution without loops (the only one so far) is to use np.unique twice with a trick that consists in removing the first items found between the two searches. This solution assume a sentinel can be set (eg. -1, the minimum value of an integer, NaN) which is generally not a problem (you can use bigger types if needed).
A = np.array([[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]], dtype=int)
# Copy the array not to mutate it
tmp = A.copy()
# Find the location of unique values
pair1, index1 = np.unique(tmp, return_index=True, axis=0)
# Discard the element found assuming -1 is never stored in A
INT_MIN = np.iinfo(A.dtype).min
tmp[index1] = INT_MIN
# Find the location of duplicated values
pair2, index2 = np.unique(tmp, return_index=True, axis=0)
# Extract the indices that share the same pair of values found
left = index1[np.isin(pair1, pair2).all(axis=1)]
right = index2[np.isin(pair2, pair1).all(axis=1)]
# Combine the each left index with each right index
result = np.hstack((left[:,None], right[:,None]))
# result = array([[0, 3],
# [1, 6]])
This solution should run in O(n log n) time as np.unique uses a basic sort internally (more specifically quick-sort).

Apply vectorised function to Cartesian product of two ranges in PyTorch

I am kind of new to pytorch, and have a very simple question. Let's say we have a scalar function f():
def f(x,y):
return np.cos(x)+y
What I want to do is use the GPU to generate all pairs of data-points from two ranges x and y. For simple case, take x=y=[0,1,2].
Can I do that without changing the function? If not, how would you change the function?

You can take the Cartesian product of the values before applying your function to their first and second elements:
x = y = torch.tensor([0,1,2])
pairs = torch.cartesian_prod(x,y)
# tensor([[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [2, 0], [2, 1], [2, 2]])
x_, y_ = pairs[:,0], pairs[:,1]
f(x_,y_)

How to generate all arrays whose elements are within two bounds specified by arrays in Numpy?

Suppose that two integer arrays min and max are given and they have equal shape. How to generate all Numpy arrays such that min[indices] <= ar[indices] <= max[indices] for all indices in np.ndindex(shape)? I have looked at the Numpy array creation routines but none of them seem to do what I want. I considered also starting with the min array and looping over its indices, adding 1 until the corresponding entry in max was reached, but I want to know if Numpy provides methods to do this more cleanly. As an example, if
min = np.array([[0, 1],
[2, 3]])
max = np.array([[0, 1],
[3, 4]])
Then I would like to have as an output:
[np.array([[0, 1],
[2, 3]]),
np.array([[0, 1],
[3, 3]]),
np.array([[0, 1],
[2, 4]]),
np.array([[0, 1],
[3, 4]])]
The reason I want to do this is that I am coding some implementations of tabular RL methods and the most natural way to specify states and actions is by using arrays. However, I also need to implement Q-tables and I would ideally have these tables be represented by 2D matrices. I want to map the states and actions to indices and use those to acces the Q-table. I know I can write a multidimensional Q-table, but then the TF-Agents library I need to use runs some checks on the inputs that throw errors. In any case, by assumption, the tabular RL problems I would solve with this method are relatively small, so the bounds on the matrices will restrict the number of resultants.

This will work,
Also since range and itertools.product both returns a generator it's memory efficient (O(1) space).
import numpy as np
from itertools import product
a = np.array([[0, 1],
[2, 3]])
b = np.array([[0, 1],
[3, 4]])
l, m = np.shape(a)
ranges = [range(i, j+1) for i, j in zip(a.ravel(), b.ravel())]
ans = [np.array(value).reshape(l,m) for value in product(*ranges)]
print(ans)
Output
[array([[0, 1],
[2, 3]]), array([[0, 1],
[2, 4]]), array([[0, 1],
[3, 3]]), array([[0, 1],
[3, 4]])]

Custom permutation, Equal distribution of pairs

I've been playing with a strange problem for a few weeks and can't seem to get the results I want.
I'd like to take a permutation of a list of objects to get unique pairs. Then order them in a particular way to maximize equal distribution of the objects at any point in the list. This also means that if an object is at the beginning of a pair if should also be at the end of a pair soon after. No pairs can repeat. To clarify, here is an example.
list (A,B,C,D) might result in the following:
(A,B)
(C,D)
(B,A)
(D,C)
(A,C)
(B,D)
(C,A)
(D,B)
(A,D)
(B,C)
(D,A)
(C,B)
Notice, every letter is used every 2 pairs, and the letters switch positions frequently.
To get the permutation I used the python script:
perm = list(itertools.permutations(list,2))
which gave me 12 pairs of the letters.
I then manually ordered the pairs so the each letter is chosen as often as possible and switches position as often as possible. At any point in the list the letters will be distributed very equally. When I go through the process of figuring out this problem I know where in the list I will stop but I don't know how much that effects the order the pairs are placed in.
With 4 letters it can be done easier because (4 letters / 2 pairs) = 2.
I also would like this to work with odd permutation pairs as well.
For example:
A,B.C
A,B,C,D,E
etc..
I have tried this a number of ways and tried to recognize patterns and while there are plenty, there is just many ways to do this problem especially. There also may not be a perfect answer.
I have also tried taking a normal permutation of the letters P(4,4) or in the case of 5 letters P(5,5), and I've tried picking certain permutations, combining them, and then chopping them up into pairs. This seems like another route but I can't seem to be able to figure out which pairs to pick unless I manually work through it.
Any help is appreciated! Maybe try to point me in the right direction :)
I ultimately will try to implement this into python but I don't necessarily need help writing the code. it's more a question of what the process might be.

What you mean by 'maximize equal distribution' isn't clearly defined. One could maybe consider the greatest number of pairs between two apparitions of a given value. I'll leave it to you to show how the method I give here performs relatively to that.
With n objects, we have n*(n-1) pairs. In these (a, b) pairs:
n have indices such as b = (a+1) modulo n
n have indices such as b = (a+2) modulo n
and so on.
We can generate the first n pairs with a difference of 1, then the n pairs with a difference of 2...
For each difference, we generate the indices by adding the difference to the index (modulo n). When we get an a that was already used for this difference, we add 1
(modulo n, again). This way, we can generate the n pairs with this difference. As we are 'rolling' through the indices, we are sure that every value will appear regularly.
def pairs(n):
for diff in range(1, n):
starts_seen = set()
index = 0
for i in range(n):
pair = [index]
starts_seen.add(index)
index = (index+diff) % n
pair.append(index)
yield pair
index = (index+diff) % n
if index in starts_seen:
index = (index+1) % n
pairs2 = list(pair for pair in pairs(2))
print(pairs2)
# [[0, 1], [1, 0]]
pairs3 = list(pair for pair in pairs(3))
print(pairs3)
# [[0, 1], [2, 0], [1, 2],
# [0, 2], [1, 0], [2, 1]]
pairs4 = list(pair for pair in pairs(4))
print(pairs4)
# [[0, 1], [2, 3], [1, 2], [3, 0], <- diff = 1
# [0, 2], [1, 3], [2, 0], [3, 1], <- diff = 2
# [0, 3], [2, 1], [1, 0], [3, 2]] <- diff = 3
pairs5 = list(pair for pair in pairs(5))
print(pairs5)
# [[0, 1], [2, 3], [4, 0], [1, 2], [3, 4],
# [0, 2], [4, 1], [3, 0], [2, 4], [1, 3],
# [0, 3], [1, 4], [2, 0], [3, 1], [4, 2],
# [0, 4], [3, 2], [1, 0], [4, 3], [2, 1]]
# A check to verify that we get the right number of different pairs:
for n in range(100):
pairs_n = set([tuple(pair) for pair in pairs(n)])
assert len(pairs_n) == n*(n-1)
print('ok')
# ok

re-ordering/unwraping integer pairs efficiently

It is a bit hard to explain what I want to do, so the best way is to show an example I think.
I have a 2D numpy array which contains a list of integer pairs. Those integers go from 0 to N and each appears in 2 pairs of the list, except for two of them (which will be called the extrema). So the list contains N-1 pairs.
I would like to reorder the array to have it starting by one of the pair containing an extremum, and having the next pair starting by the previous pair end... very confusing so look at this example :
I start with this array :
array([[3, 0], [3, 2], [4, 0], [1, 2]])
and I would like to end up with this one :
array([[1, 2], [2, 3], [3, 0], [0, 4]])
here is a algorithm that works but which contains a loop over N-1... in this case it is not a problem since N=5 but I would like to do it with N=100000 or more so I would like to do it without an explicit python loop but can't figure out a way...
import numpy as np
co = np.array([[3, 0], [3, 2], [4, 0], [1, 2]])
extrema = np.nonzero(np.bincount(co.flat) == 1)[0]
nb_el = co.shape[0]
new_co = np.empty_like(co)
start = extrema[0]
for el_idx in xrange(nb_el):
where_idx = np.where(co == start)
if where_idx[1][0] == 1:
new_co[el_idx] = co[where_idx[0][0]][::-1]
else:
new_co[el_idx] = co[where_idx[0][0]]
co = np.delete(co, where_idx[0], 0)
start = new_co[el_idx][-1]
print new_co
# array([[1, 2], [2, 3], [3, 0], [0, 4]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy unique elements according to every column - python

Related

Efficient way to find all the pairs in a list without using nested loop

Apply vectorised function to Cartesian product of two ranges in PyTorch

How to generate all arrays whose elements are within two bounds specified by arrays in Numpy?

Custom permutation, Equal distribution of pairs

re-ordering/unwraping integer pairs efficiently

Categories

Resources