Related
Suppose I have a list that stores many 2D points. In this list, some positions are stored the same points, consider the index of positions that stored the same point as an index pair. I want to find all the pairs in the list and return all 2 by 2 index pairs. It is possible that the list has some points repeated more than two times, but only the first match needs to be treated as a pair.
For example, in the below list, I have 9 points in total and there are 5 positions containing repeated points. The indices 0, 3, and 7 store the same point ([1, 1]), and the indicies 1 and 6 store the same point ([2, 3]).
[[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]]
So, for this list, I want to return the index pair as (index 0, index 3) and (index 1, index 6). The only solution I can come up with is doing this is through nested loops, which I code up as following
A = np.array([[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]], dtype=int)
# I don't want to modified the original list, looping through a index list insted.
Index = np.arange(0, A.shape[0], 1, dtype=int)
Pair = [] # for store the index pair
while Index.size != 0:
current_index = Index[0]
pi = A[current_index]
Index = np.delete(Index, 0, 0)
for j in range(Index.shape[0]):
pj = A[Index[j]]
distance = linalg.norm(pi - pj, ord=2, keepdims=True)
if distance == 0:
Pair.append([current_index, Index[j]])
Index = np.delete(Index, j, 0)
break
While this code works for me but the time complexity is O(n^2), where n == len(A), I'm wondering if is there any more efficient way to do this job with a lower time complexity. Thanks for any ideas and help.
You can use a dictionary to keep track of the indices for each point.
Then, you can iterate over the items in the dictionary, printing out the indices corresponding to points that appear more than once. The runtime of this procedure is linear, rather than quadratic, in the number of points in A:
points = {}
for index, point in enumerate(A):
point_tuple = tuple(point)
if point_tuple not in points:
points[point_tuple] = []
points[point_tuple].append(index)
for point, indices in points.items():
if len(indices) > 1:
print(indices)
This prints out:
[0, 3, 7]
[1, 6]
If you only want the first two indices where a point appears, you can use print(indices[:2]) rather than print(indices).
This is similar to the other answer, but since you only want the first two in the event of multiple pairs you can do it in a single iteration. Add the indices under the appropriate key in a dict and yield the indices if (and only if) there are two points:
from collections import defaultdict
l = [[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]]
def get_pairs(l):
ind = defaultdict(list)
for i, pair in enumerate(l):
t = tuple(pair)
ind[t].append(i)
if len(ind[t]) == 2:
yield list(ind[t])
list(get_pairs(l))
# [[0, 3], [1, 6]]
One pure-Numpy solution without loops (the only one so far) is to use np.unique twice with a trick that consists in removing the first items found between the two searches. This solution assume a sentinel can be set (eg. -1, the minimum value of an integer, NaN) which is generally not a problem (you can use bigger types if needed).
A = np.array([[1, 1], [2, 3], [1, 4], [1, 1], [10, 3], [5, 2], [2, 3], [1, 1], [3, 4]], dtype=int)
# Copy the array not to mutate it
tmp = A.copy()
# Find the location of unique values
pair1, index1 = np.unique(tmp, return_index=True, axis=0)
# Discard the element found assuming -1 is never stored in A
INT_MIN = np.iinfo(A.dtype).min
tmp[index1] = INT_MIN
# Find the location of duplicated values
pair2, index2 = np.unique(tmp, return_index=True, axis=0)
# Extract the indices that share the same pair of values found
left = index1[np.isin(pair1, pair2).all(axis=1)]
right = index2[np.isin(pair2, pair1).all(axis=1)]
# Combine the each left index with each right index
result = np.hstack((left[:,None], right[:,None]))
# result = array([[0, 3],
# [1, 6]])
This solution should run in O(n log n) time as np.unique uses a basic sort internally (more specifically quick-sort).
Sat I have the following numpy array:
arr = numpy.array([[0,0], [1, 0], [2, 0], [3, 0]])
How do I add a single sub-array on each of the six sub-arrays? (Say if want to add [2,1] to each of them then the output should be [[2,1], [3, 1], [4, 1], [5, 1]])
I know if it's a 1D array you can just write something like arr + 1 and it will add 1 to each elements in arr but what about in this case? I have yet to be able to find relative information in the documentations
arr = np.array([np.append(item, [2,1]) for item in arr])
This should give you the result
I've been playing with a strange problem for a few weeks and can't seem to get the results I want.
I'd like to take a permutation of a list of objects to get unique pairs. Then order them in a particular way to maximize equal distribution of the objects at any point in the list. This also means that if an object is at the beginning of a pair if should also be at the end of a pair soon after. No pairs can repeat. To clarify, here is an example.
list (A,B,C,D) might result in the following:
(A,B)
(C,D)
(B,A)
(D,C)
(A,C)
(B,D)
(C,A)
(D,B)
(A,D)
(B,C)
(D,A)
(C,B)
Notice, every letter is used every 2 pairs, and the letters switch positions frequently.
To get the permutation I used the python script:
perm = list(itertools.permutations(list,2))
which gave me 12 pairs of the letters.
I then manually ordered the pairs so the each letter is chosen as often as possible and switches position as often as possible. At any point in the list the letters will be distributed very equally. When I go through the process of figuring out this problem I know where in the list I will stop but I don't know how much that effects the order the pairs are placed in.
With 4 letters it can be done easier because (4 letters / 2 pairs) = 2.
I also would like this to work with odd permutation pairs as well.
For example:
A,B.C
A,B,C,D,E
etc..
I have tried this a number of ways and tried to recognize patterns and while there are plenty, there is just many ways to do this problem especially. There also may not be a perfect answer.
I have also tried taking a normal permutation of the letters P(4,4) or in the case of 5 letters P(5,5), and I've tried picking certain permutations, combining them, and then chopping them up into pairs. This seems like another route but I can't seem to be able to figure out which pairs to pick unless I manually work through it.
Any help is appreciated! Maybe try to point me in the right direction :)
I ultimately will try to implement this into python but I don't necessarily need help writing the code. it's more a question of what the process might be.
What you mean by 'maximize equal distribution' isn't clearly defined. One could maybe consider the greatest number of pairs between two apparitions of a given value. I'll leave it to you to show how the method I give here performs relatively to that.
With n objects, we have n*(n-1) pairs. In these (a, b) pairs:
n have indices such as b = (a+1) modulo n
n have indices such as b = (a+2) modulo n
and so on.
We can generate the first n pairs with a difference of 1, then the n pairs with a difference of 2...
For each difference, we generate the indices by adding the difference to the index (modulo n). When we get an a that was already used for this difference, we add 1
(modulo n, again). This way, we can generate the n pairs with this difference. As we are 'rolling' through the indices, we are sure that every value will appear regularly.
def pairs(n):
for diff in range(1, n):
starts_seen = set()
index = 0
for i in range(n):
pair = [index]
starts_seen.add(index)
index = (index+diff) % n
pair.append(index)
yield pair
index = (index+diff) % n
if index in starts_seen:
index = (index+1) % n
pairs2 = list(pair for pair in pairs(2))
print(pairs2)
# [[0, 1], [1, 0]]
pairs3 = list(pair for pair in pairs(3))
print(pairs3)
# [[0, 1], [2, 0], [1, 2],
# [0, 2], [1, 0], [2, 1]]
pairs4 = list(pair for pair in pairs(4))
print(pairs4)
# [[0, 1], [2, 3], [1, 2], [3, 0], <- diff = 1
# [0, 2], [1, 3], [2, 0], [3, 1], <- diff = 2
# [0, 3], [2, 1], [1, 0], [3, 2]] <- diff = 3
pairs5 = list(pair for pair in pairs(5))
print(pairs5)
# [[0, 1], [2, 3], [4, 0], [1, 2], [3, 4],
# [0, 2], [4, 1], [3, 0], [2, 4], [1, 3],
# [0, 3], [1, 4], [2, 0], [3, 1], [4, 2],
# [0, 4], [3, 2], [1, 0], [4, 3], [2, 1]]
# A check to verify that we get the right number of different pairs:
for n in range(100):
pairs_n = set([tuple(pair) for pair in pairs(n)])
assert len(pairs_n) == n*(n-1)
print('ok')
# ok
In the case of the set np.array([1, 2, 3]), there are only 9 possible combinations/sequences of its constituent elements: [1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3], [3, 1], [3, 2], [3, 3].
If we have the following array:
np.array([1, 1],
[1, 2],
[1, 3],
[2, 2],
[2, 3],
[3, 1],
[3, 2])
What is the best way, with NumPy/SciPy, to determine that [2, 1] and [3, 3] are missing? Put another way, how do we find the inverse list of sequences (when we know all of the possible element values)? Manually doing this with a couple of for loops is easy to figure out, but that would negate whatever speed gains we get from using NumPy over native Python (especially with larger datasets).
Your can generate a list of all possible pairs using itertools.product and collect all of them which are not in your array:
from itertools import product
pairs = [ [1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 1], [3, 2] ]
allPairs = list(map(list, product([1, 2, 3], repeat=2)))
missingPairs = [ pair for pair in allPairs if pair not in pairs ]
print(missingPairs)
Result:
[[2, 1], [3, 3]]
Note that map(list, ...) is needed to convert your list of list to a list of tuples that can be compared to the list of tuples returned by product. This can be simplified if your input array already was a list of tuples.
This is one way using itertools.product and set.
The trick here is to note that sets may only contain immutable types such as tuples.
import numpy as np
from itertools import product
x = np.array([1, 2, 3])
y = np.array([[1, 1], [1, 2], [1, 3], [2, 2],
[2, 3], [3, 1], [3, 2]])
set(product(x, repeat=2)) - set(map(tuple, y))
{(2, 1), (3, 3)}
If you want to stay in numpy instead of going back to raw python sets, you can do it using void views (based on #Jaime's answer here) and numpy's built in set methods like in1d
def vview(a):
return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
x = np.array([1, 2, 3])
y = np.array([[1, 1], [1, 2], [1, 3], [2, 2],
[2, 3], [3, 1], [3, 2]])
xx = np.array([i.ravel() for i in np.meshgrid(x, x)]).T
xx[~np.in1d(vview(xx), vview(y))]
array([[2, 1],
[3, 3]])
a = np.array([1, 2, 3])
b = np.array([[1, 1],
[1, 2],
[1, 3],
[2, 2],
[2, 3],
[3, 1],
[3, 2]])
c = np.array(list(itertools.product(a, repeat=2)))
If you want to use numpy methods, try this...
Compare the array being tested against the product using broadcasting
d = b == c[:,None,:]
#d.shape is (9,7,2)
Check if both elements of a pair matched
e = np.all(d, -1)
#e.shape is (9,7)
Check if any of the test items match an item of the product.
f = np.any(e, 1)
#f.shape is (9,)
Use f as a boolean index into the product to see what is missing.
>>> print(c[np.logical_not(f)])
[[2 1]
[3 3]]
>>>
Every combination corresponds to the number in range 0..L^2-1 where L=len(array). For example, [2, 2]=>3*(2-1)+(2-1)=4. Off by -1 arises because elements start from 1, not from zero. Such mapping might be considered as natural perfect hashing for this data type.
If operations on integer sets in NumPy are faster than operations on pairs - for example, integer set of known size might be represented by bit sequence (integer sequence) - then it is worth to traverse pair list, mark corresponding bits in integer set, then look for unset ones and retrieve corresponding pairs.
I'm looking for a way to reduce a Nx2 numpy matrix to a smaller matrix where each number occurs in every column only once.
For example:
A = np.array([[2, 0],
[1, 0],
[0, 1],
[1, 1],
[1, 3],
[1, 2]])
# Would be output as
[[2, 0],
[0, 1],
[1, 3]]
Order must be maintained (the first case where each number occurs must be the one used).
A python implementation for this is:
output = []
x_occurances = set()
y_occurances = set()
for x, y in A:
if x not in x_occurances and y not in y_occurances:
output.append([x, y])
x_occurances.add(x)
y_occurances.add(y)
But I would like to know if a more numpy-centric solution exists. I was looking at np.unique() however cases I find only seem to work over unique rows or columns.