Using Numpy to generate random combinations of two arrays without repetition - python

Given two arrays, for example [0,0,0] and [1,1,1], it is already clear (see here) how to generate all the combinations, i.e., [[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]]. itertools (combinations or product) and numpy.meshgrid are the most common ways as far as I know.
However, I could't find any discussions on how to generate this combinations randomly, without repetitions.
An easy solution could be to generate all the combinations and then choose some of them randomly. For example:
# Three random combinations of [0,0,0] and [1,1,1]
comb = np.array(np.meshgrid([0,1],[0,1],[0,1])).T.reshape(-1,3)
result = comb[np.random.choice(len(comb),3,replace=False),:]
However, this is infeasible when the number of combinations is too big.
Is there a way to generate random combinations without replacement in Python (possibly with Numpy) without generating all the combinations?
EDIT: You can notice in the accepted answer that we also got for free a technique to generate random binary vectors without repetitions, which is just a single line (described in the Bonus Section).

Here's a vectorized approach without generating all combinations -
def unique_combs(A, N):
# A : 2D Input array with each row representing one group
# N : No. of combinations needed
m,n = A.shape
dec_idx = np.random.choice(2**m,N,replace=False)
idx = ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int)
return A[np.arange(m),idx]
Please note that this assumes we are dealing with equal number of elements per group.
Explanation
To give it a bit of explanation, let's say the groups are stored in a 2D array -
In [44]: A
Out[44]:
array([[4, 2], <-- group #1
[3, 5], <-- group #2
[8, 6]]) <-- group #3
We have two elems per group. Let's say we are looking for 4 unique group combinations : N = 4. To select from two numbers from each of those three groups, we would have a total of 8 unique combinations.
Let's generate N unique numbers in that interval of 8 using np.random.choice(8, N, replace=False) -
In [86]: dec_idx = np.random.choice(8,N,replace=False)
In [87]: dec_idx
Out[87]: array([2, 3, 7, 0])
Then, convert those to binary equivalents as later on we need those to index into each row of A -
In [88]: idx = ((dec_idx[:,None] & (1 << np.arange(3)))!=0).astype(int)
In [89]: idx
Out[89]:
array([[0, 1, 0],
[1, 1, 0],
[1, 1, 1],
[0, 0, 0]])
Finally, with fancy-indexing, we get those elems off A -
In [90]: A[np.arange(3),idx]
Out[90]:
array([[4, 5, 8],
[2, 5, 8],
[2, 5, 6],
[4, 3, 8]])
Sample run
In [80]: # Original code that generates all combs
...: comb = np.array(np.meshgrid([4,2],[3,5],[8,6])).T.reshape(-1,3)
...: result = comb[np.random.choice(len(comb),4,replace=False),:]
...:
In [81]: A = np.array([[4,2],[3,5],[8,6]]) # 2D array of groups
In [82]: unique_combs(A, 3) # 3 combinations
Out[82]:
array([[2, 3, 8],
[4, 3, 6],
[2, 3, 6]])
In [83]: unique_combs(A, 4) # 4 combinations
Out[83]:
array([[2, 3, 8],
[4, 3, 6],
[2, 5, 6],
[4, 5, 8]])
Bonus section
Explanation on ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int) :
That step is basically converting decimal numbers to binary equivalents. Let's break it down to smaller steps for a closer look.
1) Input array of decimal numbers -
In [18]: dec_idx
Out[18]: array([7, 6, 4, 0])
2) Convert to 2D upon inserting new axis with None/np.newaxis -
In [19]: dec_idx[:,None]
Out[19]:
array([[7],
[6],
[4],
[0]])
3) Let's assume m = 3, i.e. we want to convert to 3 binary digit number equivalents.
We create 2-powered range array with bit-shift operation -
In [16]: (1 << np.arange(m))
Out[16]: array([1, 2, 4])
Alternatively, an explicit way would be -
In [20]: 2**np.arange(m)
Out[20]: array([1, 2, 4])
4) Now, the crux of the cryptic step there. We perform broadcasted bitwise AND-ind between 2D dec_idx and 2-powered range array.
Consider the first element from dec_idx : 7. We are performing bitiwse AND-ing of 7 against 1, 2, 4. Think of it as a filtering process, as we filter 7 at each binary interval of 1, 2, 4 as they represent the three binary digits. Similarly, we do this for all elems off dec_idx in a vectorized manner with broadcasting.
Thus, we would get the bit-wise AND-ing results like so -
In [43]: (dec_idx[:,None] & (1 << np.arange(m)))
Out[43]:
array([[1, 2, 4],
[0, 2, 4],
[0, 0, 4],
[0, 0, 0]])
The filtered numbers thus obtained are either 0 or the 2-powered range array numbers themselves. So, to have the binary equivalents, we just need to consider all non-zeros as 1s and zeros as 0s.
In [44]: ((dec_idx[:,None] & (1 << np.arange(m)))!=0)
Out[44]:
array([[ True, True, True],
[False, True, True],
[False, False, True],
[False, False, False]], dtype=bool)
In [45]: ((dec_idx[:,None] & (1 << np.arange(m)))!=0).astype(int)
Out[45]:
array([[1, 1, 1],
[0, 1, 1],
[0, 0, 1],
[0, 0, 0]])
Thus, we have the binary numbers with MSBs to the right.

Related

Delete numpy axis 1 based on condition

I need to remove values from a np axis based on a condition.
For example, I would want to remove [:,2] (the second values on axis 1) if the first value == 0, else I would want to remove [:,3].
Input:
[[0,1,2,3],[0,2,3,4],[1,3,4,5]]
Output:
[[0,1,3],[0,2,4],[1,3,4]]
So now my output has one less value on the 1st axis, depending on if it met the condition or not.
I know I can isolate and manipulate this based on
array[np.where(array[:,0] == 0)] but then I would have to deal with each condition separately, and it's very important for me to preserve the order of this array.
I am dealing with 3D arrays & am hoping to be able to calculate all this simultaneously while preserving the order.
Any help is much appreciated!
A possible solution:
a = np.array([[0,1,2,3],[0,2,3,4],[1,3,4,5]])
b = np.arange(a.shape[1])
np.apply_along_axis(
lambda x: x[np.where(x[0] == 0, np.delete(b,2), np.delete(b,3))], 1, a)
Output:
array([[0, 1, 3],
[0, 2, 4],
[1, 3, 4]])
Since you are starting and ending with a list, a straight forward iteration is a good solution:
In [261]: alist =[[0,1,2,3],[0,2,3,4],[1,3,4,5]]
In [262]: for row in alist:
...: if row[0]==0: row.pop(2)
...: else: row.pop(3)
...:
In [263]: alist
Out[263]: [[0, 1, 3], [0, 2, 4], [1, 3, 4]]
A possible array approach:
In [273]: arr = np.array([[0,1,2,3],[0,2,3,4],[1,3,4,5]])
In [274]: mask = np.ones(arr.shape, bool)
In [275]: mask[np.arange(3),np.where(arr[:,0]==0,2,3)]=False
In [276]: mask
Out[276]:
array([[ True, True, False, True],
[ True, True, False, True],
[ True, True, True, False]])
arr[mask] will be 1d, but since we are deleting the same number of elements each row, we can reshape it:
In [277]: arr[mask].reshape(arr.shape[0],-1)
Out[277]:
array([[0, 1, 3],
[0, 2, 4],
[1, 3, 4]])
I expect the list approach will be faster for small cases, but the array should scale better. I don't know where the trade off is.

selective row sum matrix in numpy

Is there any efficient numpy way to do the following:
Assume I have some matix M of size R X C. Now assume I have another matrix
E which is of shape R X a (where a is just some constant a < C), which contains row indices of
M (and -1 for padding, i.e., every element of E is in {-1, 0, .., R-1}). For example,
M=array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
E = array([[ 0, 1],
[ 2, -1],
[-1, 0]])
Now, given those matrices, I want to generate a third matrix P, where the i'th row of P will
contain the sum of the following rows of M : E[i,:]. In the example, P will be,
P[0,:] = M[0,:] + M[1,:]
P[1,:] = M[2,:]
P[2,:] = M[0,:]
Yes, doing it with a loop is pretty straight forward and easy, I was wondering if there is
any fancy numpy way to make it more efficient (assuming that I want to do it with large matrices,
e.g., 200 X 200.
Thanks!
One way would be to sum with indexed on original array and then subtract out the summations caused by the last indexed ones by -1s -
out = M[E].sum(1) - M[-1]*(E==-1).sum(1)[:,None]
Another way would be pad zeros at the end of M, so that those -1 would index into those zeros and hence have no effect on the final sum after indexing -
M1 = np.vstack((M, np.zeros((1,M.shape[1]), dtype=M.dtype)))
out = M1[E].sum(1)
If there is exactly one or lesser -1 per row in E, we can optimize further -
out = M[E].sum(1)
m = (E==-1).any(1)
out[m] -= M[-1]
Another based on tensor-multiplication -
np.einsum('ij,kli->kj',M, (E[...,None]==np.arange(M.shape[1])))
You could index M with E, and np.sum where the actual indices in E are greater or equal to 0. For that we have the where parameter:
np.sum(M[E], where=(E>=0)[...,None], axis=1)
array([[5, 7, 9],
[7, 8, 9],
[1, 2, 3]])
Where we have that:
M[E]
array([[[1, 2, 3],
[4, 5, 6]],
[[7, 8, 9],
[7, 8, 9]],
[[7, 8, 9],
[1, 2, 3]]])
Is added on the rows:
(E>=0)[...,None]
array([[[ True],
[ True]],
[[ True],
[False]],
[[False],
[ True]]])
Probably not the fastest but maybe educational: The operation you are describing can be thought of as matrix multiplication with a certain adjacency matrix:
from scipy import sparse
# construct adjacency matrix
indices = E[E!=-1]
indptr = np.concatenate([[0],np.count_nonzero(E!=-1,axis=1).cumsum()])
data = np.ones_like(indptr)
aux = sparse.csr_matrix((data,indices,indptr))
# multiply
aux*M
# array([[5, 7, 9],
# [7, 8, 9],
# [1, 2, 3]], dtype=int64)

Numpy sort two arrays together with one array as the keys in axis 1 [duplicate]

I'm trying to get the indices to sort a multidimensional array by the last axis, e.g.
>>> a = np.array([[3,1,2],[8,9,2]])
And I'd like indices i such that,
>>> a[i]
array([[1, 2, 3],
[2, 8, 9]])
Based on the documentation of numpy.argsort I thought it should do this, but I'm getting the error:
>>> a[np.argsort(a)]
IndexError: index 2 is out of bounds for axis 0 with size 2
Edit: I need to rearrange other arrays of the same shape (e.g. an array b such that a.shape == b.shape) in the same way... so that
>>> b = np.array([[0,5,4],[3,9,1]])
>>> b[i]
array([[5,4,0],
[9,3,1]])
Solution:
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You got it right, though I wouldn't describe it as cheating the indexing.
Maybe this will help make it clearer:
In [544]: i=np.argsort(a,axis=1)
In [545]: i
Out[545]:
array([[1, 2, 0],
[2, 0, 1]])
i is the order that we want, for each row. That is:
In [546]: a[0, i[0,:]]
Out[546]: array([1, 2, 3])
In [547]: a[1, i[1,:]]
Out[547]: array([2, 8, 9])
To do both indexing steps at once, we have to use a 'column' index for the 1st dimension.
In [548]: a[[[0],[1]],i]
Out[548]:
array([[1, 2, 3],
[2, 8, 9]])
Another array that could be paired with i is:
In [560]: j=np.array([[0,0,0],[1,1,1]])
In [561]: j
Out[561]:
array([[0, 0, 0],
[1, 1, 1]])
In [562]: a[j,i]
Out[562]:
array([[1, 2, 3],
[2, 8, 9]])
If i identifies the column for each element, then j specifies the row for each element. The [[0],[1]] column array works just as well because it can be broadcasted against i.
I think of
np.array([[0],
[1]])
as 'short hand' for j. Together they define the source row and column of each element of the new array. They work together, not sequentially.
The full mapping from a to the new array is:
[a[0,1] a[0,2] a[0,0]
a[1,2] a[1,0] a[1,1]]
def foo(a):
i = np.argsort(a, axis=1)
return (np.arange(a.shape[0])[:,None], i)
In [61]: foo(a)
Out[61]:
(array([[0],
[1]]), array([[1, 2, 0],
[2, 0, 1]], dtype=int32))
In [62]: a[foo(a)]
Out[62]:
array([[1, 2, 3],
[2, 8, 9]])
The above answers are now a bit outdated, since new functionality was added in numpy 1.15 to make it simpler; take_along_axis (https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.take_along_axis.html) allows you to do:
>>> a = np.array([[3,1,2],[8,9,2]])
>>> np.take_along_axis(a, a.argsort(axis=-1), axis=-1)
array([[1 2 3]
[2 8 9]])
I found the answer here, with someone having the same problem. They key is just cheating the indexing to work properly...
>>> a[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
array([[1, 2, 3],
[2, 8, 9]])
You can also use linear indexing, which might be better with performance, like so -
M,N = a.shape
out = b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
So, a.argsort(1)+(np.arange(M)[:,None]*N) basically are the linear indices that are used to map b to get the desired sorted output for b. The same linear indices could also be used on a for getting the sorted output for a.
Sample run -
In [23]: a = np.array([[3,1,2],[8,9,2]])
In [24]: b = np.array([[0,5,4],[3,9,1]])
In [25]: M,N = a.shape
In [26]: b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
Out[26]:
array([[5, 4, 0],
[1, 3, 9]])
Rumtime tests -
In [27]: a = np.random.rand(1000,1000)
In [28]: b = np.random.rand(1000,1000)
In [29]: M,N = a.shape
In [30]: %timeit b[np.arange(np.shape(a)[0])[:,np.newaxis], np.argsort(a)]
10 loops, best of 3: 133 ms per loop
In [31]: %timeit b.ravel()[a.argsort(1)+(np.arange(M)[:,None]*N)]
10 loops, best of 3: 96.7 ms per loop

How to compare two numpy arrays and add missing values to the other with a tweak

I have two numpy arrays of different dimension. I want to add those additional elements of the bigger array to the smaller array, only the 0th element and the 1st element should be given as 0.
For example :
a = [ [2,4],[4,5], [8,9],[7,5]]
b = [ [2,5], [4,6]]
After adding the missing elements to b, b would become as follows :
b [ [2,5], [4,6], [8,0], [7,0] ]
I have tried the logic up to some extent, however some values are getting redundantly added as I am not able to check whether that element has already been added to b or not.
Secondly, I am doing it with the help of an additional array c which is the copy of b and then doing the desired operations to c. If somebody can show me how to do it without the third array c , would be very helpful.
import numpy as np
a = [[2,3],[4,5],[6,8], [9,6]]
b = [[2,3],[4,5]]
a = np.array(a)
b = np.array(b)
c = np.array(b)
for i in range(len(b)):
for j in range(len(a)):
if a[j,0] == b[i,0]:
print "matched "
else:
print "not matched"
c= np.insert(c, len(c), [a[j,0], 0], axis = 0)
print c
#####For explanation#####
#basic set operation to get the missing elements
c = set([i[0] for i in a]) - set([i[0] for i in b])
#c will just store the missing elements....
#then just append the elements
for i in c:
b.append([i, 0])
Output -
[[2, 5], [4, 6], [8, 0], [7, 0]]
Edit -
But as they are numpy arrays you can just do this (and without using c as an intermediate) - just two lines
for i in set(a[:, 0]) - (set(b[:, 0])):
b = np.append(b, [[i, 0]], axis = 0)
Output -
array([[2, 5],
[4, 6],
[8, 0],
[7, 0]])
You can use np.in1d to look for matching rows from b in a to get a mask and based on the mask choose rows from a or set to zeros. Thus, we would have a vectorized approach as shown below -
np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))
Sample run -
In [47]: a
Out[47]:
array([[2, 4],
[4, 5],
[8, 9],
[7, 5]])
In [48]: b
Out[48]:
array([[8, 7],
[4, 6]])
In [49]: np.vstack((b,a[~np.in1d(a[:,0],b[:,0])]*[1,0]))
Out[49]:
array([[8, 7],
[4, 6],
[2, 0],
[7, 0]])
First we should clear up one misconception. c does not have to be a copy. A new variable assignment is sufficient.
c = b
...
c= np.insert(c, len(c), [a[j,0], 0], axis = 0)
np.insert is not modifying any of its inputs. Rather it makes a new array. And the c=... just assigns that to c, replacing the original assignment. So the original c assignment just makes writing the iteration easier.
Since you are adding this new [a[j,0],0] at the end, you could use concatenate (the underlying function used by insert and stack(s).
c = np.concatenate((c, [a[j,0],0]), axis=0)
That won't make much of a change in the run time. It's better to find all the a[j] and add them all at once.
In this case you want to add a[2,0] and a[3,0]. Leaving aside, for the moment, the question of how we find [2,3], we can do:
In [595]: a=np.array([[2,3],[4,5],[6,8],[9,6]])
In [596]: b=np.array([[2,3],[4,5]])
In [597]: ind = [2,3]
An assign and fill approach would look like:
In [605]: c = np.zeros_like(a) # target array
In [607]: c[0:b.shape[0],:] = b # fill in the b values
In [608]: c[b.shape[0]:,0] = a[ind,0] # fill in the selected a column
In [609]: c
Out[609]:
array([[2, 3],
[4, 5],
[6, 0],
[9, 0]])
A variation would be construct a temporary array with the new a values, and concatenate
In [613]: a1 = np.zeros((len(ind),2),a.dtype)
In [614]: a1[:,0] = a[ind,0]
In [616]: np.concatenate((b,a1),axis=0)
Out[616]:
array([[2, 3],
[4, 5],
[6, 0],
[9, 0]])
I'm using the a1 create and fill approach because I'm too lazy to figure out how to concatenate a[ind,0] with enough 0s to make the same thing. :)
As Divakar shows, np.in1d is a handy way of finding the matches
In [617]: np.in1d(a[:,0],b[:,0])
Out[617]: array([ True, True, False, False], dtype=bool)
In [618]: np.nonzero(~np.in1d(a[:,0],b[:,0]))
Out[618]: (array([2, 3], dtype=int32),)
In [619]: np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]
Out[619]: array([2, 3], dtype=int32)
In [620]: ind=np.nonzero(~np.in1d(a[:,0],b[:,0]))[0]
If you don't care about the order a[ind,0] can also be gotten with np.setdiff1d(a[:,0],b[:,0]) (the values will be sorted).
Assuming you are working on a single dimensional array:
import numpy as np
a = np.linspace(1, 90, 90)
b = np.array([1,2,3,4,5,6,7,8,9,10,11,13,14,15,16,17,18,19,20,
21,22,23,24,25,27,28,31,32,33,34,35,36,37,38,39,
40,41,42,43,44,46,47,48,49,50,51,52,53,54,55,56,
57,58,59,60,61,62,63,64,65,67,70,72,73,74,75,76,
77,78,79,80,81,82,84,85,86,87,88,89,90])
m_num = np.setxor1d(a, b).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num), m_num))
This also works in a 2D space:
t1 = np.reshape(a, (10, 9))
t2 = np.reshape(b, (10, 8))
m_num2 = np.setxor1d(t1, t2).astype(np.uint8)
print("Total {0} numbers missing: {1}".format(len(m_num2), m_num2))

How can I find the indices in a numpy array that meet multiple conditions?

I have an array in Python like so:
Example:
>>> scores = numpy.asarray([[8,5,6,2], [9,4,1,4], [2,5,3,8]])
>>> scores
array([[8, 5, 6, 2],
[9, 4, 1, 4],
[2, 5, 3, 8]])
I want to find all [row, col] indices in scores where the value is:
1) the minimum in its row
2) larger than a threshold
3) at most .8 times the next largest value in the row
I would like to do it as efficiently as possible, preferably without any loops. I've been struggling with this for a while, so any help you can provide would be greatly appreciated!
It should go something along the lines of
In [1]: scores = np.array([[8,5,6,2], [9,4,1,4], [2,5,3,8]]); threshold = 1.1; scores
Out[1]:
array([[8, 5, 6, 2],
[9, 4, 1, 4],
[2, 5, 3, 8]])
In [2]: part = np.partition(scores, 2, axis=1); part
Out[2]:
array([[2, 5, 6, 8],
[1, 4, 4, 9],
[2, 3, 5, 8]])
In [3]: row_mask = (part[:,0] > threshold) & (part[:,0] <= 0.8 * part[:,1]); row_mask
Out[3]: array([ True, False, True], dtype=bool)
In [4]: rows = row_mask.nonzero()[0]; rows
Out[4]: array([0, 2])
In [5]: cols = np.argmin(scores[row_mask], axis=1); cols
Out[5]: array([3, 0])
At that moment if you're looking for actual coordinate pairs, you can just zip them:
In [6]: coords = zip(rows, cols); coords
Out[6]: [(0, 3), (2, 0)]
Or if you're planning to look those elements up, you can use them as is:
In [7]: scores[rows, cols]
Out[7]: array([2, 2])
I think that you're going to have a hard time doing this with out any for loops (or at least something that performs such a loop but might be disguising it as something else), seeing as how the operation is only dependent on the row and you want to do it for each row. It's not the most efficient (and what is may depend on how frequently conditions 2 and 3 are true) but this will work:
import heapq
threshold = 1.5
ratio = .8
scores = numpy.asarray([[8,5,6,2], [9,4,1,4], [2,5,3,8]])
found_points = []
for i,row in enumerate(scores):
lowest,second_lowest = heapq.nsmallest(2,row)
if lowest > threshold and lowest <= ratio*second_lowest:
found_points.append([i,numpy.where(row == lowest)[0][0]])
You get (for the example):
found_points = [[0, 3], [2, 0]]

Categories

Resources