I am using the function get_tuples(length, total) from here
to generate an array of all tuples of given length and sum, an example and the function are shown below. After I have created the array I need to find a way to return the indices of a given number of elements in the array. I was able to do that using .index() by changing the array to a list, as shown below. However, this solution or another solution that is also based on searching (for example using np.where) takes a lot of time to find the indices. Since all elements in the array (array s in the example) are different, I was wondering if we can construct a one-to-one mapping, i.e., a function such that given the element in the array it returns the index of the element by doing some addition and multiplication on the values of this element. Any ideas if that is possible? Thanks!
import numpy as np
def get_tuples(length, total):
if length == 1:
yield (total,)
return
for i in range(total + 1):
for t in get_tuples(length - 1, total - i):
yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))
# array s
In [1]: s
Out[1]:
array([[ 0, 0, 0, 20],
[ 0, 0, 1, 19],
[ 0, 0, 2, 18],
...,
[19, 0, 1, 0],
[19, 1, 0, 0],
[20, 0, 0, 0]])
#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]])
#change array to list
s_list = s.tolist()
#find the indices
indx=[s_list.index(i) for i in elements_to_find.tolist()]
#output
In [2]: indx
Out[2]: [0, 7, 100, 5, 45]
Here is a formula that calculates the index based on the tuple alone, i.e. it needn't see the full array. To compute the index of an N-tuple it needs to evaluate N-1 binomial coefficients. The following implementation is (part-) vectorized, it accepts ND-arrays but the tuples must be in the last dimension.
import numpy as np
from scipy.special import comb
# unfortunately, comb with option exact=True is not vectorized
def bc(N,k):
return np.round(comb(N,k)).astype(int)
def get_idx(s):
N = s.shape[-1] - 1
R = np.arange(1,N)
ps = s[...,::-1].cumsum(-1)
B = bc(ps[...,1:-1]+R,1+R)
return bc(ps[...,-1]+N,N) - ps[...,0] - 1 - B.sum(-1)
# OP's generator
def get_tuples(length, total):
if length == 1:
yield (total,)
return
for i in range(total + 1):
for t in get_tuples(length - 1, total - i):
yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))
# compute each index
r = get_idx(s)
# expected: 0,1,2,3,...
assert (r == np.arange(len(r))).all()
print("all ok")
#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]])
print(get_idx(elements_to_find))
Sample run:
all ok
[ 0 7 100 5 45]
How to derive formula:
Use stars and bars to express the full partition count #part(N,k) (N is total, k is length) as a single binomial coefficient (N + k - 1) choose (k - 1).
Count back-to-front: It is not hard to verify that after the i-th full iteration of the outer loop of OP's generator exactly #part(N-i,k) have not yet been enumerated. Indeed, what's left are all partitions p1+p2+... = N with p1>=i; we can write p1=q1+i such that q1+p2+... = N-i and this latter partition is constraint-free so we can use 1. to count.
You can use binary search to make the search a lot faster.
Binary search makes the search O(log(n)) rather than O(n) (using Index)
We do not need to sort the tuples since they are already sorted by the generator
import bisect
def get_tuples(length, total):
" Generates tuples "
if length == 1:
yield (total,)
return
yield from ((i,) + t for i in range(total + 1) for t in get_tuples(length - 1, total - i))
def find_indexes(x, indexes):
if len(indexes) > 100:
# Faster to generate all indexes when we have a large
# number to check
d = dict(zip(x, range(len(x))))
return [d[tuple(i)] for i in indexes]
else:
return [bisect.bisect_left(x, tuple(i)) for i in indexes]
# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))
# Tuples are generated in sorted order [(0,0,0,20), ...(20,0,0,0)]
# which allows binary search to be used
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
y = find_indexes(x, indexes)
print('Found indexes:', *y)
print('Indexes & Tuples:')
for i in y:
print(i, x[i])
Output
Found indexes: 0 7 100 5 45
Indexes & Tuples:
0 (0, 0, 0, 20)
7 (0, 0, 7, 13)
100 (0, 5, 5, 10)
5 (0, 0, 5, 15)
45 (0, 2, 4, 14)
Performance
Scenario 1--Tuples already computed and we just want to find the index of certain tuples
For instance x = list(get_tuples(4, 20)) has already been perform.
Search for
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
Binary Search
%timeit find_indexes(x, indexes)
100000 loops, best of 3: 11.2 µs per loop
Calculates the index based on the tuple alone (courtesy #PaulPanzer approach)
%timeit get_idx(indexes)
10000 loops, best of 3: 92.7 µs per loop
In this scenario, binary search is ~8x faster when tuples have already been pre-computed.
Scenario 2--the tuples have not been pre-computed.
%%timeit
import bisect
def find_indexes(x, t):
" finds the index of each tuple in list t (assumes x is sorted) "
return [bisect.bisect_left(x, tuple(i)) for i in t]
# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
y = find_indexes(x, indexes)
100 loops, best of 3: 2.69 ms per loop
#PaulPanzer approach is the same timing in this scenario (92.97 us)
=> #PaulPanzer approach ~29 times faster when the tuples don't have to be computed
Scenario 3--Large number of indexes (#PJORR)
A large number of random indexes is generated
x = list(get_tuples(4, 20))
xnp = np.array(x)
indices = xnp[np.random.randint(0,len(xnp), 2000)]
indexes = indices.tolist()
%timeit find_indexes(x, indexes)
#Result: 1000 loops, best of 3: 1.1 ms per loop
%timeit get_idx(indices)
#Result: 1000 loops, best of 3: 716 µs per loop
In this case, we are #PaulPanzer is 53% faster
Related
I have an array val of possible values (ex. val = [0, 1, 2, 3, 4, 5]) and an array A (possibly very long list) of selected values (ex. A = [2, 3, 1, 0, 2, 1, ... , 2, 3, 1, 0, 4])
Now I want to create an array B of the same length as A such that A[i] is different than B[i] for each i and entries in B are selected randomly. How to do it efficiently using numpy?
A simple method would be drawing the difference between A and B modulo n where n is the number of possible outcomes. A[i] != B[i] means that this difference is not zero, hence we draw from 1,...,n-1:
n,N = 10,100
A = np.random.randint(0,n,N)
D = np.random.randint(1,n,N)
B = (A-D)%n
Update: while arguably elegant this solution is not the fastest. We could save some time by replacing the (slow) modulo operator with just testing for negative values and adding n to them.
In this form this solution starts looking quite similar to #Divakar's: two blocks of possible values, one needs to be shifted.
But we can do better: instead of shifting on average half the values we can instead swap them out only if A[i] == B[i]. As this is expected to happen rarely unless the list of permissible values is very short, the code runs faster:
B = np.random.randint(1,n,N)
B[B==A] = 0
This is somewhat wasteful as it creates a temporary list for every item in A but otherwise fullfills your requirements:
from random import choice
val = [0, 1, 2, 3, 4, 5]
A = [2, 3, 1, 0, 2, 1, 2, 3, 1, 0, 4]
val = set(val)
B = [choice(list(val - {x})) for x in A]
print(B) # -> [4, 2, 3, 2, 5, 4, 1, 5, 5, 4, 1]
In a nutshell:
What happens is that val is converted to a set from which the current item in A gets removed. Consequently, an item is chosen at random from this resulting subset and gets added to B.
You can also test it with:
print(all(x!=y for x, y in zip(A, B)))
which of course returns True
Finally, note that the approach above only works with hashable items. So if you might have something like val = [[1, 2], [2, 3], ..] for example you will run into problems.
Here's one vectorized way -
def randnum_excludeone(A, val):
n = val[-1]
idx = np.random.randint(0,n,len(A))
idx[idx>=A] += 1
return idx
The idea is we generate random integers for each entry in A covering the entire length of val minus 1. Then, we add in 1 if the current random number generated is same or greater than current A element, otherwise we keep it. Thus, for any random number generated that's lesser than current A number, we keep it. Otherwise, with 1 addition, we will offset from the current A number. That's our final output - idx.
Let's verify the random-ness and make sure it's uniform across non-A elements -
In [42]: A
Out[42]: array([2, 3, 1, 0, 2, 1, 2, 3, 1, 0, 4])
In [43]: val
Out[43]: array([0, 1, 2, 3, 4, 5])
In [44]: c = np.array([randnum_excludeone(A, val) for _ in range(10000)])
In [45]: [np.bincount(i) for i in c.T]
Out[45]:
[array([2013, 2018, 0, 2056, 1933, 1980]),
array([2018, 1985, 2066, 0, 1922, 2009]),
array([2032, 0, 1966, 1975, 2040, 1987]),
array([ 0, 2076, 1986, 1931, 2013, 1994]),
array([2029, 1943, 0, 1960, 2100, 1968]),
array([2028, 0, 2048, 2031, 1929, 1964]),
array([2046, 2065, 0, 1990, 1940, 1959]),
array([2040, 2003, 1935, 0, 2045, 1977]),
array([2008, 0, 2011, 2030, 1937, 2014]),
array([ 0, 2000, 2015, 1983, 2023, 1979]),
array([2075, 1995, 1987, 1948, 0, 1995])]
Benchmarking on large arrays
Other vectorized approach(es) :
# #Paul Panzer's solution
def pp(A, val):
n,N = val[-1]+1,len(A)
D = np.random.randint(1,n,N)
B = (A-D)%n
return B
Timing results -
In [66]: np.random.seed(0)
...: A = np.random.randint(0,6,100000)
In [67]: %timeit pp(A,val)
100 loops, best of 3: 3.11 ms per loop
In [68]: %timeit randnum_excludeone(A, val)
100 loops, best of 3: 2.53 ms per loop
In [69]: np.random.seed(0)
...: A = np.random.randint(0,6,1000000)
In [70]: %timeit pp(A,val)
10 loops, best of 3: 39.9 ms per loop
In [71]: %timeit randnum_excludeone(A, val)
10 loops, best of 3: 25.9 ms per loop
Extending the range of val to 10 -
In [60]: np.random.seed(0)
...: A = np.random.randint(0,10,1000000)
In [61]: %timeit pp(A,val)
10 loops, best of 3: 31.2 ms per loop
In [62]: %timeit randnum_excludeone(A, val)
10 loops, best of 3: 23.6 ms per loop
Quick and dirty, and improvements could be made, but here goes.
Your requirements can be accomplished as follows:
val = [0, 1, 2, 3, 4, 5]
A = [2, 3, 1, 0, 2, 1,4,4, 2, 3, 1, 0, 4]
val_shifted = np.roll(val,1)
dic_val = {i:val_shifted[i] for i in range(len(val_shifted))}
B = [dic_val[i] for i in A]
Which Gives the result that meets your requirement
A = [2, 3, 1, 0, 2, 1, 4, 4, 2, 3, 1, 0, 4]
B = [1, 2, 0, 5, 1, 0, 3, 3, 1, 2, 0, 5, 3]
Here is another approach. B first gets a random shuffle of A. Then, all the values where A and B overlap get shuffled. In the special case where all the overlapping elements have the same value, they get swapped with random good values.
Interesting on this approach is that it also works when there A only contains a very limited set of different values. Unlike other approaches, Bis an exact shuffle of A, so it also works when A doesn't have a uniform distribution. Also, B is a completely random shuffle except for the requirement of being different at equal indices.
import random
N = 10000
A = [random.randrange(0,6) for _ in range(N)]
B = a.copy()
random.shuffle(b)
print(A)
print(B)
while True:
equal_vals = {i for i,j in zip(A, B) if i == j}
print(len(equal_vals), equal_vals)
if len(equal_vals) == 0: # finished, no equal values on same positions
break
else:
equal_ind = [k for k, (i, j) in enumerate(zip(A, B)) if i == j]
# create a list of indices where A and B are equal
random.shuffle(equal_ind) # as the list was ordened, shuffle it to get a random order
if len(equal_vals) == 1: # special case, all equal indices have the same value
special_val = equal_vals.pop()
# find all the indices where the special_val could be placed without problems
good_ind = [k for k,(i,j) in enumerate(zip(A, B)) if i != special_val and j != special_val]
if len(good_ind) < len(equal_ind):
print("problem: there are too many equal values in list A")
else:
# swap each bad index with a random good index
chosen_ind = random.sample(good_ind, len(equal_ind))
for k1, k2 in zip(equal_ind, chosen_ind):
b[k1], b[k2] = b[k2], b[k1] # swap
break
elif len(equal_vals) >= 2:
# permute B via the lis of equal indices;
# as there are at least 2 different values, at least two indices will get a desired value
prev = equal_ind[0]
old_first = B[prev]
for k in equal_ind[1:]:
B[prev] = B[k]
prev = k
B[prev] = old_first
print(A)
print(B)
This question already has answers here:
Generate random numbers summing to a predefined value
(7 answers)
Closed 4 years ago.
I have the following list:
Sum=[54,1536,36,14,9,360]
I need to generate 4 other lists, where each list will consist of 6 random numbers starting from 0, and the numbers will add upto the values in sum. For eg;
l1=[a,b,c,d,e,f] where a+b+c+d+e+f=54
l2=[g,h,i,j,k,l] where g+h+i+j+k+l=1536
and so on upto l6. And I need to do this in python. Can it be done?
Generating a list of random numbers that sum to a certain integer is a very difficult task. Keeping track of the remaining quantity and generating items sequentially with the remaining available quantity results in a non-uniform distribution, where the first numbers in the series are generally much larger than the others. On top of that, the last one will always be different from zero because the previous items in the list will never sum up to the desired total (random generators usually use open intervals in the maximum). Shuffling the list after generation might help a bit but won't generally give good results either.
A solution could be to generate random numbers and then normalize the result, eventually rounding it if you need them to be integers.
import numpy as np
totals = np.array([54,1536,36,14]) # don't use Sum because sum is a reserved keyword and it's confusing
a = np.random.random((6, 4)) # create random numbers
a = a/np.sum(a, axis=0) * totals # force them to sum to totals
# Ignore the following if you don't need integers
a = np.round(a) # transform them into integers
remainings = totals - np.sum(a, axis=0) # check if there are corrections to be done
for j, r in enumerate(remainings): # implement the correction
step = 1 if r > 0 else -1
while r != 0:
i = np.random.randint(6)
if a[i,j] + step >= 0:
a[i, j] += step
r -= step
Each column of a represents one of the lists you want.
Hope this helps.
This might not be the most efficient way but it will work
totals = [54, 1536, 36, 14]
nums = []
x = np.random.randint(0, i, size=(6,))
for i in totals:
while sum(x) != i: x = np.random.randint(0, i, size=(6,))
nums.append(x)
print(nums)
[array([ 3, 19, 21, 11, 0, 0]), array([111, 155, 224, 511, 457,
78]), array([ 8, 5, 4, 12, 2, 5]), array([3, 1, 3, 2, 1, 4])]
This is a way more efficient way to do this
totals = [54,1536,36,14,9,360, 0]
nums = []
for i in totals:
if i == 0:
nums.append([0 for i in range(6)])
continue
total = i
temp = []
for i in range(5):
val = np.random.randint(0, total)
temp.append(val)
total -= val
temp.append(total)
nums.append(temp)
print(nums)
[[22, 4, 16, 0, 2, 10], [775, 49, 255, 112, 185, 160], [2, 10, 18, 2,
0, 4], [10, 2, 1, 0, 0, 1], [8, 0, 0, 0, 0, 1], [330, 26, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0]]
I am trying to construct a numpy array (a 2-dimensional numpy array - i.e. a matrix) from a paper that uses a non-standard indexing to construct the matrix. I.e. the top left element is q1,2. instead of q0,0.
Define the n x (n-2) matrix Q by its elements qi,j for i = i,...,n and j = 2, ... , n-1 given by
qj-1,j=h-1j-1, qj,j = h-1j-1 - h-1j and qj+1,j=hjj-1. (I have posted this in Latex form here: http://www.texpaste.com/n/8vwds4fx)
I have tried to implement in python like this:
# n = u_s.size
# n = 299 for this example
n = 299
Q = np.zeros((n,n-2))
for i in range(0,n+1):
for j in range(2,n):
Q[j-1,j] = 1.0/h[j-1]
Q[j,j] = -1.0/h[j-1] - 1.0/h[j]
Q[j+1,j] = 1.0/h[j]
But I always get the error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-54-c07a3b1c81bb> in <module>()
1 for i in range(1,n+1):
2 for j in range(2,n-1):
----> 3 Q[j-1,j] = 1.0/h[j-1]
4 Q[j,j] = -1.0/h[j-1] - 1.0/h[j]
5 Q[j+1,j] = 1.0/h[j]
IndexError: index 297 is out of bounds for axis 1 with size 297
I initially thought I could decrement both i and j in my for loop to keep edge cases safe, as a quick way to move to zero-indexed notation, but this hasn't worked. I also tried incrementing and modifying the range().
Is there a way to convert this definition to one that python can handle? Is this a common issue?
Simplifying the problem to make the assignment pattern obvious:
In [228]: h=np.arange(10,15)
In [229]: Q=np.zeros((5,5),int)
In [230]: for j in range(1,5):
...: Q[j-1:j+2,j] = h[j-1:j+2]
In [231]: Q
Out[231]:
array([[ 0, 10, 0, 0, 0],
[ 0, 11, 11, 0, 0],
[ 0, 12, 12, 12, 0],
[ 0, 0, 13, 13, 13],
[ 0, 0, 0, 14, 14]])
Assignment to the partial first and last columns may need tweaking. Here's the equivalent built from diagonals:
In [232]: np.diag(h,0)+np.diag(h[:-1],1)+np.diag(h[1:],-1)
Out[232]:
array([[10, 10, 0, 0, 0],
[11, 11, 11, 0, 0],
[ 0, 12, 12, 12, 0],
[ 0, 0, 13, 13, 13],
[ 0, 0, 0, 14, 14]])
With the h[j-1], h[j] indexing this diagonal assignment probably needs tweaking, but it should be a useful starting point.
Selecting h values more like what you use (skipping the 1/h for now):
In [238]: Q=np.zeros((5,5),int)
In [239]: for j in range(1,4):
...: Q[j-1:j+2,j] =[h[j-1],h[j-1]+h[j], h[j]]
...:
In [240]: Q
Out[240]:
array([[ 0, 10, 0, 0, 0],
[ 0, 21, 11, 0, 0],
[ 0, 11, 23, 12, 0],
[ 0, 0, 12, 25, 0],
[ 0, 0, 0, 13, 0]])
I'm skipping the two partial end columns for now. The first slicing approach allowed me to be a bit sloppy, since it's ok to slice 'off the end'. The end columns, if set, will require their own expressions.
In [241]: j=0; Q[j:j+2,j] =[h[j], h[j]]
In [242]: j=4; Q[j-1:j+1,j] =[h[j-1],h[j-1]+h[j]]
In [243]: Q
Out[243]:
array([[10, 10, 0, 0, 0],
[10, 21, 11, 0, 0],
[ 0, 11, 23, 12, 0],
[ 0, 0, 12, 25, 13],
[ 0, 0, 0, 13, 27]])
The relevant diagonal pieces are still evident:
In [244]: h[1:]+h[:-1]
Out[244]: array([21, 23, 25, 27])
The equation doesn't contain any value for i. It is referring only to j. The Q should be a matrix of dimension n+2 x n+2. For j = 1, it refers to Q[0,1], Q[1,1] and Q[2,1]. for j =n, it refers to Q[n-1,n], Q[n,n] and Q[n+1,n]. So, Q should have indices from 0 to n+1 which n+2
I don't think, you require the i loop. You can achieve your results only with j loop from 1 to n, but Q should be from 0 to n+1
Given a numpy ndarray, I would like to take the first two axes, and replace them with a new axis, which is the sum of their antidiagonals.
In particular, suppose I have variables x,y,z,..., and the entries of my array represent the probability
array[i,j,k,...] = P(x=i, y=j, z=k, ...)
I would like to obtain
new_array[l,k,...] = P(x+y=l, z=k, ...) = sum_i P(x=i, y=l-i, z=k, ...)
i.e., new_array[l,k,...] is the sum of all array[i,j,k,...] such that i+j=l.
What is the most efficient and/or cleanest way to do this in numpy?
EDIT to add:
On recommendation of #hpaulj, here is the obvious iterative solution:
array = numpy.arange(30).reshape((2,3,5))
array = array / float(array.sum()) # make it a probability
new_array = numpy.zeros([array.shape[0] + array.shape[1] - 1] + list(array.shape[2:]))
for i in range(array.shape[0]):
for j in range(array.shape[1]):
new_array[i+j,...] += array[i,j,...]
new_array.sum() # == 1
There is a trace function that gives the sum of a diagonal. You can specify the offset and 2 axes (0 and 1 are the defaults). And to get the antidiagonal, you just need to flip one dimension. np.flipud does that, though it's just [::-1,...] indexing.
Putting those together,
np.array([np.trace(np.flipud(array),offset=k) for k in range(-1,3)])
matches your new_array.
It still loops over the possible values of l (4 in this case). trace itself is compiled.
In this small case, it's actually slower than your double loop (2x3 steps). Even if I move the flipud out of the inner loop, it is still slower. I don't know how this scales for larger arrays.
Part of the problem with vectorizing this even further is that fact that each diagonal has a different length.
In [331]: %%timeit
array1 = array[::-1]
np.array([np.trace(array1,offset=k) for k in range(-1,3)])
.....:
10000 loops, best of 3: 87.4 µs per loop
In [332]: %%timeit
new_array = np.zeros([array.shape[0] + array.shape[1] - 1] + list(array.shape[2:]))
for i in range(2):
for j in range(3):
new_array[i+j] += array[i,j]
.....:
10000 loops, best of 3: 43.5 µs per loop
scipy.sparse has a dia format, which stores the values of nonzero diagonals. It stores a padded array of values, along with the offsets.
array([[12, 0, 0, 0],
[ 8, 13, 0, 0],
[ 4, 9, 14, 0],
[ 0, 5, 10, 15],
[ 0, 1, 6, 11],
[ 0, 0, 2, 7],
[ 0, 0, 0, 3]])
array([-3, -2, -1, 0, 1, 2, 3])
While that's a way of getting around the issue of variable diagonal lengths, I don't think it helps in this case where you just need their sums.
I have two 2D numpy arrays,
import numpy as np
a = np.array([[ 1, 15, 16, 200, 10],
[ -1, 10, 17, 11, -1],
[ -1, -1, 20, -1, -1]])
g = np.array([[ 1, 12, 15, 100, 11],
[ 2, 13, 16, 200, 12],
[ 3, 14, 17, 300, 13],
[ 4, 17, 18, 400, 14],
[ 5, 20, 19, 500, 16]])
What I want to do is, for each column of g, to check if it contains any element from the corresponding column of a. For the first column, I want to check if any of the values [1,2,3,4,5] appears in [1,-1,-1] and return True. For the second, I want to return False because no element in [12,13,14,17,20] appears in [15,10,-1]. At the moment, I do this using Python's list comprehension. Running
result = [np.any(np.in1d(g[:,i], a[:, i])) for i in range(5)]
calculates the correct result, but is getting slow when a has a lot of columns. Is there a more "pure numpy" way of doing this same thing? I feel like there should be an axis keyword one could add to the numpy.in1d function, but there isn't any...
I'd use broadcasting tricks, but this depends very much on the size of your arrays and the amount of RAM available to you:
M = g.reshape(g.shape+(1,)) - a.T.reshape((1,a.shape[1],a.shape[0]))
np.any(np.any(M == 0, axis=0), axis=1)
# returns:
# array([ True, False, True, True, False], dtype=bool)
It's easier to explain with a piece of paper and a pen (and smaller test arrays) (see below), but basically you're making copies of each column in g (one copy for each row in a) and subtracting single elements taken from the corresponding column in a from these copies. Similar to the original algorithm, just vectorized.
Caveat: if any of the arrays g or a is 1D, you'll need to force it to become 2D, such that its shape is at least (1,n).
Speed gains:
based only on your arrays: a factor ~20
python for loops: 301us per loop
vectorized: 15.4us per loop
larger arrays: factor ~80
In [2]: a = np.random.random_integers(-2, 3, size=(4, 50))
In [3]: b = np.random.random_integers(-20, 30, size=(35, 50))
In [4]: %timeit np.any(np.any(b.reshape(b.shape+(1,)) - a.T.reshape((1,a.shape[1],a.shape[0])) == 0, axis=0), axis=1)
10000 loops, best of 3: 39.5 us per loop
In [5]: %timeit [np.any(np.in1d(b[:,i], a[:, i])) for i in range(a.shape[1])]
100 loops, best of 3: 3.13 ms per loop
Image attached to explain broadcasting:
Instead of processing the input by column, you can process it by rows. For example you find out if any element of the first row of a is present in the columns of g, so that you can stop processing the columns where the element is found.
idx = arange(a.shape[1])
result = empty((idx.size,), dtype=bool)
result.fill(False)
for j in range(a.shape[0]):
#delete this print in production
print "%d line, I look only at columns " % (j + 1), idx
line_pruned = take(a[j], idx)
g_pruned = take(g, idx, axis=1)
positive_idx = where((g_pruned - line_pruned) == 0)[1]
#delete this print in production
print "positive hit on the ", positive_idx, " -th columns"
put(result, positive_idx, True)
idx = setdiff1d(idx, positive_idx)
if not idx.size:
break
To understand how it works, we can consider a different input:
a = np.array([[ 0, 15, 16, 200, 10],
[ -1, 10, 17, 11, -1],
[ 1, -1, 20, -1, -1]])
g = np.array([[ 1, 12, 15, 100, 11],
[ 2, 13, 16, 200, 12],
[ 3, 14, 17, 300, 13],
[ 4, 17, 18, 400, 14],
[ 5, 20, 19, 500, 16]])
The output of the script is:
1 line, I look only at columns [0 1 2 3 4]
positive hit on the [2 3] -th columns
2 line, I look only at columns [0 1 4]
positive hit on the [] -th columns
3 line, I look only at columns [0 1 4]
positive hit on the [0] -th columns
Basically you can see how in the 2nd and 3rd round of the loop you're not processing the 2nd and 4th column.
The performance of this solution really depends on many factors, but it will be faster if it is likely that you hit many True values, and the problem has many rows. This of course depends also on the input, not just on the shape.