How to create a random sequence excepting a set of given values - python

I am using numpy and i want to generate an array of size n with random integers from a to b [upper bound exclusive] that are not in the array arr (if it helps, all values in arr are unique). I want the probability to be distributed uniformly among the other possible values. I am aware I can do it in this way:
randlist = np.random.randint(a, b, n)
while np.intersect1d(randlist, arr).size > 0:
randlist = np.random.randint(a, b, n)
But this seems really inefficent. What would be the fastest way to do this?

Simplest vectorized way would be with np.setdiff1d + np.random.choice -
c = np.setdiff1d(np.arange(a,b),arr)
out = np.random.choice(c,n)
Another way with masking -
mask = np.ones(b-a,dtype=bool)
mask[arr-a] = 0
idx = np.flatnonzero(mask)+a
out = idx[np.random.randint(0,len(idx),n)]

Related

numpy write with two masks

If I have an array a with 100 elements that I want to conditionally update. I have the first mask m which selects elements of a that will be tried to update. Out of a[m] (say, 50 elements), I want to update a subset some elements, but leaves others. So the second mask m2 has 50=m.sum() elements, only some of which are True.
For completeness, a minimal example:
a = np.random.random(size=100)
m = a > 0
m2 = np.random.random(size=m.sum()) < 0
newvalues = -np.random.randint(size=m2.sum())
Then if I were to do
a[m][m2] = newvalues
This does not change the values of a, because fancy indexing a[m] makes a copy here. using indices (with where) has the same behaviour.
Instead, this works:
m12 = m.copy()
m12[m] = m2
a[m12] = newvalues
However, this is verbose and difficult to read.
Is there a more elegant way to update a subset of a subset of an array?
You can potentially first compute the "final index" of interest and then use those indexes to update. One way to achieve this in a more "numpy" way is to mask the first index array, which is computed based on the first mask array.
final_mask = np.where(m)[0][m2]
a[final_mask] = newvalues
First compute the indices of elements to update:
indices = np.array(range(100))
indices = indices[m1][m2]
then use indices to update array a:
a[indices] = newvalue

Efficient Numpy search in a non-monotonic array

I am trying to conduct something similar to searchsorted, but in the case where the array is not completely monotonic. Say I have a scalar, c and a 1D array x, I want to find the indices i of all elements such that x[i] < c <= x[i + 1]. Importantly, x is not completely monotonic.
The following code works, but I just would like to know if this is the most efficient way to do this, or if there is a simper way:
x = np.array([1,2,3,1,2,3,1,2,3])
c = 2.5
t = c > x[:-1]
u = c <= x[1:]
v = t*u
i = v.nonzero()[0]
Or in one line of code:
i = ( (c > x[:-1]) * (c <= x[1:] ).nonzero()[0]
Is this the most efficient way to recover these indices?
Two additional questions.
Is there an easy way to extend this to the case where c is a 1D array and x is a 2D array, where c has as many elements as "rows" in x, and I perform this search for each element of c in the corresponding "row" of x?
My ultimate goal is to do this with a three dimensional case. That is, suppose c is still a 1D vector with n elements. Now, let x be a 3D array, with dimensions j by n by k. Is there a way to do #1 above for each "submatrix" in x? Basically, performing #1 above j times.
For example:
x1 = np.array([1,2,3,1,2,3],[1,2,3,1,2,3],[1,2,3,1,2,3])
x2 = x1 + 1
x = np.array([x1,x2])
c = np.array([1.5,2.5,3.5])
Under #1 above, when we compare c and x1, we would get: [[0,4],[1,5],[]]
When we compare c and x2, we would get: [[],[0,4],[1,5]]
Finally, under #2, I would like to get:
[[[0,4],[1,5],[]],
[[],[0,4],[1,5]]]
We could compare once to give us the boolean mask and re-use it with negation to get the other comparison array and also use slicing -
m = c > x
i = np.flatnonzero( m[:-1] & ~m[1:] )
We can extend it to x as 2D and c as 1D case with a loop, but do minimal computations with it by pre-computing on the masks generation in a vectorized manner, like so -
m = c[:,None] > x
m2 = m[:,:-1] & ~m[:,1:]
i = [np.flatnonzero( mi ) for mi in m2]
On such task, numpy make too much comparisons. You can win a 5X factor with Numba. No difficulties to adapt for 3 dimensions.
#numba.njit
def ind(x,c):
res = empty_like(x)
i=j=0
while i < x.size-1:
if x[i]<c and c<=x[i+1]:
res[j]=i
j+=1
i+=1
return res[:j]

Delete specific values in 2-Dimension array - Numpy

import numpy as np
I have two arrays of size n (to simplify, I use in this example n = 2):
A = array([[1,2,3],[1,2,3]])
B has two dimensions with n time a random integer: 1, 2 or 3.
Let's pretend:
B = array([[1],[3]])
What is the most pythonic way to subtract B from A in order to obtain C, C = array([2,3],[1,2]) ?
I tried to use np.subtract but due to the broadcasting rules I do not obtain C. I do not want to use mask or indices but element's values. I also tried to use np.delete, np.where without success.
Thank you.
This might work and should be quite Pythonic:
dd=[[val for val in A[i] if val not in B[i]] for i in xrange(len(A))]

Combination of matrix elements giving non-zero value (PYTHON)

I have to evaluate the following expression, given two quite large matrices A,B and a very complicated function F:
The mathematical expression
I was thinking if there is an efficient way in order to first find those indices i,j that will give a non-zero element after the multiplication of the matrices, so that I avoid the quite slow 'for loops'.
Current working code
# Starting with 4 random matrices
A = np.random.randint(0,2,size=(50,50))
B = np.random.randint(0,2,size=(50,50))
C = np.random.randint(0,2,size=(50,50))
D = np.random.randint(0,2,size=(50,50))
indices []
for i in range(A.shape[0]):
for j in range(A.shape[0]):
if A[i,j] != 0:
for k in range(B.shape[1]):
if B[j,k] != 0:
for l in range(C.shape[1]):
if A[i,j]*B[j,k]*C[k,l]*D[l,i]!=0:
indices.append((i,j,k,l))
print indices
As you can see, in order to get the indices I need I have to use nested loops (= huge computational time).
My guess would be NO: you cannot avoid the for-loops. In order to find all the indices ij you need to loop through all the elements which defeats the purpose of this check. Therefore, you should go ahead and use simple array elementwise multiplication and dot product in numpy - it should be quite fast with for loops taken care by numpy.
However, if you plan on using a Python loop then the answer is YES, you can avoid them by using numpy, using the following pseudo-code (=hand-waving):
i, j = np.indices((N, M)) # CAREFUL: you may need to swap i<->j or N<->M
fs = F(i, j, z) # array of values of function F
# for a given z over the index grid
R = np.dot(A*fs, B) # summation over j
# return R # if necessary do a summation over i: np.sum(R, axis=...)
If the issue is that computing fs = F(i, j, z) is a very slow operation, then you will have to identify elements of A that are zero using two loops built-in into numpy (so they are quite fast):
good = np.nonzero(A) # hidden double loop (for 2D data)
fs = np.zeros_like(A)
fs[good] = F(i[good], j[good], z) # compute F only where A != 0

Fast way to apply function elementwise to a numpy array

I have a sets of numpy arrays which I create using
for longtuple in itertools.product([0,1], repeat = n + m -1 ):
outputs = set(np.convolve(v, longtuple, 'valid').tostring() for v in itertools.product([0,1], repeat = m))
if (len(outputs) == 2**m):
print "Hooray!"
However I would actually like to take every element x of np.convolve(v, longtuple, 'valid') and apply x >> k & 1 (for values of k that I will specify) and then add that resulting array to the set instead. Is there an efficient way to do this?
My use of set and tostring() is simply to see if there are any duplicates. I am not sure it is correct however.
You can just take the result of convolve and apply your expression to it:
set((np.convolve(v, longtuple, 'valid') >> k & 1).tostring() for v in itertools.product([0,1], repeat = m))

Categories

Resources