How to go through a double for loop randomly in python - python

Consider the following code:
for i in range(size-1):
for j in range(i+1,size):
print((i,j))
I need to go through this for-loop in a random fashion. I attempt to write a generator to do such a thing
def Neighborhood(size):
for i in shuffle(range(size-1)):
for j in shuffle(range(i+1), size):
yield i, j
for i,j in Neighborhood(size):
print((i,j))
However, shuffle cannot be applied to whatever object range is. I do not know how to remedy the situation, and any help is much appreciated. I would prefer a solution avoid converting range to a list, since I need speed. For example, size could be on the order of 30,000 and i will do perform this for loop around 30,000 times.
I also plan to escape the for loop early, so I want to avoid solutions that incorporate shuffle(list(range(size)))

You can use random.sample.
The advantage of using random.sample over random.shuffle, is , it can work on iterators, so in :
Python 3.X you don't need to convert range() to list
In Python 2,X, you can use xrange
Same Code can work in Python 2.X and 3.X
Sample code :
n=10
l1=range(n)
for i in sample(l1,len(l1)):
l2=range(i,n)
for j in sample(l2,len(l2)):
print(i,j)
Edit :
As to why I put in this edit, go through the comments.
def Neighborhood(size):
range1 = range(size-1)
for i in sample(range1, len(range1)):
range2 = range(i+1)
for j in sample(range2, len(range2)):
yield i, j

A simple way to go really random, not row-by-row:
def Neighborhood(size):
yielded = set()
while True:
i = random.randrange(size)
j = random.randrange(size)
if i < j and (i, j) not in yielded:
yield i, j
yielded.add((i, j))
Demo:
for i, j in Neighborhood(30000):
print(i, j)
Prints something like:
2045 5990
224 5588
1577 16076
11498 15640
15219 28006
8066 10142
7856 8248
17830 26616
...
Note: I assume you're indeed going to "escape the for loop early". Then this won't have problems with slowing down due to pairs being produced repeatedly.

I don't think you can randomly traverse an Iterator. You can predefine the shuffled lists, though
random iteration in Python
L1 = list(range(size-1))
random.shuffle(L1)
for i in L1:
L2 = list(range(i+1, size))
random.shuffle(L2)
for j in L2:
print((i,j))
Of course, not optimal for large lists

Related

My code is very slow. How to optimize it? Python

def function_1(arr):
return [j for i in range(len(arr)) for j in range(len(arr))
if np.array(arr)[i] == np.sort(arr)[::-1][j]]
An arrarr array is given. It is required for each position [i] to find the arriarri element number in the arrarr array, sorted in descending order. All values ​​of the arrarr array are different.
I have to write func in 1 line. It is working, but very slowly. I have to do this:
np.random.seed(42)
arr = function_1(np.random.uniform(size=1000000))
print(arr[7] + arr[42] + arr[445677] + arr[53422])
Please help to optimize the code.
You are repeatedly sorting and reversing the array, but the result of that operation is independent of the current value of i or j. The simple thing to do is to pre-compute that, then use its value in the list comprehension.
For that matter, range(len(arr)) can also be computed once.
Finally, arr is already an array; you don't need to make a copy each time through the i loop.
def function_1(arr):
arr_sr = np.sort(arr)[::-1]
r = range(len(arr))
return [j for i in r for j in r if arr[i] == arr_sr[j]]
Fitting this into a single line becomes trickier. Aside from extremely artificial outside constraints, there is no reason to do so, but once Python 3.8 is released, assignment expressions will make it simpler to do so. I think the following would be equivalent.
def function_1(arr):
return [j for i in (r:=range(len(arr))) for j in r if arr[i] == (arr_sr:=np.sort(arr)[::-1])[j]]
Have a think about the steps that are going on in here:
[j
for i in range(len(arr))
for j in range(len(arr))
if np.array(arr)[i] == np.sort(arr)[::-1][j]
]
Suppose your array contains N elements.
You pick an i, N different times
You pick a j N different times
Then for each (i,j) pair you are doing the final line.
That is, you're doing the final line N^2 times.
But in that final line, you're sorting an array containing N elements. That's an NlogN operation. So the complexity of your code is O(N^3.logN).
Try making a sorted copy of the array before your [... for i ... for j ...] is called. That'll reduce the time complexity to O(N^2 + NlogN)
I think...

Recursion to replace Looping n times

My question is quite similar to this one here:
Function with varying number of For Loops (python)
However, what I really want is for example:
def loop_rec(n):
for i in range(n):
for j in range(n):
for k in range(n):
#... n loops
#Do something with i, j and k
#Such as i+j+k
The example in the link does not allow the index x to vary.
Something like the answer suggested in that question but to use the indices instead of just x.
def loop_rec(y, n):
if n >= 1:
for x in range(y): # Not just x
loop_rec(y, n - 1)
else:
whatever()
Thanks
For problems where you have to deal with multiple nested loops, python standard library provides a convenient tool called itertools.product
In your particular case, all you have to do is to wrap the range with itertools.product, and specify how many nested loops via the repeat parameter. itertools.product, necessarily performs Cartesian product.
def loop_rec(y, n):
from itertools import product
for elems in product(range(y),repeat=n):
# sum(args)
# x,y,z ....(n) terms = elems
# index elems as elems[0], elems[1], ....
based on your requirement, you might want to use the entire tuple of each Cartesian product, or might want to index the tuple individually, or if you know the loop depth, you can assign it to the variables.
Thanks, but what if I wanted to change the range for each loop, ie. i
in range(n), j in range(n-1), k in range(n-2)
Assuming, you want to vary the range from m to n i.e. range(n), range(n-1), range(n-2), .... range(m). you can rewrite the product as
product(*map(range, range(m,n)))
.

Python Get Access to locals() Back In 2.7 to Prevent Duplicates

So I am creating a list of primes using the "sieve" method and a Python comprehension.
no_primes = [j for i in range(2,sqrt_n) for j in range(i*2, n, i)]
Problem is the Sieve method generates tons of duplicates in the 'no_primes' list. It was recommended to use locals()['_[1]'] to gain access to the list as it is being built and then removing the dups as they occur:
no_primes = [j for i in range(2,sqrt_n) for j in range(i*2, n, i) if j not in locals()['_[1]']]
Problem is, this ability has been removed as of 2.7 and so does not work.
I understand that this method may be "evil" (Dr. Evil with his pinky at his lips.) , however, I need to remove dups before they affect memory with a massive list. Yes, I can filter or use 'set' to remove dups, but by then the list will have taken over my computer's memory and/or 'filter' or 'set' will have a massive task ahead.
So how do I get this ability back? I promise not to take over the world with it.
Thanks.
You can use a set-comprehension (which automatically prevents duplicates):
no_primes = {j for i in range(2,sqrt_n) for j in range(i*2, n, i)}
You could then sort it into a list if necessary:
no_primes = sorted(no_primes)
For a further optimization, you can use xrange instead of range:
no_primes = {j for i in xrange(2,sqrt_n) for j in xrange(i*2, n, i)}
Unlike the latter, which produces an unnecessary list, xrange returns an iterator.
A simple and readable approach would be:
def get_primes(n):
multiples = []
for i in xrange(2, n+1):
if i not in multiples:
for j in xrange(i*i, n+1, i):
multiples.append(j)
return multiples
m = get_primes(100)
print m

alternative (faster) way to 3 nested for loop python

How can I make this function faster? (I call it a lot of times and it could result in some speed improvements)
def vectorr(I, J, K):
vect = []
for k in range(0, K):
for j in range(0, J):
for i in range(0, I):
vect.append([i, j, k])
return vect
You can try to take a look at itertools.product
Equivalent to nested for-loops in a generator expression. For example,
product(A, B) returns the same as ((x,y) for x in A for y in B).
The nested loops cycle like an odometer with the rightmost element
advancing on every iteration. This pattern creates a lexicographic
ordering so that if the input’s iterables are sorted, the product
tuples are emitted in sorted order.
Also no need in 0 while calling range(0, I) and etc - use just range(I)
So in your case it can be:
import itertools
def vectorr(I, J, K):
return itertools.product(range(K), range(J), range(I))
You said you want it to be faster. Let's use NumPy!
import numpy as np
def vectorr(I, J, K):
arr = np.empty((I*J*K, 3), int)
arr[:,0] = np.tile(np.arange(I), J*K)
arr[:,1] = np.tile(np.repeat(np.arange(J), I), K)
arr[:,2] = np.repeat(np.arange(K), I*J)
return arr
There may be even more elegant tweaks possible here, but that's a basic tiling that gives the same result (but as a 2D array rather than a list of lists). The code for this is all implemented in C, so it's very, very fast--this may be important if the input values may get somewhat large.
The other answers are more thorough and, in this specific case at least, better, but in general, if you're using Python 2, and for large values of I, J, or K, use xrange() instead of range(). xrange gives a generator-like object, instead of constructing a list, so you don't have to allocate memory for the entire list.
In Python 3, range works like Python 2's xrange.
import numpy
def vectorr(I,J,K):
val = numpy.indices( (I,J,K))
val.shape = (3,-1)
return val.transpose() # or val.transpose().tolist()

Is there a way to avoid this memory error?

I'm currently working through the problems on Project Euler, and so far I've come up with this code for a problem.
from itertools import combinations
import time
def findanums(n):
l = []
for i in range(1, n + 1):
s = []
for j in range(1, i):
if i % j == 0:
s.append(j)
if sum(s) > i:
l.append(i)
return l
start = time.time() #start time
limit = 28123
anums = findanums(limit + 1) #abundant numbers (1..limit)
print "done finding abundants", time.time() - start
pairs = combinations(anums, 2)
print "done finding combinations", time.time() - start
sums = map(lambda x: x[0]+x[1], pairs)
print "done finding all possible sums", time.time() - start
print "start main loop"
answer = 0
for i in range(1,limit+1):
if i not in sums:
answer += i
print "ANSWER:",answer
When I run this I run into a MemoryError.
The traceback:
File "test.py", line 20, in <module>
sums = map(lambda x: x[0]+x[1], pairs)
I've tried to prevent it by disabling garbage collection from what I've been able to get from Google but to no avail. Am I approaching this the wrong way? In my head this feels like the most natural way to do it and I'm at a loss at this point.
SIDE NOTE: I'm using PyPy 2.0 Beta2(Python 2.7.4) because it is so much faster than any other python implementation I've used, and Scipy/Numpy are over my head as I'm still just beginning to program and I don't have an engineering or strong math background.
As Kevin mention in the comments, your algorithm might be wrong, but anyway your code is not optimized.
When using very big lists, it is common to use generators, there is a famous, great Stackoverflow answer explaining the concepts of yield and generator - What does the "yield" keyword do in Python?
The problem is that your pairs = combinations(anums, 2) doesn't generate a generator, but generates a large object which is much more memory consuming.
I changed your code to have this function, since you iterating over the collection only once, you can lazy evaluate the values:
def generator_sol(anums1, s):
for comb in itertools.combinations(anums1, s):
yield comb
Now instead of pairs = combinations(anums, 2) which generates a large object.
Use:
pairs = generator_sol(anums, 2)
Then, instead of using the lambda I would use another generator:
sums_sol = (x[0]+x[1] for x in pairs)
Another tip, you better look at xrange which is more suitable, it doens't generate a list but an xrange object which is more efficient in many cases (such as here).
Let me suggest you to use generators. Try changing this:
sums = map(lambda x: x[0]+x[1], pairs)
to
sums = (a+b for (a,b) in pairs)
Ofiris solution is also ok but implies that itertools.combinations return a list when it's wrong. If you are going to keep solving project euler problems have a look at the itertools documentation.
The issue is that anums is big - about 28000 elements long. so pairs must be 28000*28000*8 bytes = 6GB. If you used numpy you could cast anums as a numpy.int16 array, in which case the result pairs would be 1.5GB - more manageable:
import numpy as np
#cast
anums = np.array([anums],dtype=np.int16)
#compute the sum of all the pairs via outer product
pairs = (anums + anums.T).ravel()

Categories

Resources