I'm running the following program, in order to compare different times between multiprocessing and single core processing.
Here is the script :
from multiprocessing import Pool, cpu_count
from time import *
#Amount to calculate
N=5000
#Fonction that works alone
def two_loops(x):
t=0
for i in range(1,x+1):
for j in range(i):
t+=1
return t
#Function that need to be called in a loop
def single_loop(x):
tt=0
for j in range(x):
tt+=1
return tt
print 'Starting loop function'
starttime=time()
tot=0
for i in range(1,N+1):
tot+=single_loop(i)
print 'Single loop function. Result ',tot,' in ', time()-starttime,' seconds'
print 'Starting multiprocessing function'
if __name__=='__main__':
starttime=time()
pool = Pool(cpu_count())
res= pool.map(single_loop,range(1,N+1))
pool.close()
print 'MP function. Result ',res,' in ', time()-starttime,' seconds'
print 'Starting two loops function'
starttime=time()
print 'Two loops onction. Result ',two_loops(N),' in ', time()-starttime,' seconds'
So basically the functions gives me the sum of all integers between 1 and N (so N(N+1)/2).
The two_loops function is the basic one, using two for loops. The single_loop is just created to simulate one loop (the j loop).
When I'm running this script, this works well but I don't get the right result. I get :
Starting loop function Single loop function. Result 12502500 in
0.380275964737 seconds
Starting multiprocessing function MP function. Result [1, 2, 3,... a
lot of values here ...,4999, 5000] in 0.683819055557 seconds
Starting two loops function Two loops onction. Result 12502500 in
0.4114818573 seconds
It looks like the script runs, but I can't manage to get the good result. I saw on a the web that the close() function was supposed to do that, but apparently not.
Do you know how I can do ?
Thanks a lot !
I don't understand your question but here's how it can be done:
from concurrent.futures.process import ProcessPoolExecutor
from timeit import Timer
def two_loops_multiprocessing():
with ProcessPoolExecutor() as executor:
executor.map(single_loop, range(N))
if __name__ == "__main__":
iterations, elapsed_time = Timer("two_loops(N)", globals=globals()).autorange()
print(elapsed_time / iterations)
iterations, elapsed_time = Timer("two_loops_multiprocessing()", globals=globals()).autorange()
print(elapsed_time / iterations)
What's happening is that your map function is chopping up your range you provide it and runs the single loop function for all these separate numbers. Look here to see what it does: https://docs.python.org/3.5/library/multiprocessing.html#multiprocessing.pool.Pool.map And since your single loop just adds 1 to tt for a range up to the value you get the value back. This effectively means you get your range() object back which is the answer you get.
In your other "single loop" you later add all the values you get together to get to a single value here:
for i in range(1,N+1):
tot+=single_loop(i)
But you forget to do this with you multiprocessing. What you should do is add a loop after you have called your map function to add them all together and you will get your expected answer.
Besides this your single loop function is basically a two loop function where you moved one loop to a function call. I'm not sure what you are trying to accomplish but there is not a big difference between the two.
Just sum the result list:
res = sum(pool.map(single_loop,range(1,N+1)))
You could avoid calculating the sum in the main thread by using some shared memory, but keep in mind that you will lose more time on synchronization. And again, there's no gain from multiprocessing in this case. It all depends on the specific case. If you needed to call single_loop fewer times and each call would take more time, then multiprocessing would speed up your code.
Related
I have a multi-nested for loop and I'd like to parallelize this as much as possible, in Python.
Suppose I have some arbitrary function, which accepts two arguments func(a,b) and I'd like to compute this function on all combinations of M and N.
What I've done so far is 'flatten' the indices as a dictionary
idx_map = {}
count = 0
for i in range(n):
for j in range(m):
idx_map[count] = (i,j)
count += 1
Now that my nested loop is flattened, I can use it like so:
arr = []
for idx in range(n*m):
i,j = idx_map[idx]
arr.append( func(M[i], N[j]) )
Can I use this with Python's built in multi-Processing to parallelize? Race conditions should not be an issue because I do not need to aggregate func calls; rather, I just want to arrive at some final array, which evaluates all func(a,b) combinations across M and N. (So Async behavior and complexity should not be relevant here.)
What's the best way to accomplish this effect?
I see from this related question but I don't understand what the author was trying to illustrate.
if 1: # multi-threaded
pool = mp.Pool(28) # try 2X num procs and inc/dec until cpu maxed
st = time.time()
for x in pool.imap_unordered(worker, range(data_Y)):
pass
print 'Multiprocess total time is %4.3f seconds' % (time.time()-st)
print
You can accomplish this yes, however the amount of work you are doing per function call needs to be quite substantial to overcome the overhead of the processes.
Vectorizing using something like numpy is typically easier, like Jérôme stated previously.
I have altered your code so that you may observe the speed up you get by using multiprocessing.
Feel free to change the largNum variable to see how as the amount of work increases per function call the scaling for multiprocessing gets better and how at low values multiprocessing is actually slower.
from concurrent.futures import ProcessPoolExecutor
import time
# Sums n**2 of a+b
def costlyFunc(theArgs):
a=theArgs[0]
b=theArgs[1]
topOfRange=(a+b)**2
sum=0
for i in range(topOfRange):
sum+=i
return sum
#changed to list
idx_map = []
largNum=200
# Your indicey flattening
for i in range(largNum):
for j in range(largNum):
idx_map.append((i,j))
I use the map function in the single core version to call costlyFunc on every element in the list. Python's concurrent.futures module also has a similar map function, however it distributes it over multiple processes.
if __name__ == "__main__":
# No multiprocessing
oneCoreTimer=time.time()
result=[x for x in map(costlyFunc,idx_map)]
oneCoreTime=time.time()-oneCoreTimer
print(oneCoreTime," seconds to complete the function without multiprocessing")
# Multiprocessing
mpTimer=time.time()
with ProcessPoolExecutor() as ex:
mpResult=[x for x in ex.map(costlyFunc,idx_map)]
mpTime=time.time()-mpTimer
print(mpTime," seconds to complete the function with multiprocessing")
print(f"Multiprocessing is {oneCoreTime/mpTime} times faster than using one core")
One of the generator advantage is that it uses less memory and consumes fewer resources. That is, we do not produce all the data at once and we do not allocate memory to all of them, and only a one value is generated each time. The state and status and values of the variables are stored, and in fact the code can be stopped and resumed by calling it to continue.
I wrote two codes and I am comparing them, I see that the generator can be written normally and now I do not see any points for the generator. Can anyone tell me what is the advantage of this generator in compare to when it be written normally? One value is generated with each iteration of both of them.
The first code:
def gen(n):
for i in range(n):
i = i ** 2
i += 1
yield i
g = gen(3)
for i in g:
print(i)
The second one:
def func(i):
i = i ** 2
i += 1
return i
for i in range(3):
print(func(i))
I know that the id of g is constant whereas the id of func(i) is changing.
Is that what the main generator advantage means?
To be specific about the above codes that you have mentioned in the question, there is no difference in terms of memory between the two approaches you have shown, but first one is more preferable because everything you need is inside the same generator function, whereas in the second case, the loop and the function are at two different places, and every time you need to use the second function, you need to use the loop outside which unnecessarily increases the redundancy.
Actually the two functions you have written, the generator one, and the normal function, they are not equivalent.
In the generator, you are returning all the values, i.e. the loop is inside the generator function:
def gen(n):
for i in range(n):
i = i ** 2
i += 1
yield i
But, in the second case, you are just returning one value, and the loop is outside the function:
def func(i):
i = i ** 2
i += 1
return i
In order to make the second function equivalent to the first one, you need to have the loop inside the function:
def func(n):
for i in range(n):
i = i ** 2
i += 1
return i
Now, of course the above function always return a single value for i=0 if control goes inside the loop, so to fix this, you need to return an entire sequence, which demands you to have a list or similar data structure that allows you to store multiple values:
def func(n):
result = []
for i in range(n):
i = i ** 2
i += 1
result.append(i)
return result
for v in func(3):
print(v)
1
2
5
Now, you can clearly differentiate the two cases, in the first one, each values are evaluated sequentially and processed later i.e. printed, but in the second case, you ended up having the entire result in memory before you can actually process it.
The main advantage is when you have a large dataset. It is basically the idea of lazy loading which means that a data is not called unless it is required. This saves your resources because typically in a list, the entire thing is loaded at once which might take up a lot of primary memory if the data is large enough.
The advantage of the first code is with respect to something you did not show. What is meant that generating and consuming one value at a time takes less memory than first generating all values, collecting them in a list, and then consuming them from the list.
The second code with which to compare the first code should have been:
def gen2(n):
result = []
for i in range(n):
i = i ** 2
i += 1
result.append(i)
return result
g = gen2(3)
for i in g:
print(i)
Note how the result of gen2 can be used exactly like the result of gen from your first example, but gen2 uses more memory if n is getting larger, whereas gen uses the same amount of memory no matter how large n is.
i am trying to compute the time for every loop iterations. However, i have noticed that the time required to process (anything) increases on each iteration in an incremental fashion. I am computing time by using following commands:
start_time = time.time()
loop:
(any process)
print (time.time() - start_time))
When you call the time.time method you are returning the amount of time in seconds based on the Unix clock system, basically the time local to the system.
You are assigning the time to start_time, you are then running your 10 processes and outputting the current time minus start_time, so you are essentially working out how long it takes you to run your 10 processes.
Now I believe that what you're trying to do is calculate how long each individual process takes, to do that you need to move around some of the lines in the sample code you supplied:
import time
for i in range(10):
start_time = time.time()
(any process)
print(time.time() - start_time))
By moving the assignment of time into the loop you will be assigning the time at which the loop starts and then outputting the individual time of each iteration rather than timing how long the entire loop takes as a whole.
This would output how long each iteration takes.
Please feel free to ask any questions!
Here's an example of how you could perform your timings with timeit.
import timeit
setup = "i = {}"
stmt = """
for x in range(i):
3 + 3
"""
[timeit.timeit(stmt=stmt, setup=setup.format(i), number=100) for i in range(10)]
Which gives you a list of the times of each loop:
[8.027901640161872e-05,
0.00011072197230532765,
0.00011189299402758479,
0.00012168602552264929,
0.00012224999954923987,
0.0001258430420421064,
0.00013012002455070615,
0.00013478699838742614,
0.000138589006382972,
0.0001438520266674459]
I am using Euler problems to test my understanding as I learn Python 3.x. After I cobble together a working solution to each problem, I find the posted solutions very illuminating and I can "absorb" new ideas after I have struggled myself. I am working on Euler 024 and I am trying a recursive approach. Now, in no ways do I believe my approach is the most efficient or most elegant, however, I successfully generate a full set of permutations, increasing in value (because I start with a sorted tuple) - which is one of the outputs I want. In addition, in order to find the millionth in the list (which is the other output I want, but can't yet get) I am trying to count how many there are each time I create a permutation and that's where I get stuck. In other words what I want to do is count the number of recursive calls each time I reach the base case, i.e. a completed permutation, not the total number of recursive calls. I have found on StackOverflow some very clear examples of counting number of executions of recursive calls but I am having no luck applying the idea to my code. Essentially my problems in my attempts so far are about "passing back" the count of the "completed" permutation using a return statement. I think I need to do that because the way my for loop creates the "stem" and "tail" tuples. At a high level, either I can't get the counter to increment (so it always comes out as "1" or "5") or the "nested return" just terminates the code after the first permutation is found, depending on where I place the return. Can anyone help insert the counting into my code?
First the "counting" code I found in SO that I am trying to use:
def recur(n, count=0):
if n == 0:
return "Finished count %s" % count
return recur(n-1, count+1)
print(recur(15))
Next is my permutation code with no counting in it. I have tried lots of approaches, but none of them work. So the following has no "counting" in it, just a comment at which point in the code I believe the counter needs to be incremented.
#
# euler 024 : Lexicographic permutations
#
import time
startTime= time.time()
#
def splitList(listStem,listTail):
for idx in range(0,len(listTail)):
tempStem =((listStem) + (listTail[idx],))
tempTail = ((listTail[:idx]) + (listTail[1+idx:]))
splitList(tempStem,tempTail)
if len(listTail) ==0:
#
# I want to increment counter only when I am here
#
print("listStem=",listStem,"listTail=",listTail)
#
inStem = ()
#inTail = ("0","1","2","3","4","5","6","7","8","9")
inTail = ("0","1","2","3")
testStem = ("0","1")
testTail = ("2","3","4","5")
splitList(inStem,inTail)
#
print('Code execution duration : ',time.time() - startTime,' seconds')
Thanks in advance,
Clive
Since it seems you've understood the basic problem but just want to understand how the recursion is happening, all you need to do is pass a variable that tells you at what point of the call stack you're in. You can add a 3rd argument to your function, and increment it with each recursive call:
def splitList(listStem, listTail, count):
for idx in range(0,len(listTail)):
...
splitList(tempStem, tempTail, count)
if len(listTail) == 0:
count[0] += 1
print('Count:', count)
...
Now, call this function like this (same as before):
splitList(inStem, inTail, [0])
Why don't you write generator for this?
Then you can just stop on nth item ("drop while i < n").
Mine solution is using itertools, but you can use your own permutations generator. Just yield next sequence member instead of printing it.
from itertools import permutations as perm, dropwhile as dw
print(''.join(dw(
lambda x: x[0]<1000000,
enumerate(perm('0123456789'),1)
).__next__()[1]))
I am a computer science student and some of the things I do require me to run huge loops on Macbook with dual core i5. Some the loops take 5-6 hours to complete but they only use 25% of my CPU. Is there a way to make this process faster? I cant change my loops but is there a way to make them run faster?
Thank you
Mac OS 10.11
Python 2.7 (I have to use 2.7) with IDLE or Spyder on Anaconda
Here is a sample code that takes 15 minutes:
def test_false_pos():
sumA = [0] * 1000
for test in range(1000):
counter = 0
bf = BloomFilter(4095,10)
for i in range(600):
bf.rand_inserts()
for x in range(10000):
randS = str(rnd.randint(0,10**8))
if bf.lookup(randS):
counter += 1
sumA[test] = counter/10000.0
avg = np.mean(sumA)
return avg
Sure thing: Python 2.7 has to generate huge lists and waste a lot of memory each time you use range(<a huge number>).
Try to use the xrange function instead. It doesn't create that gigantic list at once, it produces the members of a sequence lazily.
But if your were to use Python 3 (which is the modern version and the future of Python), you'll find out that there range is even cooler and faster than xrange in Python 2.
You could split it up into 4 loops:
import multiprocessing
def test_false_pos(times, i, q):
sumA = [0] * times
for test in range(times):
counter = 0
bf = BloomFilter(4095,10)
for i in range(600):
bf.rand_inserts()
for x in range(10000):
randS = str(rnd.randint(0,10**8))
if bf.lookup(randS):
counter += 1
sumA[test] = counter/10000.0
q.put([i, list(sumA)])
def full_test(pieces):
threads = []
q = multiprocessing.Queue()
steps = 1000 / pieces
for i in range(pieces):
threads.append(multiprocessing.Process(target=test_false_pos, args=(steps, i, q)))
[thread.start() for thread in threads]
results = [None] * pieces
for i in range(pieces):
i, result = q.get()
results[i] = result
# Flatten the array (`results` looks like this: [[...], [...], [...], [...]])
# source: https://stackoverflow.com/a/952952/5244995
sums = [value for result in results for val in result]
return np.mean(np.array(sums))
if __name__ == '__main__':
full_test(multiprocessing.cpu_count())
This will run n processes that each do 1/nth of the work, where n is the number of processors on your computer.
The test_false_pos function has been modified to take three parameters:
times is the number of times to run the loop.
i is passed through to the result.
q is a queue to add the results to.
The function loops times times, then places i and sumA into the queue for further processing.
The main thread (full_test) waits for each thread to complete, then places the results in the appropriate position in the results list. Once the list is complete, it is flattened, and the mean is calculated and returned.
Consider looking into Numba and Jit (just in time compiler). It works for functions that are Numpy based. It can handle some python routines, but is mainly for speeding up numerical calculations, especially ones with loops (like doing cholesky rank-1 up/downdates). I don't think it would work with a BloomFilter, but it is generally super helpful to know about.
In cases where you must use other packages in your flow with numpy, separate out the heavy-lifting numpy routines into their own functions, and throw a #jit decorator on top of that function. Then put them into your flows with normal python stuff.