Multiprocessing Running Slower than a Single Process

Multiprocessing Running Slower than a Single Process - python

I'm attempting to use multiprocessing to run many simulations across multiple processes; however, the code I have written only uses 1 of the processes as far as I can tell.
Updated
I've gotten all the processes to work (I think) thanks to #PaulBecotte ; however, the multiprocessing seems to run significantly slower than its non-multiprocessing counterpart.
For instance, not including the function and class declarations/implementations and imports, I have:
def monty_hall_sim(num_trial, player_type='AlwaysSwitchPlayer'):
if player_type == 'NeverSwitchPlayer':
player = NeverSwitchPlayer('Never Switch Player')
else:
player = AlwaysSwitchPlayer('Always Switch Player')
return (MontyHallGame().play_game(player) for trial in xrange(num_trial))
def do_work(in_queue, out_queue):
while True:
try:
f, args = in_queue.get()
ret = f(*args)
for result in ret:
out_queue.put(result)
except:
break
def main():
logging.getLogger().setLevel(logging.ERROR)
always_switch_input_queue = multiprocessing.Queue()
always_switch_output_queue = multiprocessing.Queue()
total_sims = 20
num_processes = 5
process_sims = total_sims/num_processes
with Timer(timer_name='Always Switch Timer'):
for i in xrange(num_processes):
always_switch_input_queue.put((monty_hall_sim, (process_sims, 'AlwaysSwitchPlayer')))
procs = [multiprocessing.Process(target=do_work, args=(always_switch_input_queue, always_switch_output_queue)) for i in range(num_processes)]
for proc in procs:
proc.start()
always_switch_res = []
while len(always_switch_res) != total_sims:
always_switch_res.append(always_switch_output_queue.get())
always_switch_success = float(always_switch_res.count(True))/float(len(always_switch_res))
print '\tLength of Always Switch Result List: {alw_sw_len}'.format(alw_sw_len=len(always_switch_res))
print '\tThe success average of switching doors was: {alw_sw_prob}'.format(alw_sw_prob=always_switch_success)
which yields:
Time Elapsed: 1.32399988174 seconds
Length: 20
The success average: 0.6
However, I am attempting to use this for total_sims = 10,000,000 over num_processes = 5, and doing so has taken significantly longer than using 1 process (1 process returned in ~3 minutes). The non-multiprocessing counterpart I'm comparing it to is:
def main():
logging.getLogger().setLevel(logging.ERROR)
with Timer(timer_name='Always Switch Monty Hall Timer'):
always_switch_res = [MontyHallGame().play_game(AlwaysSwitchPlayer('Monty Hall')) for x in xrange(10000000)]
always_switch_success = float(always_switch_res.count(True))/float(len(always_switch_res))
print '\n\tThe success average of not switching doors was: {not_switching}' \
'\n\tThe success average of switching doors was: {switching}'.format(not_switching=never_switch_success,
switching=always_switch_success)

You could try import “process “ under some if statements

EDIT- you changed some stuff, let me try and explain a bit better.
Each message you put into the input queue will cause the monty_hall_sim function to get called and send num_trial messages to the output queue.
So your original implementation was right- to get 20 output messages, send in 5 input messages.
However, your function is slightly wrong.
for trial in xrange(num_trial):
res = MontyHallGame().play_game(player)
yield res
This will turn the function into a generator that will provide a new value on each next() call- great! The problem is here
while True:
try:
f, args = in_queue.get(timeout=1)
ret = f(*args)
out_queue.put(ret.next())
except:
break
Here, on each pass through the loop you create a NEW generator with a NEW message. The old one is thrown away. So here, each input message only adds a single output message to the queue before you throw it away and get another one. The correct way to write this is-
while True:
try:
f, args = in_queue.get(timeout=1)
ret = f(*args)
for result in ret:
out_queue.put(ret.next())
except:
break
Doing it this way will continue to yield output messages from the generator until it finishes (after yielding 4 messages in this case)

I was able to get my code to run significantly faster by changing monty_hall_sim's return to a list comprehension, having do_work add the lists to the output queue, and then extend the results list of main with the lists returned by the output queue. Made it run in ~13 seconds.

Related

Allow a function (section of a code) to run only for 30 seconds, if not finished and returned the value, stop it that re-run it with different input

In my code, I have a function that has an integer as an input. This input affects heavily the running time of this function.
This is the code line where I call the function, with the input value of 0.35.
frequent_itemsets = get_frequent_items(0.35)
The get_frequent_items function returns a DataFrame, and next in the code I using this DataFrame for other computations, so I need this method to return the DataFrame (here called frequent_itemsets) to be able to continue the code.
Knowing that the input value of the integer of the function (here 0.35 ) in the example, affects the running time heavily, (for example if it is 0.35 the functions takes 28 seconds to return , and if it is 0.3, the function will take 2 Hours to return).
I am thinking of limiting the input options values for the function to the options
var_support_options = [0.18, 0.2, 0.25, 0.3, 0.35]
Now, my questions is, is there a way to write the code in a way that it try the function using these input values (provided in var_support_options) list, starting from the lowest value to the biggest.
EXAMPLE OF DESIRED process:
iteration 1 : frequent_itemsets = get_frequent_items(0.18)
if this iteration takes more than 30 seconds, stop the iteration and try the next value in the input list (in the example 0.2).
else if this takes less than 30 seconds, return the frequent_itemsets dataframe and continue the code.
I want the function to be done in less than 30 seconds using the least input integer value and then return the result and continue to the next lines of code.
Should I do that using multithreading, multiprocessing or other ? And how the code should be.

You can use multiprocessing to run code because it has method to kill/terminate it. But it needs queue to send result back to main process (because processes don't share memory and they can't use global variable)
And main process would run loop which periodically check if there is result in queue and if it time to kill/terminale other process.
One problem is that in multiprocessing processes don't share memory so main process has to send data to processes and it uses file created with pickle - so for big file it may need extra time.
Minimal working example.
It is similar to examples in answers suggested by
#matszwecja
How to limit execution time of a function call?
import multiprocessing
import time
def get_frequent_items(queue, value):
# simulater work with differen time
time.sleep(10-(value*10))
# send result
queue.put(value*2)
def run(value, timeout=30):
# qeueu to get result
q = multiprocessing.Queue()
# start process
p = multiprocessing.Process(target=get_frequent_items, args=(q, value))
p.start()
start = time.time()
while True:
time.sleep(0.1) # reduce CPU consumption
end = time.time()
print(f'time: {end-start:.1f}', end='\r')
if q.empty(): # check if there is result in queue
if end-start > timeout: # check if it is time to kill process
p.terminate()
return None # return None when there is no result
else:
return q.get() # return result
# ---- main ---
if __name__ == '__main__':
for var_support_options in [0.18, 0.2, 0.25, 0.3, 0.35]:
result = run(var_support_options, timeout=7)
print('result:', result, 'for', var_support_options)
# exit loop when you get first result
if result:
break
# --- after loop ---
if result:
print('final result:', result, 'for', var_support_options)
else:
print('no result')

Factorial calculation with threading takes too long to finish, How to fix it?

I am trying to calculate whether a given number is prime or not with the formula :
(n-1)! mod n =? (n-1)
I must calculate the factorial with different threads and make them work simultaneously and control if they're all finished and if so then join them. By doing so I will be calculating factorial with different threads and then be able to take the modulo. However even though my code works fine with the small prime numbers it is taking too long to execute when the number is too big. I searched my code and couldn't really find alternative that can slow down the execution time. Here is my code :
import threading
import time
# GLOBAL VARIABLE
result = 1
# worker class in order to multiply on threads
class Worker:
# initiating the worker class
def __init__(self):
super().__init__()
self.jobs = []
# the function that does the actual multiplying
def multiplier(self,beg,end):
global result
for i in range(beg,end+1):
result*= i
#print("\tresult updated with *{}:".format(i),result)
#print("Calculating from {} to {}".format(beg,end)," : ",result)
# appending threads to the object
def append_job(self,job):
self.jobs.append(job)
# function that is to see the threads
def see_jobs(self):
return self.jobs
# initiating the threads
def initiate(self):
for j in self.jobs:
j.start()
# finalizing and joining the threads
def finalize(self):
for j in self.jobs:
j.join()
# controlling the threads by blocking them untill all threads are asleep
def work(self):
while True:
if 0 == len([t for t in self.jobs if t.is_alive()]):
self.finalize()
break
# this is the function to split the factorial into several threads
def splitUp(n,t):
# defining the remainder and the whole
remainder, whole = (n-1) % t, (n-1) // t
# deciding to tuple count
tuple_count = whole if remainder == 0 else whole + 1
# empty result list
result = []
# iterating
beginning = 1
end = (n-1) // t
for i in range(1,tuple_count+1):
if i == tuple_count:
result.append((beginning,n-1)) # if we are at the end, just append all to end
else:
result.append((beginning,end*i))
beginning = end*i + 1
return result
if __name__ == "__main__":
threads = 64
number = 743
splitted = splitUp(number,threads)
worker = Worker()
#print(worker.see_jobs())
s = time.time()
# creating the threads
for arg in splitted:
thread = threading.Thread(target=worker.multiplier(arg[0],arg[1]))
worker.append_job(thread)
worker.initiate()
worker.work()
e = time.time()
print("result found with {} threads in {} secs\n".format(threads,e-s))
if result % number == number-1:
print("PRIME")
else:
print("NOT PRIME")
"""
-------------------- REPORT ------------------------
result found with 2 threads in 6.162530899047852 secs
result found with 4 threads in 0.29897499084472656 secs
result found with 16 threads in 0.009003162384033203 secs
result found with 32 threads in 0.0060007572174072266 secs
result found with 64 threads in 0.0029952526092529297 secs
note that: these results may differ from machine to machine
-------------------------------------------------------
"""
Thanks in advance.

First and foremost, you have a critical error in your code that you haven't reported or tried to trace:
======================== RESTART: ========================
result found with 2 threads in 5.800899267196655 secs
NOT PRIME
>>>
======================== RESTART: ========================
result found with 64 threads in 0.002002716064453125 secs
PRIME
>>>
As the old saying goes, "if the program doesn't work, it doesn't matter how fast it is".
The only test case you've given is 743; if you want help to diagnose the logic error, please determine the minimal test and parallelism that causes the error, and post a separate question.
I suspect that it's with your multplier function, as you're working with an ill-advised global variable in parallel processing, and your multiply operation is not thread-safe.
In assembly terms, you have an unguarded region:
LOAD result
MUL i
STORE result
If this is interleaved with the same work from another process, the result is virtually guaranteed to be wrong. You have to make this a critical region.
Once you fix that, you still have your speed problem. Factorial is the poster-child for recursion acceleration. I see two obvious accelerants:
Instead of that horridly slow multiplication loop, use functools.reduce to blast through your multiplication series.
If you're going to loop the program with a series of inputs, then short-cut most of the calculations with memoization. The example on the linked page benefits greatly from multiple-recursion; since factorial is linear, you'd need repeated application to take advantage of the technique.

How to correctly handle exceptions in multiprocessing

It is possible to retrieve the outputs of workers with Pool.map, but when one worker fails, an exception is raised and it's not possible to retrieve the outputs anymore. So, my idea was to log the outputs in a process-synchronized queue so as to retrieve the outputs of all successful workers.
The following snippet seems to work:
from multiprocessing import Pool, Manager
from functools import partial
def f(x, queue):
if x == 4:
raise Exception("Error")
queue.put_nowait(x)
if __name__ == '__main__':
queue = Manager().Queue()
pool = Pool(2)
try:
pool.map(partial(f, queue=queue), range(6))
pool.close()
pool.join()
except:
print("An error occurred")
while not queue.empty():
print("Output => " + str(queue.get()))
But I was wondering whether a race condition could occur during the queue polling phase. I'm not sure whether the queue process will necessarily be alive when all workers have completed. Do you think my code is correct from that point of view?

As far as "how to correctly handle exceptions", which is your main question:
First, in your case, you will never get to execute pool.close and pool.join. But pool.map will not return until all the submitted tasks have returned their results or generated an exception, so you really don't need to call these to be sure that all of your submitted tasks have been completed. If it weren't for worker function f writing the results to a queue, you would never be able to get any results back using map as long as long as any of your tasks resulted in an exception. You would instead have to apply_async individual tasks and get AsyncResult instances for each one.
So I would say that a better way of handling exceptions in you worker functions without having to resort to using a queue would be as follows. But note that when you use apply_async, tasks are being submitted one task at a time, which can result in many shared memory accesses. This becomes a performance issue really only when the number of tasks being submitted is very large. In this case, it would be better for worker functions to handle the exceptions themselves and somehow pass back the error indication to allow the use of map or imap, where you could specify a chunksize.
When using a queue, be aware that writing to a managed queue has fair bit of overhead. The second piece of code shows how you can reduce that overhead a bit by using a multiprocessing.Queue instance, which does not use a proxy unlike the managed queue. Note the output order, which is not the order in which the tasks were submitted but rather the order in which tasks completed -- another potential downside or upside to using a queue (you can use a callback function with apply_async if you want the results in the order completed). Even with your original code you should not depend on the order of results in the queue.
from multiprocessing import Pool
def f(x):
if x == 4:
raise Exception("Error")
return x
if __name__ == '__main__':
pool = Pool(2)
results = [pool.apply_async(f, args=(x,)) for x in range(6)]
for x, result in enumerate(results): # result is AsyncResult instance:
try:
return_value = result.get()
except:
print(f'An error occurred for x = {x}')
else:
print(f'For x = {x} the return value is {return_value}')
Prints:
For x = 0 the return value is 0
For x = 1 the return value is 1
For x = 2 the return value is 2
For x = 3 the return value is 3
An error occurred for x = 4
For x = 5 the return value is 5
OP's Original Code Modified to Use multiprocessing.Queue
from multiprocessing import Pool, Queue
def init_pool(q):
global queue
queue = q
def f(x):
if x == 4:
raise Exception("Error")
queue.put_nowait(x)
if __name__ == '__main__':
queue = Queue()
pool = Pool(2, initializer=init_pool, initargs=(queue,))
try:
pool.map(f, range(6))
except:
print("An error occurred")
while not queue.empty():
print("Output => " + str(queue.get()))
Prints:
An error occurred
Output => 0
Output => 2
Output => 3
Output => 1
Output => 5

Multiprocessing script gets stuck

I have the following Python code:
def workPackage(args):
try:
outputdata = dict()
iterator = 1
for name in outputnames:
outputdata[name] = []
for filename in filelist:
read_data = np.genfromtxt(filename, comments="#", unpack=True, names=datacolnames, delimiter=";")
mean_va1 = np.mean(read_data["val1"])
mean_va2 = np.mean(read_data["val2"])
outputdata[outputnames[0]].append(read_data["setpoint"][0])
outputdata[outputnames[1]].append(mean_val1)
outputdata[outputnames[2]].append(mean_val2)
outputdata[outputnames[3]].append(mean_val1-mean_val2)
outputdata[outputnames[4]].append((mean_val1-mean_val2)/read_data["setpoint"][0]*100)
outputdata[outputnames[5]].append(2*np.std(read_data["val1"]))
outputdata[outputnames[6]].append(2*np.std(read_data["val2"]))
print("Process "+str(identifier+1)+": "+str(round(100*(iterator/len(filelist)),1))+"% complete")
iterator = iterator+1
queue.put (outputdata)
except:
some message
if __name__ == '__main__':
"Main script"
This code is used to evaluate a large amount of measurement data. In total I got some 900 files across multiple directories (about 13GB in total).
The main script determines all the filepaths and stores them in 4 chunks. Each chunk (list of filepaths) is given to one process.
try:
print("Distributing the workload on "+str(numberOfProcesses)+" processes...")
for i in range(0,numberOfProcesses):
q[i] = multiprocessing.Queue()
Processes[i] = multiprocessing.Process(target=workPackage, args=(filelistChunks[i], colnames, outputdatanames, i, q[i]))
Processes[i].start()
for i in range(0,numberOfProcesses):
Processes[i].join()
except:
print("Exception while processing stuff...")
After that the restuls are read from the queue and stored to an output file.
Now here's my problem:
The script starts the 4 processes and each of them runs to 100% (see the print in the workPackage function). They don't finish at the same time but within about 2 minutes.
But then the script simply stops.
If I limit the amount of data to process by simply cutting the filelist it sometimes runs until the end but sometimes doesn't.
I don't get, why the script simply gets stuck after all processes reach 100%.
I seriously don't know what's happening there.

You add items to the queue with queue.put(), then call queue.join(), but I don't see where you call queue.get() or queue.task_done(). Join won't release the thread until the queue is empty and task_done() has been called on each item.

python parallel calculations in a while loop

I have been looking around for some time, but haven't had luck finding an example that could solve my problem. I have added an example from my code. As one can notice this is slow and the 2 functions could be done separately.
My aim is to print every second the latest parameter values. At the same time the slow processes can be calculated in the background. The latest value is shown and when any process is ready the value is updated.
Can anybody recommend a better way to do it? An example would be really helpful.
Thanks a lot.
import time
def ProcessA(parA):
# imitate slow process
time.sleep(5)
parA += 2
return parA
def ProcessB(parB):
# imitate slow process
time.sleep(10)
parB += 5
return parB
# start from here
i, parA, parB = 1, 0, 0
while True: # endless loop
print(i)
print(parA)
print(parB)
time.sleep(1)
i += 1
# update parameter A
parA = ProcessA(parA)
# update parameter B
parB = ProcessB(parB)

I imagine this should do it for you. This has the benefit of you being able to add extra parallel funcitons up to a total equal to the number of cores you have. Edits are welcome.
#import time module
import time
#import the appropriate multiprocessing functions
from multiprocessing import Pool
#define your functions
#whatever your slow function is
def slowFunction(x):
return someFunction(x)
#printingFunction
def printingFunction(new,current,timeDelay):
while new == current:
print current
time.sleep(timeDelay)
#set the initial value that will be printed.
#Depending on your function this may take some time.
CurrentValue = slowFunction(someTemporallyDynamicVairable)
#establish your pool
pool = Pool()
while True: #endless loop
#an asynchronous function, this will continue
# to run in the background while your printing operates.
NewValue = pool.apply_async(slowFunction(someTemporallyDynamicVairable))
pool.apply(printingFunction(NewValue,CurrentValue,1))
CurrentValue = NewValue
#close your pool
pool.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing Running Slower than a Single Process - python

You could try import “process “ under some if statements

I was able to get my code to run significantly faster by changing monty_hall_sim's return to a list comprehension, having do_work add the lists to the output queue, and then extend the results list of main with the lists returned by the output queue. Made it run in ~13 seconds.

Related

Allow a function (section of a code) to run only for 30 seconds, if not finished and returned the value, stop it that re-run it with different input

Factorial calculation with threading takes too long to finish, How to fix it?

How to correctly handle exceptions in multiprocessing

Multiprocessing script gets stuck

python parallel calculations in a while loop

Categories

Resources