I have a simple example script that uses multiprocessing to execute a very simple function and return the runtimes for all parts of the process. The script is fully reproducible and looks as such:
import time
start_time = time.perf_counter()
import multiprocessing
print(f'Libraries loaded: {round(time.perf_counter()-start_time,2)} sec')
start_time = time.perf_counter()
def test():
print('Sleeping 1 sec')
time.sleep(1)
print('Done Sleeping')
print(f'Functions loaded: {round(time.perf_counter()-start_time,2)} sec')
start_time = time.perf_counter()
if __name__ == '__main__':
p1 = multiprocessing.Process(target=test)
p2 = multiprocessing.Process(target=test)
p1.start()
p2.start()
p1.join()
p2.join()
finish = time.perf_counter()
print(f'Multiprocessing finished in {round(finish - start_time, 2)} sec')
The output of the script looks as such:
Libraries loaded: 0.03 sec
Functions loaded: 0.0 sec
Libraries loaded: 0.0 sec
Functions loaded: 0.0 sec
Sleeping 1 sec
Libraries loaded: 0.0 sec
Functions loaded: 0.0 sec
Sleeping 1 sec
Done SleepingDone Sleeping
Multiprocessing finished in 1.12 sec
Process finished with exit code 0
As you can see, whilst the multiprocesses are running in parallel, they run the entire script each time as opposed to just executing the target function test. The script is therefore being run twice completely unnecessarily and I don't understand why.
Could someone explain this to me please?
Thanks
Do this:
if __name__ == "__main__":
<execute main thread here>
inside main thread you create the process. That new process won't call the whole script cause you only allow the main thread to call the whole process.
Related
Following is my simplified program from my main project. I'm using Semaphore to only allow exactly two processes execute test function at a time. If I'm not out of my mind, the program should only have 10 secs running time but instead I had 20 secs. How do I fix it to reduce my program running time down to 10 secs?
Note: tested with Sublime on Windows 10.
import time
from multiprocessing import Semaphore, Lock, Process
def test(sem):
sem.acquire()
time.sleep(5)
sem.release()
if __name__ == '__main__':
sem = Semaphore(2)
processes = []
for _ in range(4):
processes.append(Process(target=test, args=(sem,)))
start = time.perf_counter()
for process in processes:
process.start()
process.join()
end = time.perf_counter() - start
print(f'program finished in {end} secs')
Output
program finished in 20.836512662 secs
[Finished in 21.1s]
for process in processes:
process.start()
process.join()
You are starting each process, then immediately waiting for it to finish.
That's 4 processes each doing a 5 sec wait. Hence the 20 secs. There's no actual parallelism in your code.
What you want is to start off the processes all at the same time. Then wait for each to finish:
for process in processes:
process.start() # start each
for process in processes:
process.join() # wait for all to finish
Which results in:
program finished in 10.129543458 secs
I am testing multiprocessing on jupyter notebook and spyder:
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print(f'Sleeping 5 second(s)...')
time.sleep(5)
print(f'Done Sleeping...')
p2 = multiprocessing.Process(target = do_something)
p3 = multiprocessing.Process(target = do_something)
p2.start()
p3.start()
p2.join()
p3.join()
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} secounds')
And I got:
Finished in 0.12 secounds
This is much shorter than 5 seconds.
I did test the do_something function and it seems fine. I feel like in above code, the do_someting function was not even executed...
start = time.perf_counter()
def do_something(seconds):
print(f'Sleeping {seconds} second(s)...')
time.sleep(seconds)
print(f'Done Sleeping...{seconds}')
do_something(5)
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} secounds')
Sleeping 5 second(s)...
Done Sleeping...5
Finished in 5.0 secounds
Your code should throw an error (I won't write the traceback to keep the answer short):
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Long story short: the multiprocessing package is unable to correctly understand and execute your code. You should keep the definitions at the beginning of the file, and put the code you want to execute inside the
if __name__ == '__main__':
Otherwise, each new process will try to execute the same file (and spawn other processes, as well). The corrected code takes about 5.22 seconds to complete on my pc.
The need for the "if" is explained in the programming guidelines (section "Safe importing of main module") of the multiprocessing package. Be sure to read them to avoid an unwanted behaviour: multithreading and multiprocessing are prone to elusive bugs when not used correctly.
Here is the corrected code:
import multiprocessing
import time
def do_something():
print('Sleeping 5 seconds...')
time.sleep(5)
print('Done Sleeping.')
if __name__ == '__main__':
start = time.perf_counter()
p2 = multiprocessing.Process(target=do_something, args=())
p3 = multiprocessing.Process(target=do_something, args=())
p2.start()
p3.start()
p2.join()
p3.join()
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} seconds')
Why do you see the output after 0.12 seconds? This happens because each child process throws its error and crashes (you should get two identical runtime errors), then the parent process is able to complete.
I am trying to run a simple multiprocessing task as shown below:
def main():
def do_something():
print('sleeping 1 second')
time.sleep(1)
print('Done sleeping')
p1 = multiprocessing.Process(target=do_something())
p2 = multiprocessing.Process(target=do_something())
p1.start()
p2.start()
if __name__ == '__main__':
main()
Here is the output:
sleeping 1 second
Done sleeping
sleeping 1 second
Done sleeping
Process finished with exit code 0
But I was expecting to output:
sleeping 1 second
sleeping 1 second
Done sleeping
Done sleeping
Process finished with exit code 0
I am using a windows machine using vscode. It seems that multiprocessing isn't doing its function entirely, do I have to enable multiprocessing support or is it something else?
Please help. Thanks!
A process is not a thread, it is doing a diferente task, the process blocks until done. Note that it is deferment from a thread pool too. Olso is good practice to wait until all thread are done with .join() an make sure that they exit correctly
import threading as th
import time
def do_something():
print('sleeping 1 second')
time.sleep(1)
print('Done sleeping')
def main():
th_1 = th.Thread(target=do_something)
th_2 = th.Thread(target=do_something)
th_1.start()
th_2.start()
th_1.join()
th_2.join()
if __name__ == '__main__':
main()
Hey everyone i have script that works parallel, i was using APScheduler for scheduling the tasks but it works synchron (BlockingScheduler,BackgroundScheduler) doesnt work on parallel processes. What would be your advices , how can i run the parallel processes for every second ? also im using multiprocesses for parallel
EDİT:I have just solved it, if anyone gets trouble like this issue, here the example
from multiprocessing import Process
from apscheduler.schedulers.background import BlockingScheduler
def work_log_cpu1():
print(" Proces work_log_cpu1")
list11=[]
for i in range(10000000):
list11.append(i*2)
print("Proces work_log_cpu1 finished")
def work_log_cpu2():
print("Proces work_log_cpu2")
list12=[]
for i in range(10000000):
list12.append(i*2)
print("Proces work_log_cpu2 finished")
def work_log_cpu3():
print(" Proces work_log_cpu3")
list13=[]
for i in range(10000000):
list13.append(i*2)
print("Proces work_log_cpu3 finished")
def main():
# sleeps=[3,5,2,7]
process=Process(target=work_log_cpu1)
process2=Process(target=work_log_cpu2)
process3=Process(target=work_log_cpu3)
process.start()
process2.start()
process3.start()
process.join()
process2.join()
process3.join()
if __name__ == '__main__':
# main()
sched.add_job(main, 'interval', seconds=1,id='first_job',max_instances=1)
sched.start()
What's wrong with multiprocessing?
import multiprocessing
p1 = multiprocessing.Process(target=func1, args=("var1", "var2",))
p2 = multiprocessing.Process(target=func2, args=("var3", "var4",))
p1.start()
p2.start()
p2.join()
I am learning the multiprocessing module of Python. I am on Python 3.8. This is my sample code:
# import stuff
def add(x, y):
time.sleep(10)
print(f'{x + y} \n')
def main():
start = time.perf_counter()
if __name__ == '__main__':
p1 = mp.Process(target=add, args=(100, 200))
p2 = mp.Process(target=add, args=(200, 300))
p1.start(); p2.start()
p1.join(); p2.join()
end = time.perf_counter()
print(f'{end - start} seconds \n')
main()
I am expecting outputs such as:
300
500
10.something seconds
But when I run it I am getting:
5.999999999062311e-07 seconds
5.00000000069889e-07 seconds
500
300
10.704853300000002 seconds
For some reason the end = time.perf_counter(); print(f'{end - start} seconds \n') part is getting executed once after each process is started and one more time after they both end. But here I am specifically writing p1.join(); p2.join() to tell the computer to wait until these processes are finished and then move on to the following line of code.
Why is it behaving like this? And what can I do to fix it?
This is happening because you are running on Windows, which does not support fork. On Linux, I see the output you expect. Because Windows can't fork, it has to re-import your entire module in each child process in order to run your worker function. Because you're not protecting the code that calculates/prints the runtime in the if __name__ == "__main__": guard, they are executed in both of your worker processes when they are launched, in addition to running in your main process once the workers finish. Move them (and any other code you only want to run in the parent process) into the guard to get the output you want:
# import stuff
def add(x, y):
time.sleep(10)
print(f'{x + y} \n')
def main():
p1 = mp.Process(target=add, args=(100, 200))
p2 = mp.Process(target=add, args=(200, 300))
p1.start(); p2.start()
p1.join(); p2.join()
if __name__ == '__main__':
main()