Python Multiprocessing module unexpected outputs. What can be the cause of it? - python

I am learning the multiprocessing module of Python. I am on Python 3.8. This is my sample code:
# import stuff
def add(x, y):
time.sleep(10)
print(f'{x + y} \n')
def main():
start = time.perf_counter()
if __name__ == '__main__':
p1 = mp.Process(target=add, args=(100, 200))
p2 = mp.Process(target=add, args=(200, 300))
p1.start(); p2.start()
p1.join(); p2.join()
end = time.perf_counter()
print(f'{end - start} seconds \n')
main()
I am expecting outputs such as:
300
500
10.something seconds
But when I run it I am getting:
5.999999999062311e-07 seconds
5.00000000069889e-07 seconds
500
300
10.704853300000002 seconds
For some reason the end = time.perf_counter(); print(f'{end - start} seconds \n') part is getting executed once after each process is started and one more time after they both end. But here I am specifically writing p1.join(); p2.join() to tell the computer to wait until these processes are finished and then move on to the following line of code.
Why is it behaving like this? And what can I do to fix it?

This is happening because you are running on Windows, which does not support fork. On Linux, I see the output you expect. Because Windows can't fork, it has to re-import your entire module in each child process in order to run your worker function. Because you're not protecting the code that calculates/prints the runtime in the if __name__ == "__main__": guard, they are executed in both of your worker processes when they are launched, in addition to running in your main process once the workers finish. Move them (and any other code you only want to run in the parent process) into the guard to get the output you want:
# import stuff
def add(x, y):
time.sleep(10)
print(f'{x + y} \n')
def main():
p1 = mp.Process(target=add, args=(100, 200))
p2 = mp.Process(target=add, args=(200, 300))
p1.start(); p2.start()
p1.join(); p2.join()
if __name__ == '__main__':
main()

Related

Why multiprocessing.Process does not work here?

I am testing multiprocessing on jupyter notebook and spyder:
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print(f'Sleeping 5 second(s)...')
time.sleep(5)
print(f'Done Sleeping...')
p2 = multiprocessing.Process(target = do_something)
p3 = multiprocessing.Process(target = do_something)
p2.start()
p3.start()
p2.join()
p3.join()
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} secounds')
And I got:
Finished in 0.12 secounds
This is much shorter than 5 seconds.
I did test the do_something function and it seems fine. I feel like in above code, the do_someting function was not even executed...
start = time.perf_counter()
def do_something(seconds):
print(f'Sleeping {seconds} second(s)...')
time.sleep(seconds)
print(f'Done Sleeping...{seconds}')
do_something(5)
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} secounds')
Sleeping 5 second(s)...
Done Sleeping...5
Finished in 5.0 secounds
Your code should throw an error (I won't write the traceback to keep the answer short):
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Long story short: the multiprocessing package is unable to correctly understand and execute your code. You should keep the definitions at the beginning of the file, and put the code you want to execute inside the
if __name__ == '__main__':
Otherwise, each new process will try to execute the same file (and spawn other processes, as well). The corrected code takes about 5.22 seconds to complete on my pc.
The need for the "if" is explained in the programming guidelines (section "Safe importing of main module") of the multiprocessing package. Be sure to read them to avoid an unwanted behaviour: multithreading and multiprocessing are prone to elusive bugs when not used correctly.
Here is the corrected code:
import multiprocessing
import time
def do_something():
print('Sleeping 5 seconds...')
time.sleep(5)
print('Done Sleeping.')
if __name__ == '__main__':
start = time.perf_counter()
p2 = multiprocessing.Process(target=do_something, args=())
p3 = multiprocessing.Process(target=do_something, args=())
p2.start()
p3.start()
p2.join()
p3.join()
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} seconds')
Why do you see the output after 0.12 seconds? This happens because each child process throws its error and crashes (you should get two identical runtime errors), then the parent process is able to complete.

Starting multiple processes inside a function makes it loop for each process started

I'm writing a program and made a "pseudo" program which imitates same thing as the main one does. The main idea is that a program starts and it scans a game. First part detects if game started, then it open 2 processes. 1 that scans the game all the time and sends info to the second process, which analyzes the data and plots it. In short, its 2 infinite loops running simultaneously.
I'm trying to put it all into functions now so I can run it through tkinter and make a GUI for it.
The issue is, every time a process starts, it loops back on start of parent function, executes it again, then goes to start second process. What is the issue here? In this test model, one process sends value of X to second process which prints it out.
import multiprocessing
import time
from multiprocessing import Pipe
def function_start():
print("GAME DETECTED AND STARTED")
parent_conn, child_conn = Pipe()
p1 = multiprocessing.Process(target=function_first_process_loop, args=(child_conn,))
p2 = multiprocessing.Process(target=function_second_process_loop, args=(parent_conn,))
function_load(p1)
function_load(p2)
def function_load(process):
if __name__ == '__main__':
print("slept 1")
process.start()
def function_first_process_loop(conn):
x=0
print("FIRST PROCESS STARTED")
while True:
time.sleep(1)
x += 1
conn.send(x)
print(x)
def function_second_process_loop(conn):
print("SECOND PROCESS STARTED")
while True:
data = conn.recv()
print(data)
function_start()
I've also tried rearranging functions a bit on different ways. This is one of them:
import multiprocessing
import time
from multiprocessing import Pipe
def function_load():
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p1 = multiprocessing.Process(target=function_first_process_loop, args=(child_conn,))
p2 = multiprocessing.Process(target=function_second_process_loop, args=(parent_conn,))
p1.start()
p2.start()
#FIRST
def function_start():
print("GAME LOADED AND STARTED")
function_load()
def function_first_process_loop(conn):
x=0
print("FIRST PROCESS STARTED")
while True:
time.sleep(1)
x += 1
conn.send(x)
print(x)
def function_second_process_loop(conn):
print("SECOND PROCESS STARTED")
while True:
data = conn.recv()
print(data)
#
function_start()
You should always tag your question tagged with multiprocessing with platform you are running under, but I will infer that it is probably Windows or some other platform that uses the spawn method to launch new processes. That means when a new process is created, a new Python interpreter is launched an the program source is processed from the top and any code at global scope that is not protected by the check if __name__ == '__main__': will be executed, which means that each started process will be executing the statement function_start().
So, as #PranavHosangadi rightly pointed out you need the __name__ check in the correct place.
import multiprocessing
from multiprocessing import Pipe
import time
def function_start():
print("GAME DETECTED AND STARTED")
parent_conn, child_conn = Pipe()
p1 = multiprocessing.Process(target=function_first_process_loop, args=(child_conn,))
p2 = multiprocessing.Process(target=function_second_process_loop, args=(parent_conn,))
function_load(p1)
function_load(p2)
def function_load(process):
print("slept 1")
process.start()
def function_first_process_loop(conn):
x=0
print("FIRST PROCESS STARTED")
while True:
time.sleep(1)
x += 1
conn.send(x)
print(x)
def function_second_process_loop(conn):
print("SECOND PROCESS STARTED")
while True:
data = conn.recv()
print(data)
if __name__ == '__main__':
function_start()
Let's do an experiment: Before function_start(), add this line:
print(__name__, "calling function_start()")
Now, you get the following output:
__main__ calling function_start()
GAME DETECTED AND STARTED
slept 1
slept 1
__mp_main__ calling function_start()
GAME DETECTED AND STARTED
__mp_main__ calling function_start()
GAME DETECTED AND STARTED
FIRST PROCESS STARTED
SECOND PROCESS STARTED
1
1
2
2
...
Clearly, function_start() is called by the child process every time you start it. This is because python loads the entire script, and then calls the function you want from that script. The new processes have the name __mp_main__ to differentiate them from the main process, and you can make use of that to prevent the call to function_start() by these processes.
So instead of function_start(), call it this way:
if __name__ == "__main__":
print(__name__, "calling function_start()")
function_start()
and now you get what you wanted:
__main__ calling function_start()
GAME DETECTED AND STARTED
slept 1
slept 1
FIRST PROCESS STARTED
SECOND PROCESS STARTED
1
1
2
2
...

Multiprocessing in python doesnt print any statements

Multithreading is printing the output but not multiprocessing. Searched stack overflow and answered questions didnt solve the problem.
Multiprocessing is not working.
from threading import Thread
import datetime
from multiprocessing import Process
import sys
import time
def func1():
print('Working')
time.sleep(5)
global a
a=10
print(datetime.datetime.now())
def func2():
print("Working")
time.sleep(10)
print(datetime.datetime.now())
p1 = Process(target=func1)
p1.start()
p2 = Process(target=func2)
p2.start()
p1.join()
p2.join()
print(a)
Even the print(a) is not printing the value. It says
NameError: name 'a' is not defined
As I commented, plain variables, be they global or not, won't magically travel between multiprocessing Processes. (Well, actually, that's a bit of a simplification and depends on the OS and multiprocessing spawner you're using, but I digress.)
The simplest communication channel is a multiprocessing.Queue (that actually "magically" works between processes).
As discussed in further comments,
you can't use multiprocessing in an IDE that doesn't save your script before executing it, since it requires being able to spawn a copy of the script, and if there's no script on disk, there's nothing to spawn.
on a similar note, you can't use multiprocessing very well from Jupyter notebooks, since they're not run as regular Python scripts, but via the Python kernel process Jupyter starts.
Here's a simple adaptation of your code to actually pass data between the processes.
Remember to guard your multiprocessing main() with if __name__ == "__main__".
import datetime
import time
import multiprocessing
def func1(q: multiprocessing.Queue):
print("func1 thinking...")
time.sleep(2)
q.put(("func1", 10))
print("func1 quit at", datetime.datetime.now())
def func2(q: multiprocessing.Queue):
for x in range(10):
print("func2 working", x)
q.put(("func2", x))
time.sleep(0.3)
def main():
queue = multiprocessing.Queue()
p1 = multiprocessing.Process(target=func1, args=(queue,))
p2 = multiprocessing.Process(target=func2, args=(queue,))
p1.start()
p2.start()
p1.join()
p2.join()
print("Subprocesses ended, reading their results...")
while not queue.empty():
print(queue.get())
if __name__ == "__main__":
main()
The output is:
func1 thinking...
func2 working 0
func2 working 1
func2 working 2
func2 working 3
func2 working 4
func2 working 5
func2 working 6
func1 quit at 2021-06-16 17:58:46.542275
func2 working 7
func2 working 8
func2 working 9
2021-06-16 17:58:47.577008
Subprocesses ended, reading their results...
('func2', 0)
('func2', 1)
('func2', 2)
('func2', 3)
('func2', 4)
('func2', 5)
('func2', 6)
('func1', 10)
('func2', 7)
('func2', 8)
('func2', 9)

Python How to schedule the parallel scripts for every seconds ? like (cron job)

Hey everyone i have script that works parallel, i was using APScheduler for scheduling the tasks but it works synchron (BlockingScheduler,BackgroundScheduler) doesnt work on parallel processes. What would be your advices , how can i run the parallel processes for every second ? also im using multiprocesses for parallel
EDİT:I have just solved it, if anyone gets trouble like this issue, here the example
from multiprocessing import Process
from apscheduler.schedulers.background import BlockingScheduler
def work_log_cpu1():
print(" Proces work_log_cpu1")
list11=[]
for i in range(10000000):
list11.append(i*2)
print("Proces work_log_cpu1 finished")
def work_log_cpu2():
print("Proces work_log_cpu2")
list12=[]
for i in range(10000000):
list12.append(i*2)
print("Proces work_log_cpu2 finished")
def work_log_cpu3():
print(" Proces work_log_cpu3")
list13=[]
for i in range(10000000):
list13.append(i*2)
print("Proces work_log_cpu3 finished")
def main():
# sleeps=[3,5,2,7]
process=Process(target=work_log_cpu1)
process2=Process(target=work_log_cpu2)
process3=Process(target=work_log_cpu3)
process.start()
process2.start()
process3.start()
process.join()
process2.join()
process3.join()
if __name__ == '__main__':
# main()
sched.add_job(main, 'interval', seconds=1,id='first_job',max_instances=1)
sched.start()
What's wrong with multiprocessing?
import multiprocessing
p1 = multiprocessing.Process(target=func1, args=("var1", "var2",))
p2 = multiprocessing.Process(target=func2, args=("var3", "var4",))
p1.start()
p2.start()
p2.join()

How to wait for all multiprocessing.Processes to complete before continuing?

I am learning about Python multiprocessing and trying to understand how I can make my code wait for all processes to finish and then continue with the rest of the code. I thought join() method should do the job, but the output of my code is not what I expected from the using it.
Here is the code:
from multiprocessing import Process
import time
def fun():
print('starting fun')
time.sleep(2)
print('finishing fun')
def fun2():
print('starting fun2')
time.sleep(5)
print('finishing fun2')
def fun3():
print('starting fun3')
print('finishing fun3')
if __name__ == '__main__':
processes = []
print('starting main')
for i in [fun, fun2, fun3]:
p = Process(target=i)
p.start()
processes.append(p)
for p in processes:
p.join()
print('finishing main')
g=0
print("g",g)
I expected all processes under if __name__ == '__main__': to finish before the lines g=0 and print(g) are called, so something like this was expected:
starting main
starting fun2
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0
But the actual output indicates that there's something I don't understand about join() (or multiprocessing in general):
starting main
g 0
g 0
starting fun2
g 0
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0
The question is: How do I write the code that finishes all processes first and then continues with the code without multiprocessing, so that I get the former output? I run the code from command prompt on Windows, in case it matters.
On waiting the Process to finish:
You can just Process.join your list, something like
import multiprocessing
import time
def func1():
time.sleep(1)
print('func1')
def func2():
time.sleep(2)
print('func2')
def func3():
time.sleep(3)
print('func3')
def main():
processes = [
multiprocessing.Process(target=func1),
multiprocessing.Process(target=func2),
multiprocessing.Process(target=func3),
]
for p in processes:
p.start()
for p in processes:
p.join()
if __name__ == '__main__':
main()
But if you're thinking about giving your process more complexity, try using a Pool:
import multiprocessing
import time
def func1():
time.sleep(1)
print('func1')
def func2():
time.sleep(2)
print('func2')
def func3():
time.sleep(3)
print('func3')
def main():
result = []
with multiprocessing.Pool() as pool:
result.append(pool.apply_async(func1))
result.append(pool.apply_async(func2))
result.append(pool.apply_async(func3))
for r in result:
r.wait()
if __name__ == '__main__':
main()
More info on Pool
On why g0 prints multiple times:
This is happening because you're using spawn or forkserver to set your Process and the g0 and print declarations are outside a function or the __main__ if block.
From the docs:
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
(...)
This allows the newly spawned Python interpreter to safely import the module and then run the module’s foo() function.
Similar restrictions apply if a pool or manager is created in the main module.
It's basically interpreting again because it's importing your .py file as a module.

Categories

Resources