I am learning about the Python multiprocessing library and noticed this curious {to me} behavior. I am using Windows with Python 2.7 in Atom with the Atom Runner script execution package. Given this code:
import multiprocessing
import time
def f(name):
time.sleep(1)
print 'count: ', name
if __name__ == '__main__':
cnt = 0
print 'Sleep'
time.sleep(1)
while 1:
p1 = multiprocessing.Process(target=f, args=(cnt,))
p1.start()
cnt+=1
if cnt==2:
print 'break'
break
p1.join()
The output looks like this:
count: 0
count: 1
Sleep
break
The print commands seem to get locked out until the multiprocessing completes, even though they occur earlier in the code. Why is this? Running it from a command window produces the expected output (counts after break).
Kurt identified the problem. Calling sys.stdout.flush() before the .join() statement corrected the behavior. Output is now:
Pool
Sleep
break
count: 1
count: 0
Related
I'm writing a program and made a "pseudo" program which imitates same thing as the main one does. The main idea is that a program starts and it scans a game. First part detects if game started, then it open 2 processes. 1 that scans the game all the time and sends info to the second process, which analyzes the data and plots it. In short, its 2 infinite loops running simultaneously.
I'm trying to put it all into functions now so I can run it through tkinter and make a GUI for it.
The issue is, every time a process starts, it loops back on start of parent function, executes it again, then goes to start second process. What is the issue here? In this test model, one process sends value of X to second process which prints it out.
import multiprocessing
import time
from multiprocessing import Pipe
def function_start():
print("GAME DETECTED AND STARTED")
parent_conn, child_conn = Pipe()
p1 = multiprocessing.Process(target=function_first_process_loop, args=(child_conn,))
p2 = multiprocessing.Process(target=function_second_process_loop, args=(parent_conn,))
function_load(p1)
function_load(p2)
def function_load(process):
if __name__ == '__main__':
print("slept 1")
process.start()
def function_first_process_loop(conn):
x=0
print("FIRST PROCESS STARTED")
while True:
time.sleep(1)
x += 1
conn.send(x)
print(x)
def function_second_process_loop(conn):
print("SECOND PROCESS STARTED")
while True:
data = conn.recv()
print(data)
function_start()
I've also tried rearranging functions a bit on different ways. This is one of them:
import multiprocessing
import time
from multiprocessing import Pipe
def function_load():
if __name__ == '__main__':
parent_conn, child_conn = Pipe()
p1 = multiprocessing.Process(target=function_first_process_loop, args=(child_conn,))
p2 = multiprocessing.Process(target=function_second_process_loop, args=(parent_conn,))
p1.start()
p2.start()
#FIRST
def function_start():
print("GAME LOADED AND STARTED")
function_load()
def function_first_process_loop(conn):
x=0
print("FIRST PROCESS STARTED")
while True:
time.sleep(1)
x += 1
conn.send(x)
print(x)
def function_second_process_loop(conn):
print("SECOND PROCESS STARTED")
while True:
data = conn.recv()
print(data)
#
function_start()
You should always tag your question tagged with multiprocessing with platform you are running under, but I will infer that it is probably Windows or some other platform that uses the spawn method to launch new processes. That means when a new process is created, a new Python interpreter is launched an the program source is processed from the top and any code at global scope that is not protected by the check if __name__ == '__main__': will be executed, which means that each started process will be executing the statement function_start().
So, as #PranavHosangadi rightly pointed out you need the __name__ check in the correct place.
import multiprocessing
from multiprocessing import Pipe
import time
def function_start():
print("GAME DETECTED AND STARTED")
parent_conn, child_conn = Pipe()
p1 = multiprocessing.Process(target=function_first_process_loop, args=(child_conn,))
p2 = multiprocessing.Process(target=function_second_process_loop, args=(parent_conn,))
function_load(p1)
function_load(p2)
def function_load(process):
print("slept 1")
process.start()
def function_first_process_loop(conn):
x=0
print("FIRST PROCESS STARTED")
while True:
time.sleep(1)
x += 1
conn.send(x)
print(x)
def function_second_process_loop(conn):
print("SECOND PROCESS STARTED")
while True:
data = conn.recv()
print(data)
if __name__ == '__main__':
function_start()
Let's do an experiment: Before function_start(), add this line:
print(__name__, "calling function_start()")
Now, you get the following output:
__main__ calling function_start()
GAME DETECTED AND STARTED
slept 1
slept 1
__mp_main__ calling function_start()
GAME DETECTED AND STARTED
__mp_main__ calling function_start()
GAME DETECTED AND STARTED
FIRST PROCESS STARTED
SECOND PROCESS STARTED
1
1
2
2
...
Clearly, function_start() is called by the child process every time you start it. This is because python loads the entire script, and then calls the function you want from that script. The new processes have the name __mp_main__ to differentiate them from the main process, and you can make use of that to prevent the call to function_start() by these processes.
So instead of function_start(), call it this way:
if __name__ == "__main__":
print(__name__, "calling function_start()")
function_start()
and now you get what you wanted:
__main__ calling function_start()
GAME DETECTED AND STARTED
slept 1
slept 1
FIRST PROCESS STARTED
SECOND PROCESS STARTED
1
1
2
2
...
I am trying to run a simple multiprocessing task as shown below:
def main():
def do_something():
print('sleeping 1 second')
time.sleep(1)
print('Done sleeping')
p1 = multiprocessing.Process(target=do_something())
p2 = multiprocessing.Process(target=do_something())
p1.start()
p2.start()
if __name__ == '__main__':
main()
Here is the output:
sleeping 1 second
Done sleeping
sleeping 1 second
Done sleeping
Process finished with exit code 0
But I was expecting to output:
sleeping 1 second
sleeping 1 second
Done sleeping
Done sleeping
Process finished with exit code 0
I am using a windows machine using vscode. It seems that multiprocessing isn't doing its function entirely, do I have to enable multiprocessing support or is it something else?
Please help. Thanks!
A process is not a thread, it is doing a diferente task, the process blocks until done. Note that it is deferment from a thread pool too. Olso is good practice to wait until all thread are done with .join() an make sure that they exit correctly
import threading as th
import time
def do_something():
print('sleeping 1 second')
time.sleep(1)
print('Done sleeping')
def main():
th_1 = th.Thread(target=do_something)
th_2 = th.Thread(target=do_something)
th_1.start()
th_2.start()
th_1.join()
th_2.join()
if __name__ == '__main__':
main()
I am learning about Python multiprocessing and trying to understand how I can make my code wait for all processes to finish and then continue with the rest of the code. I thought join() method should do the job, but the output of my code is not what I expected from the using it.
Here is the code:
from multiprocessing import Process
import time
def fun():
print('starting fun')
time.sleep(2)
print('finishing fun')
def fun2():
print('starting fun2')
time.sleep(5)
print('finishing fun2')
def fun3():
print('starting fun3')
print('finishing fun3')
if __name__ == '__main__':
processes = []
print('starting main')
for i in [fun, fun2, fun3]:
p = Process(target=i)
p.start()
processes.append(p)
for p in processes:
p.join()
print('finishing main')
g=0
print("g",g)
I expected all processes under if __name__ == '__main__': to finish before the lines g=0 and print(g) are called, so something like this was expected:
starting main
starting fun2
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0
But the actual output indicates that there's something I don't understand about join() (or multiprocessing in general):
starting main
g 0
g 0
starting fun2
g 0
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0
The question is: How do I write the code that finishes all processes first and then continues with the code without multiprocessing, so that I get the former output? I run the code from command prompt on Windows, in case it matters.
On waiting the Process to finish:
You can just Process.join your list, something like
import multiprocessing
import time
def func1():
time.sleep(1)
print('func1')
def func2():
time.sleep(2)
print('func2')
def func3():
time.sleep(3)
print('func3')
def main():
processes = [
multiprocessing.Process(target=func1),
multiprocessing.Process(target=func2),
multiprocessing.Process(target=func3),
]
for p in processes:
p.start()
for p in processes:
p.join()
if __name__ == '__main__':
main()
But if you're thinking about giving your process more complexity, try using a Pool:
import multiprocessing
import time
def func1():
time.sleep(1)
print('func1')
def func2():
time.sleep(2)
print('func2')
def func3():
time.sleep(3)
print('func3')
def main():
result = []
with multiprocessing.Pool() as pool:
result.append(pool.apply_async(func1))
result.append(pool.apply_async(func2))
result.append(pool.apply_async(func3))
for r in result:
r.wait()
if __name__ == '__main__':
main()
More info on Pool
On why g0 prints multiple times:
This is happening because you're using spawn or forkserver to set your Process and the g0 and print declarations are outside a function or the __main__ if block.
From the docs:
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
(...)
This allows the newly spawned Python interpreter to safely import the module and then run the module’s foo() function.
Similar restrictions apply if a pool or manager is created in the main module.
It's basically interpreting again because it's importing your .py file as a module.
I am learning the multiprocessing module of Python. I am on Python 3.8. This is my sample code:
# import stuff
def add(x, y):
time.sleep(10)
print(f'{x + y} \n')
def main():
start = time.perf_counter()
if __name__ == '__main__':
p1 = mp.Process(target=add, args=(100, 200))
p2 = mp.Process(target=add, args=(200, 300))
p1.start(); p2.start()
p1.join(); p2.join()
end = time.perf_counter()
print(f'{end - start} seconds \n')
main()
I am expecting outputs such as:
300
500
10.something seconds
But when I run it I am getting:
5.999999999062311e-07 seconds
5.00000000069889e-07 seconds
500
300
10.704853300000002 seconds
For some reason the end = time.perf_counter(); print(f'{end - start} seconds \n') part is getting executed once after each process is started and one more time after they both end. But here I am specifically writing p1.join(); p2.join() to tell the computer to wait until these processes are finished and then move on to the following line of code.
Why is it behaving like this? And what can I do to fix it?
This is happening because you are running on Windows, which does not support fork. On Linux, I see the output you expect. Because Windows can't fork, it has to re-import your entire module in each child process in order to run your worker function. Because you're not protecting the code that calculates/prints the runtime in the if __name__ == "__main__": guard, they are executed in both of your worker processes when they are launched, in addition to running in your main process once the workers finish. Move them (and any other code you only want to run in the parent process) into the guard to get the output you want:
# import stuff
def add(x, y):
time.sleep(10)
print(f'{x + y} \n')
def main():
p1 = mp.Process(target=add, args=(100, 200))
p2 = mp.Process(target=add, args=(200, 300))
p1.start(); p2.start()
p1.join(); p2.join()
if __name__ == '__main__':
main()
first situation, main process can not finished,
from multiprocessing import Pool, Queue
queue = Queue()
def handle(slogan):
for i in xrange(100000):
queue.put(slogan)
print 'put done'
def main():
pools = Pool(2)
for i in xrange(4):
pools.apply_async(handle, args=('test', ))
print 'waiting all done...'
pools.close()
pools.join()
print 'all done...'
if __name__ == '__main__':
main()
the result of this code, like this:
waiting all done...
put done
put done
put done
put done
I have waited for over 1 hours. I can not understand. I thought multiprocessing module has some bug or something. So I change this code. This time I do not use Queue of multiprocessing, I just use it for computing some numbers. And code as follow:
from multiprocessing import Pool
def handle(slogan):
tmp = 0
for i in xrange(100000):
tmp += i
print 'put done'
def main():
pools = Pool(2)
for i in xrange(4):
pools.apply_async(handle, args=('test', ))
print 'waiting all done...'
pools.close()
pools.join()
print 'all done...'
if __name__ == '__main__':
main()
for the code, it finished successfully, result as:
waiting all done...
put done
put done
put done
put done
all done...
just because I use Queue? I do not know why. who can explain it for me?
You aren't capturing the result. You should capture the return values from apply_async() and call get() on each one of them.
Also, try specifying a large timeout value in join() or get(). In some versions of Python this is required to work around a bug.
See also: https://stackoverflow.com/a/3571687/4323