Python Multiprocessing: Execute code serially before and after parallel execution - python

Novice here: I am trying to execute some code serially and then create a pool of threads and execute some code in parallel. After the parallel execution is done, I want to execute some more code serially.
For example...
import time
from multiprocessing import Pool
print("I only want to print this statement once")
def worker(i):
"""worker function"""
now = time.time()
time.sleep(i)
then = time.time()
print(now, then)
if __name__ == '__main__':
with Pool(3) as p:
p.map(worker, [1, 1, 1])
p.close()
print("Only print this once as well")
I would like this to return...
I only want to print this statement once
1533511478.0619314 1533511479.0620182
1533511478.0789354 1533511479.0791905
1533511478.0979397 1533511479.098235
Only print this once as well
However what it returns is this:
I only want to print this statement once
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
1533511478.0619314 1533511479.0620182
1533511478.0789354 1533511479.0791905
1533511478.0979397 1533511479.098235
Only print this once as well
So it seems to be running the print statements an additional time for each pool.
Any help would be appreciated!

Based on the observed behaviour, I assume you are on a NT/Windows Operating System.
The reason you see all those prints is because on Windows the spawn start strategy is used. When a new process is "spawned", a new Python interpreter is launched and it receives the module and the function it's supposed to execute. When the new interpreter imports the module, the top level print functions are executed. Hence the duplicate prints.
Just move those print statement within the __main__ and you won't see them again.

Related

How to stop multiprocessing in python running for the full script

I have the following code in python:
import multiprocessing
import time
print "I want this to show once at the beggining"
def leaveout():
print "I dont Want This to Show"
def hang(input1):
print "I want this to show 3 times"
print "Number = %s" % input1
def main():
p = multiprocessing.Process(target=hang, args=("0"))
p.start()
p1 = multiprocessing.Process(target=hang, args=("1"))
p1.start()
p2 = multiprocessing.Process(target=hang, args=("2"))
p2.start()
p.join()
p1.join()
p2.join()
if __name__ == '__main__':
main()
print "I want this to show once at the end"
My objective is to multiprocesses the hang function in three instances which is happening successfully. My issue is that the main function also runs three instances of the entire script resuting in the following output:
c:\Evopt>python multiprocessingprac.py
I want this to show once at the beggining
I want this to show once at the beggining
I want this to show once at the end
I want this to show 3 times
Number = 2
I want this to show once at the beggining
I want this to show once at the end
I want this to show 3 times
Number = 1
I want this to show once at the beggining
I want this to show once at the end
I want this to show 3 times
Number = 0
I want this to show once at the end
How can I stop this happening?
When spawning a new process, Windows creates a blank process. A new Python interpreter is then loaded in the spawned process and it's given the same code base to interpret.
That's why you see duplicated print statements being executed. As they are top level expressions, they will be executed every time a process will evaluate that code.
In Unix OSes this is not observed because it implements a totally different process creation mechanism (the fork strategy) which does not require a new Python interpreter to be loaded again in the child process.
To fix your issue, you need to remove the print( ... ) expressions from the script and move them into the main function.
def main():
print("I want this to show once at the beggining")
p0 = multiprocessing.Process( ... )
p0.start()
...
p2.join()
print("I want this to show once at the end")
You can read more about process start strategies in the multiprocessing documentation.

Python Multiprocessing: How do I keep it within the specified function?

I've been a long-term observer of Stack Overflow but this time I just can't find a solution to my problem, so here I am asking you directly!
Consider this code:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
print "External"
It's the basic example code for multiprocessing with pools as found here in the first box, plus a print statement at the end.
When executing this in PyCharm Community on Windows 7, Python 2.7, the Pool part works fine, but "External" is printed multiple times, too. As a result, when I try to use Multithreading on a specific function in another program, all processes end up running the entire program. How do I prevent that, so only the given function is multiprocessed?
I tried using Process instead, closing, joining and/or terminating the process or pool, embedding the entire thing into a function, calling said function from a different file (it then starts executing that file). I can't find anything related to my problem and feel like I'm missing something very simple.
Since the print instruction is not indented, it is executed each time the python file is imported. Which is, every time a new process is created.
On the opposite, all the code placed beneath if __name__ == '__main__ will not be executed each time a process is created, but only from the main process, which is the only place where the instruction evaluates to true.
Try the following code, you should not see the issue again. You should expect to see External printed to the console only once.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
print "External"
Related: python multiprocessing on windows, if __name__ == "__main__"

Python multithreaded print statements delayed until all threads complete execution

I have a piece of code below that creates a few threads to perform a task, which works perfectly well on its own. However I'm struggling to understand why the print statements I call in my function do not execute until all threads complete and the print 'finished' statement is called. I would expect them to be called as the thread executes. Is there any simple way to accomplish this, and why does this work this way in the first place?
def func(param):
time.sleep(.25)
print param*2
if __name__ == '__main__':
print 'starting execution'
launchTime = time.clock()
params = range(10)
pool=multiprocessing.Pool(processes=100) #use N processes to download the data
_=pool.map(func,params)
print 'finished'
For python 3 you can now use the flush param like that:
print('Your text', flush=True)
This happens due to stdout buffering. You still can flush the buffers:
import sys
print 'starting'
sys.stdout.flush()
You can find more info on this issue here and here.
Having run into plenty of issues around this and garbled outputs (especially under Windows when adding colours to the output..), my solution has been to have an exclusive printing thread which consumes a queue
If this still doesn't work, also add flush=True to your print statement(s) as suggested by #Or Duan
Further, you may find the "most correct", but a heavy-handed approach to displaying messages with threading is to use the logging library which can wrap a queue (and write to many places asynchronously, including stdout) or write to a system-level queue (outside Python; availability depends greatly on OS support)
import threading
from queue import Queue
def display_worker(display_queue):
while True:
line = display_queue.get()
if line is None: # simple termination logic, other sentinels can be used
break
print(line, flush=True) # remove flush if slow or using Python2
def some_other_worker(display_queue, other_args):
# NOTE accepts queue reference as an argument, though it could be a global
display_queue.put("something which should be printed from this thread")
def main():
display_queue = Queue() # synchronizes console output
screen_printing_thread = threading.Thread(
target=display_worker,
args=(display_queue,),
)
screen_printing_thread.start()
### other logic ###
display_queue.put(None) # end screen_printing_thread
screen_printing_thread.stop()

Problems mixing threads/processes in python [duplicate]

I have a piece of code below that creates a few threads to perform a task, which works perfectly well on its own. However I'm struggling to understand why the print statements I call in my function do not execute until all threads complete and the print 'finished' statement is called. I would expect them to be called as the thread executes. Is there any simple way to accomplish this, and why does this work this way in the first place?
def func(param):
time.sleep(.25)
print param*2
if __name__ == '__main__':
print 'starting execution'
launchTime = time.clock()
params = range(10)
pool=multiprocessing.Pool(processes=100) #use N processes to download the data
_=pool.map(func,params)
print 'finished'
For python 3 you can now use the flush param like that:
print('Your text', flush=True)
This happens due to stdout buffering. You still can flush the buffers:
import sys
print 'starting'
sys.stdout.flush()
You can find more info on this issue here and here.
Having run into plenty of issues around this and garbled outputs (especially under Windows when adding colours to the output..), my solution has been to have an exclusive printing thread which consumes a queue
If this still doesn't work, also add flush=True to your print statement(s) as suggested by #Or Duan
Further, you may find the "most correct", but a heavy-handed approach to displaying messages with threading is to use the logging library which can wrap a queue (and write to many places asynchronously, including stdout) or write to a system-level queue (outside Python; availability depends greatly on OS support)
import threading
from queue import Queue
def display_worker(display_queue):
while True:
line = display_queue.get()
if line is None: # simple termination logic, other sentinels can be used
break
print(line, flush=True) # remove flush if slow or using Python2
def some_other_worker(display_queue, other_args):
# NOTE accepts queue reference as an argument, though it could be a global
display_queue.put("something which should be printed from this thread")
def main():
display_queue = Queue() # synchronizes console output
screen_printing_thread = threading.Thread(
target=display_worker,
args=(display_queue,),
)
screen_printing_thread.start()
### other logic ###
display_queue.put(None) # end screen_printing_thread
screen_printing_thread.stop()

Sandbox shell programs on time

I'm writing a grading robot for a Python programming class, and the students' submissions may loop infinitely. I want to sandbox a shell call to their program so that it can't run longer than a specific amount of time. I'd like to run, say,
restrict --msec=100 --default=-1 python -c "while True: pass"
and have it return -1 if the program runs longer than 100ms, and otherwise return the value of the executed expression (in this case, the output of the python program)
Does Python support this internally? I'm also writing the grading robot in Perl, so I could use some Perl module wrapped around the call to the shell script.
Use apply_async to call the student's function, (foo in the example below). Use the get method with a timeout to get the result if it returns in 0.1 seconds or less, otherwise get raise a TimeoutError:
import multiprocessing as mp
import time
import sys
def foo(x):
time.sleep(x)
return x*x
pool = mp.Pool(1)
for x in (0.01, 1.0):
try:
result = pool.apply_async(foo, args = (x,)).get(timeout = 0.1)
except KeyboardInterrupt:
pool.terminate()
sys.exit("Cancelled")
except mp.TimeoutError:
print('Timed out')
else:
print "Result: {r}".format(r = result)
Or, if the student submits a script instead of function, then you could use jcollado's Command class.
The standard approach is to do the following.
Create a subprocess which runs the student's program in it's own Python instance.
Wait for a time.
If the student subprocess exits, good.
If the subprocess has not exited, you need to kill it.
You'll be happiest downloading the psutil module which allows each status checking of the subprocess.

Categories

Resources