Queue timeout when I execute unrelated code - python

I'm trying to use "exec" to check some external code snippets for correctness and I wanted to trap infinite loops by spawning a process, waiting for a short period of time, then checking the local variables. I managed to shrink the code to this example:
import multiprocessing
def fHelper(queue, codeIn, globalsParamIn, localsParamIn):
exec(codeIn, globalsParamIn, localsParamIn) # Execute code string with limited builtins
queue.put(localsParamIn['spam'])
def f(codeIn):
globalsParam = {"float" : float, "int" : int, "len" : len}
spam = False
localsParam = {'spam': spam}
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=fHelper, args=(queue, codeIn, globalsParam, localsParam))
p.start()
p.join(3) # Wait for 3 seconds or until process finishes
if p.is_alive(): # Just in case p hangs
p.terminate()
p.join()
return queue.get(timeout=3)
fOut = f("spam=True")
print(fOut)
# assert fOut
Now the code as-is executes fine, but if you uncomment the last line (or use almost anything else - print(fOut.copy()) will do it) the queue times out. I'm using Python 3.8.2 on Windows.
I would welcome any suggestions on how to fix the bug, or better yet understand what on earth is going on.
Thanks!

Related

Python multiprocessing queue not showing results

I am trying to run a separate Python Process and store the result in the queue. I can extract the result in two ways: either run queue.get() just once or use a while loop and iterate over queue until it`s empty.
In the code below first method is used if first=True and second method is used if first=False.
from multiprocessing import Process, Queue
def foo1(queue):
queue.put(1)
def main(first=False):
queue = Queue()
p = Process(target=foo1, args=(queue,))
p.start()
if first:
a = queue.get()
print(a)
else:
while not queue.empty():
print(queue.get())
p.join()
if __name__ == "__main__":
main()
Question: Why does first method print 1 correctly and second does not ? Aren`t they supposed to be equal ?
I am using Windows 10. I noticed this behavior in both interactive console and shell terminal.
Note: Due to the bug mentioned here I have to run the code as one script.

Running Python on multiple cores

I have created a (rather large) program that takes quite a long time to finish, and I started looking into ways to speed up the program.
I found that if I open task manager while the program is running only one core is being used.
After some research, I found this website:
Why does multiprocessing use only a single core after I import numpy? which gives a solution of os.system("taskset -p 0xff %d" % os.getpid()),
however this doesn't work for me, and my program continues to run on a single core.
I then found this:
is python capable of running on multiple cores?,
which pointed towards using multiprocessing.
So after looking into multiprocessing, I came across this documentary on how to use it https://docs.python.org/3/library/multiprocessing.html#examples
I tried the code:
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
a = input("Finished")
After running the code (not in IDLE) It said this:
Finished
hello bob
Finished
Note: after it said Finished the first time I pressed enter
So after this I am now even more confused and I have two questions
First: It still doesn't run with multiple cores (I have an 8 core Intel i7)
Second: Why does it input "Finished" before its even run the if statement code (and it's not even finished yet!)
To answer your second question first, "Finished" is printed to the terminal because a = input("Finished") is outside of your if __name__ == '__main__': code block. It is thus a module level constant which gets assigned when the module is first loaded and will execute before any code in the module runs.
To answer the first question, you only created one process which you run and then wait to complete before continuing. This gives you zero benefits of multiprocessing and incurs overhead of creating the new process.
Because you want to create several processes, you need to create a pool via a collection of some sort (e.g. a python list) and then start all of the processes.
In practice, you need to be concerned with more than the number of processors (such as the amount of available memory, the ability to restart workers that crash, etc.). However, here is a simple example that completes your task above.
import datetime as dt
from multiprocessing import Process, current_process
import sys
def f(name):
print('{}: hello {} from {}'.format(
dt.datetime.now(), name, current_process().name))
sys.stdout.flush()
if __name__ == '__main__':
worker_count = 8
worker_pool = []
for _ in range(worker_count):
p = Process(target=f, args=('bob',))
p.start()
worker_pool.append(p)
for p in worker_pool:
p.join() # Wait for all of the workers to finish.
# Allow time to view results before program terminates.
a = input("Finished") # raw_input(...) in Python 2.
Also note that if you join workers immediately after starting them, you are waiting for each worker to complete its task before starting the next worker. This is generally undesirable unless the ordering of the tasks must be sequential.
Typically Wrong
worker_1.start()
worker_1.join()
worker_2.start() # Must wait for worker_1 to complete before starting worker_2.
worker_2.join()
Usually Desired
worker_1.start()
worker_2.start() # Start all workers.
worker_1.join()
worker_2.join() # Wait for all workers to finish.
For more information, please refer to the following links:
https://docs.python.org/3/library/multiprocessing.html
Dead simple example of using Multiprocessing Queue, Pool and Locking
https://pymotw.com/2/multiprocessing/basics.html
https://pymotw.com/2/multiprocessing/communication.html
https://pymotw.com/2/multiprocessing/mapreduce.html

A print function makes a multiprocessing program fail

In the following code, I'm trying to create a sandboxed master-worker system, in which changes to global variables in a worker don't reflect to other workers.
To achieve this, a new process is created each time a task is created, and to make the execution parallel, the creation of processes itself is managed by ThreadPoolExecutor.
import time
from concurrent.futures import ThreadPoolExecutor
from multiprocessing import Pipe, Process
def task(conn, arg):
conn.send(arg * 2)
def isolate_fn(fn, arg):
def wrapped():
parent_conn, child_conn = Pipe()
p = Process(target=fn, args=(child_conn, arg), daemon=True)
try:
p.start()
r = parent_conn.recv()
finally:
p.join()
return r
return wrapped
def main():
with ThreadPoolExecutor(max_workers=4) as executor:
pair = []
for i in range(0, 10):
pair.append((i, executor.submit(isolate_fn(task, i))))
# This function makes the program broken.
#
print('foo')
time.sleep(2)
for arg, future in pair:
if future.done():
print('arg: {}, res: {}'.format(arg, future.result()))
else:
print('not finished: {}'.format(arg))
print('finished')
main()
This program works fine, until I put the print('foo') function inside the loop. If the function exists, some tasks remain unfinished, and what is worse, this program itself doesn't finish.
Results are not always the same, but the following is the typical output:
foo
foo
foo
foo
foo
foo
foo
foo
foo
foo
arg: 0, res: 0
arg: 1, res: 2
arg: 2, res: 4
not finished: 3
not finished: 4
not finished: 5
not finished: 6
not finished: 7
not finished: 8
not finished: 9
Why is this program so fragile?
I use Python 3.4.5.
Try using
from multiprocessing import set_start_method
... rest of your code here ....
if __name__ == '__main__':
set_start_method('spawn')
main()
If you search Stackoverflow for python multiprocessing and multithreading you will find a a fair few questions mentioning similar hanging issues. (esp. for python version 2.7 and 3.2)
Mixing multithreading and multiprocessing ist still a bit of an issue and even the python docs for multiprocessing.set_start_method mention that. In your case 'spawn' and 'forkserver' should work without any issues.
Another option might be to use MultiProcessingPool directly, but this may not be possible for you in a more complex use case.
Btw. 'Not Finished' may still appear in your output, as you are not waiting for your sub processes to finish, but the whole code should not hang anymore and always finish cleanly.
You are not creating ThreadPoolExecutor every time , rather using the pre initialized pool for every iteration. I really not able to track which print statement is hindering you?

Stopping processes in ThreadPool in Python

I've been trying to write an interactive wrapper (for use in ipython) for a library that controls some hardware. Some calls are heavy on the IO so it makes sense to carry out the tasks in parallel. Using a ThreadPool (almost) works nicely:
from multiprocessing.pool import ThreadPool
class hardware():
def __init__(IPaddress):
connect_to_hardware(IPaddress)
def some_long_task_to_hardware(wtime):
wait(wtime)
result = 'blah'
return result
pool = ThreadPool(processes=4)
Threads=[]
h=[hardware(IP1),hardware(IP2),hardware(IP3),hardware(IP4)]
for tt in range(4):
task=pool.apply_async(h[tt].some_long_task_to_hardware,(1000))
threads.append(task)
alive = [True]*4
Try:
while any(alive) :
for tt in range(4): alive[tt] = not threads[tt].ready()
do_other_stuff_for_a_bit()
except:
#some command I cannot find that will stop the threads...
raise
for tt in range(4): print(threads[tt].get())
The problem comes if the user wants to stop the process or there is an IO error in do_other_stuff_for_a_bit(). Pressing Ctrl+C stops the main process but the worker threads carry on running until their current task is complete.
Is there some way to stop these threads without having to rewrite the library or have the user exit python? pool.terminate() and pool.join() that I have seen used in other examples do not seem to do the job.
The actual routine (instead of the simplified version above) uses logging and although all the worker threads are shut down at some point, I can see the processes that they started running carry on until complete (and being hardware I can see their effect by looking across the room).
This is in python 2.7.
UPDATE:
The solution seems to be to switch to using multiprocessing.Process instead of a thread pool. The test code I tried is to run foo_pulse:
class foo(object):
def foo_pulse(self,nPulse,name): #just one method of *many*
print('starting pulse for '+name)
result=[]
for ii in range(nPulse):
print('on for '+name)
time.sleep(2)
print('off for '+name)
time.sleep(2)
result.append(ii)
return result,name
If you try running this using ThreadPool then ctrl-C does not stop foo_pulse from running (even though it does kill the threads right away, the print statements keep on coming:
from multiprocessing.pool import ThreadPool
import time
def test(nPulse):
a=foo()
pool=ThreadPool(processes=4)
threads=[]
for rn in range(4) :
r=pool.apply_async(a.foo_pulse,(nPulse,'loop '+str(rn)))
threads.append(r)
alive=[True]*4
try:
while any(alive) : #wait until all threads complete
for rn in range(4):
alive[rn] = not threads[rn].ready()
time.sleep(1)
except : #stop threads if user presses ctrl-c
print('trying to stop threads')
pool.terminate()
print('stopped threads') # this line prints but output from foo_pulse carried on.
raise
else :
for t in threads : print(t.get())
However a version using multiprocessing.Process works as expected:
import multiprocessing as mp
import time
def test_pro(nPulse):
pros=[]
ans=[]
a=foo()
for rn in range(4) :
q=mp.Queue()
ans.append(q)
r=mp.Process(target=wrapper,args=(a,"foo_pulse",q),kwargs={'args':(nPulse,'loop '+str(rn))})
r.start()
pros.append(r)
try:
for p in pros : p.join()
print('all done')
except : #stop threads if user stops findRes
print('trying to stop threads')
for p in pros : p.terminate()
print('stopped threads')
else :
print('output here')
for q in ans :
print(q.get())
print('exit time')
Where I have defined a wrapper for the library foo (so that it did not need to be re-written). If the return value is not needed the neither is this wrapper :
def wrapper(a,target,q,args=(),kwargs={}):
'''Used when return value is wanted'''
q.put(getattr(a,target)(*args,**kwargs))
From the documentation I see no reason why a pool would not work (other than a bug).
This is a very interesting use of parallelism.
However, if you are using multiprocessing, the goal is to have many processes running in parallel, as opposed to one process running many threads.
Consider these few changes to implement it using multiprocessing:
You have these functions that will run in parallel:
import time
import multiprocessing as mp
def some_long_task_from_library(wtime):
time.sleep(wtime)
class MyException(Exception): pass
def do_other_stuff_for_a_bit():
time.sleep(5)
raise MyException("Something Happened...")
Let's create and start the processes, say 4:
procs = [] # this is not a Pool, it is just a way to handle the
# processes instead of calling them p1, p2, p3, p4...
for _ in range(4):
p = mp.Process(target=some_long_task_from_library, args=(1000,))
p.start()
procs.append(p)
mp.active_children() # this joins all the started processes, and runs them.
The processes are running in parallel, presumably in a separate cpu core, but that is to the OS to decide. You can check in your system monitor.
In the meantime you run a process that will break, and you want to stop the running processes, not leaving them orphan:
try:
do_other_stuff_for_a_bit()
except MyException as exc:
print(exc)
print("Now stopping all processes...")
for p in procs:
p.terminate()
print("The rest of the process will continue")
If it doesn't make sense to continue with the main process when one or all of the subprocesses have terminated, you should handle the exit of the main program.
Hope it helps, and you can adapt bits of this for your library.
In answer to the question of why pool did not work then this is due to (as quoted in the Documentation) then main needs to be importable by the child processes and due to the nature of this project interactive python is being used.
At the same time it was not clear why ThreadPool would - although the clue is right there in the name. ThreadPool creates its pool of worker processes using multiprocessing.dummy which as noted here is just a wrapper around the Threading module. Pool uses the multiprocessing.Process. This can be seen by this test:
p=ThreadPool(processes=3)
p._pool[0]
<DummyProcess(Thread23, started daemon 12345)> #no terminate() method
p=Pool(processes=3)
p._pool[0]
<Process(PoolWorker-1, started daemon)> #has handy terminate() method if needed
As threads do not have a terminate method the worker threads carry on running until they have completed their current task. Killing threads is messy (which is why I tried to use the multiprocessing module) but solutions are here.
The one warning about the solution using the above:
def wrapper(a,target,q,args=(),kwargs={}):
'''Used when return value is wanted'''
q.put(getattr(a,target)(*args,**kwargs))
is that changes to attributes inside the instance of the object are not passed back up to the main program. As an example the class foo above can also have methods such as:
def addIP(newIP):
self.hardwareIP=newIP
A call to r=mp.Process(target=a.addIP,args=(127.0.0.1)) does not update a.
The only way round this for a complex object seems to be shared memory using a custom manager which can give access to both the methods and attributes of object a For a very large complex object based on a library this may be best done using dir(foo) to populate the manager. If I can figure out how I'll update this answer with an example (for my future self as much as others).
If for some reasons using threads is preferable, we can use this.
We can send some siginal to the threads we want to terminate. The simplest siginal is global variable:
import time
from multiprocessing.pool import ThreadPool
_FINISH = False
def hang():
while True:
if _FINISH:
break
print 'hanging..'
time.sleep(10)
def main():
global _FINISH
pool = ThreadPool(processes=1)
pool.apply_async(hang)
time.sleep(10)
_FINISH = True
pool.terminate()
pool.join()
print 'main process exiting..'
if __name__ == '__main__':
main()

Python multiprocessing.Pool does not start right away

I want to input text to python and process it in parallel. For that purpose I use multiprocessing.Pool. The problem is that sometime, not always, I have to input text multiple times before anything is processed.
This is a minimal version of my code to reproduce the problem:
import multiprocessing as mp
import time
def do_something(text):
print('Out: ' + text, flush=True)
# do some awesome stuff here
if __name__ == '__main__':
p = None
while True:
message = input('In: ')
if not p:
p = mp.Pool()
p.apply_async(do_something, (message,))
What happens is that I have to input text multiple times before I get a result, no matter how long I wait after I have inputted something the first time. (As stated above, that does not happen every time.)
python3 test.py
In: a
In: a
In: a
In: Out: a
Out: a
Out: a
If I create the pool before the while loop or if I add time.sleep(1) after creating the pool, it seems to work every time. Note: I do not want to create the pool before I get an input.
Has someone an explanation for this behavior?
I'm running Windows 10 with Python 3.4.2
EDIT: Same behavior with Python 3.5.1
EDIT:
An even simpler example with Pool and also ProcessPoolExecutor. I think the problem is the call to input() right after appyling/submitting, which only seems to be a problem the first time appyling/submitting something.
import concurrent.futures
import multiprocessing as mp
import time
def do_something(text):
print('Out: ' + text, flush=True)
# do some awesome stuff here
# ProcessPoolExecutor
# if __name__ == '__main__':
# with concurrent.futures.ProcessPoolExecutor() as executor:
# executor.submit(do_something, 'a')
# input('In:')
# print('done')
# Pool
if __name__ == '__main__':
p = mp.Pool()
p.apply_async(do_something, ('a',))
input('In:')
p.close()
p.join()
print('done')
Your code works when I tried it on my Mac.
In Python 3, it might help to explicitly declare how many processors will be in your pool (ie the number of simultaneous processes).
try using p = mp.Pool(1)
import multiprocessing as mp
import time
def do_something(text):
print('Out: ' + text, flush=True)
# do some awesome stuff here
if __name__ == '__main__':
p = None
while True:
message = input('In: ')
if not p:
p = mp.Pool(1)
p.apply_async(do_something, (message,))
I could not reproduce it on Windows 7 but there are few long shots worth to mention for your issue.
your AV might be interfering with the newly spawned processes, try temporarily disabling it and see if the issue is still present.
Win 10 might have different IO caching algorithm, try inputting larger strings. If it works, it means that the OS tries to be smart and sends data when a certain amount has piled up.
As Windows has no fork() primitive, you might see the delay caused by the spawn starting method.
Python 3 added a new pool of workers called ProcessPoolExecutor, I'd recommend you to use this no matter the issue you suffer from.

Categories

Resources