Python multiprocessing: Event.wait() blocking other processes - python

I'm working with a toy multiprocessing problem, and event signalling is not working as expected. The multiprocessing documentation refers detail description of Event() to the multithreading documentation, and the description of the methods there are precisely what I'm trying to do. I want worker processes (subclassed from multiprocessing.Process) spawned by a parent class, then wait for a start signal from the parent class, do their thing, then terminate. What seems to be happening, however, is that the first process, once running, blocks any others. What's going on here, and how do I fix?
class Worker(Process):
def __init__(self, my_id, caller):
Process.__init__(self)
self.caller = caller
self.my_id = my_id
def run(self):
print("%i started"%self.my_id)
self.caller.start_flag.wait()
print("%i sleeping"%self.my_id)
sleep(2000)
class ParentProcess(object):
def __init__(self, num_procs):
self.procs = []
self.start_flag = Event()
for i in range(num_procs):
self.procs.append(Worker(i, self))
def run(self):
for proc in self.procs:
proc.run()
self.start_flag.set()
for proc in self.procs:
proc.join()
print("%i done"%proc.my_id)
if __name__ == '__main__':
cpus = cpu_count()
world = ParentProcess(cpus)
start = time()
world.run()
end = time()
runtime = end - start
print("Runtime: %3.6f"%runtime)
This is only outputting "0 started", then hanging. It seems the Event.wait() is blocking all other threads, even the caller. The documentation implies this should not happen.

He is a working version of the code. When you subclass process you implement the run method to define what should run in that process. When you actually want the process to start you should call the start method on it (proc.start()).
from multiprocessing import Process, Event
from time import time, sleep
class Worker(Process):
def __init__(self, my_id, caller):
Process.__init__(self)
self.caller = caller
self.my_id = my_id
def run(self):
print("%i started"%self.my_id)
self.caller.start_flag.wait()
print("%i sleeping"%self.my_id)
sleep(5)
class ParentProcess(object):
def __init__(self, num_procs):
self.procs = []
self.start_flag = Event()
for i in range(num_procs):
self.procs.append(Worker(i, self))
def run(self):
for proc in self.procs:
proc.start()
self.start_flag.set()
for proc in self.procs:
proc.join()
print("%i done"%proc.my_id)
if __name__ == '__main__':
cpus = 4
world = ParentProcess(cpus)
start = time()
world.run()
end = time()
runtime = end - start
print(runtime)
Outputs:
0 started
1 started
2 started
2 sleeping
0 sleeping
1 sleeping
3 started
3 sleeping
0 done
1 done
2 done
3 done
5.01037812233

Related

Queue in multiprocessing blocked by threading.Timer

I need to transfer data from a subprocess to the main one.
The subprocess in doing a repetitive task using threading.timer
Whenever threading.timer is called, the queue does not work anymore.
The subprocess is acquiring data, while I want to display them in real-time in the main process.
I wrote this snippet to showcase the problem:
import threading
import multiprocessing
class MyClass():
def __init__(self, q):
self.q = q
print("put value in q: ", "start")
self.q.put("start")
self.i = 0
self.update()
def update(self):
if self.i < 3:
print("put value in q: ", self.i)
self.q.put(self.i)
self.i += 1
threading.Timer(0.5, self.update).start()
else:
self.stop()
def stop(self):
print("put value in q: ", "stop")
self.q.put("stop")
if __name__ == "__main__":
q = multiprocessing.Queue()
process = multiprocessing.Process(target = MyClass, args=(q,))
process.start()
process.join()
for i in range(5):
print("get value in q: ",q.get(block = True, timeout = 2))
and I get this only:
put value in q: start
put value in q: 0
put value in q: 1
put value in q: 2
put value in q: stop
get value in q: start
get value in q: 0
Is there a solution or a workaround?
You have process. It has main thread (MyClass() call). threading.Timer() spawns another thread along with main thread so you have to wait untill all additional threads are terminated before you stop process. So to solve the problem replace threading.Timer(0.5, self.update).start() with (wait for threads):
t = threading.Timer(0.5, self.update)
t.start()
t.join()
Or replace threading.Timer(0.5, self.update).start() with (no additional threads):
time.sleep(.5)
self.update()
Both solutions should work.

Python threading: monitoring threads and executing additional code after threads complete

I have the following code:
class MyThread(Thread):
def __init__(self, command, template, env, build_flavor, logger):
Thread.__init__(self)
self.command = command
self.template = template
self.env = env
self.build_flavor = build_flavor
self.logger = logger
def run(self):
self.logger.info('Running (%s)...this may take several minutes. Please be patient' % self.build_flavor)
run_command(self.command, self.template, self.env)
self.logger.info('Complete (%s)' % self.build_flavor)
return
And then in another class, when I create the actual threads:
if self.build_type == 'default':
threads = []
for t in self.template:
modify_template(t)
build_flavor = self.getmatch(t)
thread = MyThread(packer, t, self.new_env, build_flavor, self.logger)
thread.setName(build_flavor)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
vmware_create()
openstack_create()
Unfortunately, after the threads are .join()'d, I'm calling vmware_create() and openstack_create() in serial. I'd like to be able to execute each of those after their respective threads complete so that I'm not waiting for both threads to finish before starting one of the *_create() functions...and then waiting for the first to complete before executing the 2nd
i.e. right now vmware_create() will execute only after BOTH threads are finished, and once vmware_create() is done, only then will openstack_create() begin. I'd like to be able to wait for the respective threads to complete, and then execute the _create() function for whatever thread completed first, all the while waiting for the 2nd thread to finish and then once that's done, immediately executing its _create() function for true parallelization.
I haven't been able to figure out how to do this and need a lil help.
Functions are objects. Just hand them to the thread:
class MyThread(Thread):
def __init__(self, command, template, env, build_flavor, logger, func=None):
Thread.__init__(self)
self.command = command
self.template = template
self.env = env
self.build_flavor = build_flavor
self.logger = logger
self.func = func
def run(self):
self.logger.info('Running (%s)...this may take several minutes. Please be patient' % self.build_flavor)
run_command(self.command, self.template, self.env)
self.logger.info('Complete (%s)' % self.build_flavor)
# call func if it is there
if self.func:
self.func()
return
Now, I supply the first two threads with function to call:
if self.build_type == 'default':
threads = []
funcs = {0: vmware_create, 1: openstack_create}
for i, t in enumerate(self.template):
modify_template(t)
build_flavor = self.getmatch(t)
func = funcs.get(i, None)
thread = MyThread(packer, t, self.new_env, build_flavor,
self.logger, func=func)
thread.setName(build_flavor)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
Of course, you can add them to any other threads.

Multi threading in python using parallel threads

I created two threads each running different functions.
What i tryed to achieve is if first thread ends then the second should also end ( i tryed achieving it using global variable)
Once both the threads end the same procedure should continue.
The script is not working as expected.
I am using Linux - Centos and python 2.7
#!/usr/bin/python
import threading
import time
import subprocess
import datetime
import os
import thread
command= "strace -o /root/Desktop/a.txt -c ./server"
final_dir = "/root/Desktop/"
exitflag = 0
# Define a function for the thread
def print_time(*args):
os.chdir(final_dir)
print "IN first thread"
proc = subprocess.Popen(command,shell=True,stdout=subprocess.PIPE, stderr=subprocess.PIPE)
proc.wait(70)
exitflag=1
def print_time1(*args):
print "In second thread"
global exitflag
while exitflag:
thread.exit()
#proc = subprocess.Popen(command1,shell=True,stdout=subprocess.PIPE, sterr=subprocess.PIPE)
# Create two threads as follows
while (1):
t1=threading.Thread(target=print_time)
t1.start()
t2=threading.Thread(target=print_time1)
t2=start()
time.sleep(80)
z = t1.isAlive()
z1 = t2.isAlive()
if z:
z.exit()
if z1:
z1.exit()
threading.Thread(target=print_time1).start()
threading.Thread(target=print_time1).start()
print "In try"
Where am i going wrong?
You could create an object to share state, and have the dependent thread check that state. Something like:
import threading
import time
import datetime
class Worker1( threading.Thread ):
def __init__(self, state):
super(Worker1, self).__init__()
self.state = state
def run(self):
print_time_helper("Worker1 Start")
time.sleep(4)
print_time_helper("Worker1 End")
self.state.keepOnRunning = False
class Worker2( threading.Thread ):
def __init__(self, state):
super(Worker2, self).__init__()
self.state = state
def run(self):
while self.state.keepOnRunning:
print_time_helper("Worker2")
time.sleep(1)
class State( object ):
def __init__(self):
self.keepOnRunning = True
def main():
state = State()
thread1 = Worker1(state)
thread2 = Worker2(state)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
def print_time_helper(name):
print "{0}: {1}".format(name, datetime.datetime.now().time().strftime("%S"))
which will output something like this (numbers show current time seconds):
Worker1 Start: 39
Worker2: 39
Worker2: 40
Worker2: 41
Worker2: 42
Worker1 End: 43
However, this is a bit simplistic for most situations. You might be better off using message queues - this is a good intro.
Use a threading.Event instead of an int and wait for it to be set.
Also your logic appears to be wrong in print_time1 because your while loop will never run since exitflag is initially 0, but even if it was 1 it would still just exit immediately. It's not actually waiting on anything.

Process vs. Thread with regards to using Queue()/deque() and class variable for communication and "poison pill"

I would like to create either a Thread or a Process which runs forever in a While True loop.
I need to send and receive data to the worker in the form for queues, either a multiprocessing.Queue() or a collections.deque(). I prefer to use collections.deque() as it is significantly faster.
I also need to be able to kill the worker eventually (as it runs in a while True loop. Here is some test code I've put together to try and understand the differences between Threads, Processes, Queues, and deque ..
import time
from multiprocessing import Process, Queue
from threading import Thread
from collections import deque
class ThreadingTest(Thread):
def __init__(self, q):
super(ThreadingTest, self).__init__()
self.q = q
self.toRun = False
def run(self):
print("Started Thread")
self.toRun = True
while self.toRun:
if type(self.q) == type(deque()):
if self.q:
i = self.q.popleft()
print("Thread deque: " + str(i))
elif type(self.q) == type(Queue()):
if not self.q.empty():
i = self.q.get_nowait()
print("Thread Queue: " + str(i))
def stop(self):
print("Trying to stop Thread")
self.toRun = False
while self.isAlive():
time.sleep(0.1)
print("Stopped Thread")
class ProcessTest(Process):
def __init__(self, q):
super(ProcessTest, self).__init__()
self.q = q
self.toRun = False
self.ctr = 0
def run(self):
print("Started Process")
self.toRun = True
while self.toRun:
if type(self.q) == type(deque()):
if self.q:
i = self.q.popleft()
print("Process deque: " + str(i))
elif type(self.q) == type(Queue()):
if not self.q.empty():
i = self.q.get_nowait()
print("Process Queue: " + str(i))
def stop(self):
print("Trying to stop Process")
self.toRun = False
while self.is_alive():
time.sleep(0.1)
print("Stopped Process")
if __name__ == '__main__':
q = Queue()
t1 = ProcessTest(q)
t1.start()
for i in range(10):
if type(q) == type(deque()):
q.append(i)
elif type(q) == type(Queue()):
q.put_nowait(i)
time.sleep(1)
t1.stop()
t1.join()
if type(q) == type(deque()):
print(q)
elif type(q) == type(Queue()):
while q.qsize() > 0:
print(str(q.get_nowait()))
As you can see, t1 can either be ThreadingTest, or ProcessTest. Also, the queue passed to it can either be a multiprocessing.Queue or a collections.deque.
ThreadingTest works with a Queue or deque(). It also kills run() properly when the stop() method is called.
Started Thread
Thread deque: 0
Thread deque: 1
Thread deque: 2
Thread deque: 3
Thread deque: 4
Thread deque: 5
Thread deque: 6
Thread deque: 7
Thread deque: 8
Thread deque: 9
Trying to stop Thread
Stopped Thread
deque([])
ProcessTest is only able to read from the queue if it is of type multiprocessing.Queue. It doesn't work with collections.deque. Furthermore, I am unable to kill the process using stop().
Process Queue: 0
Process Queue: 1
Process Queue: 2
Process Queue: 3
Process Queue: 4
Process Queue: 5
Process Queue: 6
Process Queue: 7
Process Queue: 8
Process Queue: 9
Trying to stop Process
I'm trying to figure out why? Also, what would be the best way to use deque with a process? And, how would I go about killing the process using some sort of stop() method.
You can't use a collections.deque to pass data between two multiprocessing.Process instances, because collections.deque is not process-aware. multiprocessing.Queue writes its contents to a multiprocessing.Pipe internally, which means that data in it can be enqueued in once process and retrieved in another. collections.deque doesn't have that kind of plumbing, so it won't work. When you write to the deque in one process, the deque instance in the other process won't be affected at all; they're completely separate instances.
A similar issue is happening to your stop() method. You're changing the value of toRun in the main process, but this won't affect the child at all. They're completely separate instances. The best way to end the child would be to send some sentinel to the Queue. When you get the sentinel in the child, break out of the infinite loop:
def run(self):
print("Started Process")
self.toRun = True
while self.toRun:
if type(self.q) == type(deque()):
if self.q:
i = self.q.popleft()
print("Process deque: " + str(i))
elif type(self.q) == type(Queue()):
if not self.q.empty():
i = self.q.get_nowait()
if i is None:
break # Got sentinel, so break
print("Process Queue: " + str(i))
def stop(self):
print("Trying to stop Process")
self.q.put(None) # Send sentinel
while self.is_alive():
time.sleep(0.1)
print("Stopped Process")
Edit:
If you actually do need deque semantics between two process, you can use a custom multiprocessing.Manager() to create a shared deque in a Manager process, and each of your Process instances will get a Proxy to it:
import time
from multiprocessing import Process
from multiprocessing.managers import SyncManager
from collections import deque
SyncManager.register('deque', deque)
def Manager():
m = SyncManager()
m.start()
return m
class ProcessTest(Process):
def __init__(self, q):
super(ProcessTest, self).__init__()
self.q = q
self.ctr = 0
def run(self):
print("Started Process")
self.toRun = True
while self.toRun:
if self.q._getvalue():
i = self.q.popleft()
if i is None:
break
print("Process deque: " + str(i))
def stop(self):
print("Trying to stop Process")
self.q.append(None)
while self.is_alive():
time.sleep(0.1)
print("Stopped Process")
if __name__ == '__main__':
m = Manager()
q = m.deque()
t1 = ProcessTest(q)
t1.start()
for i in range(10):
q.append(i)
time.sleep(1)
t1.stop()
t1.join()
print(q)
Note that this probably isn't going to be faster than a multiprocessing.Queue, though, since there's an IPC cost for every time you access the deque. It's also a much less natural data structure for passing messages the way you are.

python can't start a new thread

I am building a multi threading application.
I have setup a threadPool.
[ A Queue of size N and N Workers that get data from the queue]
When all tasks are done I use
tasks.join()
where tasks is the queue .
The application seems to run smoothly until suddently at some point (after 20 minutes in example) it terminates with the error
thread.error: can't start new thread
Any ideas?
Edit: The threads are daemon Threads and the code is like:
while True:
t0 = time.time()
keyword_statuses = DBSession.query(KeywordStatus).filter(KeywordStatus.status==0).options(joinedload(KeywordStatus.keyword)).with_lockmode("update").limit(100)
if keyword_statuses.count() == 0:
DBSession.commit()
break
for kw_status in keyword_statuses:
kw_status.status = 1
DBSession.commit()
t0 = time.time()
w = SWorker(threads_no=32, network_server='http://192.168.1.242:8180/', keywords=keyword_statuses, cities=cities, saver=MySqlRawSave(DBSession), loglevel='debug')
w.work()
print 'finished'
When the daemon threads are killed?
When the application finishes or when the work() finishes?
Look at the thread pool and the worker (it's from a recipe )
from Queue import Queue
from threading import Thread, Event, current_thread
import time
event = Event()
class Worker(Thread):
"""Thread executing tasks from a given tasks queue"""
def __init__(self, tasks):
Thread.__init__(self)
self.tasks = tasks
self.daemon = True
self.start()
def run(self):
'''Start processing tasks from the queue'''
while True:
event.wait()
#time.sleep(0.1)
try:
func, args, callback = self.tasks.get()
except Exception, e:
print str(e)
return
else:
if callback is None:
func(args)
else:
callback(func(args))
self.tasks.task_done()
class ThreadPool:
"""Pool of threads consuming tasks from a queue"""
def __init__(self, num_threads):
self.tasks = Queue(num_threads)
for _ in range(num_threads): Worker(self.tasks)
def add_task(self, func, args=None, callback=None):
''''Add a task to the queue'''
self.tasks.put((func, args, callback))
def wait_completion(self):
'''Wait for completion of all the tasks in the queue'''
self.tasks.join()
def broadcast_block_event(self):
'''blocks running threads'''
event.clear()
def broadcast_unblock_event(self):
'''unblocks running threads'''
event.set()
def get_event(self):
'''returns the event object'''
return event
ALSo maybe the problem it's because I create SWorker objects in a loop?
What happens with the old SWorker (garbage collection ?) ?
There is still not enough code for localize the problem, but I'm sure that this is because you don't utilize the threads and start too much of them. Did you see canonical example from Queue python documentation http://docs.python.org/library/queue.html (bottom of the page)?
I can reproduce your problem with the following code:
import threading
import Queue
q = Queue.Queue()
def worker():
item = q.get(block=True) # sleeps forever for now
do_work(item)
q.task_done()
# create infinite number of workers threads and fails
# after some time with "error: can't start new thread"
while True:
t = threading.Thread(target=worker)
t.start()
q.join() # newer reached this
Instead you must create the poll of threads with known number of threads and put your data to queue like:
q = Queue()
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
UPD: In case you need to stop some thread, you can add a flag to it or send a special mark means "stop" for break while loop:
class Worker(Thread):
break_msg = object() # just uniq mark sign
def __init__(self):
self.continue = True
def run():
while self.continue: # can stop and destroy thread, (var 1)
msg = queue.get(block=True)
if msg == self.break_msg:
return # will stop and destroy thread (var 2)
do_work()
queue.task_done()
workers = [Worker() for _ in xrange(num_workers)]
for w in workers:
w.start()
for task in tasks:
queue.put(task)
for _ in xrange(num_workers):
queue.put(Worker.break_msg) # stop thread after all tasks done. Need as many messages as many threads you have
OR
queue.join() # wait until all tasks done
for w in workers:
w.continue = False
w.put(None)

Categories

Resources