Adding objects to queue without interruption - python

I would like to put two objects into a queue, but I've got to be sure the objects are in both queues at the same time, therefore it should not be interrupted in between - something like an atomic block. Does some one have a solution? Many thanks...
queue_01.put(car)
queue_02.put(bike)

You could use a Condition object. You can tell the threads to wait with cond.wait(), and signal when the queues are ready with cond.notify_all(). See, for example, Doug Hellman's wonderful Python Module of the Week blog. His code uses multiprocessing; here I've adapted it for threading:
import threading
import Queue
import time
def stage_1(cond,q1,q2):
"""perform first stage of work, then notify stage_2 to continue"""
with cond:
q1.put('car')
q2.put('bike')
print 'stage_1 done and ready for stage 2'
cond.notify_all()
def stage_2(cond,q):
"""wait for the condition telling us stage_1 is done"""
name=threading.current_thread().name
print 'Starting', name
with cond:
cond.wait()
print '%s running' % name
def run():
# http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html#synchronizing-threads-with-a-condition-object
condition=threading.Condition()
queue_01=Queue.Queue()
queue_02=Queue.Queue()
s1=threading.Thread(name='s1', target=stage_1, args=(condition,queue_01,queue_02))
s2_clients=[
threading.Thread(name='stage_2[1]', target=stage_2, args=(condition,queue_01)),
threading.Thread(name='stage_2[2]', target=stage_2, args=(condition,queue_02)),
]
# Notice stage2 processes are started before stage1 process, and yet they wait
# until stage1 finishes
for c in s2_clients:
c.start()
time.sleep(1)
s1.start()
s1.join()
for c in s2_clients:
c.join()
run()
Running the script yields
Starting stage_2[1]
Starting stage_2[2]
stage_1 done and ready for stage 2 <-- Notice that stage2 is prevented from running until the queues have been packed.
stage_2[2] running
stage_2[1] running

To atomically add to two different queues, acquire the locks for both queues first. That's easiest to do by making a subclass of Queue that uses recursive locks.
import Queue # Note: module renamed to "queue" in Python 3
import threading
class MyQueue(Queue.Queue):
"Make a queue that uses a recursive lock instead of a regular lock"
def __init__(self):
Queue.Queue.__init__(self)
self.mutex = threading.RLock()
queue_01 = MyQueue()
queue_02 = MyQueue()
with queue_01.mutex:
with queue_02.mutex:
queue_01.put(1)
queue_02.put(2)

Related

How to use Queue with threading properly

I am new to queue & threads kindly help with the below code , here I am trying to execute the function hd , I need to run the function multiple times but only after a single run has been completed
import queue
import threading
import time
fifo_queue = queue.Queue()
def hd():
print("hi")
time.sleep(1)
print("done")
for i in range(3):
cc = threading.Thread(target=hd)
fifo_queue.put(cc)
cc.start()
Current Output
hi
hi
hi
donedonedone
Expected Output
hi
done
hi
done
hi
done​
You can use a Semaphore for your purposes
A semaphore manages an internal counter which is decremented by each acquire() call and incremented by each release() call. The counter can never go below zero; when acquire() finds that it is zero, it blocks, waiting until some other thread calls release().
A default value of Semaphore is 1,
class threading.Semaphore(value=1)
so only one thread would be active at once:
import queue
import threading
import time
fifo_queue = queue.Queue()
semaphore = threading.Semaphore()
def hd():
with semaphore:
print("hi")
time.sleep(1)
print("done")
for i in range(3):
cc = threading.Thread(target=hd)
fifo_queue.put(cc)
cc.start()
hi
done
hi
done
hi
done
As #user2357112supportsMonica mentioned in comments RLock would be more safe option
class threading.RLock
This class implements reentrant lock objects. A reentrant lock must be released by the thread that acquired it. Once a thread has acquired a reentrant lock, the same thread may acquire it again without blocking; the thread must release it once for each time it has acquired it.
import queue
import threading
import time
fifo_queue = queue.Queue()
lock = threading.RLock()
def hd():
with lock:
print("hi")
time.sleep(1)
print("done")
for i in range(3):
cc = threading.Thread(target=hd)
fifo_queue.put(cc)
cc.start()
please put the print("down") before sleep.
it will work fine.
Reason:
your program will do this:
thread1:
print
sleep
print
but while the thread is sleeping, other threads will be working and printing their first command.
in my way the thread will write the first, write the second and then go to sleep and wait for other threads to show up.

multiprocessing produces defunct process

I use Tornado as a web server, user can submit a task through the front end page, after auditing they can start the submitted task. In this situation, i want to start an asynchronous sub process to handle the task, so i write the following code in an request handler:
def task_handler():
// handle task here
def start_a_process_for_task():
p = multiprocessing.Process(target=task_handler,args=())
p.start()
return 0
I don't care about the sub process and just start a process for it and return to the front end page and tell user the task is started. The task itself will run in the background and will record it's status or results to database so user
can view on the web page later. So here i don't want to use p.join() which is blocking, but without p.join() after the task finished,the sub process becomes a defunct process and as Tornado runs as a daemon and never exits, the defunct process will never disappear.
Anyone knows how to fix this problem, thanks.
The proper way to avoid defunct children is for the parent to gracefully clean up and close all resources of the exited child. This is normally done by join(), but if you want to avoid that, another approach could be to set up a global handler for the SIGCHLD signal on the parent.
SIGCHLD will be emitted whenever a child exits, and in the handler function you should either call Process.join() if you still have access to the process object, or even use os.wait() to "wait" for any child process to terminate and properly reap it. The wait time here should be 0 as you know for sure a child process has just exited. You will also be able to get the process' exit code / termination signal so it can also be a useful method to handle / log child process crashes.
Here's a quick example of doing this:
from __future__ import print_function
import os
import signal
import time
from multiprocessing import Process
def child_exited(sig, frame):
pid, exitcode = os.wait()
print("Child process {pid} exited with code {exitcode}".format(
pid=pid, exitcode=exitcode
))
def worker():
time.sleep(5)
print("Process {pid} has completed it's work".format(pid=os.getpid()))
def parent():
children = []
# Comment out the following line to see zombie children
signal.signal(signal.SIGCHLD, child_exited)
for i in range(5):
c = Process(target=worker)
c.start()
print("Parent forked out worker process {pid}".format(pid=c.pid))
children.append(c)
time.sleep(1)
print("Forked out {c} workers, hit Ctrl+C to end...".format(c=len(children)))
while True:
time.sleep(5)
if __name__ == '__main__':
parent()
One caveat is that I am not sure if this process works on non-Unix operating systems. It should work on Linux, Mac and other Unixes.
You need to join your subprocesses if you do not want to create zombies. You can do it in threads.
This is a dummy example. After 10 seconds, all your subprocesses are gone instead of being zombies. This launches a thread for every subprocess. Threads do not need to be joined or waited. A thread executes subprocess, joins it and then exits the thread as soon as the subprocess is completed.
import multiprocessing
import threading
from time import sleep
def task_processor():
sleep(10)
class TaskProxy(threading.Thread):
def __init__(self):
super(TaskProxy, self).__init__()
def run(self):
p = multiprocessing.Process(target=task_processor,args=())
p.start()
p.join()
def task_handler():
t = TaskProxy()
t.daemon = True
t.start()
return
for _ in xrange(0,20):
task_handler()
sleep(60)

python daemon thread exits but process still run in the background

I am using python 2.7 and Python thread doesn't kill its process after the main program exits. (checking this with the ps -ax command on ubuntu machine)
I have the below thread class,
import os
import threading
class captureLogs(threading.Thread):
'''
initialize the constructor
'''
def __init__(self, deviceIp, fileTag):
threading.Thread.__init__(self)
super(captureLogs, self).__init__()
self._stop = threading.Event()
self.deviceIp = deviceIp
self.fileTag = fileTag
def stop(self):
self._stop.set()
def stopped(self):
return self._stop.isSet()
'''
define the run method
'''
def run(self):
'''
Make the thread capture logs
'''
cmdTorun = "adb logcat > " + self.deviceIp +'_'+self.fileTag+'.log'
os.system(cmdTorun)
And I am creating a thread in another file sample.py,
import logCapture
import os
import time
c = logCapture.captureLogs('100.21.143.168','somefile')
c.setDaemon(True)
c.start()
print "Started the log capture. now sleeping. is this a dameon?", c.isDaemon()
time.sleep(5)
print "Sleep tiime is over"
c.stop()
print "Calling stop was successful:", c.stopped()
print "Thread is now completed and main program exiting"
I get the below output from the command line:
Started the log capture. now sleeping. is this a dameon? True
Sleep tiime is over
Calling stop was successful: True
Thread is now completed and main program exiting
And the sample.py exits.
But when I use below command on a terminal,
ps -ax | grep "adb"
I still see the process running. (I am killing them manually now using the kill -9 17681 17682)
Not sure what I am missing here.
My question is,
1) why is the process still alive when I already killed it in my program?
2) Will it create any problem if I don't bother about it?
3) is there any other better way to capture logs using a thread and monitor the logs?
EDIT: As suggested by #bug Killer, I added the below method in my thread class,
def getProcessID(self):
return os.getpid()
and used os.kill(c.getProcessID(), SIGTERM) in my sample.py . The program doesn't exit at all.
It is likely because you are using os.system in your thread. The spawned process from os.system will stay alive even after the thread is killed. Actually, it will stay alive forever unless you explicitly terminate it in your code or by hand (which it sounds like you are doing ultimately) or the spawned process exits on its own. You can do this instead:
import atexit
import subprocess
deviceIp = '100.21.143.168'
fileTag = 'somefile'
# this is spawned in the background, so no threading code is needed
cmdTorun = "adb logcat > " + deviceIp +'_'+fileTag+'.log'
proc = subprocess.Popen(cmdTorun, shell=True)
# or register proc.kill if you feel like living on the edge
atexit.register(proc.terminate)
# Here is where all the other awesome code goes
Since all you are doing is spawning a process, creating a thread to do it is overkill and only complicates your program logic. Just spawn the process in the background as shown above and then let atexit terminate it when your program exits. And/or call proc.terminate explicitly; it should be fine to call repeatedly (much like close on a file object) so having atexit call it again later shouldn't hurt anything.

Mutual exclusion thread locking, with dropping of queued functions upon mutex/lock release, in Python?

This is the problem I have: I'm using Python 2.7, and I have a code which runs in a thread, which has a critical region that only one thread should execute at the time. That code currently has no mutex mechanisms, so I wanted to inquire what I could use for my specific use case, which involves "dropping" of "queued" functions. I've tried to simulate that behavior with the following minimal working example:
useThreading=False # True
if useThreading: from threading import Thread, Lock
else: from multiprocessing import Process, Lock
mymutex = Lock()
import time
tstart = None
def processData(data):
#~ mymutex.acquire()
try:
print('thread {0} [{1:.5f}] Do some stuff'.format(data, time.time()-tstart))
time.sleep(0.5)
print('thread {0} [{1:.5f}] 1000'.format(data, time.time()-tstart))
time.sleep(0.5)
print('thread {0} [{1:.5f}] done'.format(data, time.time()-tstart))
finally:
#~ mymutex.release()
pass
# main:
tstart = time.time()
for ix in xrange(0,3):
if useThreading: t = Thread(target = processData, args = (ix,))
else: t = Process(target = processData, args = (ix,))
t.start()
time.sleep(0.001)
Now, if you run this code, you get a printout like this:
thread 0 [0.00173] Do some stuff
thread 1 [0.00403] Do some stuff
thread 2 [0.00642] Do some stuff
thread 0 [0.50261] 1000
thread 1 [0.50487] 1000
thread 2 [0.50728] 1000
thread 0 [1.00330] done
thread 1 [1.00556] done
thread 2 [1.00793] done
That is to say, the three threads quickly get "queued" one after another (something like 2-3 ms after each other). Actually, they don't get queued, they simply start executing in parallel after 2-3 ms after each other.
Now, if I enable the mymutex.acquire()/.release() commands, I get what would be expected:
thread 0 [0.00174] Do some stuff
thread 0 [0.50263] 1000
thread 0 [1.00327] done
thread 1 [1.00350] Do some stuff
thread 1 [1.50462] 1000
thread 1 [2.00531] done
thread 2 [2.00547] Do some stuff
thread 2 [2.50638] 1000
thread 2 [3.00706] done
Basically, now with locking, the threads don't run in parallel, but they run one after another thanks to the lock - as long as one thread is working, the others will block at the .acquire(). But this is not exactly what I want to achieve, either.
What I want to achieve is this: let's assume that when .acquire() is first triggered by a thread function, it registers an id of a function (say a pointer to it) in a queue. After that, the behavior is basically the same as with the Lock - while the one thread works, the others block at .acquire(). When the first thread is done, it goes in the finally: block - and here, I'd like to check to see how many threads are waiting in the queue; then I'd like to delete/drop all waiting threads except for the very last one - and finally, I'd .release() the lock; meaning that after this, what was the last thread in the queue would execute next. I'd imagine, I would want to write something like the following pseudocode:
...
finally:
if (len(mymutex.queue) > 2): # more than this instance plus one other waiting:
while (len(mymutex.queue) > 2):
mymutex.queue.pop(1) # leave alone [0]=this instance, remove next element
# at this point, there should be only queue[0]=this instance, and queue[1]= what was the last thread queued previously
mymutex.release() # once we releace, queue[0] should be gone, and the next in the queue should acquire the mutex/lock..
pass
...
With that, I'd expect a printout like this:
thread 0 [0.00174] Do some stuff
thread 0 [0.50263] 1000
thread 0 [1.00327] done
# here upon lock release, thread 1 would be deleted - and the last one in the queue, thread 2, would acquire the lock next:
thread 2 [1.00350] Do some stuff
thread 2 [1.50462] 1000
thread 2 [2.00531] done
What would be the most straightforward way to achieve this in Python?
Seems like you want a queue-like behaviour, so why not use Queue?
import threading
from Queue import Queue
import time
# threads advertise to this queue when they're waiting
wait_queue = Queue()
# threads get their task from this queue
task_queue = Queue()
def do_stuff():
print "%s doing stuff" % str(threading.current_thread())
time.sleep(5)
def queue_thread(sleep_time):
# advertise current thread waiting
time.sleep(sleep_time)
wait_queue.put("waiting")
# wait for permission to pass
message = task_queue.get()
print "%s got task: %s" % (threading.current_thread(), message)
# unregister current thread waiting
wait_queue.get()
if message == "proceed":
do_stuff()
# kill size-1 threads waiting
for _ in range(wait_queue.qsize() - 1):
task_queue.put("die")
# release last
task_queue.put("proceed")
if message == "die":
print "%s died without doing stuff" % threading.current_thread()
pass
t1 = threading.Thread(target=queue_thread, args=(1, ))
t2 = threading.Thread(target=queue_thread, args=(2, ))
t3 = threading.Thread(target=queue_thread, args=(3, ))
t4 = threading.Thread(target=queue_thread, args=(4, ))
# allow first thread to pass
task_queue.put("proceed")
t1.start()
t2.start()
t3.start()
t4.start()
thread-1 arrives first and "acquires" the section, other threads come later to wait at the queue (and advertise they're waiting). Then, when thread-1 leaves it gives permission to the last thread at the queue by telling all other thread to die, and the last thread to proceed.
You can have finer control using different messages, a typical one would be a thread-id in the wait_queue (so you know who is waiting, and the order in which it arrived).
You can probably utilize non-blocking operations (queue.put(block=False) and queue.get(block=False)) in your favour when you're set on what you need.

Waking up a specific thread from sleep from another thread in python

I've been chasing my tail on this one for days now, and I'm going crazy. I'm a total amateur and totally new to python, so please excuse my stupidity.
My main thread repeatedly checks a database for an entry and then starts threads for every new entry that it finds in the database.
The threads that it starts up basically poll the database for a value and if it doesn't find that value, it does some things and then it sleeps for 60 seconds and starts over.
Simplified non-code for the started up thread
while True:
stop = _Get_a_Value_From_Database_for_Exit() #..... a call to DBMS
If stop = 0:
Do_stuff()
time.sleep(60)
else:
break
There could be many of these threads running at any given time. What I'd like to do is have the main thread check another spot in the database for specific value and then can interrupt the sleep in the example above in a specific thread that was started. The goal would be to exit a specific thread like the one listed above without having to wait the remainder of the sleep duration. All of these threads can be referenced by the database id that is shared. I've seen references to event.wait() and event.set() been trying to figure out how I replace the time.sleep() with it, but I have no idea how I could use it to wake up a specific thread instead of all of them.
This is where my ignorance show through: is there a way where I could do something based on database id for the event.wait (Like 12345.wait(60) in the started up thread and 12345.set() in the main thread (all dynamic based on the ever changing database id).
Thanks for your help!!
The project is a little complex, and here's my version of it.
scan database file /tmp/db.dat, prefilled with two words
manager: create a thread for each word; default is a "whiskey" thread and a "syrup" thread
if a word ends in _stop, like syrup_stop, tell that thread to die by setting its stop event
each thread scans the database file and exits if it sees the word stop. It'll also exit if its stop event is set.
note that if the Manager thread sets a worker's stop_event, the worker will immediately exit. Each thread does a little bit of stuff, but spends most of its time in the stop_ev.wait() call. Thus, when the event does get set, it doesn't have to sit around, it can exit immediately.
The server is fun to play around with! Start it, then send commands to it by adding lines to the database. Try each one of the following:
$ echo pie >> /tmp/db.dat # start new thread
$ echo pie_stop >> /tmp/db.dat # stop thread by event
$ echo whiskey_stop >> /tmp/db.dat # stop another thread "
$ echo stop >> /tmp/db.dat # stop all threads
source
import logging, sys, threading, time
STOP_VALUE = 'stop'
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)-4s %(threadName)s %(levelname)s %(message)s",
datefmt="%H:%M:%S",
stream=sys.stderr,
)
class Database(list):
PATH = '/tmp/db.dat'
def __init__(self):
super(Database,self).__init__()
self._update_lock = threading.Lock()
def update(self):
with self._update_lock:
self[:] = [ line.strip() for line in open(self.PATH) ]
db = Database()
def spawn(events, key):
events[key] = threading.Event()
th = threading.Thread(
target=search_worker,
kwargs=dict(stop_ev=events[key]),
name='thread-{}'.format(key),
)
th.daemon = True
th.start()
def search_worker(stop_ev):
"""
scan database until "stop" found, or our event is set
"""
logging.info('start')
while True:
logging.debug('scan')
db.update()
if STOP_VALUE in db:
logging.info('stopvalue: done')
return
if stop_ev.wait(timeout=10):
logging.info('event: done')
return
def manager():
"""
scan database
- word: spawn thread if none already
- word_stop: tell thread to die by setting its stop event
"""
logging.info('start')
events = dict()
while True:
db.update()
for key in db:
if key == STOP_VALUE:
continue
if key in events:
continue
if key.endswith('_stop'):
key = key.split('_')[0]
if key not in events:
logging.error('stop: missing key=%s!', key)
else:
# signal thread to stop
logging.info('stop: key=%s', key)
events[key].set()
del events[key]
else:
spawn(events, key)
logging.info('spawn: key=%s', key)
time.sleep(2)
if __name__=='__main__':
with open(Database.PATH, 'w') as dbf:
dbf.write(
'whiskey\nsyrup\n'
)
db.update()
logging.info('start: db=%s -- %s', db.PATH, db)
manager_t = threading.Thread(
target=manager,
name='manager',
)
manager_t.start()
manager_t.join()
Rather change your design architecture and go in for a distributed process-to-process message-passing, than to repetitively re-chase a dbEngine in an infinite loop with repetitive dbSeek-s for a dumb value, re-test it for an in-equality and then trying to "kill-a-sleep".
Both ZeroMQ or nanomsg are smart, broker-less, messaging layers very good in this sense.
A desire to cross-breed fire and water IMHO does not bring anything good for a real-world system.
A smart, scaleable, distributed process-to-process design does.
( Fig. on a simple distributed process-to-process messaging/coordination, courtesy imatix/ZeroMQ )

Categories

Resources