Threaded scripts stall after ending without closing - python

Hopefully this is just something small im doing wrong as these are some of my first threaded scripts using queues. Basically after running through it stops and sits there but wont exit.
import threading
import Queue
class Words(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.queue = Queue.Queue()
def word(self):
read = open('words.txt')
for word in read:
word = word.replace("\n","")
self.queue.put(word)
read.close()
for i in range(5):
t = self.run()
t.setDaemon(True)
t.start()
self.queue.join()
def run(self):
while True:
word = self.queue.get()
print word
self.queue.task_done()
if __name__ == '__main__':
Word = Words()
Word.word()

You are using threads incorrectly in a couple of ways in your code:
First, the code seems to be built on the incorrect assumption that the one Thread subclass object you have can spawn all of the threads you need to do the work. On the contrary, the Thread documentation says that start "must be called at most once per Thread object". In the case of the word method, this is the self reference.
However, it would not be useful to call self.start() because that would spawn a single thread to consume the queue, and you would gain nothing from threading. Since word would have to construct new instances of Words anyway to initiate multiple threads, and the queue object will need to be accessed by multiple Words instances, it would be useful to have both of those separate from the Words object. For example, word could be a function outside of the Words object that starts like:
def word():
queue = Queue.Queue()
read = open('words.txt')
for word in read:
word = word.replace("\n","")
self.put(word)
read.close()
#...
This would also mean that Words would have to take the queue object as a parameter so that multiple instances would share the same queue:
class Words(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
Second, your thread function (run) is an infinite loop, so the thread will never terminate. Since you are only running the queue consumer threads after you have added all items to the queue, you should not have a problem terminating the thread once the queue is empty, like so:
def run(self):
while True:
try:
word = self.queue.get(False)
except Queue.Empty:
break
print word
self.queue.task_done()
It is useful to use exceptions here because otherwise the queue could empty out and then the thread could try to get from it and it would end up waiting forever for an item to be added.
Third, in your for loop you call self.run(), which passes control to the run method, which then processes the entire queue and returns None after the method is changed to terminate. The following lines would throw exceptions because t would be assigned the value None. Since you want to spawn other threads to do the work, you should do t = Word(queue) to get a new word thread and then t.start() to start. So, the code when put together should be
class Words(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
try:
word = self.queue.get(False)
except Queue.Empty:
break
print word
self.queue.task_done()
def word():
queue = Queue.Queue()
read = open('words.txt')
for word in read:
word = word.replace("\n","")
self.put(word)
read.close()
for i in range(5):
t = Word()
t.setDaemon(True)
t.start()
queue.join()
if __name__=='__main__':
word()

It looks to me like you're mixing up a number of different aspects of threads, when you really just need a simple solution. As far as I can tell, the for i in range(5): loop never gets past the first iteration because you run the thread and it gets caught in an infinite loop.
Here's how I would do it:
import threading
import Queue
class Worker(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
# try to dequeue a word from the queue
try:
word = self.queue.get_nowait()
# if there's nothing in the queue, break because we're done
except Queue.Empty:
break
# if the 'try' was successful at getting a word, print it
print word
def fill_queue(queue):
read = open('words.txt')
for word in read:
word = word.replace("\n", "")
queue.put(word)
read.close()
if __name__ == "__main__":
# create empty queue
queue = Queue.Queue()
# fill the queue with work
fill_queue(queue)
# create 5 worker threads
threads = []
for i in range(5):
threads.append(Worker(queue))
# start threads
for thread in threads:
thread.start()
# join threads once they finish
for thread in threads:
thread.join()

If you would like to read over some examples of threaded code in Python, the following recipes might be able teach you some basics regarding the subject. Some of them are demonstrations, and others are programs:
mthread.py (2)
mthread.py (1)
Thread Syncronizer
Bounded Buffer Example (1)
Bounded Buffer Example (2)
Port Forwarding
Module For Running Simple Proxies
Proxy Example
Paint 2.0
spots (2)
Directory Pruner 2

Related

How does Python Queue know it will be empty?

I would like to understand how a queue knows that it wont receive any new items. In the following example the queue will indefintely wait when the tputter thread is not started (I assume because nothing was put to it so far). If the tputter is started it waits between 'puts' until something new is there and as soon as everything is finished it stops. But how does the tgetter know whether something new will end up in the queue or not?
import threading
import queue
import time
q = queue.Queue()
def getter():
for i in range(5):
print('worker:', q.get())
time.sleep(2)
def putter():
for i in range(5):
print('putter: ', i)
q.put(i)
time.sleep(3)
tgetter = threading.Thread(target=getter)
tgetter.start()
tputter = threading.Thread(target=putter)
#tputter.start()
A common way to do this is to use the "poison pill" pattern. Basically, the producer and consumer agree on a special "poison pill" object that the producer can load into the queue, which will indicate that no more items are going to be sent, and the consumer can shut down.
So, in your example, it'd look like this:
import threading
import queue
import time
q = queue.Queue()
END = object()
def getter():
while True:
item = q.get()
if item == END:
break
print('worker:', item)
time.sleep(2)
def putter():
for i in range(5):
print('putter: ', i)
q.put(i)
time.sleep(3)
q.put(END)
tgetter = threading.Thread(target=getter)
tgetter.start()
tputter = threading.Thread(target=putter)
#tputter.start()
This is a little contrived, since the producer is hard-coded to always send five items, so you have to imagine that the consumer doesn't know ahead of time how many items the producer will send.

Should I bother locking the queue when I put to or get from it?

I've been gong through the tutorials about multithreading and queue in python3. As the official tutorial goes, "The Queue class in this module implements all the required locking semantics". But in another tutorial, I've seen an example as following:
import queue
import threading
import time
exitFlag = 0
class myThread (threading.Thread):
def __init__(self, threadID, name, q):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.q = q
def run(self):
print ("Starting " + self.name)
process_data(self.name, self.q)
print ("Exiting " + self.name)
def process_data(threadName, q):
while not exitFlag:
queueLock.acquire()
if not workQueue.empty():
data = q.get()
queueLock.release()
print ("%s processing %s" % (threadName, data))
else:
queueLock.release()
time.sleep(1)
threadList = ["Thread-1", "Thread-2", "Thread-3"]
nameList = ["One", "Two", "Three", "Four", "Five"]
queueLock = threading.Lock()
workQueue = queue.Queue(10)
threads = []
threadID = 1
# Create new threads
for tName in threadList:
thread = myThread(threadID, tName, workQueue)
thread.start()
threads.append(thread)
threadID += 1
# Fill the queue
queueLock.acquire()
for word in nameList:
workQueue.put(word)
queueLock.release()
# Wait for queue to empty
while not workQueue.empty():
pass
# Notify threads it's time to exit
exitFlag = 1
# Wait for all threads to complete
for t in threads:
t.join()
print ("Exiting Main Thread")
I believe the tutorial you're following is a bad example of how to use Python's threadsafe queue. In particular, the tutorial is using the threadsafe queue in a way that unfortunately requires an extra lock. Indeed, this extra lock means that the threadsafe queue in the tutorial could be replaced with an old-fashioned non-threadsafe queue based on a simple list.
The reason that locking is needed is hinted at by the documentation for Queue.empty():
If empty() returns False it doesn't guarantee that a subsequent call to get() will not block.
The issue is that another thread could run in-between the call to empty() and the call to get(), stealing the item that empty() otherwise reported to exist. The tutorial probably uses the lock to ensure that the thread has exclusive access to the queue from the call to empty() until the call to get(). Without this lock, two threads could enter into the if-statement and both issue a call to get(), meaning that one of them could block, waiting for an item that will never be pushed.
Let me show you how to use the threadsafe queue properly. Instead of checking empty() first, just rely directly on the blocking behavior of get():
def process_data(threadName, q):
while True:
data = q.get()
if exitFlag:
break
print("%s processing %s" % (threadName, data))
The queue's internal locking will ensure that two threads do not interfere for the duration of the call to get(), and no queueLock is needed. Note that the tutorial's original code would check exitFlag periodically every 1 second, whereas this modified queue requires you to push a dummy object into the queue after setting exitFlag to True -- otherwise, the flag will never be checked.
The last part of the controller code would need to be modified as follows:
# Notify threads it's time to exit
exitFlag = 1
for _ in range(len(threadList)):
# Push a dummy element causing a single thread to wake-up and stop.
workQueue.put(None)
# Wait for all threads to exit
for t in threads:
t.join()
There is another issue with the tutorial's use of the threadsafe queue, namely that a busy-loop is used in the main thread when waiting for the queue to empty:
# Wait for queue to empty
while not workQueue.empty():
pass
To wait for the queue to empty it would be better to use Queue.task_done() in the threads and then call Queue.join() in the main thread. At the end of the loop body in process_data(), call q.task_done(). In the main controller code, instead of the while-loop above, call q.join().
See also the example in the bottom of Python's documentation page on the queue module.
Alternatively, you can keep the queueLock and replace the threadsafe queue with a plain old list as follows:
Replace workQueue = queue.Queue(10) with workQueue = []
Replace if not workQueue.empty() with if len(workQueue) > 0
Replace workQueue.get() with workQueue.pop(0)
Replace workQueue.put(word) with workQueue.append(word)
Note that this doesn't preserve the blocking behavior of put() present in the original version.

python multithreading queues not running or exiting cleanly

I'm learning python multithreading and queues. The following creates a bunch of threads that pass data through a queue to another thread for printing:
import time
import threading
import Queue
queue = Queue.Queue()
def add(data):
return ["%sX" % x for x in data]
class PrintThread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
data = self.queue.get()
print data
self.queue.task_done()
class MyThread(threading.Thread):
def __init__(self, queue, data):
threading.Thread.__init__(self)
self.queue = queue
self.data = data
def run(self):
self.queue.put(add(self.data))
if __name__ == "__main__":
a = MyThread(queue, ["a","b","c"])
a.start()
b = MyThread(queue, ["d","e","f"])
b.start()
c = MyThread(queue, ["g","h","i"])
c.start()
printme = PrintThread(queue)
printme.start()
queue.join()
However, I see only the data from the first thread print out:
['aX', 'bX', 'cX']
Then nothing else, but the program doesn't exit. I have to kill the process to have it exit.
Ideally, after each MyThread does it data processing and puts the result to the queue, that thread should exit? Simultaneously the PrintThread should take whatever is on the queue and print it.
After all MyThread threads have finished and the PrintThread thread has finished processing everything on the queue, the program should exit cleanly.
What have I done wrong?
EDIT:
If each MyThread thread takes a while to process, is there a way to guarantee that the PrintThread thread will wait for all the MyThread threads to finish before it will exit itself?
That way the print thread will definitely have processed every possible data on the queue because all the other threads have already exited.
For example,
class MyThread(threading.Thread):
def __init__(self, queue, data):
threading.Thread.__init__(self)
self.queue = queue
self.data = data
def run(self):
time.sleep(10)
self.queue.put(add(self.data))
The above modification will wait for 10 seconds before putting anything on the queue. The print thread will run, but I think it's exiting too early since there is not data on the queue yet, so the program prints out nothing.
Your PrintThread does not loop but instead only prints out a single queue item and then stops running.
Therefore, the queue will never be empty and the queue.join() statement will prevent the main program from terminating
Change the run() method of your PrintThread into the following code in order to have all queue items processed:
try:
while True:
data = self.queue.get_nowait()
print data
self.queue.task_done()
except queue.Empty:
# All items have been taken off the queue
pass

Need help on producer and consumer thread in python

I wanted to create the consumer and producer thread in python simultaneously, where producer thread will append the queue and consumer thread retrieves the item which stored in the queue. And I need to start the consumer thread along with producer. Consumer thread should wait till the queue gets an item. And it should terminate when there is no item in queue. I am new to python, please help on this.
Requirements:
If there is a list of 10 numbers, producer thread should insert the queue with one item, and consumer thread should retrieve the number. Both thread should start simultaneously .
from queue import Queue
import threading
import time
class producer(threading.Thread):
def __init__(self, list_of_numbers):
threading.Thread.__init__(self)
self.list_items = list_of_numbers
def run(self):
for i in self.list_items:
queue.put(str(i))
class consumer(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
while queue.not_empty:
queue_ret = queue.get()
print("Retrieved", queue_ret)
queue = Queue()
producers = producer([10,20,5,4,3,2,1])
consumers = consumer()
producers.start()
consumers.start()
producers.join()
consumers.join()
Just put a special item once you are done:
_im_done = object()
class producer(threading.Thread):
def run(self):
'''feed the consumer until you are done'''
queue.put(_im_done)
class consumer(threading.Thread):
def run(self):
while True:
queue_ret = queue.get()
if queue_ret is _im_done:
break
'''normal execution'''
If there are multiple consumers, then you have to put the item back before you stop:
class consumer(threading.Thread):
def run(self):
while True:
queue_ret = queue.get()
if queue_ret is _im_done:
queue.put(_im_done)
break
'''normal execution'''
You can use the queue module directly. The documentation contains an example for your use case. As a side note, the module is named Queue in Python 2.
However threading in Python won't get your program any faster if it is CPU bound, in this case you may use multiprocessing module instead (in IO bound cases threading may be more viable since threads are often cheaper). Mutiprocessing module also provides a safe queue implementation named multiprocessing.Queue.
queue.get() is blocking. If there are no items in queue it will just get stuck there. You should use while True: queue.get(block=False) and handle Empty exception and exit.
Ok full code to clear confusion
from Queue import Queue, Empty
import threading
import time
started = False
running = True
class producer(threading.Thread):
def __init__(self, list_of_numbers):
threading.Thread.__init__(self)
self.list_items = list_of_numbers
def run(self):
started = True
for i in self.list_items:
queue.put(str(i))
running = False
class consumer(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
while not started:
sleep(0)
while running:
try:
queue_ret = queue.get(block=False)
except Empty:
sleep(0)
continue
print("Retrieved", queue_ret)
queue = Queue()
producers = producer([10,20,5,4,3,2,1])
consumers = consumer()
producers.start()
consumers.start()
producers.join()
consumers.join()

How to stop a looping thread in Python?

What's the proper way to tell a looping thread to stop looping?
I have a fairly simple program that pings a specified host in a separate threading.Thread class. In this class it sleeps 60 seconds, the runs again until the application quits.
I'd like to implement a 'Stop' button in my wx.Frame to ask the looping thread to stop. It doesn't need to end the thread right away, it can just stop looping once it wakes up.
Here is my threading class (note: I haven't implemented looping yet, but it would likely fall under the run method in PingAssets)
class PingAssets(threading.Thread):
def __init__(self, threadNum, asset, window):
threading.Thread.__init__(self)
self.threadNum = threadNum
self.window = window
self.asset = asset
def run(self):
config = controller.getConfig()
fmt = config['timefmt']
start_time = datetime.now().strftime(fmt)
try:
if onlinecheck.check_status(self.asset):
status = "online"
else:
status = "offline"
except socket.gaierror:
status = "an invalid asset tag."
msg =("{}: {} is {}. \n".format(start_time, self.asset, status))
wx.CallAfter(self.window.Logger, msg)
And in my wxPyhton Frame I have this function called from a Start button:
def CheckAsset(self, asset):
self.count += 1
thread = PingAssets(self.count, asset, self)
self.threads.append(thread)
thread.start()
Threaded stoppable function
Instead of subclassing threading.Thread, one can modify the function to allow
stopping by a flag.
We need an object, accessible to running function, to which we set the flag to stop running.
We can use threading.currentThread() object.
import threading
import time
def doit(arg):
t = threading.currentThread()
while getattr(t, "do_run", True):
print ("working on %s" % arg)
time.sleep(1)
print("Stopping as you wish.")
def main():
t = threading.Thread(target=doit, args=("task",))
t.start()
time.sleep(5)
t.do_run = False
if __name__ == "__main__":
main()
The trick is, that the running thread can have attached additional properties. The solution builds
on assumptions:
the thread has a property "do_run" with default value True
driving parent process can assign to started thread the property "do_run" to False.
Running the code, we get following output:
$ python stopthread.py
working on task
working on task
working on task
working on task
working on task
Stopping as you wish.
Pill to kill - using Event
Other alternative is to use threading.Event as function argument. It is by
default False, but external process can "set it" (to True) and function can
learn about it using wait(timeout) function.
We can wait with zero timeout, but we can also use it as the sleeping timer (used below).
def doit(stop_event, arg):
while not stop_event.wait(1):
print ("working on %s" % arg)
print("Stopping as you wish.")
def main():
pill2kill = threading.Event()
t = threading.Thread(target=doit, args=(pill2kill, "task"))
t.start()
time.sleep(5)
pill2kill.set()
t.join()
Edit: I tried this in Python 3.6. stop_event.wait() blocks the event (and so the while loop) until release. It does not return a boolean value. Using stop_event.is_set() works instead.
Stopping multiple threads with one pill
Advantage of pill to kill is better seen, if we have to stop multiple threads
at once, as one pill will work for all.
The doit will not change at all, only the main handles the threads a bit differently.
def main():
pill2kill = threading.Event()
tasks = ["task ONE", "task TWO", "task THREE"]
def thread_gen(pill2kill, tasks):
for task in tasks:
t = threading.Thread(target=doit, args=(pill2kill, task))
yield t
threads = list(thread_gen(pill2kill, tasks))
for thread in threads:
thread.start()
time.sleep(5)
pill2kill.set()
for thread in threads:
thread.join()
This has been asked before on Stack. See the following links:
Is there any way to kill a Thread in Python?
Stopping a thread after a certain amount of time
Basically you just need to set up the thread with a stop function that sets a sentinel value that the thread will check. In your case, you'll have the something in your loop check the sentinel value to see if it's changed and if it has, the loop can break and the thread can die.
I read the other questions on Stack but I was still a little confused on communicating across classes. Here is how I approached it:
I use a list to hold all my threads in the __init__ method of my wxFrame class: self.threads = []
As recommended in How to stop a looping thread in Python? I use a signal in my thread class which is set to True when initializing the threading class.
class PingAssets(threading.Thread):
def __init__(self, threadNum, asset, window):
threading.Thread.__init__(self)
self.threadNum = threadNum
self.window = window
self.asset = asset
self.signal = True
def run(self):
while self.signal:
do_stuff()
sleep()
and I can stop these threads by iterating over my threads:
def OnStop(self, e):
for t in self.threads:
t.signal = False
I had a different approach. I've sub-classed a Thread class and in the constructor I've created an Event object. Then I've written custom join() method, which first sets this event and then calls a parent's version of itself.
Here is my class, I'm using for serial port communication in wxPython app:
import wx, threading, serial, Events, Queue
class PumpThread(threading.Thread):
def __init__ (self, port, queue, parent):
super(PumpThread, self).__init__()
self.port = port
self.queue = queue
self.parent = parent
self.serial = serial.Serial()
self.serial.port = self.port
self.serial.timeout = 0.5
self.serial.baudrate = 9600
self.serial.parity = 'N'
self.stopRequest = threading.Event()
def run (self):
try:
self.serial.open()
except Exception, ex:
print ("[ERROR]\tUnable to open port {}".format(self.port))
print ("[ERROR]\t{}\n\n{}".format(ex.message, ex.traceback))
self.stopRequest.set()
else:
print ("[INFO]\tListening port {}".format(self.port))
self.serial.write("FLOW?\r")
while not self.stopRequest.isSet():
msg = ''
if not self.queue.empty():
try:
command = self.queue.get()
self.serial.write(command)
except Queue.Empty:
continue
while self.serial.inWaiting():
char = self.serial.read(1)
if '\r' in char and len(msg) > 1:
char = ''
#~ print('[DATA]\t{}'.format(msg))
event = Events.PumpDataEvent(Events.SERIALRX, wx.ID_ANY, msg)
wx.PostEvent(self.parent, event)
msg = ''
break
msg += char
self.serial.close()
def join (self, timeout=None):
self.stopRequest.set()
super(PumpThread, self).join(timeout)
def SetPort (self, serial):
self.serial = serial
def Write (self, msg):
if self.serial.is_open:
self.queue.put(msg)
else:
print("[ERROR]\tPort {} is not open!".format(self.port))
def Stop(self):
if self.isAlive():
self.join()
The Queue is used for sending messages to the port and main loop takes responses back. I've used no serial.readline() method, because of different end-line char, and I have found the usage of io classes to be too much fuss.
Depends on what you run in that thread.
If that's your code, then you can implement a stop condition (see other answers).
However, if what you want is to run someone else's code, then you should fork and start a process. Like this:
import multiprocessing
proc = multiprocessing.Process(target=your_proc_function, args=())
proc.start()
now, whenever you want to stop that process, send it a SIGTERM like this:
proc.terminate()
proc.join()
And it's not slow: fractions of a second.
Enjoy :)
My solution is:
import threading, time
def a():
t = threading.currentThread()
while getattr(t, "do_run", True):
print('Do something')
time.sleep(1)
def getThreadByName(name):
threads = threading.enumerate() #Threads list
for thread in threads:
if thread.name == name:
return thread
threading.Thread(target=a, name='228').start() #Init thread
t = getThreadByName('228') #Get thread by name
time.sleep(5)
t.do_run = False #Signal to stop thread
t.join()
I find it useful to have a class, derived from threading.Thread, to encapsulate my thread functionality. You simply provide your own main loop in an overridden version of run() in this class. Calling start() arranges for the object’s run() method to be invoked in a separate thread.
Inside the main loop, periodically check whether a threading.Event has been set. Such an event is thread-safe.
Inside this class, you have your own join() method that sets the stop event object before calling the join() method of the base class. It can optionally take a time value to pass to the base class's join() method to ensure your thread is terminated in a short amount of time.
import threading
import time
class MyThread(threading.Thread):
def __init__(self, sleep_time=0.1):
self._stop_event = threading.Event()
self._sleep_time = sleep_time
"""call base class constructor"""
super().__init__()
def run(self):
"""main control loop"""
while not self._stop_event.isSet():
#do work
print("hi")
self._stop_event.wait(self._sleep_time)
def join(self, timeout=None):
"""set stop event and join within a given time period"""
self._stop_event.set()
super().join(timeout)
if __name__ == "__main__":
t = MyThread()
t.start()
time.sleep(5)
t.join(1) #wait 1s max
Having a small sleep inside the main loop before checking the threading.Event is less CPU intensive than looping continuously. You can have a default sleep time (e.g. 0.1s), but you can also pass the value in the constructor.
Sometimes you don't have control over the running target. In those cases you can use signal.pthread_kill to send a stop signal.
from signal import pthread_kill, SIGTSTP
from threading import Thread
from itertools import count
from time import sleep
def target():
for num in count():
print(num)
sleep(1)
thread = Thread(target=target)
thread.start()
sleep(5)
pthread_kill(thread.ident, SIGTSTP)
result
0
1
2
3
4
[14]+ Stopped

Categories

Resources