Need some assistance with Python threading/queue - python

import threading
import Queue
import urllib2
import time
class ThreadURL(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
host = self.queue.get()
sock = urllib2.urlopen(host)
data = sock.read()
self.queue.task_done()
hosts = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.facebook.com', 'http://stackoverflow.com']
start = time.time()
def main():
queue = Queue.Queue()
for i in range(len(hosts)):
t = ThreadURL(queue)
t.start()
for host in hosts:
queue.put(host)
queue.join()
if __name__ == '__main__':
main()
print 'Elapsed time: {0}'.format(time.time() - start)
I've been trying to get my head around how to perform Threading and after a few tutorials, I've come up with the above.
What it's supposed to do is:
Initialiase the queue
Create my Thread pool and then queue up the list of hosts
My ThreadURL class should then begin work once a host is in the queue and read the website data
The program should finish
What I want to know first off is, am I doing this correctly? Is this the best way to handle threads?
Secondly, my program fails to exit. It prints out the Elapsed time line and then hangs there. I have to kill my terminal for it to go away. I'm assuming this is due to my incorrect use of queue.join() ?

Your code looks fine and is quite clean.
The reason your application still "hangs" is that the worker threads are still running, waiting for the main application to put something in the queue, even though your main thread is finished.
The simplest way to fix this is to mark the threads as daemons, by doing t.daemon = True before your call to start. This way, the threads will not block the program stopping.

looks fine. yann is right about the daemon suggestion. that will fix your hang. my only question is why use the queue at all? you're not doing any cross thread communication, so it seems like you could just send the host info as an arg to ThreadURL init() and drop the queue.
nothing wrong with it, just wondering.

One thing, in the thread run function, the while True loop, if some exception happened, the task_done() may not be called however the get() has already been called. Thus the queue.join() may never end.

Related

How to send a CTRL-C signal to individual threads in Python?

I am trying to figure out how to properly send a CTRL-C signal on Windows using Python. Earlier I was messing around with youtube-dl and embedded it into a PyQt Qthread to do the processing and created a stop button to stop the thread but when trying to download a livestream I was unable to get FFMPEG to stop even after closing the application and I'd have to manually kill the process which breaks the video every time.
I knew I'd have to send it a CTRL-C signal somehow and ended up using this.
os.kill(signal.CTRL_C_EVENT, 0)
I was actually able to get it to work but if you try to download more than one video and try to stop one of the threads with the above signal it would kill all the downloads.
Is there any way to send the signal to just one thread without effecting the others?
Here is an example of some regular Python code with 2 seperate threads where the CTRL-C signal is fired in thread_2 after 10 seconds which ends up killing thread_1.
import os
import signal
import threading
import time
import youtube_dl
def thread_1():
print("thread_1 running")
url = 'https://www.cbsnews.com/common/video/cbsn_header_prod.m3u8'
path = 'C:\\Users\\Richard\\Desktop\\'
ydl_opts = {
'format': 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best',
'outtmpl': '{0}%(title)s-%(id)s.%(ext)s'.format(path),
'nopart': True,
}
ydl_opts = ydl_opts
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
try:
ydl.download([url])
except KeyboardInterrupt:
print('stopped')
def thread_2():
print("thread_2 running")
time.sleep(10)
os.kill(signal.CTRL_C_EVENT, 0)
def launch_thread(target, message, args=[], kwargs={}):
def thread_msg(*args, **kwargs):
target(*args, **kwargs)
print(message)
thread = threading.Thread(target=thread_msg, args=args, kwargs=kwargs)
thread.start()
return thread
if __name__ == '__main__':
thread1 = launch_thread(thread_1, "finished thread_1")
thread2 = launch_thread(thread_2, "finished thread_2")
Does anyone have any suggestions or ideas? Thanks.
It is not possible to send signals to another thread, so you need to do something else.
You could possibly raise an exception in another thread, using this hack (for which I won't copy the source here because it comes with an MIT license):
http://tomerfiliba.com/recipes/Thread2/
With that, you could send a KeyboardInterrupt exception to the other thread, which is what happens with Ctrl-C anyway.
While it seems like this would do what you want, it would still break the video which is currently downloading.
On the other hand, since you seem to only be interested in killing all threads when the main thread exits, that can be done in a much simpler way:
Configure all threads as daemons, e.g.:
thread = threading.Thread(target=thread_msg, args=args, kwargs=kwargs)
thread.daemon = True
thread.start()
These threads will exit when the main thread exits, without any additional intervention needed from you.
Is there any way to send the signal to just one thread without effecting the others?
I am not a Python expert, but if I was trying to solve your problem, after reading about signal handling in Python3, I would start planning to use multiple processes instead of using multiple threads within a single process.
You can use signal.pthread_kill
from signal import pthread_kill, SIGTSTP
from threading import Thread
from itertools import count
from time import sleep
def target():
for num in count():
print(num)
sleep(1)
thread = Thread(target=target)
thread.start()
sleep(5)
pthread_kill(thread.ident, SIGTSTP)
result
0
1
2
3
4
[14]+ Stopped

Python3: Wait for Daemon to finish iteration

I'm writing a python script that will start a local fileserver, and while that server is alive it will be writing to a file every 30 seconds. I would like to have the server and writer function running synchronously so I made the writer function a daemon thread... My main question is, since this daemon thread will quit once the server is stopped, if the daemon is in the middle of writing to a file will it complete that operation before exiting? It would be really bad to be left with 1/2 a file. Here's the code, but the actual file it will be writing is about 3k lines of JSON, hence the concern.
import http.server
import socketserver
from time import sleep
from threading import Thread
class Server:
def __init__(self):
self.PORT = 8000
self.Handler = http.server.SimpleHTTPRequestHandler
self.httpd = socketserver.TCPServer(("", self.PORT), self.Handler)
print("Serving at port", self.PORT)
def run(self):
try:
self.httpd.serve_forever()
except KeyboardInterrupt:
print("Server stopped")
def test():
while True:
with open('test', mode='w') as file:
file.write('testing...')
print('file updated')
sleep(5)
if __name__ == "__main__":
t = Thread(target=test, daemon=True)
t.start()
server = Server()
server.run()
It looks like you may have made an incorrect decision making the writer thread daemonic.
Making a daemonic thread does not mean it will run synchronously. It will still be affected by the GIL.
If you want synchronous execution, you'll have to use multiprocessing
From the Python docs:
Daemon threads are abruptly stopped at shutdown. Their resources (such
as open files, database transactions, etc.) may not be released
properly. If you want your threads to stop gracefully, make them
non-daemonic and use a suitable signalling mechanism such as an Event.
So that means that daemon threads are only suitable for the tasks that only make sense in context of the main thread and don't matter when the main thread has stopped working. File I/O, particularly data saving, is not suitable for a daemon thread.
So it looks like the most obvious and logical solution would be to make the writer thread non-daemonic.
Then, even if the main thread exits, the Python process won't be ended until all non-daemonic threads have finished. This allows for file I/O to complete and exit safely.
Explanation of daemonic threads in Python can be found here

How to make sure queue is empty before exiting main thread

I have a program that has two threads, the main thread and one additional that works on handling jobs from a FIFO queue.
Something like this:
import queue
import threading
q = queue.Queue()
def _worker():
while True:
msg = q.get(block=True)
print(msg)
q.task_done()
t = threading.Thread(target=_worker)
#t.daemon = True
t.start()
q.put('asdf-1')
q.put('asdf-2')
q.put('asdf-4')
q.put('asdf-4')
What I want to accomplish is basically to make sure the queue is emptied before the main thread exits.
If I set t.daemon to be True the program will exit before the queue is emptied, however if it's set to False the program will never exit. Is there some way to make sure the thread running the _worker() method clears the queue on main thread exit?
The comments touch on using .join(), but depending on your use case, using a join may make threading pointless.
I assume that your main thread will be doing things other than adding items to the queue - and may be shut down at any point, you just want to ensure that your queue is empty before shutting down is complete.
At the end of your main thread, you could add a simple empty check in a loop.
while not q.empty():
sleep(1)
If you don't set t.daemon = True then the thread will never finish. Setting the thread as a daemon thread will mean that the thread does not cause your program to stay running when the main thread finishes.
Put a special item (e.g. None) in the queue, that signals the worker thread to stop.
import queue
import threading
q = queue.Queue()
def _worker():
while True:
msg = q.get(block=True)
if msg is None:
return
print(msg) # do your stuff here
t = threading.Thread(target=_worker)
#t.daemon = True
t.start()
q.put('asdf-1')
q.put('asdf-2')
q.put('asdf-4')
q.put('asdf-4')
q.put(None)
t.join()

Can't catch SIGINT in multithreaded program

I've seen many topics about this particular problem but i still can't figure why i'm not catching a SIGINT in my main Thread.
Here is my code:
def connect(self, retry=100):
tries=retry
logging.info('connecting to %s' % self.path)
while True:
try:
self.sp = serial.Serial(self.path, 115200)
self.pileMessage = pilemessage.Pilemessage()
self.pileData = pilemessage.Pilemessage()
self.reception = reception.Reception(self.sp,self.pileMessage,self.pileData)
self.reception.start()
self.collisionlistener = collisionListener.CollisionListener(self)
self.message = messageThread.Message(self.pileMessage,self.collisionlistener)
self.datastreaminglistener = dataStreamingListener.DataStreamingListener(self)
self.datastreaming = dataStreaming.Data(self.pileData,self.datastreaminglistener)
return
except serial.serialutil.SerialException:
logging.info('retrying')
if not retry:
raise SpheroError('failed to connect after %d tries' % (tries-retry))
retry -= 1
def disconnect(self):
self.reception.stop()
self.message.stop()
self.datastreaming.stop()
while not self.pileData.isEmpty():
self.pileData.pop()
self.datastreaminglistener.remove()
while not self.pileMessage.isEmpty():
self.pileMessage.pop()
self.collisionlistener.remove()
self.sp.close()
if __name__ == '__main__':
import time
try:
logging.getLogger().setLevel(logging.DEBUG)
s = Sphero("/dev/rfcomm0")
s.connect()
s.set_motion_timeout(65525)
s.set_rgb(0,255,0)
s.set_back_led_output(255)
s.configure_locator(0,0)
except KeyboardInterrupt:
s.disconnect()
In the main function I call Connect() which is launching Threads over which i don't have direct controll.
When I launch this script I would like to be able to stop it when hitting Control+C by calling the "disconnect()" function which stops all the other threads.
In the code i provided it doesn't work because there is no thread in the main function. But I already tryied putting all the instuctions from Main() in a Thread with a While loop without success.
Is there a simple way to solve my problem ?
Thanx
Your indentation is messed up, but there's enough to go on.
Your main thread isn't catching SIGINT because it's not alive. There is nothing that stops your main thread from continuing past the try block, seeing no more code, and closing up shop.
I am not familiar with Sphero. I just attempted to google its docs and was linked to a bunch of 404 pages, so I'll tell you what you would normally do in a threaded environment - join your threads to the main thread so that the main thread can't finish execution before the worker threads.
for t in my_thread_list:
t.join() #main thread can't get past here until all the threads finish
If your Sphero object doesn't provide join-like functionality, you could hack something in that blocks, i.e.
raw_input('Press Enter to disconnect')
s.disconnect()

Why is infinite loop needed when using threading and a queue in Python

I'm trying to understand how to use threading and I came across this nice example at http://www.ibm.com/developerworks/aix/library/au-threadingpython/
#!/usr/bin/env python
import Queue
import threading
import urllib2
import time
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]
queue = Queue.Queue()
class ThreadUrl(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
#grabs host from queue
host = self.queue.get()
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print url.read(1024)
#signals to queue job is done
self.queue.task_done()
start = time.time()
def main():
#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
main()
print "Elapsed Time: %s" % (time.time() - start)
The part I don't understand is why the run method has an infinite loop:
def run(self):
while True:
... etc ...
Just for laughs I ran the program without the loop and it looks like it runs fine!
So can someone explain why this loop is needed?
Also how is the loop exited as there is no break statement?
Do you want the thread to perform more than one job? If not, you don't need the loop. If so, you need something that's going to make it do that. A loop is a common solution. Your sample data contains five job, and the program starts five threads. So you don't need any thread to do more than one job here. Try adding one more URL to your workload, though, and see what changes.
The loop is required as without it each worker thread terminates as soon as it completes its first task. What you want is to have the worker take another task when it finishes.
In the code above, you create 5 worker threads, which just happens to be sufficient to cover the 5 URL's you are working with. If you had >5 URL's you would find only the first 5 were processed.

Categories

Resources