Python multithreading using queue - python

I was reading a article on Python multi threading using Queues and have a basic question.
Based on the print stmt, 5 threads are started as expected. So, how does the queue works?
1.The thread is started initially and when the queue is populated with a item does it gets restarted and starts processing that item?
2.If we use the queue system and threads process each item by item in the queue, how there is a improvement in performance..Is it not similar to serial processing ie; 1 by 1.
import Queue
import threading
import urllib2
import datetime
import time
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]
queue = Queue.Queue()
class ThreadUrl(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
print 'threads are created'
self.queue = queue
def run(self):
while True:
#grabs host from queue
print 'thread startting to run'
now = datetime.datetime.now()
host = self.queue.get()
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print 'host=%s ,threadname=%s' % (host,self.getName())
print url.read(20)
#signals to queue job is done
self.queue.task_done()
start = time.time()
if __name__ == '__main__':
#spawn a pool of threads, and pass them queue instance
print 'program start'
for i in range(5):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
print "Elapsed Time: %s" % (time.time() - start)

A queue is similar to a list container, but with internal locking to make it a thread-safe way to communicate data.
What happens when you start all of your threads is that they all block on the self.queue.get() call, waiting to pull an item from the queue. When an item is put into the queue from your main thread, one of the threads will become unblocked and receive the item. It can then continue to process it until it finishes and returns to a blocking state.
All of your threads can run concurrently because they all are able to receive items from the queue. This is where you would see your improvement in performance. If the urlopen and read take time in one thread and it is waiting on IO, that means another thread can do work. The queue objects job is simply to manage the locking access, and popping off items to the callers.

Related

Quit signal when waiting for blocking read from queue.Queue

In many cases I have a worker thread which pops data from a Queue and acts on it. At some kind of event I want my worker thread to stop. The simple solution is to add a timeout to the get call and check the Event/flag every time the get times out. This however as two problems:
Causes an unnecessary context switch
Delays the shutdown until a timeout occurs
Is there any better way to listen both to a stop event and new data in the Queue? Is it possible to listen to two Queue's at the same time and block until there's data in the first one? (In this case one can use a second Queue just to trigger the shutdown.)
The solution I'm currently using:
from queue import Queue, Empty
from threading import Event, Thread
from time import sleep
def worker(exit_event, queue):
print("Worker started.")
while not exit_event.isSet():
try:
data = queue.get(timeout=10)
print("got {}".format(data))
except Empty:
pass
print("Worker quit.")
if __name__ == "__main__":
exit_event = Event()
queue = Queue()
th = Thread(target=worker, args=(exit_event, queue))
th.start()
queue.put("Testing")
queue.put("Hello!")
sleep(2)
print("Asking worker to quit")
exit_event.set()
th.join()
print("All done..")
I guess you may easily reduce timeout to 0.1...0.01 sec. Slightly different solution is to use the queue to send both data and control commands to the thread:
import queue
import threading
import time
THREADSTOP = 0
class ThreadControl:
def __init__(self, command):
self.command = command
def worker(q):
print("Worker started.")
while True:
data = q.get()
if isinstance(data, ThreadControl):
if data.command == THREADSTOP:
break
print("got {}".format(data))
print("Worker quit.")
if __name__ == '__main__':
q = queue.Queue()
th = threading.Thread(target=worker, args=(q,))
th.start()
q.put("Testing")
q.put("Hello!")
time.sleep(2)
print("Asking worker to quit")
q.put(ThreadControl(command=THREADSTOP)) # sending command
th.join()
print("All done..")
Another option is to use sockets instead of queues.

Implementing a single thread server/daemon (Python)

I am developing a server (daemon).
The server has one "worker thread". The worker thread runs a queue of commands. When the queue is empty, the worker thread is paused (but does not exit, because it should preserve certain state in memory). To have exactly one copy of the state in memory, I need to run all time exactly one (not several and not zero) worker thread.
Requests are added to the end of this queue when a client connects to a Unix socket and sends a command.
After the command is issued, it is added to the queue of commands of the worker thread. After it is added to the queue, the server replies something like "OK". There should be not a long pause between server receiving a command and it "OK" reply. However, running commands in the queue may take some time.
The main "work" of the worker thread is split into small (taking relatively little time) chunks. Between chunks, the worker thread inspects ("eats" and empties) the queue and continues to work based on the data extracted from the queue.
How to implement this server/daemon in Python?
This is a sample code with internet sockets, easily replaced with unix domain sockets. It takes whatever you write to the socket, passes it as a "command" to worker, responds OK as soon as it has queued the command. The single worker simulates a lengthy task with sleep(30). You can queue as many tasks as you want, receive OK immediately and every 30 seconds, your worker prints a command from the queue.
import Queue, threading, socket
from time import sleep
class worker(threading.Thread):
def __init__(self,q):
super(worker,self).__init__()
self.qu = q
def run(self):
while True:
new_task=self.qu.get(True)
print new_task
i=0
while i < 10:
print "working ..."
sleep(1)
i += 1
try:
another_task=self.qu.get(False)
print another_task
except Queue.Empty:
pass
task_queue = Queue.Queue()
w = worker(task_queue)
w.daemon = True
w.start()
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('localhost', 4200))
sock.listen(1)
try:
while True:
conn, addr = sock.accept()
data = conn.recv(32)
task_queue.put(data)
conn.sendall("OK")
conn.close()
except:
sock.close()

How to implement a python thread pool to test for network connectivity?

I am trying to implement a Python (2.6.x/2.7.x) thread pool that would check for network connectivity(ping or whatever), the entire pool threads must be killed/terminated when the check is successful.
So I am thinking of creating a pool of, let's say, 10 worker threads. If any one of them is successful in pinging, the main thread should terminate all the rest.
How do I implement this?
This is not a compilable code, this is just to give you and idea how to make threads communicate..
Inter process or threads communication happens through queues or pipes and some other ways..here I'm using queues for communication.
It works like this.. I'll send ip addresses in in_queue and add response to out_queue, my main thread monitors out_queue and if it gets desired result, it marks all the threads to terminate.
Below is the pinger thread definition..
import threading
from Queue import Queue, Empty
# A thread that pings ip.
class Pinger(threading.Thread):
def __init__(self, kwargs=None):
threading.Thread.__init__(self)
self.kwargs = kwargs
self.stop_pinging = False
def run(self):
ip_queue = self.kwargs.get('in_queue')
out_queue = self.kwargs.get('out_queue')
while not self.stop_pinging:
try:
data = ip_quque.get(timeout=1)
ping_status = ping(data)
# This is pseudo code, you've to takecare of
# your own ping.
if ping_status:
out_queue.put('success')
# you can even break here if you don't want to
# continue after one success
else:
out_queue.put('failure')
if ip_queue.empty()
break
except Empty, e:
pass
Here is the main thread block..
# Create the shared queue and launch both thread pools
in_queue = Queue()
out_queue = Queue()
ip_list = ['ip1', 'ip2', '....']
# This is to add all the ips to the queue or you can
# customize to add through some producer way.
for ip in ip_list:
in_queue.put(ip)
pingerer_pool = []
for i in xrange(1, 10):
pingerer_worker = Pinger(kwargs={'in_queue': in_queue, 'out_queue': out_queue}, name=str(i))
pingerer_pool.append(pinger_worker)
pingerer_worker.start()
while 1:
if out_queue.get() == 'success':
for pinger in pinger_pool:
pinger_worker.stop_pinging = True
break
Note: This is a pseudo code, you should make this workable as you like.

Why is infinite loop needed when using threading and a queue in Python

I'm trying to understand how to use threading and I came across this nice example at http://www.ibm.com/developerworks/aix/library/au-threadingpython/
#!/usr/bin/env python
import Queue
import threading
import urllib2
import time
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com",
"http://ibm.com", "http://apple.com"]
queue = Queue.Queue()
class ThreadUrl(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
#grabs host from queue
host = self.queue.get()
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print url.read(1024)
#signals to queue job is done
self.queue.task_done()
start = time.time()
def main():
#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
main()
print "Elapsed Time: %s" % (time.time() - start)
The part I don't understand is why the run method has an infinite loop:
def run(self):
while True:
... etc ...
Just for laughs I ran the program without the loop and it looks like it runs fine!
So can someone explain why this loop is needed?
Also how is the loop exited as there is no break statement?
Do you want the thread to perform more than one job? If not, you don't need the loop. If so, you need something that's going to make it do that. A loop is a common solution. Your sample data contains five job, and the program starts five threads. So you don't need any thread to do more than one job here. Try adding one more URL to your workload, though, and see what changes.
The loop is required as without it each worker thread terminates as soon as it completes its first task. What you want is to have the worker take another task when it finishes.
In the code above, you create 5 worker threads, which just happens to be sufficient to cover the 5 URL's you are working with. If you had >5 URL's you would find only the first 5 were processed.

Need some assistance with Python threading/queue

import threading
import Queue
import urllib2
import time
class ThreadURL(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
host = self.queue.get()
sock = urllib2.urlopen(host)
data = sock.read()
self.queue.task_done()
hosts = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.facebook.com', 'http://stackoverflow.com']
start = time.time()
def main():
queue = Queue.Queue()
for i in range(len(hosts)):
t = ThreadURL(queue)
t.start()
for host in hosts:
queue.put(host)
queue.join()
if __name__ == '__main__':
main()
print 'Elapsed time: {0}'.format(time.time() - start)
I've been trying to get my head around how to perform Threading and after a few tutorials, I've come up with the above.
What it's supposed to do is:
Initialiase the queue
Create my Thread pool and then queue up the list of hosts
My ThreadURL class should then begin work once a host is in the queue and read the website data
The program should finish
What I want to know first off is, am I doing this correctly? Is this the best way to handle threads?
Secondly, my program fails to exit. It prints out the Elapsed time line and then hangs there. I have to kill my terminal for it to go away. I'm assuming this is due to my incorrect use of queue.join() ?
Your code looks fine and is quite clean.
The reason your application still "hangs" is that the worker threads are still running, waiting for the main application to put something in the queue, even though your main thread is finished.
The simplest way to fix this is to mark the threads as daemons, by doing t.daemon = True before your call to start. This way, the threads will not block the program stopping.
looks fine. yann is right about the daemon suggestion. that will fix your hang. my only question is why use the queue at all? you're not doing any cross thread communication, so it seems like you could just send the host info as an arg to ThreadURL init() and drop the queue.
nothing wrong with it, just wondering.
One thing, in the thread run function, the while True loop, if some exception happened, the task_done() may not be called however the get() has already been called. Thus the queue.join() may never end.

Categories

Resources