I would like to understand how a queue knows that it wont receive any new items. In the following example the queue will indefintely wait when the tputter thread is not started (I assume because nothing was put to it so far). If the tputter is started it waits between 'puts' until something new is there and as soon as everything is finished it stops. But how does the tgetter know whether something new will end up in the queue or not?
import threading
import queue
import time
q = queue.Queue()
def getter():
for i in range(5):
print('worker:', q.get())
time.sleep(2)
def putter():
for i in range(5):
print('putter: ', i)
q.put(i)
time.sleep(3)
tgetter = threading.Thread(target=getter)
tgetter.start()
tputter = threading.Thread(target=putter)
#tputter.start()
A common way to do this is to use the "poison pill" pattern. Basically, the producer and consumer agree on a special "poison pill" object that the producer can load into the queue, which will indicate that no more items are going to be sent, and the consumer can shut down.
So, in your example, it'd look like this:
import threading
import queue
import time
q = queue.Queue()
END = object()
def getter():
while True:
item = q.get()
if item == END:
break
print('worker:', item)
time.sleep(2)
def putter():
for i in range(5):
print('putter: ', i)
q.put(i)
time.sleep(3)
q.put(END)
tgetter = threading.Thread(target=getter)
tgetter.start()
tputter = threading.Thread(target=putter)
#tputter.start()
This is a little contrived, since the producer is hard-coded to always send five items, so you have to imagine that the consumer doesn't know ahead of time how many items the producer will send.
Related
I'm trying to write a code in which there is a single queue and many workers (producer_consumer in the example) that process objects in the queue. I need to use multiprocessing since the code the workers are going to execute will be CPU bounded. The setup is the following:
The queue is initialized by the parent process with some initial values (names in the example), then it starts the workers.
Workers start getting elements from the queue and after processing that element each worker may produce a new object to be inserted (...and then processed by someone else) into the queue.
All this goes on until the queue is empty. When this happens I would like that all workers stops and the control is given back to the Parent to conclude the execution.
I wrote this example in which workers correctly process elements and produce new objects into the queue but the problem is that the execution hang when the queue is empty. Any suggestions?
Thanks in advance
import time
import os
import random
import string
from multiprocessing import Process, Queue, Lock
# Produces and Consumes names in and from the Queue,
def producer_consumer(queue, lock):
# Synchronize access to the console
with lock:
print('Starting consumer => {}'.format(os.getpid()))
while not queue.empty():
time.sleep(random.randint(0, 10))
# If the queue is empty, queue.get() will block until the queue has data
name = queue.get()
if random.random() < 0.7:
product = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(10))
queue.put(product)
else:
product = 'nothing'
# Synchronize access to the console
with lock:
print('{} got {}, produced {}'.format(os.getpid(), name, product))
if __name__ == '__main__':
# Create the Queue object
queue = Queue()
# Create a lock object to synchronize resource access
lock = Lock()
producer_consumers = []
names = ['Mario', 'Peppino', 'Francesco', 'Carlo', 'Ermenegildo']
for name in names:
queue.put(name)
for _ in range(5):
producer_consumers.append(Process(target=producer_consumer, args=(queue, lock)))
for process in producer_consumers:
process.start()
for p in producer_consumers:
p.join()
print('Parent process exiting...')
here is a simple example:
from collections import deque
from multiprocessing import Process
global_dequeue = deque([])
def push():
global_dequeue.append('message')
p = Process(target=push)
p.start()
def pull():
print(global_dequeue)
pull()
the output is deque([])
if I was to call push function directly, not as a separate process, the output would be deque(['message'])
How can get the message into deque, but still run push function in a separate process?
You can share data by using multiprocessing Queue object which is designed to share data between processes:
from multiprocessing import Process, Queue
import time
def push(q): # send Queue to function as argument
for i in range(10):
q.put(str(i)) # put element in Queue
time.sleep(0.2)
q.put("STOP") # put poison pillow to stop taking elements from Queue in master
if __name__ == "__main__":
q = Queue() # create Queue instance
p = Process(target=push, args=(q,),) # create Process
p.start() # start it
while True:
x = q.get()
if x == "STOP":
break
print(x)
p.join() # join process to our master process and continue master run
print("Finish")
Let me know if it helped, feel free to ask questions.
You can also use Managers to achieve this.
Python 2: https://docs.python.org/2/library/multiprocessing.html#managers
Python 3:https://docs.python.org/3.8/library/multiprocessing.html#managers
Example of usage:
https://pymotw.com/2/multiprocessing/communication.html#managing-shared-state
I have written a program that I am using to benchmark a mongodb database performing under multithreaded bulk write conditions.
The problem is that the program hangs and does not finish executing.
I am quite sure that the problem is due to writing 530838 records to the database and using 10 threads to bulk write 50 records at a time. This leaves a modulo value of 38 records, however the run method fetches 50 records from the queue so the process hangs when 530800 records have been written and never writes the final 38 records as the following code never finishes executing
for object in range(50):
objects.append(self.queue.get())
I would like the program to write 50 records at a time until fewer than 50 remain at which point it should write the remaining records in the queue and then exit the thread when no records remain in the queue.
Thanks in advance :)
import threading
import Queue
import json
from pymongo import MongoClient, InsertOne
import datetime
#Set the number of threads
n_thread = 10
#Create the queue
queue = Queue.Queue()
#Connect to the database
client = MongoClient("mongodb://mydatabase.com")
db = client.threads
class ThreadClass(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
#Assign thread working with queue
self.queue = queue
def run(self):
while True:
objects = []
#Get next 50 objects from queue
for object in range(50):
objects.append(self.queue.get())
#Insert the queued objects into the database
db.threads.insert_many(objects)
#signals to queue job is done
self.queue.task_done()
#Create number of processes
threads = []
for i in range(n_thread):
t = ThreadClass(queue)
t.setDaemon(True)
#Start thread
t.start()
#Start timer
starttime = datetime.datetime.now()
#Read json object by object
content = json.load(open("data.txt","r"))
for jsonobj in content:
#Put object into queue
queue.put(jsonobj)
#wait on the queue until everything has been processed
queue.join()
for t in threads:
t.join()
#Print the total execution time
endtime = datetime.datetime.now()
duration = endtime-starttime
print(divmod(duration.days * 86400 + duration.seconds, 60))
From the docs on Queue.get you can see that the default settings are block=True and timeout=None, which results in blocked waiting on an empty queue to have a next item that can be taken.
You could use get_nowait or get(False) to ensure you're not blocking. If you want the blocking to be conditional on whether the queue has 50 items, whether it is empty, or other conditions, you can use Queue.empty and Queue.qsize, but note that they do not provide race-condition-proof guarantees of non-blocking behavior... they would merely be heuristics for whether to use block=False with get.
Something like this:
def run(self):
while True:
objects = []
#Get next 50 objects from queue
block = self.queue.qsize >= 50
for i in range(50):
try:
item = self.queue.get(block=block)
except Queue.Empty:
break
objects.append(item)
#Insert the queued objects into the database
db.threads.insert_many(objects)
#signals to queue job is done
self.queue.task_done()
Another approach would be to set timeout and use a try ... except block to catch any Empty exceptions that are raised. This has the advantage that you can decide how long to wait, rather than heuristically guessing when to immediately return, but they are similar.
Also note that I changed your loop variable from object to i ... you should most likely avoid having your loop variable ghost the global object class.
Hopefully this is just something small im doing wrong as these are some of my first threaded scripts using queues. Basically after running through it stops and sits there but wont exit.
import threading
import Queue
class Words(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.queue = Queue.Queue()
def word(self):
read = open('words.txt')
for word in read:
word = word.replace("\n","")
self.queue.put(word)
read.close()
for i in range(5):
t = self.run()
t.setDaemon(True)
t.start()
self.queue.join()
def run(self):
while True:
word = self.queue.get()
print word
self.queue.task_done()
if __name__ == '__main__':
Word = Words()
Word.word()
You are using threads incorrectly in a couple of ways in your code:
First, the code seems to be built on the incorrect assumption that the one Thread subclass object you have can spawn all of the threads you need to do the work. On the contrary, the Thread documentation says that start "must be called at most once per Thread object". In the case of the word method, this is the self reference.
However, it would not be useful to call self.start() because that would spawn a single thread to consume the queue, and you would gain nothing from threading. Since word would have to construct new instances of Words anyway to initiate multiple threads, and the queue object will need to be accessed by multiple Words instances, it would be useful to have both of those separate from the Words object. For example, word could be a function outside of the Words object that starts like:
def word():
queue = Queue.Queue()
read = open('words.txt')
for word in read:
word = word.replace("\n","")
self.put(word)
read.close()
#...
This would also mean that Words would have to take the queue object as a parameter so that multiple instances would share the same queue:
class Words(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
Second, your thread function (run) is an infinite loop, so the thread will never terminate. Since you are only running the queue consumer threads after you have added all items to the queue, you should not have a problem terminating the thread once the queue is empty, like so:
def run(self):
while True:
try:
word = self.queue.get(False)
except Queue.Empty:
break
print word
self.queue.task_done()
It is useful to use exceptions here because otherwise the queue could empty out and then the thread could try to get from it and it would end up waiting forever for an item to be added.
Third, in your for loop you call self.run(), which passes control to the run method, which then processes the entire queue and returns None after the method is changed to terminate. The following lines would throw exceptions because t would be assigned the value None. Since you want to spawn other threads to do the work, you should do t = Word(queue) to get a new word thread and then t.start() to start. So, the code when put together should be
class Words(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
try:
word = self.queue.get(False)
except Queue.Empty:
break
print word
self.queue.task_done()
def word():
queue = Queue.Queue()
read = open('words.txt')
for word in read:
word = word.replace("\n","")
self.put(word)
read.close()
for i in range(5):
t = Word()
t.setDaemon(True)
t.start()
queue.join()
if __name__=='__main__':
word()
It looks to me like you're mixing up a number of different aspects of threads, when you really just need a simple solution. As far as I can tell, the for i in range(5): loop never gets past the first iteration because you run the thread and it gets caught in an infinite loop.
Here's how I would do it:
import threading
import Queue
class Worker(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
# try to dequeue a word from the queue
try:
word = self.queue.get_nowait()
# if there's nothing in the queue, break because we're done
except Queue.Empty:
break
# if the 'try' was successful at getting a word, print it
print word
def fill_queue(queue):
read = open('words.txt')
for word in read:
word = word.replace("\n", "")
queue.put(word)
read.close()
if __name__ == "__main__":
# create empty queue
queue = Queue.Queue()
# fill the queue with work
fill_queue(queue)
# create 5 worker threads
threads = []
for i in range(5):
threads.append(Worker(queue))
# start threads
for thread in threads:
thread.start()
# join threads once they finish
for thread in threads:
thread.join()
If you would like to read over some examples of threaded code in Python, the following recipes might be able teach you some basics regarding the subject. Some of them are demonstrations, and others are programs:
mthread.py (2)
mthread.py (1)
Thread Syncronizer
Bounded Buffer Example (1)
Bounded Buffer Example (2)
Port Forwarding
Module For Running Simple Proxies
Proxy Example
Paint 2.0
spots (2)
Directory Pruner 2
I have some producer function which rely on I/O heavy blocking calls and some consumer functions which too rely on I/O heavy blocking calls. In order to speed them up, I used the Gevent micro-threading library as glue.
Here's what my paradigm looks like:
import gevent
from gevent.queue import *
import time
import random
q = JoinableQueue()
workers = []
producers = []
def do_work(wid, value):
gevent.sleep(random.randint(0,2))
print 'Task', value, 'done', wid
def worker(wid):
while True:
item = q.get()
try:
print "Got item %s" % item
do_work(wid, item)
finally:
print "No more items"
q.task_done()
def producer():
while True:
item = random.randint(1, 11)
if item == 10:
print "Signal Received"
return
else:
print "Added item %s" % item
q.put(item)
for i in range(4):
workers.append(gevent.spawn(worker, random.randint(1, 100000)))
#This doesnt work.
for j in range(2):
producers.append(gevent.spawn(producer))
#Uncommenting this makes this script work.
#producer()
q.join()
I have four consumer and would like to have two producers. The producers exit when they a signal i.e. 10. The consumers keep feeding off this queue and the whole task finishes when the producers and consumers are over.
However, this doesn't work. If I comment out the for loop which spawns multiple producers and use only a single producer, the script runs fine.
I can't seem to figure out what I've done wrong.
Any ideas?
Thanks
You don't actually want to quit when the queue has no unfinished work, because conceptually that's not when the application should finish.
You want to quit when the producers have finished, and then when there is no unfinished work.
# Wait for all producers to finish producing
gevent.joinall(producers)
# *Now* we want to make sure there's no unfinished work
q.join()
# We don't care about workers. We weren't paying them anything, anyways
gevent.killall(workers)
# And, we're done.
I think it does q.join() before anything is put in the queue and exits immediately. Try joining all producers before joining queue.
What you want do to is to block the main program while the producers and workers communicate. Blocking on the queue will wait until the queue is empty and then yield, which could be immediately. Put this at the end of your program instead of q.join()
gevent.joinall(producers)
I have met same issues like yours. The main problem with your code was that your producer has been spawned in gevent thread which make worker couldn't get task immediately.
I suggest that you should run producer() in the main process not spawn in gevent thread When the process run met the producer which could push the task immediately.
import gevent
from gevent.queue import *
import time
import random
q = JoinableQueue()
workers = []
producers = []
def do_work(wid, value):
gevent.sleep(random.randint(0,2))
print 'Task', value, 'done', wid
def worker(wid):
while True:
item = q.get()
try:
print "Got item %s" % item
do_work(wid, item)
finally:
print "No more items"
q.task_done()
def producer():
while True:
item = random.randint(1, 11)
if item == 10:
print "Signal Received"
return
else:
print "Added item %s" % item
q.put(item)
producer()
for i in range(4):
workers.append(gevent.spawn(worker, random.randint(1, 100000)))
Codes above make sense.. :)