How to identify if Python Threads with Queue are done with task? - python

Here i have MazeRunner Class which put all elements of self.boxes in queue and run thread on them until all of the queue becomes empty q.empty() .
Here problem is how do i actually identify if my program is done performing threads on all elements which are in queue of self.boxes & return True.
It looks challenging because our threads are in while loop which keep changes based on self.boxes length & self.threads we defined.
i have tried putting all threads in list and t.join them all. But not luck. Any Help?
import threading,queue,time
class MazeRunner:
def __init__(self):
self.q = queue.Queue()
self.boxes = [1,2,3,4,5,6,7] ## `7` elements of list
self.threads = 5
for i in self.boxes:
self.q.put(i) ### ADDING Every element of list to queue
for j in range(self.threads): ### for i in range(5) threads
t = threading.Thread(target=self.ProcessQueue)
t.start() ### Started `5` threads on `7` elements
def ProcessQueue(self):
while not self.q.empty():
each_element = self.q.get()
self.SleepFunction(each_element)
self.q.task_done()
def SleepFunction(self,each_element):
print("STARTING : ",each_element)
time.sleep(10)
print("DONE : ",each_element)
lets_try = MazeRunner()
if lets_try == True:
print("All Threads Done on Elements")

You need to wait until all threads are done calling Thread.join:
HOWTO:
Replace your self.threads = 5 expression with class constant:
THREAD_NUM = 5
Put additional attribute threads (for a list of threads) into your __init__ method:
...
self.threads = []
Put each created thread into threads list:
for j in range(self.THREAD_NUM):
t = threading.Thread(target=self.ProcessQueue)
self.threads.append(t)
t.start()
Define method like check_completed to ensure all threads are terminated (done):
....
def check_completed(self):
for t in self.threads:
t.join()
return True
The way you need to check "all done":
m_runner = MazeRunner()
if m_runner.check_completed():
print("All Threads Done on Elements")

Related

MultiThreading with Python

This is a Producer Consumer Problem. I need a single producer and multiple consumers to access the shared data cell and each consumer needs to access the produced data before the producer makes additional data. The code works fine when there is a single consumer. I have attempted to make a list of the Producer and Consumers in order to .join() and .start() them. The program works so far as the first consumer, but hangs up when it gets to the second consumer. I have tried to change the locking mechanisms from "notify" to "notifyAll" in the getData and setData, I am a beginner in python and this stuff is pretty foreign to me but I have been trying stuff for 10 hours and would really appreciate some help.
import time, random
from threading import Thread, currentThread, Condition
class SharedCell(object):
def __init__(self):
self.data = -1
self.writeable = True
self.condition = Condition()
def setData(self, data):
self.condition.acquire()
while not self.writeable:
self.condition.wait()
print("%s setting data to %d" % \
(currentThread().getName(), data))
self.data = data
self.writeable = False
self.condition.notifyAll()
self.condition.release()
def getData(self):
self.condition.acquire()
while self.writeable:
self.condition.wait()
print(f'accessing data {currentThread().getName()} {self.data}')
self.writeable = True
self.condition.notifyAll()
self.condition.release()
return self.data
class Producer(Thread):
def __init__(self, cell, accessCount, sleepMax):
Thread.__init__(self, name = "Producer")
self.accessCount = accessCount
self.cell = cell
self.sleepMax = sleepMax
def run(self):
print("%s starting up" % self.getName())
for count in range(self.accessCount):
time.sleep(random.randint(1, self.sleepMax))
self.cell.setData(count + 1)
print("%s is done producing\n" % self.getName())
class Consumer(Thread):
def __init__(self, cell, accessCount, sleepMax):
Thread.__init__(self)
self.accessCount = accessCount
self.cell = cell
self.sleepMax = sleepMax
def run(self):
print("%s starting up" % self.getName())
for count in range(self.accessCount):
time.sleep(random.randint(1, self.sleepMax))
value = self.cell.getData()
print("%s is done consuming\n" % self.getName())
def main():
accessCount = int(input("Enter the number of accesses: "))
sleepMax = 4
cell = SharedCell()
producer = Producer(cell, accessCount, sleepMax)
consumer = Consumer(cell, accessCount, sleepMax)
consumerTwo = Consumer(cell, accessCount, sleepMax)
threads = []
threads.append(producer)
threads.append(consumer)
threads.append(consumerTwo)
print("Starting the threads")
for thread in threads:
thread.start()
thread.join()
main()
The join function blocks the current thread and waits until the indicated thread terminates. In your loop at the end of your main function, why do you join each thread immediately after starting it? That would result in starting thread 1, and then waiting for it to terminate before starting thread 2, and then waiting that it to terminate before starting thread 3, and so on.
Perhaps you meant something like this:
for thread in threads:
thread.start()
for thread in threads:
thread.join()
so that every thread is started before you wait for them to terminate.

Main thread hanging at queue join because q.get() on empty queue does not return

I have a request manager that builds a queue and starts x worker threads (x currently == 1).
Each thread is looping and getting elements from the queue appending the results to a shared list.
If the queue is exhausted the queue.Empty exception is caught, the current job marked as done and the thread should exit. This does work.
This block at the end of the run() however seems to break things. The queue has an arbitrary length and it might occur that the queue is longer then actual results fetchable. In order to exit all threads early a thread checks if the result he got has len == 0. If this is the case the thread clears the queue of all items left, marks itself as done and exits.
if len(request_result) == 0:
with self.q.mutex:
self.q.queue.clear()
self.q.task_done()
return
My assumption was that every thread would then finish it's current job and exit.
However the execution of the main thread hangs at q.join() and I can't debug why. From the debugger it looks like the worker-thread is not terminating. But that's just guessing.
I've read: Threading queue hangs in Python
but that does not solve the problem. I however set q.unfinished_tasks to 0 manually but that is not thread safe and will cause the program to crash when threads try to call taks_done() when another thread just set q.unfinished_tasks to 0.
class RequestManager:
def __init__(self, config=None):
self.config = config
def request_all_heroes(self):
q = queue.Queue()
result_list = []
# todo: get range max from highest hero ID.
for skip in [x * 100 for x in range(1, 3)]:
q.put_nowait(skip)
for _ in range(int(self.config["meta"]["number_of_threads"])):
RequestWorker(q=q,
config=self.config,
query_name='all_heroes',
shared_result_list=result_list).start()
q.join()
return [Hero(item) for sublist in result_list for item in sublist]
class RequestWorker(threading.Thread):
def __init__(self,
q=None,
config=None,
query_name="",
shared_result_list=None, *args, **kwargs):
self.q = q
self.config = config
self.query_file_path = self.config["files"][query_name]
self.shared_result_list = shared_result_list
super().__init__(*args, **kwargs)
def run(self):
keep_running = True
while keep_running:
try:
skip_number = self.q.get()
except queue.Empty:
self.q.task_done()
return
sr = SpecificRequest(config=self.config, skip=skip_number, query_file_path=self.query_file_path)
request_result = sr.do_specific_request()
if len(request_result) == 0:
with self.q.mutex:
self.q.queue.clear()
self.q.task_done()
return
self.shared_result_list.append(request_result)
self.q.task_done()
EDIT 1
if not self.q.empty():
skip_number = self.q.get()
else:
return
This works, unfortunately it is plain wrong becase get is called after the check if the queue is empty. This will cause problems at some point because a thread can check, see an element in the queue and another thread can snatch that last element in the meantime. Unlikely but possible.
This question is now about why self.q.get() does not return.

Python Looping until max number is reached

my tool stops randomly and it seems like all threads are 'ghosts'.
How does it work:
The tool loops until the max number of allowed threads at the same time are running, in this case 20. When a thread finishes it starts the next one.
Problem:
After like an hour of doing this, the tool is stuck at 20 Threads running but nothing happens anymore.
Thanks in advance everyone!
maxthreadcount = 20
while True:
if threading.active_count() < maxthreadcount:
threading.Thread(target=Dealer).start()
Dealer:
def Dealer():
print("thread started")
return
You need to terminate previously created threads after their job (print command in this case) is done.
Take a look at this example from this article:
class CountdownTask:
def __init__(self):
self._running = True
def terminate(self):
self._running = False
def run(self, n):
while self._running and n > 0:
print('T-minus', n)
n -= 1
time.sleep(5)
c = CountdownTask()
t = Thread(target = c.run, args =(10, ))
t.start()
...
# Signal termination
c.terminate()
# Wait for actual termination (if needed)
t.join()
I think you should call self.terminate() after doing the print. Something like below:
class Dealer():
def __init__(self):
self._running = True
def run(self):
print("thread started")
return self.terminate()
def terminate(self):
self._running = False
Edit
I also believe you can make use of python's ThreadPool to this extent. Instead of spawning threads yourself, you might be able to reuse threads after their assigned task is over, for the new tasks.

Delete Objects in a List as Passed to Multiprocessing

I need to pass each object in a large list to a function. After the function completes I no longer need the object passed to the function and would like to delete the object to save memory. If I were working with a single process I would do the following:
result = []
while len(mylist) > 0:
result.append(myfunc(mylist.pop())
As I loop over mylist I pop off each object in the list such that the object is no longer stored in mylist after it's passed to my function. How do I achieve this same effect in parallel using multiprocessing?
A simple consumer example (credits go here) :
import multiprocessing
import time
import random
class Consumer(multiprocessing.Process):
def __init__(self, task_queue, result_queue):
multiprocessing.Process.__init__(self)
self.task_queue = task_queue
self.result_queue = result_queue
def run(self):
while True:
task = self.task_queue.get()
if task is None:
# Poison pill means shutdown
self.task_queue.task_done()
break
answer = task.process()
self.task_queue.task_done()
self.result_queue.put(answer)
return
class Task(object):
def process(self):
time.sleep(0.1) # pretend to take some time to do the work
return random.randint(0, 100)
if __name__ == '__main__':
# Establish communication queues
tasks = multiprocessing.JoinableQueue()
results = multiprocessing.Queue()
# Start consumers
num_consumers = multiprocessing.cpu_count() * 2
consumers = [Consumer(tasks, results) for i in xrange(num_consumers)]
for consumer in consumers:
consumer.start()
# Enqueue jobs
num_jobs = 10
for _ in xrange(num_jobs):
tasks.put(Task())
# Add a poison pill for each consumer
for _ in xrange(num_consumers):
tasks.put(None)
# Wait for all tasks to finish
tasks.join()
# Start printing results
while num_jobs:
result = results.get()
print 'Result:', result
num_jobs -= 1

Communicating and comparing between objects in python-multiprocessing?

I am trying to find a way to compare between different objects (inherited from Thread class) in a way that keep parallilsm (real-time processing).
Every worker has three fields (message, count, n ). I am updating Count everytime. Let's say that I have three threads workers. I need to compare in my server based on the field count, how can I do access and compare between Worker.count of every worker, in a way that I keep parallelism
from Queue import Queue
from threading import Thread
import time
class Worker(Thread):
def __init__(self, message, n):
Thread.__init__(self)
self.message = message
self.count= 0
self.n = n
def run(self):
while True:
print(self.message)
self.count+=1
time.sleep(self.n)
class Comparator(Thread):
def __init__(self, message, n):
Thread.__init__(self)
self.message = message
self.n = n
def run(self):
while True:
max= max([x.count for x in threads]) # how can I access to other threads
print "max", max
time.sleep(self.n)
thread1 = Worker("Test-1", 1)
thread2 = Worker("Test-2", 3)
s = Comparator("Test-3", 2)
s.start()
s.join()
threads = [thread1, thread2]
for g in threads:
g.start()
for worker in threads:
# wait for workers
worker.join()
NOTE Using shared object here is not a good solution for me, using Queue() for example is not what I want, I need to do comparision based on updated field in the object that I update on the go (for simplicity, I use max() ).
you can pass the threads list to Comparator __init__() method :
[...]
class Comparator(Thread):
def __init__(self, message, n, threads):
Thread.__init__(self)
self.threads = threads
def run(self):
while True:
max= max([x.count for x in self.threads])
print("max", max)
time.sleep(self.n)
[...]
threads = [thread1, thread2]
s = Comparator("Test-3", 2, threads)

Categories

Resources