Python dynamic MultiThread with Queue - Class - python

I have been struggling to implement a proper dynamic multi-thread system until now. The idea is to spin up multiple new pools of sub-threads from the main (each pool have its own number of threads and queue size) to run functions and the user can define if the main should wait for the sub-thread to finish up or just move to the next line after starting the thread. This multi-thread logic will help to extract data in parallel and at a fast frequency.
The solution to my issue is shared below for everyone who wants it. If you have any doubts and questions, please let me know.

# -*- coding: utf-8 -*-
"""
Created on Mon Jul 5 00:00:51 2021
#author: Tahasanul Abraham
"""
#%% Initialization of Libraries
import sys, os, inspect
currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(currentdir)
sys.path.insert(0,parentdir)
parentdir_1up = os.path.dirname(parentdir)
sys.path.insert(0,parentdir_1up)
from queue import Queue
from threading import Thread, Lock
class Worker(Thread):
def __init__(self, tasks):
Thread.__init__(self)
self.tasks = tasks
self.daemon = True
self.lock = Lock()
self.start()
def run(self):
while True:
func, args, kargs = self.tasks.get()
try:
if func.lower() == "terminate":
break
except:
try:
with self.lock:
func(*args, **kargs)
except Exception as exception:
print(exception)
self.tasks.task_done()
class ThreadPool:
def __init__(self, num_threads, num_queue=None):
if num_queue is None or num_queue < num_threads:
num_queue = num_threads
self.tasks = Queue(num_queue)
self.threads = num_threads
for _ in range(num_threads): Worker(self.tasks)
# This function can be called to terminate all the worker threads of the queue
def terminate(self):
self.wait_completion()
for _ in range(self.threads): self.add_task("terminate")
return None
# This function can be called to add new work to the queue
def add_task(self, func, *args, **kargs):
self.tasks.put((func, args, kargs))
# This function can be called to wait till all the workers are done processing the pending works. If this function is called, the main will not process any new lines unless all the workers are done with the pending works.
def wait_completion(self):
self.tasks.join()
# This function can be called to check if there are any pending/running works in the queue. If there are any works pending, the call will return Boolean True or else it will return Boolean False
def is_alive(self):
if self.tasks.unfinished_tasks == 0:
return False
else:
return True
#%% Standalone Run
if __name__ == "__main__":
import time
def test_return(x,d):
print (str(x) + " - pool completed")
d[str(x)] = x
time.sleep(5)
# 2 thread and 10000000000 FIFO queues
pool = ThreadPool(2,1000000000)
r ={}
for i in range(10):
pool.add_task(test_return, i, r)
print (str(i) + " - pool added")
print ("Waiting for completion")
pool.wait_completion()
print ("pool done")
# 1 thread and 2 FIFO queues
pool = ThreadPool(1,2)
r ={}
for i in range(10):
pool.add_task(test_return, i, r)
print (str(i) + " - pool added")
print ("Waiting for completion")
pool.wait_completion()
print ("pool done")
# 2 thread and 1 FIFO queues
pool = ThreadPool(2,1)
r ={}
for i in range(10):
pool.add_task(test_return, i, r)
print (str(i) + " - pool added")
print ("Waiting for completion")
pool.wait_completion()
print ("pool done")
Making a new Pool
Using the above classes, one can make a pool of their own choise with the number of parallel threads they want and the size of the queue. Example of creating a pool of 10 threads with 200 queue size.
pool = ThreadPool(10,200)
Adding work to Pool
Once a pool is created, one can use that pool.add_task to do sub-routine works. In my example version i used the pool to call a function and its arguments. Example, I called the test_return fucntion with its arguments i and r.
pool.add_task(test_return, i, r)
Waiting for the pool to complete its work
If a pool is given some work to do, the user can either move to other code lines or wait for the pool to finish its work before the next lines ar being read. To wait for the pool to finish the work and then return back, a call for wait_completion is required. Example:
pool.wait_completion()
Terminate and close down the pool threads
Once the requirement of the pool threads are done, it is possible to terminate and close down the pool threads to save up memory and release the blocked threads. This can be done by calling the following function.
pool.terminate()
Checking if there are any pending works from the pool
There is a function that can be called to check if there are any pending/running works in the queue. If there are any works pending, the call will return Boolean True, or else it will return Boolean False. To check if the pool is working or not call the folling function.
pool.is_alive()

Related

Kill all "workers" on "listener" error (multiprocessing, manager and queue set-up)

I'm using multiprocessing to run workers on different files in parallel. Worker's results are put into queue. A listener gets the results from the queue and writes them to the file.
Sometimes listener might run into errors (of various origins). In this case, the listener silently dies, but all other processes continue running (rather surprisingly, worker errors causes all processes to terminate).
I would like to stop all processes (workers, listener, e.t.c.) when listener catches an error. How this can be done?
The scheme of my code is as follows:
def worker(file_path, q):
## do something
q.put(1.)
return True
def listener(q):
while True:
m = q.get()
if m == 'kill':
break
else:
try:
# do something and write to file
except Exception as err:
# raise error
tb = sys.exc_info()[2]
raise err.with_traceback(tb)
def main():
manager = mp.Manager()
q = manager.Queue(maxsize=3)
with mp.Pool(5) as pool:
watcher = pool.apply_async(listener, (q,))
files = ['path_1','path_2','path_3']
jobs = [ pool.apply_async(worker, (p,q,)) for p in files ]
# fire off workers
for job in jobs:
job.get()
# kill the listener when done
q.put('kill')
# run
if __name__ == "__main__":
main()
I tried introducing event = manager.Event() and using it as a flag in main():
## inside the pool, after starting workers
while True:
if event.is_set():
for job in jobs:
job.terminate()
No success. Calling os._exit(1) in listener exception block rises broken pipe error, but processes are not killed.
I also tried setting daemon = True,
for job in jobs:
job.daemon = True
Did not help.
In fact, to handle listener exceptions, I'm using a callable, as required by apply_async (so that they are not entirely silenced). This complicates the situation, but not much.
Thank you in advance.
As always there are many ways to accomplish what you're after, but I would probably suggest using an Event to signal that the processes should quit. I also would not use a Pool in this instance, as it only really simplifies things for simple cases where you need something like map. More complicated use cases quickly make it easier to just build you own "pool" with the functionality you need.
from multiprocessing import Process, Queue, Event
from random import random
def might_fail(a):
assert(a > .001)
def worker(args_q: Queue, result_q: Queue, do_quit: Event):
try:
while not do_quit.is_set():
args = args_q.get()
if args is None:
break
else:
# do something
result_q.put(random())
finally: #signal that worker is exiting even if exception is raised
result_q.put(None) #signal listener that worker is exiting
def listener(result_q: Queue, do_quit: Event, n_workers: int):
n_completed = 0
while n_workers > 0:
res = result_q.get()
if res is None:
n_workers -= 1
else:
n_completed += 1
try:
might_fail(res)
except:
do_quit.set() #let main continue
print(n_completed)
raise #reraise error after we signal others to stop
do_quit.set() #let main continue
print(n_completed)
if __name__ == "__main__":
args_q = Queue()
result_q = Queue()
do_quit = Event()
n_workers = 4
listener_p = Process(target=listener, args=(result_q, do_quit, n_workers))
listener_p.start()
for _ in range(n_workers):
worker_p = Process(target=worker, args=(args_q, result_q, do_quit))
worker_p.start()
for _ in range(1000):
args_q.put("some/file.txt")
for _ in range(n_workers):
args_q.put(None)
do_quit.wait()
print('done')

Python 3 Limit count of active threads (finished threads do not quit)

I want to limit the number of active threads. What i have seen is, that a finished thread stays alive and does not exit itself, so the number of active threads keep growing until an error occours.
The following code starts only 8 threads at a time but they stay alive even when they finished. So the number keeps growing:
class ThreadEx(threading.Thread):
__thread_limiter = None
__max_threads = 2
#classmethod
def max_threads(cls, thread_max):
ThreadEx.__max_threads = thread_max
ThreadEx.__thread_limiter = threading.BoundedSemaphore(value=ThreadEx.__max_threads)
def __init__(self, target=None, args:tuple=()):
super().__init__(target=target, args=args)
if not ThreadEx.__thread_limiter:
ThreadEx.__thread_limiter = threading.BoundedSemaphore(value=ThreadEx.__max_threads)
def run(self):
ThreadEx.__thread_limiter.acquire()
try:
#success = self._target(*self._args)
#if success: return True
super().run()
except:
pass
finally:
ThreadEx.__thread_limiter.release()
def call_me(test1, test2):
print(test1 + test2)
time.sleep(1)
ThreadEx.max_threads(8)
for i in range(0, 99):
t = ThreadEx(target=call_me, args=("Thread count: ", str(threading.active_count())))
t.start()
Due to the for loop, the number of threads keep growing to 99.
I know that a thread has done its work because call_me has been executed and threading.active_count() was printed.
Does somebody know how i make sure, a finished thread does not stay alive?
This may be a silly answer but to me it looks you are trying to reinvent ThreadPool.
from multiprocessing.pool import ThreadPool
from time import sleep
p = ThreadPool(8)
def call_me(test1):
print(test1)
sleep(1)
for i in range(0, 99):
p.apply_async(call_me, args=(i,))
p.close()
p.join()
This will ensure only 8 concurrent threads are running your function at any point of time. And if you want a bit more performance, you can import Pool from multiprocessing and use that. The interface is exactly the same but your pool will now be subprocesses instead of threads, which usually gives a performance boost as GIL does not come in the way.
I have changed the class according to the help of Hannu.
I post it for reference, maybe it's useful for others that come across this post:
import threading
from multiprocessing.pool import ThreadPool
import time
class MultiThread():
__thread_pool = None
#classmethod
def begin(cls, max_threads):
MultiThread.__thread_pool = ThreadPool(max_threads)
#classmethod
def end(cls):
MultiThread.__thread_pool.close()
MultiThread.__thread_pool.join()
def __init__(self, target=None, args:tuple=()):
self.__target = target
self.__args = args
def run(self):
try:
result = MultiThread.__thread_pool.apply_async(self.__target, args=self.__args)
return result.get()
except:
pass
def call_me(test1, test2):
print(test1 + test2)
time.sleep(1)
return 0
MultiThread.begin(8)
for i in range(0, 99):
t = MultiThread(target=call_me, args=("Thread count: ", str(threading.active_count())))
t.run()
MultiThread.end()
The maximum of threads is 8 at any given time determined by the method begin.
And also the method run returns the result of your passed function if it returns something.
Hope that helps.

Python Multiprocessing - terminate / restart worker process

I have a bunch of long running processes that I would like to split up into multiple processes. That part I can do no problem. The issue I run into is sometimes these processes go into a hung state. To address this issue I would like to be able to set a time threshold for each task that a process is working on. When that time threshold is exceeded I would like to restart or terminate the task.
Originally my code was very simple using a process pool, however with the pool I could not figure out how to retrieve the processes inside the pool, nevermind how to restart / terminate a process in the pool.
I have resorted to using a queue and process objects as is illustrated in this example (https://pymotw.com/2/multiprocessing/communication.html#passing-messages-to-processes with some changes.
My attempts to figure this out are in the code below. In its current state the process does not actually get terminated. Further to that I cannot figure out how to get the process to move onto the next task after the current task is terminated. Any suggestions / help appreciated, perhaps I’m going about this the wrong way.
Thanks
import multiprocess
import time
class Consumer(multiprocess.Process):
def __init__(self, task_queue, result_queue, startTimes, name=None):
multiprocess.Process.__init__(self)
if name:
self.name = name
print 'created process: {0}'.format(self.name)
self.task_queue = task_queue
self.result_queue = result_queue
self.startTimes = startTimes
def stopProcess(self):
elapseTime = time.time() - self.startTimes[self.name]
print 'killing process {0} {1}'.format(self.name, elapseTime)
self.task_queue.cancel_join_thread()
self.terminate()
# now want to get the process to start procesing another job
def run(self):
'''
The process subclass calls this on a separate process.
'''
proc_name = self.name
print proc_name
while True:
# pulling the next task off the queue and starting it
# on the current process.
task = self.task_queue.get()
self.task_queue.cancel_join_thread()
if task is None:
# Poison pill means shutdown
#print '%s: Exiting' % proc_name
self.task_queue.task_done()
break
self.startTimes[proc_name] = time.time()
answer = task()
self.task_queue.task_done()
self.result_queue.put(answer)
return
class Task(object):
def __init__(self, a, b, startTimes):
self.a = a
self.b = b
self.startTimes = startTimes
self.taskName = 'taskName_{0}_{1}'.format(self.a, self.b)
def __call__(self):
import time
import os
print 'new job in process pid:', os.getpid(), self.taskName
if self.a == 2:
time.sleep(20000) # simulate a hung process
else:
time.sleep(3) # pretend to take some time to do the work
return '%s * %s = %s' % (self.a, self.b, self.a * self.b)
def __str__(self):
return '%s * %s' % (self.a, self.b)
if __name__ == '__main__':
# Establish communication queues
# tasks = this is the work queue and results is for results or completed work
tasks = multiprocess.JoinableQueue()
results = multiprocess.Queue()
#parentPipe, childPipe = multiprocess.Pipe(duplex=True)
mgr = multiprocess.Manager()
startTimes = mgr.dict()
# Start consumers
numberOfProcesses = 4
processObjs = []
for processNumber in range(numberOfProcesses):
processObj = Consumer(tasks, results, startTimes)
processObjs.append(processObj)
for process in processObjs:
process.start()
# Enqueue jobs
num_jobs = 30
for i in range(num_jobs):
tasks.put(Task(i, i + 1, startTimes))
# Add a poison pill for each process object
for i in range(numberOfProcesses):
tasks.put(None)
# process monitor loop,
killProcesses = {}
executing = True
while executing:
allDead = True
for process in processObjs:
name = process.name
#status = consumer.status.getStatusString()
status = process.is_alive()
pid = process.ident
elapsedTime = 0
if name in startTimes:
elapsedTime = time.time() - startTimes[name]
if elapsedTime > 10:
process.stopProcess()
print "{0} - {1} - {2} - {3}".format(name, status, pid, elapsedTime)
if allDead and status:
allDead = False
if allDead:
executing = False
time.sleep(3)
# Wait for all of the tasks to finish
#tasks.join()
# Start printing results
while num_jobs:
result = results.get()
print 'Result:', result
num_jobs -= 1
I generally recommend against subclassing multiprocessing.Process as it leads to code hard to read.
I'd rather encapsulate your logic in a function and run it in a separate process. This keeps the code much cleaner and intuitive.
Nevertheless, rather than reinventing the wheel, I'd recommend you to use some library which already solves the issue for you such as Pebble or billiard.
For example, the Pebble library allows to easily set timeouts to processes running independently or within a Pool.
Running your function within a separate process with a timeout:
from pebble import concurrent
from concurrent.futures import TimeoutError
#concurrent.process(timeout=10)
def function(foo, bar=0):
return foo + bar
future = function(1, bar=2)
try:
result = future.result() # blocks until results are ready
except TimeoutError as error:
print("Function took longer than %d seconds" % error.args[1])
Same example but with a process Pool.
with ProcessPool(max_workers=5, max_tasks=10) as pool:
future = pool.schedule(function, args=[1], timeout=10)
try:
result = future.result() # blocks until results are ready
except TimeoutError as error:
print("Function took longer than %d seconds" % error.args[1])
In both cases, the timing out process will be automatically terminated for you.
A way simpler solution would be to continue using a than reimplementing the Pool is to design a mechanism which timeout the function you are running.
For instance:
from time import sleep
import signal
class TimeoutError(Exception):
pass
def handler(signum, frame):
raise TimeoutError()
def run_with_timeout(func, *args, timeout=10, **kwargs):
signal.signal(signal.SIGALRM, handler)
signal.alarm(timeout)
try:
res = func(*args, **kwargs)
except TimeoutError as exc:
print("Timeout")
res = exc
finally:
signal.alarm(0)
return res
def test():
sleep(4)
print("ok")
if __name__ == "__main__":
import multiprocessing as mp
p = mp.Pool()
print(p.apply_async(run_with_timeout, args=(test,),
kwds={"timeout":1}).get())
The signal.alarm set a timeout and when this timeout, it run the handler, which stop the execution of your function.
EDIT: If you are using a windows system, it seems to be a bit more complicated as signal does not implement SIGALRM. Another solution is to use the C-level python API. This code have been adapted from this SO answer with a bit of adaptation to work on 64bit system. I have only tested it on linux but it should work the same on windows.
import threading
import ctypes
from time import sleep
class TimeoutError(Exception):
pass
def run_with_timeout(func, *args, timeout=10, **kwargs):
interupt_tid = int(threading.get_ident())
def interupt_thread():
# Call the low level C python api using ctypes. tid must be converted
# to c_long to be valid.
res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
ctypes.c_long(interupt_tid), ctypes.py_object(TimeoutError))
if res == 0:
print(threading.enumerate())
print(interupt_tid)
raise ValueError("invalid thread id")
elif res != 1:
# "if it returns a number greater than one, you're in trouble,
# and you should call it again with exc=NULL to revert the effect"
ctypes.pythonapi.PyThreadState_SetAsyncExc(
ctypes.c_long(interupt_tid), 0)
raise SystemError("PyThreadState_SetAsyncExc failed")
timer = threading.Timer(timeout, interupt_thread)
try:
timer.start()
res = func(*args, **kwargs)
except TimeoutError as exc:
print("Timeout")
res = exc
else:
timer.cancel()
return res
def test():
sleep(4)
print("ok")
if __name__ == "__main__":
import multiprocessing as mp
p = mp.Pool()
print(p.apply_async(run_with_timeout, args=(test,),
kwds={"timeout": 1}).get())
print(p.apply_async(run_with_timeout, args=(test,),
kwds={"timeout": 5}).get())
For long running processes and/or long iterators, spawned workers might hang after some time. To prevent this, there are two built-in techniques:
Restart workers after they have delivered maxtasksperchild tasks from the queue.
Pass timeout to pool.imap.next(), catch the TimeoutError, and finish the rest of the work in another pool.
The following wrapper implements both, as a generator. This also works when replacing stdlib multiprocessing with multiprocess.
import multiprocessing as mp
def imap(
func,
iterable,
*,
processes=None,
maxtasksperchild=42,
timeout=42,
initializer=None,
initargs=(),
context=mp.get_context("spawn")
):
"""Multiprocessing imap, restarting workers after maxtasksperchild tasks to avoid zombies.
Example:
>>> list(imap(str, range(5)))
['0', '1', '2', '3', '4']
Raises:
mp.TimeoutError: if the next result cannot be returned within timeout seconds.
Yields:
Ordered results as they come in.
"""
with context.Pool(
processes=processes,
maxtasksperchild=maxtasksperchild,
initializer=initializer,
initargs=initargs,
) as pool:
it = pool.imap(func, iterable)
while True:
try:
yield it.next(timeout)
except StopIteration:
return
To catch the TimeoutError:
>>> import time
>>> iterable = list(range(10))
>>> results = []
>>> try:
... for i, result in enumerate(imap(time.sleep, iterable, processes=2, timeout=2)):
... results.append(result)
... except mp.TimeoutError:
... print("Failed to process the following subset of iterable:", iterable[i:])
Failed to process the following subset of iterable: [2, 3, 4, 5, 6, 7, 8, 9]

Thread Getting Stuck At Join

I'm running a thread pool that is giving a random bug. Sometimes it works, sometimes it gets stuck at the pool.join part of this code. I've been at this several days, yet cannot find any difference between when it works or when it gets stuck. Please help...
Here's the code...
def run_thread_pool(functions_list):
# Make the Pool of workers
pool = ThreadPool() # left blank to default to machine number of cores
pool.map(run_function, functions_list)
# close the pool and wait for the work to finish
pool.close()
pool.join()
return
Similarly, this code is also randomly getting stuck at q.join(:
def run_queue_block(methods_list, max_num_of_workers=20):
from views.console_output_handler import add_to_console_queue
'''
Runs methods on threads. Stores method returns in a list. Then outputs that list
after all methods in the list have been completed.
:param methods_list: example ((method name, args), (method_2, args), (method_3, args)
:param max_num_of_workers: The number of threads to use in the block.
:return: The full list of returns from each method.
'''
method_returns = []
log = StandardLogger(logger_name='run_queue_block')
# lock to serialize console output
lock = threading.Lock()
def _output(item):
# Make sure the whole print completes or threads can mix up output in one line.
with lock:
if item:
add_to_console_queue(item)
msg = threading.current_thread().name, item
log.log_debug(msg)
return
# The worker thread pulls an item from the queue and processes it
def _worker():
log = StandardLogger(logger_name='_worker')
while True:
try:
method, args = q.get() # Extract and unpack callable and arguments
except:
# we've hit a nonetype object.
break
if method is None:
break
item = method(*args) # Call callable with provided args and store result
method_returns.append(item)
_output(item)
q.task_done()
num_of_jobs = len(methods_list)
if num_of_jobs < max_num_of_workers:
max_num_of_workers = num_of_jobs
# Create the queue and thread pool.
q = Queue()
threads = []
# starts worker threads.
for i in range(max_num_of_workers):
t = threading.Thread(target=_worker)
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
t.start()
threads.append(t)
for method in methods_list:
q.put(method)
# block until all tasks are done
q.join()
# stop workers
for i in range(max_num_of_workers):
q.put(None)
for t in threads:
t.join()
return method_returns
I never know when it's going to work. It works most the time, but most the time is not good enough. What might possibly cause a bug like this?
You have to call shutdown on the concurrent.futures.ThreadPoolExecutor object. Then return the result of pool.map.
def run_thread_pool(functions_list):
# Make the Pool of workers
pool = ThreadPool() # left blank to default to machine number of cores
result = pool.map(run_function, functions_list)
# close the pool and wait for the work to finish
pool.shutdown()
return result
I've simplified your code without a Queue object and daemon Thread. Check if it fits your requirement.
def run_queue_block(methods_list):
from views.console_output_handler import add_to_console_queue
'''
Runs methods on threads. Stores method returns in a list. Then outputs that list
after all methods in the list have been completed.
:param methods_list: example ((method name, args), (method_2, args), (method_3, args)
:param max_num_of_workers: The number of threads to use in the block.
:return: The full list of returns from each method.
'''
method_returns = []
log = StandardLogger(logger_name='run_queue_block')
# lock to serialize console output
lock = threading.Lock()
def _output(item):
# Make sure the whole print completes or threads can mix up output in one line.
with lock:
if item:
add_to_console_queue(item)
msg = threading.current_thread().name, item
log.log_debug(msg)
return
# The worker thread pulls an item from the queue and processes it
def _worker(method, *args, **kwargs):
log = StandardLogger(logger_name='_worker')
item = method(*args, **kwargs) # Call callable with provided args and store result
with lock:
method_returns.append(item)
_output(item)
threads = []
# starts worker threads.
for method, args in methods_list:
t = threading.Thread(target=_worker, args=(method, args))
t.start()
threads.append(t)
# stop workers
for t in threads:
t.join()
return method_returns
To allow your queue to join in your second example, you need to ensure that all tasks are removed from the queue.
So in your _worker function, mark tasks as done even if they could not be processed, otherwise the queue will never be emptied, and your program will hang.
def _worker():
log = StandardLogger(logger_name='_worker')
while True:
try:
method, args = q.get() # Extract and unpack callable and arguments
except:
# we've hit a nonetype object.
q.task_done()
break
if method is None:
q.task_done()
break
item = method(*args) # Call callable with provided args and store result
method_returns.append(item)
_output(item)
q.task_done()

python can't start a new thread

I am building a multi threading application.
I have setup a threadPool.
[ A Queue of size N and N Workers that get data from the queue]
When all tasks are done I use
tasks.join()
where tasks is the queue .
The application seems to run smoothly until suddently at some point (after 20 minutes in example) it terminates with the error
thread.error: can't start new thread
Any ideas?
Edit: The threads are daemon Threads and the code is like:
while True:
t0 = time.time()
keyword_statuses = DBSession.query(KeywordStatus).filter(KeywordStatus.status==0).options(joinedload(KeywordStatus.keyword)).with_lockmode("update").limit(100)
if keyword_statuses.count() == 0:
DBSession.commit()
break
for kw_status in keyword_statuses:
kw_status.status = 1
DBSession.commit()
t0 = time.time()
w = SWorker(threads_no=32, network_server='http://192.168.1.242:8180/', keywords=keyword_statuses, cities=cities, saver=MySqlRawSave(DBSession), loglevel='debug')
w.work()
print 'finished'
When the daemon threads are killed?
When the application finishes or when the work() finishes?
Look at the thread pool and the worker (it's from a recipe )
from Queue import Queue
from threading import Thread, Event, current_thread
import time
event = Event()
class Worker(Thread):
"""Thread executing tasks from a given tasks queue"""
def __init__(self, tasks):
Thread.__init__(self)
self.tasks = tasks
self.daemon = True
self.start()
def run(self):
'''Start processing tasks from the queue'''
while True:
event.wait()
#time.sleep(0.1)
try:
func, args, callback = self.tasks.get()
except Exception, e:
print str(e)
return
else:
if callback is None:
func(args)
else:
callback(func(args))
self.tasks.task_done()
class ThreadPool:
"""Pool of threads consuming tasks from a queue"""
def __init__(self, num_threads):
self.tasks = Queue(num_threads)
for _ in range(num_threads): Worker(self.tasks)
def add_task(self, func, args=None, callback=None):
''''Add a task to the queue'''
self.tasks.put((func, args, callback))
def wait_completion(self):
'''Wait for completion of all the tasks in the queue'''
self.tasks.join()
def broadcast_block_event(self):
'''blocks running threads'''
event.clear()
def broadcast_unblock_event(self):
'''unblocks running threads'''
event.set()
def get_event(self):
'''returns the event object'''
return event
ALSo maybe the problem it's because I create SWorker objects in a loop?
What happens with the old SWorker (garbage collection ?) ?
There is still not enough code for localize the problem, but I'm sure that this is because you don't utilize the threads and start too much of them. Did you see canonical example from Queue python documentation http://docs.python.org/library/queue.html (bottom of the page)?
I can reproduce your problem with the following code:
import threading
import Queue
q = Queue.Queue()
def worker():
item = q.get(block=True) # sleeps forever for now
do_work(item)
q.task_done()
# create infinite number of workers threads and fails
# after some time with "error: can't start new thread"
while True:
t = threading.Thread(target=worker)
t.start()
q.join() # newer reached this
Instead you must create the poll of threads with known number of threads and put your data to queue like:
q = Queue()
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
UPD: In case you need to stop some thread, you can add a flag to it or send a special mark means "stop" for break while loop:
class Worker(Thread):
break_msg = object() # just uniq mark sign
def __init__(self):
self.continue = True
def run():
while self.continue: # can stop and destroy thread, (var 1)
msg = queue.get(block=True)
if msg == self.break_msg:
return # will stop and destroy thread (var 2)
do_work()
queue.task_done()
workers = [Worker() for _ in xrange(num_workers)]
for w in workers:
w.start()
for task in tasks:
queue.put(task)
for _ in xrange(num_workers):
queue.put(Worker.break_msg) # stop thread after all tasks done. Need as many messages as many threads you have
OR
queue.join() # wait until all tasks done
for w in workers:
w.continue = False
w.put(None)

Categories

Resources