My multi-threading script raising this error :
thread.error : can't start new thread
when it reached 460 threads :
threading.active_count() = 460
I assume the old threads keeps stack up, since the script didn't kill them. This is my code:
import threading
import Queue
import time
import os
import csv
def main(worker):
#Do Work
print worker
return
def threader():
while True:
worker = q.get()
main(worker)
q.task_done()
def main_threader(workers):
global q
global city
q = Queue.Queue()
for x in range(20):
t = threading.Thread(target=threader)
t.daemon = True
print "\n\nthreading.active_count() = " + str(threading.active_count()) + "\n\n"
t.start()
for worker in workers:
q.put(worker)
q.join()
How do I kill the old threads when their job is done? (Is return not enough?)
i'm sure the old threads work is done as i'm printing the results , but i'm not sure why they still active afterward , any direct way to kill a thread after it finish his work ?
Related
I'm trying to code a kind of task manager in Python. It's based on a job queue, the main thread is in charge of adding jobs to this queue. I have made this class to handle the jobs queued, able to limit the number of concurrent processes and handle the output of the finished processes.
Here comes the problem, the _check_jobs function I don't get updated the returncode value of each process, independently of its status (running, finished...) job.returncode is always None, therefore I can't run if statement and remove jobs from the processing job list.
I know it can be done with process.communicate() or process.wait() but I don't want to block the thread that launches the processes. Is there any other way to do it, maybe using a ProcessPoolExecutor? The queue can be hit by processes at any time and I need to be able to handle them.
Thank you all for your time and support :)
from queue import Queue
import subprocess
from threading import Thread
from time import sleep
class JobQueueManager(Queue):
def __init__(self, maxsize: int):
super().__init__(maxsize)
self.processing_jobs = []
self.process = None
self.jobs_launcher=Thread(target=self._worker_job)
self.processing_jobs_checker=Thread(target=self._check_jobs_status)
self.jobs_launcher.start()
self.processing_jobs_checker.start()
def _worker_job(self):
while True:
# Run at max 3 jobs concurrently
if self.not_empty and len(self.processing_jobs) < 3:
# Get job from queue
job = self.get()
# Execute a task without blocking the thread
self.process = subprocess.Popen(job)
self.processing_jobs.append(self.process)
# util if queue.join() is used to block the queue
self.task_done()
else:
print("Waiting 4s for jobs")
sleep(4)
def _check_jobs_status(self):
while True:
# Check if jobs are finished
for job in self.processing_jobs:
# Sucessfully completed
if job.returncode == 0:
self.processing_jobs.remove(job)
# Wait 4 seconds and repeat
sleep(4)
def main():
q = JobQueueManager(100)
task = ["stress", "--cpu", "1", "--timeout", "20"]
for i in range(10): #put 10 tasks in the queue
q.put(task)
q.join() #block until all tasks are done
if __name__ == "__main__":
main()
I answer myself, I have come up with a working solution. The JobExecutor class handles in a custom way the Pool of processes. The watch_completed_tasks function tries to watch and handle the output of the tasks when they are done. This way everything is done with only two threads and the main thread is not blocked when submitting processes.
import subprocess
from threading import Timer
from concurrent.futures import ProcessPoolExecutor, as_completed
import logging
def launch_job(job):
process = subprocess.Popen(job, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(f"launching {process.pid}")
return [process.pid, process.stdout.read(), process.stderr.read()]
class JobExecutor(ProcessPoolExecutor):
def __init__(self, max_workers: int):
super().__init__(max_workers)
self.futures = []
self.watch_completed_tasks()
def submit(self, command):
future = super().submit(launch_job, command)
self.futures.append(future)
return future
def watch_completed_tasks(self):
# Manage tasks completion
for completed_task in as_completed(self.futures):
print(f"FINISHED task with PID {completed_task.result()[0]}")
self.futures.remove(completed_task)
# call this function evevery 5 seconds
timer_thread = Timer(5.0, self.watch_completed_tasks)
timer_thread.setName("TasksWatcher")
timer_thread.start()
def main():
executor = JobExecutor(max_workers=5)
for i in range(10):
task = ["stress",
"--cpu", "1",
"--timeout", str(i+5)]
executor.submit(task)
So I am trying to learn multiprocessing module and have written a code(below) where 4 processes are generated and are assigned 8 jobs (in the processor function) and each job contains just a sleep function (in the example job function).Now I have written the similar code in multithreading module there is was working fine but here it is not outputting any thing.Please help
from multiprocessing import Process, Lock
import multiprocessing
import time
print_lock = Lock()
def exampleJob(worker): # function simulating some computation
time.sleep(.5)
print_lock.acquire()
print(multiprocessing.current_process.pid,worker)
print_lock.release()
def processor(): #function where process pick up the job
while True:
worker = q.get()
exampleJob(worker)
q.task_done()
q = multiprocessing.JoinableQueue()
process = []
for x in range(4):
p = multiprocessing.Process(target=processor)
process.append(p)
for i in range(0,len(process)):
process[i].start
start = time.time()
for worker in range(8):
q.put(worker)
q.join()
print('Entire job took:',time.time() - start)
The first problem is start needs to be start().
Also, separate processes have separate global variables, so print_lock = Lock() is a different lock in each process. You have to create the lock once and pass it to the individual processes. This goes for the queue as well.
A JoinableQueue isn't really needed. What's needed is a sentinel flag to tell the processes to exit, and join the processes.
Working example with other fixes:
import multiprocessing as mp
import time
def exampleJob(print_lock,worker): # function simulating some computation
time.sleep(.5)
with print_lock:
print(mp.current_process().name,worker)
def processor(print_lock,q): # function where process pick up the job
while True:
worker = q.get()
if worker is None: # flag to exit the process
break
exampleJob(print_lock,worker)
# This "if" required for portability in some OSes.
# Windows for example creates new Python processes and imports the original script.
# Without this the below code would run again in each child process.
if __name__ == '__main__':
print_lock = mp.Lock()
q = mp.Queue()
processes = [mp.Process(target=processor,args=(print_lock,q)) for _ in range(4)]
for process in processes:
process.start() # OP code didn't *call* the start method.
start = time.time()
for worker in range(8):
q.put(worker)
for process in processes:
q.put(None) # quit indicator
for process in processes:
process.join()
print('Entire job took:',time.time() - start)
Output:
Process-2 2
Process-1 0
Process-3 1
Process-4 3
Process-3 6
Process-1 5
Process-2 4
Process-4 7
Entire job took: 1.1350018978118896
I want to kill a thread in python. This thread can run in a blocking operation and join can't terminate it.
Simular to this:
from threading import Thread
import time
def block():
while True:
print("running")
time.sleep(1)
if __name__ == "__main__":
thread = Thread(target = block)
thread.start()
#kill thread
#do other stuff
My problem is that the real blocking operation is in another module that is not from me so there is no place where I can break with a running variable.
The thread will be killed when exiting the main process if you set it up as a daemon:
from threading import Thread
import time
def block():
while True:
print("running")
time.sleep(1)
if __name__ == "__main__":
thread = Thread(target = block, daemon = True)
thread.start()
sys.exit(0)
Otherwise just set a flag, I'm using a bad example (you should use some synchronization not just a plain variable):
from threading import Thread
import time
RUNNING = True
def block():
global RUNNING
while RUNNING:
print("running")
time.sleep(1)
if __name__ == "__main__":
thread = Thread(target = block, daemon = True)
thread.start()
RUNNING = False # thread will stop, not killed until next loop iteration
.... continue your stuff here
Use a running variable:
from threading import Thread
import time
running = True
def block():
global running
while running:
print("running")
time.sleep(1)
if __name__ == "__main__":
thread = Thread(target = block)
thread.start()
running = False
# do other stuff
I would prefer to wrap it all in a class, but this should work (untested though).
EDIT
There is a way to asynchronously raise an exception in a separate thread which could be caught by a try: except: block, but it's a dirty dirty hack: https://gist.github.com/liuw/2407154
Original post
"I want to kill a thread in python." you can't. Threads are only killed when they're daemons when there are no more non-daemonic threads running from the parent process. Any thread can be asked nicely to terminate itself using standard inter-thread communication methods, but you state that you don't have any chance to interrupt the function you want to kill. This leaves processes.
Processes have more overhead, and are more difficult to pass data to and from, but they do support being killed by sending SIGTERM or SIGKILL.
from multiprocessing import Process, Queue
from time import sleep
def workfunction(*args, **kwargs): #any arguments you send to a child process must be picklable by python's pickle module
sleep(args[0]) #really long computation you might want to kill
return 'results' #anything you want to get back from a child process must be picklable by python's pickle module
class daemon_worker(Process):
def __init__(self, target_func, *args, **kwargs):
self.return_queue = Queue()
self.target_func = target_func
self.args = args
self.kwargs = kwargs
super().__init__(daemon=True)
self.start()
def run(self): #called by self.start()
self.return_queue.put(self.target_func(*self.args, **self.kwargs))
def get_result(self): #raises queue.Empty if no result is ready
return self.return_queue.get()
if __name__=='__main__':
#start some work that takes 1 sec:
worker1 = daemon_worker(workfunction, 1)
worker1.join(3) #wait up to 3 sec for the worker to complete
if not worker1.is_alive(): #if we didn't hit 3 sec timeout
print('worker1 got: {}'.format(worker1.get_result()))
else:
print('worker1 still running')
worker1.terminate()
print('killing worker1')
sleep(.1) #calling worker.is_alive() immediately might incur a race condition where it may or may not have shut down yet.
print('worker1 is alive: {}'.format(worker1.is_alive()))
#start some work that takes 100 sec:
worker2 = daemon_worker(workfunction, 100)
worker2.join(3) #wait up to 3 sec for the worker to complete
if not worker2.is_alive(): #if we didn't hit 3 sec timeout
print('worker2 got: {}'.format(worker2.get_result()))
else:
print('worker2 still running')
worker2.terminate()
print('killing worker2')
sleep(.1) #calling worker.is_alive() immediately might incur a race condition where it may or may not have shut down yet.
print('worker2 is alive: {}'.format(worker2.is_alive())
Lets say I have the below code:
import Queue
import threading
import time
def basic_worker(queue, thread_name):
while True:
if queue.empty(): break
print "Starting %s" % (threading.currentThread().getName()) + "\n"
item = queue.get()
##do_work on item which might take 10-15 minutes to complete
queue.task_done()
print "Ending %s" % (threading.currentThread().getName()) + "\n"
def basic(queue):
# http://docs.python.org/library/queue.html
for i in range(10):
t = threading.Thread(target=basic_worker,args=(queue,tName,))
t.daemon = True
t.start()
queue.join() # block until all tasks are done
print 'got here' + '\n'
queue = Queue.Queue()
for item in range(4):
queue.put(item)
basic(queue)
print "End of program"
My question is, if I set t.daemon = True will it exit the code killing the threads that are taking 10-15 minutes to do some work on the item from the queue? Because from what I have read it says that the program will exit if there are any daemonic threads alive. My understanding is that the threads working on the item taking a long time will also exit incompletely. If I don't set t.daemon = True my program hangs forever and doesn't exit when there are no items in the queue.
The reason why the programm hangs forever if t.daemon = False, is that the following code block ...
if queue.empty(): break
... leads to a race-condition.
Imagine there is only one item left in the queue and two threads evaluate the condition above nearly simultaneously. The condition evaluates to False for both threads ... so they don't break.
The faster thread gets the last item, while the slower hangs forever in the statement item = queue.get().
Respecting the fact that daemon mode is False the program waits for all threads to be finished. That never happens.
From my point of view, the code you provided (with t.daemon = True), works fine.
May the following sentence confuses you:
The entire Python program exits when no alive non-daemon threads are left.
... but consider: If you start all threads from the main thread with t.daemon = True, the only non-daemon thread is the main thread itself. So the program exists when the main thread is finished.
... and that does not happen until the queue is empty, because of the queue.join() statement. So you long running computations inside the child threads will not be interrupted.
There is no need to check the queue.empty(), when using daemon threads and queue.join().
This should be enough:
#!/bin/python
import Queue
import threading
import time
def basic_worker(queue, thread_name):
print "Starting %s" % (threading.currentThread().getName()) + "\n"
while True:
item = queue.get()
##do_work on item which might take 10-15 minutes to complete
time.sleep(5) # to simulate work
queue.task_done()
def basic(queue):
# http://docs.python.org/library/queue.html
for i in range(10):
print 'enqueuing', i
t = threading.Thread(target=basic_worker, args=(queue, i))
t.daemon = True
t.start()
queue.join() # block until all tasks are done
print 'got here' + '\n'
queue = Queue.Queue()
for item in range(4):
queue.put(item)
basic(queue)
print "End of program"
I have a simple python app that will not terminate if i use queue.join(). Below is the code:
import threading
import Queue
q = Queue.Queue()
for i in range(5):
q.put("BLAH")
def worker():
while True:
print q.qsize()
a = q.get()
print q.qsize()
q.task_done()
print q.qsize()
for i in range(2):
t = threading.Thread(target=worker())
t.daemon = True
t.start()
q.join()
I've also created a watchdog thread that print's threading.enumerate(), then sleeps for 2 seconds. The only thread left is the MainThread, and the queue size is in fact 0. This script will never terminate. I have to ctrl + z, then kill it. What's going on?
t = threading.Thread(target=worker)
You want to pass a reference to the worker function, you should not call it.
worker function does not exit, therefore it will not join. Second you probably want to join thread not queue.
I'm not an expert in python threading, but queue is just for data passing between threads.