I am quite new to programming and I am running Linux, python3.5
There are a few similar questions in Stack Overflow but most of them do not have any response
like: [Python 2.7 multi-thread]In Python, how to timeout a function call in sub-thread?, and Python , Timeout on a function on child thread without using signal and thread.join
I am able to use signal when it is in main thread and timeout for multiprocess. However, the function I am currently running is a child thread using apscheduler (or it can be started directly)
schedule.add_job(test_upload.run, 'interval', seconds=10, start_date='2016-01-01 00:00:05',
args=['instant'])
and I can't convert it to child process because I am sharing database connection.
I have also tried https://stackoverflow.com/a/36904264/2823816, but terminal said
result = await future.result(timeout = timeout)
^
SyntaxError: invalid syntax
in
import concurrent
def run():
return 1
timeout = 10
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(run) # get a future object
try:
result = await future.result(timeout = timeout)
except concurrent.futures.TimeOutError:
result = None
I am now very sure how to solve it:( Thanks for any help.
I gave up timing out the thread in my child thread.
So I used multi-process within the child thread to kill it. I could not find any other solution.
https://github.com/dozysun/timeout-timer, work fine in sub thread which use sub thread as timer
Related
I have a program, which uses multiprocesses to execute functions from an external hardware library. The communication between the multiprocess and my program happens with JoinableQueue().
A part of the code looks like this:
# Main Code
queue_cmd.put("do_something")
queue_cmd.join() # here is my problem
# multiprocess
task = queue_cmd.get()
if task == "do_something":
external_class.do_something()
queue_cmd.task_done()
Note: external_class is the external hardware library.
This library sometimes crashes and the line queue_cmd.task_done() never gets executed. As a result, my main program hangs indefinitely in the queue_cmd.join() part, waiting for the queue_cmd.task_done() to be called. Unfortunately, there is no timeout parameter for the join() function.
How can I wait for the element in the JoinableQueue to be processed, but also deal with the event of my multiprocess terminating (due to the crash in the do_something() function)?
Ideally, the join function would have a timeout parameter (.join(timeout=30)), which I could use to restart the multiprocess - but it does not.
You can always wrap a blocking function on another thread:
queue_cmd.put("do_something")
t = Thread(target=queue_cmd.join)
t.start()
# implement a timeout
start = datetime.now()
timeout = 10 # seconds
while t.is_alive() and (datetime.now() - start).seconds < timeout:
# do something else
# waiting for the join or timeout
if t.is_alive():
# kill the subprocess that failed
pass
I think the best approach here is to start the "crashable" module in (yet) another process:
Main code
queue_cmd.put("do_something")
queue_cmd.join()
Multiprocess (You can now move this to a thread)
task = queue_cmd.get()
if task == "do_something":
subprocess.run(["python", "pleasedontcrash.py"])
queue_cmd.task_done()
pleasedontcrash.py
external_class.do_something()
As shown, I'd do it using subprocess. If you need to pass parameters (which you could with subprocess using pipes or arguments), it's easier to use multiprocessing.
while True:
pid = os.getpid()
try:
pool = mp.Pool(processes=1, maxtasksperchild=1)
result = pool.apply_async(my_func, args=())
result.get(timeout=60)
pool.close()
except multiprocessing.context.TimeoutError:
traceback.print_exc()
kill_proc_tree(pid)
def kill_proc_tree(pid):
parent = psutil.Process(pid)
children = parent.children(recursive=True)
for child in children:
child.kill()
I am using the multiprocessing library and am trying to spawn a new process everytime my_func finishes running, throws an exception, or has ran longer than 60 seconds (result.get(timeout=60) should throw an exception). Since I want to keep the while loop running but also avoid having zombie processes, I need to be able to keep the parent process running but at the same time, kill all child processes if an exception is thrown in the parent process or the child process, or the child process finishes before spawning a new process.The kill_proc_tree function that I found online was supposed to tackle the issue which it seemed to do at first (my_func opens a new window when a process begins and closes the window when the process supposedly ends), but then I realized that in my Task Manager, the Python Script is still taking up my memory and after enough multiprocessing.context.TimeoutError errors (they are thrown by the parent process), my memory becomes full.
So what I should I do to solve this problem? Any help would be greatly appreciated!
The solution should be as simple as calling method terminate on the pool for all exceptions and not just for a TimeoutError since result.get(timeout=60) can throw an arbitrary exception if your my_func completes before the 60 seconds with an exception.
Note that according to the documentation the terminate method "stops the worker processes immediately without completing outstanding work" and will be implicitly called when the context handler for the pool is exited as in the following example:
import multiprocessing
while True:
try:
with multiprocessing.Pool(processes=1, maxtasksperchild=1) as pool:
result = pool.apply_async(my_func, args=())
result.get(timeout=60)
except Exception:
pass
Specifying the maxtasksperchild=1 parameter to the Pool constructor seems somewhat superfluous since you are never submitting more than one task to the pool anyway.
My basic question is: how do I detect whether the current thread is a dummy thread? I am new to threading and I recently was debugging some code in my Apache2/Flask app and thought it might be useful. I was getting a flip flopping error where a request was processed successfully on the main thread, unsuccessfully on a dummy thread and then successfully on the main thread again, etc.
Like I said I am using Apache2 and Flask which seems the combination of which creates these dummy threads. I would also be interested in knowing more about that if anyone can teach me.
My code is meant to print information about the threads running on the service and looks something like this:
def allthr_info(self):
"""Returns info in JSON form of all threads."""
all_thread_infos = Queue()
for thread_x in threading.enumerate():
if thread_x is threading.current_thread() or thread_x is threading.main_thread():
continue
info = self._thr_info(thread_x)
all_thread_infos.put(info)
return list(all_thread_infos.queue)
def _thr_info(self, thr):
"""Consolidation of the thread info that can be obtained from threading module."""
thread_info = {}
try:
thread_info = {
'name': thr.getName(),
'ident': thr.ident,
'daemon': thr.daemon,
'is_alive': thr.is_alive(),
}
except Exception as e:
LOGGER.error(e)
return thread_info
You can check if the current thread is an instance of threading._DummyThread.
isinstance(threading.current_thread(), threading._DummyThread)
threading.py itself can teach you what dummy-threads are about:
Dummy thread class to represent threads not started here.
These aren't garbage collected when they die, nor can they be waited for.
If they invoke anything in threading.py that calls current_thread(), they
leave an entry in the _active dict forever after.
Their purpose is to return something from current_thread().
They are marked as daemon threads so we won't wait for them
when we exit (conform previous semantics).
def current_thread():
"""Return the current Thread object, corresponding to the caller's thread of control.
If the caller's thread of control was not created through the threading
module, a dummy thread object with limited functionality is returned.
"""
try:
return _active[get_ident()]
except KeyError:
return _DummyThread()
I was previously using the threading.Thread module. Now I'm using concurrent.futures -> ThreadPoolExecutor. Previously, I was using the following code to exit/kill/finish a thread:
def terminate_thread(thread):
"""Terminates a python thread from another thread.
:param thread: a threading.Thread instance
"""
if not thread.isAlive():
return
exc = ctypes.py_object(SystemExit)
res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
ctypes.c_long(thread.ident), exc)
if res == 0:
raise ValueError("nonexistent thread id")
elif res > 1:
# """if it returns a number greater than one, you're in trouble,
# and you should call it again with exc=NULL to revert the effect"""
ctypes.pythonapi.PyThreadState_SetAsyncExc(thread.ident, None)
raise SystemError("PyThreadState_SetAsyncExc failed")
This doesn't appear to be working with futures interface. What's the best practice here? Just return? My threads are controlling Selenium instances. I need to make sure that when I kill a thread, the Selenium instance is torn down.
Edit: I had already seen the post that is referenced as duplicate. It's insufficient because when you venture into something like futures, behaviors can be radically different. In the case of the previous threading module, my terminate_thread function is acceptable and not applicable to the criticism of the other q/a. It's not the same as "killing". Please take a look at the code I posted to see that.
I don't want to kill. I want to check if its still alive and gracefully exit the thread in the most proper way. How to do with futures?
If you want to let the threads finish their current work use:
thread_executor.shutdown(wait=True)
If you want to bash the current futures being run on the head and stop all ...future...(heh) futures use:
thread_executor.shutdown(wait=False)
for t in thread_executor._threads:
terminate_thread(t)
This uses your terminate_thread function to call an exception in the threads in the thread pool executor. Those futures that were disrupted will return with the exception set.
How about .cancel() on the thread result?
cancel() Attempt to cancel the call. If the call is currently being
executed and cannot be cancelled then the method will return False,
otherwise the call will be cancelled and the method will return True.
https://docs.python.org/3/library/concurrent.futures.html
I am trying to use The Queue in python which will be multithreaded. I just wanted to know the approach I am using is correct or not. And if I am doing something redundant or If there is a better approach that I should use.
I am trying to get new requests from a table and schedule them using some logic to perform some operation like running a query.
So here from the main thread I spawn a separate thread for the queue.
if __name__=='__main__':
request_queue = SetQueue(maxsize=-1)
worker = Thread(target=request_queue.process_queue)
worker.setDaemon(True)
worker.start()
while True:
try:
#Connect to the database get all the new requests to be verified
db = Database(username_testschema, password_testschema, mother_host_testschema, mother_port_testschema, mother_sid_testschema, 0)
#Get new requests for verification
verify_these = db.query("SELECT JOB_ID FROM %s.table WHERE JOB_STATUS='%s' ORDER BY JOB_ID" %
(username_testschema, 'INITIATED'))
#If there are some requests to be verified, put them in the queue.
if len(verify_these) > 0:
for row in verify_these:
print "verifying : %s" % row[0]
verify_id = row[0]
request_queue.put(verify_id)
except Exception as e:
logger.exception(e)
finally:
time.sleep(10)
Now in the Setqueue class I have a process_queue function which is used for processing the top 2 requests in every run that were added to the queue.
'''
Overridding the Queue class to use set as all_items instead of list to ensure unique items added and processed all the time,
'''
class SetQueue(Queue.Queue):
def _init(self, maxsize):
Queue.Queue._init(self, maxsize)
self.all_items = set()
def _put(self, item):
if item not in self.all_items:
Queue.Queue._put(self, item)
self.all_items.add(item)
'''
The Multi threaded queue for verification process. Take the top two items, verifies them in a separate thread and sleeps for 10 sec.
This way max two requests per run will be processed.
'''
def process_queue(self):
while True:
scheduler_obj = Scheduler()
try:
if self.qsize() > 0:
for i in range(2):
job_id = self.get()
t = Thread(target=scheduler_obj.verify_func, args=(job_id,))
t.start()
for i in range(2):
t.join(timeout=1)
self.task_done()
except Exception as e:
logger.exception(
"QUEUE EXCEPTION : Exception occured while processing requests in the VERIFICATION QUEUE")
finally:
time.sleep(10)
I want to see if my understanding is correct and if there can be any issues with it.
So the main thread running in while True in the main func connects to database gets new requests and puts it in the queue. The worker thread(daemon) for the queue keeps on getting new requests from the queue and fork non-daemon threads which do the processing and since timeout for the join is 1 the worker thread will keep on taking new requests without getting blocked, and its child thread will keep on processing in the background. Correct?
So in case if the main process exit these won`t be killed until they finish their work but the worker daemon thread would exit.
Doubt : If the parent is daemon and child is non daemon and if parent exits does child exit?).
I also read here :- David beazley multiprocessing
By david beazley in using a Pool as a Thread Coprocessor section where he is trying to solve a similar problem. So should I follow his steps :-
1. Create a pool of processes.
2. Open a thread like I am doing for request_queue
3. In that thread
def process_verification_queue(self):
while True:
try:
if self.qsize() > 0:
job_id = self.get()
pool.apply_async(Scheduler.verify_func, args=(job_id,))
except Exception as e:
logger.exception("QUEUE EXCEPTION : Exception occured while processing requests in the VERIFICATION QUEUE")
Use a process from the pool and run the verify_func in parallel. Will this give me more performance?
While its possible to create a new independent thread for the queue, and process that data separately the way you are doing it, I believe it is more common for each independent worker thread to post messages to a queue that they already "know" about. Then that queue is processed from some other thread by pulling messages out of that queue.
Design Idea
The way I invision your application would be three threads. The main thread, and two worker threads. 1 worker thread would get requests from the database and put them in the queue. The other worker thread would process that data from the queue
The main thread would just waiting for the other threads to finish by using the thread functions .join()
You would protect queue that the threads have access to and make it thread safe by using a mutex. I have seen this pattern in many other designs in other languages as well.
Suggested Reading
"Effective Python" by Brett Slatkin has a great example of this very question.
Instead of inheriting from Queue, he just creates a wrapper to it in his class
called MyQueue and adds a get() and put(message) function.
He even provides the source code at his Github repo
https://github.com/bslatkin/effectivepython/blob/master/example_code/item_39.py
I'm not affiliated with the book or its author, but I highly recommend it as I learned quite a few things from it :)
I like this explanation of the advantages & differences between using threads and processes -
".....But there's a silver lining: processes can make progress on multiple threads of execution simultaneously. Since a parent process doesn't share the GIL with its child processes, all processes can execute simultaneously (subject to the constraints of the hardware and OS)...."
He has some great explanations for getting around GIL and how to improve performance
Read more here:
http://jeffknupp.com/blog/2013/06/30/pythons-hardest-problem-revisited/