Kill function after a given amount of time? - python

What's the best way to kill a function (that is still running) after a given amount of time in Python? These are two approaches I have found so far:
Say this is our base function:
import time
def foo():
a_long_time = 10000000
time.sleep(a_long_time)
TIMEOUT = 5 # seconds
1. Multiprocessing Approach
import multiprocessing
if __name__ == '__main__':
p = multiprocessing.Process(target=foo, name="Foo")
p.start()
p.join(TIMEOUT)
if p.is_alive()
print('function terminated')
p.terminate()
p.join()
2. Signal Approach
import signal
class TimeoutException(Exception):
pass
def timeout_handler(signum, frame):
raise TimeoutException
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(TIMEOUT)
try:
foo()
except TimeoutException:
print('function terminated')
What are the advantages and disadvantages in terms of scope, safety and usability of these two methods? Are there any better approaches?

Well, as always, it depends.
As you probably have already verified, both these methods work. I would say it depends on your application and correct implementation (your signalling method is a bit wrong...)
Both methods can be considered "safe" if implemented correctly. It depends if your main program outside the foo function needs to do something, or can it just sit and wait for foo to either complete or timeout. The signalling method does not allow any parallel processing, as your main program will be in foo() until it either completes or times out. BUT you need then to defuse the signal. If your foo completes in one second, your main program leaves the try/except structure, and four seconds later ... kaboom ... an exception is raised and probably uncaught. Not good.
try:
foo()
signal.alarm(0)
except TimeoutException:
print ("function terminated")
solves the problem.
I would personally prefer the multiprocessing approach. It is simpler and does not require signals and exception handling that in theory can go wrong if your program execution is not where you expect it to be when a signal is raised. If it is ok for your program to wait in join(), then you are done. However, if you want to do something in the main process while you wait, you can enter a loop, track time in a variable, check if over timeout and if so, terminate the process. You would just use join with a tiny timeout to "peek" if the process is still running.
Another method, depending on your foo(), is to use threads with a class or a global variable. If your foo keeps processing commands instead of possibly waiting for a long time for a command to finish, you can add an if clause there:
def foo():
global please_exit_now
while True:
do_stuff
do_more_stuff
if foo_is_ready:
break
if please_exit_now is True:
please_exit_now = False
return
finalise_foo
return
If do_stuff and do_more_stuff complete in a reasonable amount of time, you could then process things in your main program and just set global please_exit_now as True, and your thread would eventually notice that and exit.
I would probably just go for your multiprocessing and join, though.
Hannu

Related

Python - How to break immediately out of loop without waiting for next iteration, or stop thread? [duplicate]

Is there a way in python to interrupt a thread when it's sleeping?
(As we can do in java)
I am looking for something like that.
import threading
from time import sleep
def f():
print('started')
try:
sleep(100)
print('finished')
except SleepInterruptedException:
print('interrupted')
t = threading.Thread(target=f)
t.start()
if input() == 'stop':
t.interrupt()
The thread is sleeping for 100 seconds and if I type 'stop', it interrupts
The correct approach is to use threading.Event. For example:
import threading
e = threading.Event()
e.wait(timeout=100) # instead of time.sleep(100)
In the other thread, you need to have access to e. You can interrupt the sleep by issuing:
e.set()
This will immediately interrupt the sleep. You can check the return value of e.wait to determine whether it's timed out or interrupted. For more information refer to the documentation: https://docs.python.org/3/library/threading.html#event-objects .
How about using condition objects: https://docs.python.org/2/library/threading.html#condition-objects
Instead of sleep() you use wait(timeout). To "interrupt" you call notify().
If you, for whatever reason, needed to use the time.sleep function and happened to expect the time.sleep function to throw an exception and you simply wanted to test what happened with large sleep values without having to wait for the whole timeout...
Firstly, sleeping threads are lightweight and there's no problem just letting them run in daemon mode with threading.Thread(target=f, daemon=True) (so that they exit when the program does). You can check the result of the thread without waiting for the whole execution with t.join(0.5).
But if you absolutely need to halt the execution of the function, you could use multiprocessing.Process, and call .terminate() on the spawned process. This does not give the process time to clean up (e.g. except and finally blocks aren't run), so use it with care.

Training a model based on time rather than epochs [duplicate]

In Python, for a toy example:
for x in range(0, 3):
# Call function A(x)
I want to continue the for loop if function A takes more than five seconds by skipping it so I won't get stuck or waste time.
By doing some search, I realized a subprocess or thread may help, but I have no idea how to implement it here.
I think creating a new process may be overkill. If you're on Mac or a Unix-based system, you should be able to use signal.SIGALRM to forcibly time out functions that take too long. This will work on functions that are idling for network or other issues that you absolutely can't handle by modifying your function. I have an example of using it in this answer:
Option for SSH to timeout after a short time? ClientAlive & ConnectTimeout don't seem to do what I need them to do
Editing my answer in here, though I'm not sure I'm supposed to do that:
import signal
class TimeoutException(Exception): # Custom exception class
pass
def timeout_handler(signum, frame): # Custom signal handler
raise TimeoutException
# Change the behavior of SIGALRM
signal.signal(signal.SIGALRM, timeout_handler)
for i in range(3):
# Start the timer. Once 5 seconds are over, a SIGALRM signal is sent.
signal.alarm(5)
# This try/except loop ensures that
# you'll catch TimeoutException when it's sent.
try:
A(i) # Whatever your function that might hang
except TimeoutException:
continue # continue the for loop if function A takes more than 5 second
else:
# Reset the alarm
signal.alarm(0)
This basically sets a timer for 5 seconds, then tries to execute your code. If it fails to complete before time runs out, a SIGALRM is sent, which we catch and turn into a TimeoutException. That forces you to the except block, where your program can continue.
Maybe someone find this decorator useful, based on TheSoundDefense answer:
import time
import signal
class TimeoutException(Exception): # Custom exception class
pass
def break_after(seconds=2):
def timeout_handler(signum, frame): # Custom signal handler
raise TimeoutException
def function(function):
def wrapper(*args, **kwargs):
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(seconds)
try:
res = function(*args, **kwargs)
signal.alarm(0) # Clear alarm
return res
except TimeoutException:
print u'Oops, timeout: %s sec reached.' % seconds, function.__name__, args, kwargs
return
return wrapper
return function
Test:
#break_after(3)
def test(a, b, c):
return time.sleep(10)
>>> test(1,2,3)
Oops, timeout: 3 sec reached. test (1, 2, 3) {}
If you can break your work up and check every so often, that's almost always the best solution. But sometimes that's not possible—e.g., maybe you're reading a file off an slow file share that every once in a while just hangs for 30 seconds. To deal with that internally, you'd have to restructure your whole program around an async I/O loop.
If you don't need to be cross-platform, you can use signals on *nix (including Mac and Linux), APCs on Windows, etc. But if you need to be cross-platform, that doesn't work.
So, if you really need to do it concurrently, you can, and sometimes you have to. In that case, you probably want to use a process for this, not a thread. You can't really kill a thread safely, but you can kill a process, and it can be as safe as you want it to be. Also, if the thread is taking 5+ seconds because it's CPU-bound, you don't want to fight with it over the GIL.
There are two basic options here.
First, you can put the code in another script and run it with subprocess:
subprocess.check_call([sys.executable, 'other_script.py', arg, other_arg],
timeout=5)
Since this is going through normal child-process channels, the only communication you can use is some argv strings, a success/failure return value (actually a small integer, but that's not much better), and optionally a hunk of text going in and a chunk of text coming out.
Alternatively, you can use multiprocessing to spawn a thread-like child process:
p = multiprocessing.Process(func, args)
p.start()
p.join(5)
if p.is_alive():
p.terminate()
As you can see, this is a little more complicated, but it's better in a few ways:
You can pass arbitrary Python objects (at least anything that can be pickled) rather than just strings.
Instead of having to put the target code in a completely independent script, you can leave it as a function in the same script.
It's more flexible—e.g., if you later need to, say, pass progress updates, it's very easy to add a queue in either or both directions.
The big problem with any kind of parallelism is sharing mutable data—e.g., having a background task update a global dictionary as part of its work (which your comments say you're trying to do). With threads, you can sort of get away with it, but race conditions can lead to corrupted data, so you have to be very careful with locking. With child processes, you can't get away with it at all. (Yes, you can use shared memory, as Sharing state between processes explains, but this is limited to simple types like numbers, fixed arrays, and types you know how to define as C structures, and it just gets you back to the same problems as threads.)
Ideally, you arrange things so you don't need to share any data while the process is running—you pass in a dict as a parameter and get a dict back as a result. This is usually pretty easy to arrange when you have a previously-synchronous function that you want to put in the background.
But what if, say, a partial result is better than no result? In that case, the simplest solution is to pass the results over a queue. You can do this with an explicit queue, as explained in Exchanging objects between processes, but there's an easier way.
If you can break the monolithic process into separate tasks, one for each value (or group of values) you wanted to stick in the dictionary, you can schedule them on a Pool—or, even better, a concurrent.futures.Executor. (If you're on Python 2.x or 3.1, see the backport futures on PyPI.)
Let's say your slow function looked like this:
def spam():
global d
for meat in get_all_meats():
count = get_meat_count(meat)
d.setdefault(meat, 0) += count
Instead, you'd do this:
def spam_one(meat):
count = get_meat_count(meat)
return meat, count
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
results = executor.map(spam_one, get_canned_meats(), timeout=5)
for (meat, count) in results:
d.setdefault(meat, 0) += count
As many results as you get within 5 seconds get added to the dict; if that isn't all of them, the rest are abandoned, and a TimeoutError is raised (which you can handle however you want—log it, do some quick fallback code, whatever).
And if the tasks really are independent (as they are in my stupid little example, but of course they may not be in your real code, at least not without a major redesign), you can parallelize the work for free just by removing that max_workers=1. Then, if you run it on an 8-core machine, it'll kick off 8 workers and given them each 1/8th of the work to do, and things will get done faster. (Usually not 8x as fast, but often 3-6x as fast, which is still pretty nice.)
This seems like a better idea (sorry, I am not sure of the Python names of thing yet):
import signal
def signal_handler(signum, frame):
raise Exception("Timeout!")
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(3) # Three seconds
try:
for x in range(0, 3):
# Call function A(x)
except Exception, msg:
print "Timeout!"
signal.alarm(0) # Reset
The comments are correct in that you should check inside. Here is a potential solution. Note that an asynchronous function (by using a thread for example) is different from this solution. This is synchronous which means it will still run in series.
import time
for x in range(0,3):
someFunction()
def someFunction():
start = time.time()
while (time.time() - start < 5):
# do your normal function
return;

Python threading interrupt sleep

Is there a way in python to interrupt a thread when it's sleeping?
(As we can do in java)
I am looking for something like that.
import threading
from time import sleep
def f():
print('started')
try:
sleep(100)
print('finished')
except SleepInterruptedException:
print('interrupted')
t = threading.Thread(target=f)
t.start()
if input() == 'stop':
t.interrupt()
The thread is sleeping for 100 seconds and if I type 'stop', it interrupts
The correct approach is to use threading.Event. For example:
import threading
e = threading.Event()
e.wait(timeout=100) # instead of time.sleep(100)
In the other thread, you need to have access to e. You can interrupt the sleep by issuing:
e.set()
This will immediately interrupt the sleep. You can check the return value of e.wait to determine whether it's timed out or interrupted. For more information refer to the documentation: https://docs.python.org/3/library/threading.html#event-objects .
How about using condition objects: https://docs.python.org/2/library/threading.html#condition-objects
Instead of sleep() you use wait(timeout). To "interrupt" you call notify().
If you, for whatever reason, needed to use the time.sleep function and happened to expect the time.sleep function to throw an exception and you simply wanted to test what happened with large sleep values without having to wait for the whole timeout...
Firstly, sleeping threads are lightweight and there's no problem just letting them run in daemon mode with threading.Thread(target=f, daemon=True) (so that they exit when the program does). You can check the result of the thread without waiting for the whole execution with t.join(0.5).
But if you absolutely need to halt the execution of the function, you could use multiprocessing.Process, and call .terminate() on the spawned process. This does not give the process time to clean up (e.g. except and finally blocks aren't run), so use it with care.

How to stop an infinite loop safely in Python?

I've got a script that runs an infinite loop and adds things to a database and does things that I can't just stop halfway through, so I can't just press Ctrl+C and stop it.
I want to be able to somehow stop a while loop, but let it finish it's last iteration before it stops.
Let me clarify:
My code looks something like this:
while True:
do something
do more things
do more things
I want to be able to interrupt the while loop at the end, or the beginning, but not between doing things because that would be bad.
And I don't want it to ask me after every iteration if I want to continue.
Thanks for the great answers, I'm super grateful but my implementation doesn't seem to be working:
def signal_handler(signal, frame):
global interrupted
interrupted = True
class Crawler():
def __init__(self):
# not relevant
def crawl(self):
interrupted = False
signal.signal(signal.SIGINT, signal_handler)
while True:
doing things
more things
if interrupted:
print("Exiting..")
break
When I press Ctrl+C the program just keeps going ignoring me.
What you need to do is catch the interrupt, set a flag saying you were interrupted but then continue working until it's time to check the flag (at the end of each loop). Because python's try-except construct will abandon the current run of the loop, you need to set up a proper signal handler; it'll handle the interrupt but then let python continue where it left off. Here's how:
import signal
import time # For the demo only
def signal_handler(signal, frame):
global interrupted
interrupted = True
signal.signal(signal.SIGINT, signal_handler)
interrupted = False
while True:
print("Working hard...")
time.sleep(3)
print("All done!")
if interrupted:
print("Gotta go")
break
Notes:
Use this from the command line. In the IDLE console, it'll trample on IDLE's own interrupt handling.
A better solution would be to "block" KeyboardInterrupt for the duration of the loop, and unblock it when it's time to poll for interrupts. This is a feature of some Unix flavors but not all, hence python does not support it (see the third "General rule")
The OP wants to do this inside a class. But the interrupt function is invoked by the signal handling system, with two arguments: The signal number and a pointer to the stack frame-- no place for a self argument giving access to the class object. Hence the simplest way to set a flag is to use a global variable. You can rig a pointer to the local context by using closures (i.e., define the signal handler dynamically in __init__(), but frankly I wouldn't bother unless a global is out of the question due to multi-threading or whatever.
Caveat: If your process is in the middle of a system call, handling an signal may interrupt the system call. So this may not be safe for all applications. Safer alternatives would be (a) Instead of relying on signals, use a non-blocking read at the end of each loop iteration (and type input instead of hitting ^C); (b) use threads or interprocess communication to isolate the worker from the signal handling; or (c) do the work of implementing real signal blocking, if you are on an OS that has it. All of them are OS-dependent to some extent, so I'll leave it at that.
the below logic will help you do this,
import signal
import sys
import time
run = True
def signal_handler(signal, frame):
global run
print("exiting")
run = False
signal.signal(signal.SIGINT, signal_handler)
while run:
print("hi")
time.sleep(1)
# do anything
print("bye")
while running this, try pressing CTRL + C
To clarify #praba230890's solution: The interrupted variable was not defined in the correct scope. It was defined in the crawl function and the handler could not reach it as a global variable, according to the definition of the handler at the root of the program.
Here is edited example of the principle above. It is the infinitive python loop in a separate thread with the safe signal ending. Also has thread-blocking sleep step - up to you to keep it, replace for asyncio implementation or remove.
This function could be imported to any place in an application, runs without blocking other code (e.g. good for REDIS pusub subscription). After the SIGINT catch the thread job ends peacefully.
from typing import Callable
import time
import threading
import signal
end_job = False
def run_in_loop(job: Callable, interval_sec: int = 0.5):
def interrupt_signal_handler(signal, frame):
global end_job
end_job = True
signal.signal(signal.SIGINT, interrupt_signal_handler)
def do_job():
while True:
job()
time.sleep(interval_sec)
if end_job:
print("Parallel job ending...")
break
th = threading.Thread(target=do_job)
th.start()
You forgot to add global statement in crawl function.
So result will be
import signal
def signal_handler(signal, frame):
global interrupted
interrupted = True
class Crawler():
def __init__(self):
... # or pass if you don't want this to do anything. ... Is for unfinished code
def crawl(self):
global interrupted
interrupted = False
signal.signal(signal.SIGINT, signal_handler)
while True:
# doing things
# more things
if interrupted:
print("Exiting..")
break
I hope below code would help you:
#!/bin/python
import sys
import time
import signal
def cb_sigint_handler(signum, stack):
global is_interrupted
print("SIGINT received")
is_interrupted = True
if __name__ == "__main__":
is_interrupted = False
signal.signal(signal.SIGINT, cb_sigint_handler)
while True:
# do stuff here
print("processing...")
time.sleep(3)
if is_interrupted:
print("Exiting..")
# do clean up
sys.exit(0)

Break the function after certain time

In Python, for a toy example:
for x in range(0, 3):
# Call function A(x)
I want to continue the for loop if function A takes more than five seconds by skipping it so I won't get stuck or waste time.
By doing some search, I realized a subprocess or thread may help, but I have no idea how to implement it here.
I think creating a new process may be overkill. If you're on Mac or a Unix-based system, you should be able to use signal.SIGALRM to forcibly time out functions that take too long. This will work on functions that are idling for network or other issues that you absolutely can't handle by modifying your function. I have an example of using it in this answer:
Option for SSH to timeout after a short time? ClientAlive & ConnectTimeout don't seem to do what I need them to do
Editing my answer in here, though I'm not sure I'm supposed to do that:
import signal
class TimeoutException(Exception): # Custom exception class
pass
def timeout_handler(signum, frame): # Custom signal handler
raise TimeoutException
# Change the behavior of SIGALRM
signal.signal(signal.SIGALRM, timeout_handler)
for i in range(3):
# Start the timer. Once 5 seconds are over, a SIGALRM signal is sent.
signal.alarm(5)
# This try/except loop ensures that
# you'll catch TimeoutException when it's sent.
try:
A(i) # Whatever your function that might hang
except TimeoutException:
continue # continue the for loop if function A takes more than 5 second
else:
# Reset the alarm
signal.alarm(0)
This basically sets a timer for 5 seconds, then tries to execute your code. If it fails to complete before time runs out, a SIGALRM is sent, which we catch and turn into a TimeoutException. That forces you to the except block, where your program can continue.
Maybe someone find this decorator useful, based on TheSoundDefense answer:
import time
import signal
class TimeoutException(Exception): # Custom exception class
pass
def break_after(seconds=2):
def timeout_handler(signum, frame): # Custom signal handler
raise TimeoutException
def function(function):
def wrapper(*args, **kwargs):
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(seconds)
try:
res = function(*args, **kwargs)
signal.alarm(0) # Clear alarm
return res
except TimeoutException:
print u'Oops, timeout: %s sec reached.' % seconds, function.__name__, args, kwargs
return
return wrapper
return function
Test:
#break_after(3)
def test(a, b, c):
return time.sleep(10)
>>> test(1,2,3)
Oops, timeout: 3 sec reached. test (1, 2, 3) {}
If you can break your work up and check every so often, that's almost always the best solution. But sometimes that's not possible—e.g., maybe you're reading a file off an slow file share that every once in a while just hangs for 30 seconds. To deal with that internally, you'd have to restructure your whole program around an async I/O loop.
If you don't need to be cross-platform, you can use signals on *nix (including Mac and Linux), APCs on Windows, etc. But if you need to be cross-platform, that doesn't work.
So, if you really need to do it concurrently, you can, and sometimes you have to. In that case, you probably want to use a process for this, not a thread. You can't really kill a thread safely, but you can kill a process, and it can be as safe as you want it to be. Also, if the thread is taking 5+ seconds because it's CPU-bound, you don't want to fight with it over the GIL.
There are two basic options here.
First, you can put the code in another script and run it with subprocess:
subprocess.check_call([sys.executable, 'other_script.py', arg, other_arg],
timeout=5)
Since this is going through normal child-process channels, the only communication you can use is some argv strings, a success/failure return value (actually a small integer, but that's not much better), and optionally a hunk of text going in and a chunk of text coming out.
Alternatively, you can use multiprocessing to spawn a thread-like child process:
p = multiprocessing.Process(func, args)
p.start()
p.join(5)
if p.is_alive():
p.terminate()
As you can see, this is a little more complicated, but it's better in a few ways:
You can pass arbitrary Python objects (at least anything that can be pickled) rather than just strings.
Instead of having to put the target code in a completely independent script, you can leave it as a function in the same script.
It's more flexible—e.g., if you later need to, say, pass progress updates, it's very easy to add a queue in either or both directions.
The big problem with any kind of parallelism is sharing mutable data—e.g., having a background task update a global dictionary as part of its work (which your comments say you're trying to do). With threads, you can sort of get away with it, but race conditions can lead to corrupted data, so you have to be very careful with locking. With child processes, you can't get away with it at all. (Yes, you can use shared memory, as Sharing state between processes explains, but this is limited to simple types like numbers, fixed arrays, and types you know how to define as C structures, and it just gets you back to the same problems as threads.)
Ideally, you arrange things so you don't need to share any data while the process is running—you pass in a dict as a parameter and get a dict back as a result. This is usually pretty easy to arrange when you have a previously-synchronous function that you want to put in the background.
But what if, say, a partial result is better than no result? In that case, the simplest solution is to pass the results over a queue. You can do this with an explicit queue, as explained in Exchanging objects between processes, but there's an easier way.
If you can break the monolithic process into separate tasks, one for each value (or group of values) you wanted to stick in the dictionary, you can schedule them on a Pool—or, even better, a concurrent.futures.Executor. (If you're on Python 2.x or 3.1, see the backport futures on PyPI.)
Let's say your slow function looked like this:
def spam():
global d
for meat in get_all_meats():
count = get_meat_count(meat)
d.setdefault(meat, 0) += count
Instead, you'd do this:
def spam_one(meat):
count = get_meat_count(meat)
return meat, count
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
results = executor.map(spam_one, get_canned_meats(), timeout=5)
for (meat, count) in results:
d.setdefault(meat, 0) += count
As many results as you get within 5 seconds get added to the dict; if that isn't all of them, the rest are abandoned, and a TimeoutError is raised (which you can handle however you want—log it, do some quick fallback code, whatever).
And if the tasks really are independent (as they are in my stupid little example, but of course they may not be in your real code, at least not without a major redesign), you can parallelize the work for free just by removing that max_workers=1. Then, if you run it on an 8-core machine, it'll kick off 8 workers and given them each 1/8th of the work to do, and things will get done faster. (Usually not 8x as fast, but often 3-6x as fast, which is still pretty nice.)
This seems like a better idea (sorry, I am not sure of the Python names of thing yet):
import signal
def signal_handler(signum, frame):
raise Exception("Timeout!")
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(3) # Three seconds
try:
for x in range(0, 3):
# Call function A(x)
except Exception, msg:
print "Timeout!"
signal.alarm(0) # Reset
The comments are correct in that you should check inside. Here is a potential solution. Note that an asynchronous function (by using a thread for example) is different from this solution. This is synchronous which means it will still run in series.
import time
for x in range(0,3):
someFunction()
def someFunction():
start = time.time()
while (time.time() - start < 5):
# do your normal function
return;

Categories

Resources