I had a program that ran recursively, and while 95% of the time it wasn't an issue sometimes I would hit a recursion limit if I was doing something that took too long. In my efforts to convert to and iterative code, I decided to try something along the lines of this:
while True:
do something
#check if task is done
if done:
print 'ALL DONE'
break
else:
time.sleep(600)
continue
I've tested my code and it works fine, but I was wondering if there is anything inherently wrong with this method? Will it eat up RAM or crash the box if it was left to run for too long?
Thanks in advance!
EDIT:
The "do something" I refer to is checking a log file for certain keywords periodically, as data is constantly being written to the log file. Once these lines are written, which happens at varying length of times, I have the script perform certain tasks, such as copying specific lines to a separate files.
My original program had two functions, one called itself periodically until it found keywords, which would then call the 'dosomething' function. The do something function upon completion would then call original function, and this would happen until the task was finished or I hit the recursion limit
There is nothing inherently wrong in this pattern. I have used the daemon function in init.d to start a very similar python script. As long as "do something" doesn't leak, it should be able to run forever.
I think that either way
time.sleep()
will not stop the recursion limit
Because sleep only pauses the execution , and doesn't free any kind of memory
check https://docs.python.org/2/library/time.html the Time.sleep() description
It suspends the operation , but it will not do any memory optimization
The pattern you describe is easy to implement, but usually not the best way to do things. If the task completes just after you check, you still have to wait 5 minutes to resume processing. However, sometimes there is little choice but to do this; for example, if the only way to detect the task is complete is to check for the existence of a file, you may have to do it this way. In such cases the time interval choice needs to balance the CPU consumed by the "spin" with wait time.
Another pattern that is also fairly easy is to simply block while waiting on the task to complete. Whether this is easy or not depends on the particular API you are using. But this technique does not scale because all processing must wait for a single activity to complete. Imagine not being able to open a new browser tab while a page is loading.
Best practice today generally uses one of several models for asynchronous processing. Much like writing event handlers for mouse clicks, etc. in a website or GUI, you write a callback function that handles the result of processing, and pass that callback to the task. No CPU is wasted and the response is handled immediately without waiting. Many frameworks support this model today. Tulip uses the actor model.
Specifically regarding the recursion limit, I don't think your sleep loop is responsible for hitting the stack frame limit. Maybe it was something happening within the task itself.
Related
I have a huge problem. I am working in a Web Python Project, where, after click in a button, a specific controller is called and then another function, present in a python module, is called as well, as shown in my code below. However , I need a second button that stops the process of the stream function controller.
import analyser
def stream():
analyser.get_texts()
response.flash = "Analysis Done."
return ""
I've been searching a lot how to stop a process by an external event (something similar to interruption), but the solutions that I've got, all of them, were about how to stop python script using sys.exit() ou programatically by a return statement, for example. None of these solutions actually work for me.
I want that the user be able to stop that function whenever he wants, once that my function analyser.get_texts() remains processing all the time.
So, my question is how can I stop the execution of stream function, through a button click on my view? Thanks.
If I understand you correctly, then your analyser doesn't provide its own way to terminate an ongoing calculation. You will therefore need to wrap it into something that allows you to terminate the analyser without its "consent".
The right approach for that depends on how bad terminating the analyser in that way is: does it leave resources in a bad state?
Depending on that, you have multiple options:
Run your analysis in a separate process. These can be cleanly killed from the outside. Note that it's usually not a good idea to forcefully stop a thread, so use processes instead.
Use some kind of asynchronous task management that lets you create and stop tasks (e.g. Celery).
I'm writing a program in which I want to evaluate a piece of code asynchronously. I want it to be isolated from the main thread so that it can raise an error, enter an infinite loop, or just about anything else without disrupting the main program. I was hoping to use threading.Thread, but this has a major problem; I can't figure out how to stop it. I have tried Thread._stop(), but that frequently doesn't work. I end up with a thread that I can't control hogging both interpreter time and CPU power. The code in the thread doesn't open any files or do anything else that would cause problems if I hard-killed it.
Python's multiprocessing.Process.terminate() does this really well; unfortunately, initiating a process on Windows takes nearly a second, which is long enough to cause annoying delays in my GUI.
Does anyone know either a: how to kill a Python thread (I don't think I care how dirty the exit is), or b: how to speed up starting a process?
A third possibility would be a third-party library that provides an alternative method for asynchronous execution, but I've never heard of any such thing.
In my case, the best way to do this seems to be to maintain a running worker process, and send the code to it on an as-needed basis. If the process acts up, I kill it and then start a new one immediately to avoid any delay the next time.
I've been coding the python "apscheduler" package (Advanced Python Scheduler) into my app, so far it's going good, I'm able to do almost everything that I had envisioned doing with it.
Only one kink left to iron out...
The function my events are calling will only accept around 3 calls a second or fail as it is triggering very slow hardware I/O :(
I've tried limiting the max number of threads in the threadpool from 20 to just 1 to try and slow down execution, but since I'm not really putting a bit load on apscheduler my events are still firing pretty much concurrently (well... very, very close together at least).
Is there a way to 'stagger' different events that fire within the same second?
I have recently found this question because I, like yourself, was trying to stagger scheduled jobs slightly to compensate for slow hardware.
Including an argument like this in the scheduler add_job call staggers the start time for each job by 200ms (while incrementing idx for each job):
next_run_time=datetime.datetime.now() + datetime.timedelta(seconds=idx * 0.2)
What you want to use is the 'jitter' option.
From the docs:
The jitter option enables you to add a random component to the
execution time. This might be useful if you have multiple servers and
don’t want them to run a job at the exact same moment or if you want
to prevent multiple jobs with similar options from always running
concurrently
Example:
# Run the `job_function` every hour with an extra-delay picked randomly
# in a [-120,+120] seconds window.
sched.add_job(job_function, 'interval', hours=1, jitter=120)
I don't know about apscheduler but have you considered using a Redis LIST (queue) and simply serializing the event feed into that one critically bounded function so that it fires no more than three times per second? (For example you could have it do a blocking POP with a one second max delay, increment your trigger count for every event, sleep when it hits three, and zero the trigger count any time the blocking POP times out (Or you could just use 333 millisecond sleeps after each event).
My solution for future reference:
I added a basic bool lock in the function being called and a wait which seems to do the trick nicely - since it's not the calling of the function itself that raises the error, but rather a deadlock situation with what the function carries out :D
My wx GUI shows thumbnails, but they're slow to generate, so:
The program should remain usable while the thumbnails are generating.
Switching to a new folder should stop generating thumbnails for the old folder.
If possible, thumbnail generation should make use of multiple processors.
What is the best way to do this?
Putting the thumbnail generation in a background thread with threading.Thread will solve your first problem, making the program usable.
If you want a way to interrupt it, the usual way is to add a "stop" variable which the background thread checks every so often (e.g., once per thumbnail), and the GUI thread sets when it wants to stop it. Ideally you should protect this with a threading.Condition. (The condition isn't actually necessary in most cases—the same GIL that prevents your code from parallelizing well also protects you from certain kinds of race conditions. But you shouldn't rely on that.)
For the third problem, the first question is: Is thumbnail generation actually CPU-bound? If you're spending more time reading and writing images from disk, it probably isn't, so there's no point trying to parallelize it. But, let's assume that it is.
First, if you have N cores, you want a pool of N threads, or N-1 if the main thread has a lot of work to do too, or maybe something like 2N or 2N-1 to trade off a bit of best-case performance for a bit of worst-case performance.
However, if that CPU work is done in Python, or in a C extension that nevertheless holds the Python GIL, this won't help, because most of the time, only one of those threads will actually be running.
One solution to this is to switch from threads to processes, ideally using the standard multiprocessing module. It has built-in APIs to create a pool of processes, and to submit jobs to the pool with simple load-balancing.
The problem with using processes is that you no longer get automatic sharing of data, so that "stop flag" won't work. You need to explicitly create a flag in shared memory, or use a pipe or some other mechanism for communication instead. The multiprocessing docs explain the various ways to do this.
You can actually just kill the subprocesses. However, you may not want to do this. First, unless you've written your code carefully, it may leave your thumbnail cache in an inconsistent state that will confuse the rest of your code. Also, if you want this to be efficient on Windows, creating the subprocesses takes some time (not as in "30 minutes" or anything, but enough to affect the perceived responsiveness of your code if you recreate the pool every time a user clicks a new folder), so you probably want to create the pool before you need it, and keep it for the entire life of the program.
Other than that, all you have to get right is the job size. Hopefully creating one thumbnail isn't too big of a job—but if it's too small of a job, you can batch multiple thumbnails up into a single job—or, more simply, look at the multiprocessing API and change the way it batches jobs when load-balancing.
Meanwhile, if you go with a pool solution (whether threads or processes), if your jobs are small enough, you may not really need to cancel. Just drain the job queue—each worker will finish whichever job it's working on now, but then sleep until you feed in more jobs. Remember to also drain the queue (and then maybe join the pool) when it's time to quit.
One last thing to keep in mind is that if you successfully generate thumbnails as fast as your computer is capable of generating them, you may actually cause the whole computer—and therefore your GUI—to become sluggish and unresponsive. This usually comes up when your code is actually I/O bound and you're using most of the disk bandwidth, or when you use lots of memory and trigger swap thrash, but if your code really is CPU-bound, and you're having problems because you're using all the CPU, you may want to either use 1 fewer core, or look into setting thread/process priorities.
What is the best way to continuously repeat the execution of a given function at a fixed interval while being able to terminate the executor (thread or process) immediately?
Basically I know two approaches:
use multiprocessing and function with infinite cycle and time.sleep at the end. Processing is terminated with process.terminate() in any state.
use threading and constantly recreate timers at the end of the thread function. Processing is terminated by timer.cancel() while sleeping.
(both “in any state” and “while sleeping” are fine, even though the latter may be not immediate). The problem is that I have to use both multiprocessing and threading as the latter appears not to work on ARM (some fuzzy interaction of python interpreter and vim, outside of vim everything is fine) (I was using the second approach there, have not tried threading+cycle; no code is currently left) and the former spawns way too many processes which I would like not to see unless really required. This leads to a problem of having to code two different approaches while threading with cycle is just a few more imports for drop-in replacements of all multiprocessing stuff wrapped in if/else (except that there is no thread.terminate()). Is there some better way to do the job?
Currently used code is here (currently with cycle for both jobs), but I do not think it will be much useful to answer the question.
Update: The reason why I am using this solution are functions that display file status (and some other things like branch) in version control systems in vim statusline. These statuses must be updated, but updating them immediately cannot be done without using hooks and I have no idea how to set hooks temporary and remove on vim quit without possibly spoiling user configuration. Thus standard solution is cache expiring after N seconds. But when cache expired I need to do an expensive shell call and the delay appears to be noticeable, the more noticeable the heavier IO load is. What I am implementing now is updating values for viewed buffers each N seconds in a separate process thus delays are bothering that process and not me. Threads are likely to also work because GIL does not affect calls to external programs.
I'm not clear on why a single long-lived thread that loops infinitely over the tasks wouldn't work for you? Or why you end up with many processes in the multiprocess option?
My immediate reaction would have been a single thread with a queue to feed it things to do. But I may be misunderstanding the problem.
I do not know how do it simply and/or cleanly in Python, but I was wondering if maybe you couldn't take avantage of an existing system scheduler, e.g. crontab for *nix system.
There is an API in python and it might satisfied your needs.