Edit: Looks like a duplicate, but I assure you, it's not. I'm looking to kill the current running process cleanly, not to kill a separate process.
The problem is the process I'm killing isn't spawned by subprocess or exec. It's basically trying to kill itself.
Here's the scenario: The program does cleanup on exit, but sometimes this takes too long. I am sure that I can terminate the program, because the first step in the quit saves the Database. How do I go about doing this?
cannot use taskkill, as it is not available in some Windows installs e.g. home editions of XP
tskill also doesn't work
win32api.TerminateProcess(handle, 0) works, but i'm concerned it may cause memory leaks because i won't have the opportunity to close the handle (program immediately stops after calling TerminateProcess). note: Yup, I am force quitting it so there are bound to be some unfreed resources, but I want to minimize this as much as possible (as I will only do it only if it is taking an unbearable amount of time, for better user experience) but i don't think python will handle gc if it's force-quit.
I'm currently doing the last one, as it just works. I'm concerned though about the unfreed handle. Any thoughts/suggestions would be very much appreciated!
win32api.TerminateProcess(handle, 0)
works, but i'm concerned it may cause
memory leaks because i won't have the
opportunity to close the handle
(program immediately stops after
calling TerminateProcess). note: Yup,
I am force quitting it so there are
bound to be some unfreed resources,
but I want to minimize this as much as
possible (as I will only do it only if
it is taking an unbearable amount of
time, for better user experience) but
i don't think python will handle gc if
it's force-quit.
If a process self-terminates, then you don't need to worry about garbage collection. The OS will automatically clean up all memory resources used by that process, so you don't have to worry about memory leaks. Memory leaks are when a process is running and using more and more memory as time goes by.
So yes terminating your process this way isn't very "clean", but there wont be any ill side-effects.
If I understand your question, you're trying to get the program to shut itself down. This is usually done with sys.exit().
TerminateProcess and taskkill /f do not free resources and will result in memory leaks.
Here is the MS quote on terminateProcess:
{ ... Terminating a process does not cause child processes to be terminated.
Terminating a process does not necessarily remove the process object from the system. A process object is deleted when the last handle to the process is closed. ... }
MS heavily uses COM and DCOM, which share handles and resources the OS does not and can not track. ExitProcess should then be used instead, if you do not intend to reboot often. That allows a process to properly free the resources it used. Linux does not have this problem because it does not use COM or DCOM.
Related
I'm writing a program in which I want to evaluate a piece of code asynchronously. I want it to be isolated from the main thread so that it can raise an error, enter an infinite loop, or just about anything else without disrupting the main program. I was hoping to use threading.Thread, but this has a major problem; I can't figure out how to stop it. I have tried Thread._stop(), but that frequently doesn't work. I end up with a thread that I can't control hogging both interpreter time and CPU power. The code in the thread doesn't open any files or do anything else that would cause problems if I hard-killed it.
Python's multiprocessing.Process.terminate() does this really well; unfortunately, initiating a process on Windows takes nearly a second, which is long enough to cause annoying delays in my GUI.
Does anyone know either a: how to kill a Python thread (I don't think I care how dirty the exit is), or b: how to speed up starting a process?
A third possibility would be a third-party library that provides an alternative method for asynchronous execution, but I've never heard of any such thing.
In my case, the best way to do this seems to be to maintain a running worker process, and send the code to it on an as-needed basis. If the process acts up, I kill it and then start a new one immediately to avoid any delay the next time.
I have written a python program that needs to run for multiple days at a time, because of the constant collection of data. Previously I had no issues running this program for months at a time. I recently made some updates to the program, and now after around 12 hours I get the dreaded out of memory killer. The 'dmesg' output is the following:
[9084334.914808] Out of memory: Kill process 2276 (python2.7) score 698 or sacrifice child
[9084334.914811] Killed process 2276 (python2.7) total-vm:13279000kB, anon-rss:4838164kB, file-rss:8kB
Besides just general python coding, the main change made to the program was the addition of a multiprocessing Queue. This is the first time I have ever used this feature, so I am not sure if this might be the cause of the issue. The purpose of the Queue in my program is to be able to make dynamic changes in a parallel process. The Queue is initiated in the main program and in continually being monitored in the parallel process. A simplified version of how I am doing this in the parallel process is the following (with 'q' being the Queue):
while(1):
if q.empty():
None
else:
fr = q.get()
# Additional code
time.sleep(1)
The dynamic changes to 'q' do not happen very often so majority of the time q.empty() will be true, but the loop is there to be ready as soon as changes are made. My question is, would running this code for multiple hours at a time cause the memory to eventually run low? With the 'while' loop being pretty short and running basically non stop, I was thinking this might be an problem. If this could be the cause of the problem, does anybody have any suggestions on how to improve the code so the out of memory killer doesn't get called?
Thank you very much.
The only way you can run out of memory in the way you describe is if you're using more and more memory as time goes on. The loop here does not demonstrate this behavior, so it cannot be (solely) responsible for any memory errors. Running a tight, infinite loop can burn through a lot of needless processor cycles, but it can't cause a MemoryError by itself unless it's storing data to someplace else.
It's likely that elsewhere in your code, you're holding onto some variables that you don't intend to. This is called a memory leak, and you can use a memory profiler to look for where such a leak is coming from.
Some likely suspects are caching methods used to improve performance, or lists of variables that never leave scope. Perhaps your multiprocessing queue is holding on to references to earlier data objects, or items are never deleted from the queue once they're inserted? (This latter case is unlikely given the code you've shown if you're using the builtin queue.Queue, but anything is possible).
You can convert your program into a linux service and set its oom policy to continue.
You can check this and this links to see how to see/edit service parameters and see oom policy service parameter respectively.
What is the best way to continuously repeat the execution of a given function at a fixed interval while being able to terminate the executor (thread or process) immediately?
Basically I know two approaches:
use multiprocessing and function with infinite cycle and time.sleep at the end. Processing is terminated with process.terminate() in any state.
use threading and constantly recreate timers at the end of the thread function. Processing is terminated by timer.cancel() while sleeping.
(both “in any state” and “while sleeping” are fine, even though the latter may be not immediate). The problem is that I have to use both multiprocessing and threading as the latter appears not to work on ARM (some fuzzy interaction of python interpreter and vim, outside of vim everything is fine) (I was using the second approach there, have not tried threading+cycle; no code is currently left) and the former spawns way too many processes which I would like not to see unless really required. This leads to a problem of having to code two different approaches while threading with cycle is just a few more imports for drop-in replacements of all multiprocessing stuff wrapped in if/else (except that there is no thread.terminate()). Is there some better way to do the job?
Currently used code is here (currently with cycle for both jobs), but I do not think it will be much useful to answer the question.
Update: The reason why I am using this solution are functions that display file status (and some other things like branch) in version control systems in vim statusline. These statuses must be updated, but updating them immediately cannot be done without using hooks and I have no idea how to set hooks temporary and remove on vim quit without possibly spoiling user configuration. Thus standard solution is cache expiring after N seconds. But when cache expired I need to do an expensive shell call and the delay appears to be noticeable, the more noticeable the heavier IO load is. What I am implementing now is updating values for viewed buffers each N seconds in a separate process thus delays are bothering that process and not me. Threads are likely to also work because GIL does not affect calls to external programs.
I'm not clear on why a single long-lived thread that loops infinitely over the tasks wouldn't work for you? Or why you end up with many processes in the multiprocess option?
My immediate reaction would have been a single thread with a queue to feed it things to do. But I may be misunderstanding the problem.
I do not know how do it simply and/or cleanly in Python, but I was wondering if maybe you couldn't take avantage of an existing system scheduler, e.g. crontab for *nix system.
There is an API in python and it might satisfied your needs.
Task is:
I have task queue stored in db. It grows. I need to solve tasks by python script when I have resources for it. I see two ways:
python script working all the time. But i don't like it (reason posible memory leak).
python script called by cron and do a little part of task. But i need to solve the problem of one working active script in memory (To prevent active scripts count grow). What is the best solution to implement it in python?
Any ideas to solve this problem at all?
You can use a lockfile to prevent multiple scripts from running out of cron. See the answers to an earlier question, "Python: module for creating PID-based lockfile". This is really just good practice in general for anything that you need to make sure won't have multiple instances running, actually, so you should look into it even if you do have the script running constantly, which I do suggest.
For most things, it shouldn't be too hard to avoid memory leaks, but if you're having a lot of trouble with it (I sometimes do with complex third-party web frameworks, for example), I would suggest instead writing the script with a small, carefully-designed main loop that monitors the database for new jobs, and then uses the multiprocessing module to fork off new processes to complete each task.
When a task is complete, the child process can exit, immediately freeing any memory that isn't properly garbage collected, and the main loop should be simple enough that you can avoid any memory leaks.
This also offers the advantage that you can run multiple tasks in parallel if your system has more than one CPU core, or if your tasks spend a lot of time waiting for I/O.
This is a bit of a vague question. One thing you should remember is that it is very difficult to leak memory in Python, because of the automatic garbage collection. croning a Python script to handle the queue isn't very nice, although it would work fine.
I would use method 1; if you need more power you could make a small Python process that monitors the DB queue and starts new processes to handle the tasks.
I'd suggest using Celery, an asynchronous task queuing system which I use myself.
It may seem a bit heavy for your use case, but it makes it easy to expand later by adding more worker resources if/when needed.
What is the "right" or "best" way to monitor the system resources a python script is using and terminate it if the resource use exceeds some predetermined values. In my case memory usage is of concern. I am not asking how to measure the system resource use although I am open to suggestions.
As a simple example, let's assume I have a function that finds prime numbers less than some large number and adds them to a list based on some condition. I don't know ahead of time how many prime numbers will satisfy the condition so I what to be sure to terminate the function if I use up to much system memory (8gb lets say).
I know that there are ways to monitor the size of python objects. What I don't know is the proper way to monitor the size of the list and exit is to just include a size test in the prime function loop and exit if it exceeds 8gb or if there is an "external" (by external I mean external to the loop but still within or part of the python script) way to monitor and exit.
In my case I am running on a mac but am asking the question in general.
On Unix-like system, a useful "external" way to monitor any process is the ulimit command (you don't clarify whether you want instead to run in Windows, where ulimit doesn't exist and other approaches may, but I don't know them;-).
If you're thinking about performing such controls inside your own Python programs, just change the function in question to check the size of each object it's appending to the list (and keep a running total) and return when the running total reaches or exceeds a threshold (which you could pass as an extra parameter to the function in question).
Edit: the OP has clarified in a comment that they want the monitoring in the very worst place it could possibly be placed -- in the previous paragraphs, I mentioned how it's easy outside of the process, easy inside the function, but the OP wants it "smack in the middle";-).
Least-bad way is probably with a "watchdog thread" -- a separate daemon thread in an infinite loop which, every X seconds, checks the process's resource consumption (e.g. with resource.getrusage, if on Unix-like machines -- again, if on Windows, something else is needed instead) and, if that consumption exceeds the desired limits, attempts to kill the main thread with thread.interrupt_main. Of course, this is fail from foolproof: the periodicity X (like in all cases of "polling") must be low enough to stop a runaway process in the meantime, but high enough to not slow the process down to a crawl. Plus, the main thread (the only one that can be interrupted like this) might be blocking exceptions (in which case the watchdog thread might perhaps try with "signals to this very process" of growing severity, all the way up to SIGKILL, the killer-signal that can never be blocked or intercepted).
So, this intermediate approach is a lot more work than the ulimit command, is more fragile, and has no substantial added value. But, if you want to put the monitoring "inside the process but outside the resource-consuming function", with no advantages, lots of work, and the other disadvantages I've mentioned, this is the way to do it.
resource.getrusage() (in particular ru_idrss) can give you the resource usage of the current python interpreter, which you can use as a sentinel to stop processing.