I just saw a MemoryError happen on a machine and I noticed the available cache on the server increased drastically after this. Is there some kind of way that Python triggers a memory-management task when the error gets thrown? Or is this potentially managed by the server (Linux / CentOs)?
MemoryError isn't handled specially in a way that should cause this to occur for it, and no other exception, but:
Exceptions do unwind the stack, and objects referenced solely along the stack between the exception being raised and when it is caught will generally be released when the exception handling is complete (during handling, the exception traceback tends to create cyclic references that prevent cleanup from occurring)
MemoryError inherits from BaseException, not Exception, so it's less likely to be handled by "generic" except Exception: blocks, meaning more stack layers are unwound and eventually released
The CPython cyclic garbage collector determines when to run collections based on the number of allocations and deallocations that have occurred; if the large stack unwind frees a lot of objects, even more might be freed if it's enough to trigger a collection
All of this increases the odds that memory will be released, but none of it is specific to MemoryError; the same behavior could be observed if you hit Ctrl-C and triggered a KeyboardInterrupt. More likely, you're seeing Python exit, or Linux is responding to the extreme memory request by dumping its cache; the MemoryError would come after the cache is dumped to try to satisfy the large memory request, particularly if the request is made in several sequential requests for blocks of memory instead of a single huge request.
Related
I have an application that relies on SIGINT for a graceful shutdown. I noticed that every once in awhile it just keeps running. The cause turned out to be a generator in xml/etree/ElementTree.py.
If SIGINT arrives while that generator is being cleaned up, all exceptions are ignored (recall that default action for SIGINT is to raise a KeyboardInterrupt). That's not unique to this particular generator, or to generators in general.
From the python docs:
"Due to the precarious circumstances under which __del__() methods are invoked, exceptions that occur during their execution are ignored, and a warning is printed to sys.stderr instead"
In over five years of programming in python, this is the first time I run into this issue.
If garbage collection can occur at any point, then SIGINT can also theoretically be ignored at any point, and I can't ever rely on it. Is that correct? Have I just been lucky this whole time?
Or is it something about this particular package and this particular generator?
I noticed that cpython will raise a MemoryError even in situations where almost half of all system memory is marked for gc, but the number of marked objects is below the gc threshold. Manually calling gc.collect() and retrying the operation that raised the original MemoryError succeeds in these cases and a large amount of RAM is reclaimed in the process.
Is there a reason cpython doesn't attempt to gc.collect() prior to raising the MemoryError? Are there disadvantages to doing so?
Obviously the GIL prevents switching contexts between threads to protect reference counting, but is signal handling completely safe in CPython?
Signals in Python are caught by a very simple signal handler which, in effect, simply schedules the actual signal handler function to be called on the main thread. The C signal handler doesn't touch any Python objects, so it doesn't risk corrupting any state, while the Python signal handler is executed in-between bytecode op evaluations, so it too won't corrupt CPython's internal state.
A signal could be delivered and handled in the middle of a reference counting operation. In case you wonder why CPython doesn't use atomic CPU instructions for reference counting: They are too slow. Atomic operations use memory barrier to sync CPU caches (L1, L2, shared L3) and CPUs (ccNUMA). As you can imagine it prevents lots of optimizations. Modern CPU are insanely fast, so fast that they spend a lot of time doing nothing but waiting for data. Reference increment and decrement are very common operations in CPython. Memory barriers prevent out-of-order execution which is a very important optimization trick.
The reference counting code is carefully written and takes multi-threading and signals into account. Signal handlers cannot access a partly created or destroyed Python object just like threads can't, too. Macros like Py_CLEAR take care of edge cases. I/O functions take care of EINTR, too. 3.3 has an improved subprocess module that uses only async-signal-safe function between fork() and execvpe().
You don't have to worry. We have some clever people that know their POSIX fu quite well.
Edit: Looks like a duplicate, but I assure you, it's not. I'm looking to kill the current running process cleanly, not to kill a separate process.
The problem is the process I'm killing isn't spawned by subprocess or exec. It's basically trying to kill itself.
Here's the scenario: The program does cleanup on exit, but sometimes this takes too long. I am sure that I can terminate the program, because the first step in the quit saves the Database. How do I go about doing this?
cannot use taskkill, as it is not available in some Windows installs e.g. home editions of XP
tskill also doesn't work
win32api.TerminateProcess(handle, 0) works, but i'm concerned it may cause memory leaks because i won't have the opportunity to close the handle (program immediately stops after calling TerminateProcess). note: Yup, I am force quitting it so there are bound to be some unfreed resources, but I want to minimize this as much as possible (as I will only do it only if it is taking an unbearable amount of time, for better user experience) but i don't think python will handle gc if it's force-quit.
I'm currently doing the last one, as it just works. I'm concerned though about the unfreed handle. Any thoughts/suggestions would be very much appreciated!
win32api.TerminateProcess(handle, 0)
works, but i'm concerned it may cause
memory leaks because i won't have the
opportunity to close the handle
(program immediately stops after
calling TerminateProcess). note: Yup,
I am force quitting it so there are
bound to be some unfreed resources,
but I want to minimize this as much as
possible (as I will only do it only if
it is taking an unbearable amount of
time, for better user experience) but
i don't think python will handle gc if
it's force-quit.
If a process self-terminates, then you don't need to worry about garbage collection. The OS will automatically clean up all memory resources used by that process, so you don't have to worry about memory leaks. Memory leaks are when a process is running and using more and more memory as time goes by.
So yes terminating your process this way isn't very "clean", but there wont be any ill side-effects.
If I understand your question, you're trying to get the program to shut itself down. This is usually done with sys.exit().
TerminateProcess and taskkill /f do not free resources and will result in memory leaks.
Here is the MS quote on terminateProcess:
{ ... Terminating a process does not cause child processes to be terminated.
Terminating a process does not necessarily remove the process object from the system. A process object is deleted when the last handle to the process is closed. ... }
MS heavily uses COM and DCOM, which share handles and resources the OS does not and can not track. ExitProcess should then be used instead, if you do not intend to reboot often. That allows a process to properly free the resources it used. Linux does not have this problem because it does not use COM or DCOM.
Consider the following code:
df = defer.Deferred()
def hah(_): raise ValueError("4")
df.addCallback(hah)
df.callback(hah)
When it runs, that exception just gets eaten. Where did it go? How can I get it to be displayed? Doing defer.setDebugging(True) has no effect.
I ask this because other times, I get a printout saying "Unhandled error in Deferred:". How do I get that to happen in this case? I see that if I add an errback to df then the errback gets called with the exception, but all I want to do is print the error and do nothing else, and I don't want to manually add that handler to every deferred I create.
The exception is still sitting in the Deferred. There are two possible outcomes at this point:
You could add an errback to the Deferred. As soon as you do, it will get called with a Failure containing the exception that was raised.
You could let the Deferred be garbage collected (explicitly delete df, or return from the function, or lose the reference in any other way). This triggers the ''Unhandled error in Deferred'' code.
Because an errback can be added to a Deferred at any time (ie, the first point above), Deferreds don't do anything with otherwise unhandled errors right away. They don't know if the error is really unhandled, or just unhandled so far. It's only when the Deferred is garbage collected that it can be sure no one else is going to handle the exception, so that's when it gets logged.
In general, you want to be sure you have errbacks on Deferreds, precisely because it's sometimes hard to predict when a Deferred will get garbage collected. It might be a long time, which means it might be a long time before you learn about the exception if you don't have your own errback attached.
This doesn't have to be a terrible burden. Any Deferred (a) which is returned from a callback on another Deferred (b) (ie, when chaining happens) will pass its errors along to b. So (a) doesn't need extra errbacks on it for logging and reporting, only (b) does. If you have a single logical task which is complicated and involves many asynchronous operations, it's almost always the case that all of the Deferreds involved in those operations should channel their results (success or failure) to one main Deferred that represents the logical operation. You often only need special error handling behavior on that one Deferred, and that will let you handle errors from any of the other Deferreds involved.