Python - Memory Leak

Python - Memory Leak - python

I'm working on solving a memory leak in my Python application.
Here's the thing - it really only appears to happen on Windows Server 2008 (not R2) but not earlier versions of Windows, and it also doesn't look like it's happening on Linux (although I haven't done nearly as much testing on Linux).
To troubleshoot it, I set up debugging on the garbage collector:
gc.set_debug(gc.DEBUG_UNCOLLECTABLE | gc.DEBUG_INSTANCES | gc.DEBUG_OBJECTS)
Then, periodically, I log the contents of gc.garbage.
Thing is, gc.garbage is always empty, yet my memory usage goes up and up and up.
Very puzzling.

If there's never any garbage in gc.garbage, then I'm not sure what you're trying to do by enabling GC debugging. Sure, it'll tell you which objects are considered for cleanup, but that's not particularly interesting if you end up with no circular references that can't be cleaned up.
If your program is using more and more memory according to the OS, there can generally be four different cases at play:
Your application is storing more and more things, keeping references to each one so they don't get collected.
Your application is creating circular references between objects that can't be cleaned up by the gc module (typically because one of them has a __del__ method.)
Your application is freeing (and re-using) memory, but the OS doesn't want the memory re-used, so it keeps allocating new blocks of memory.
The leak is a real memory leak but in a C/C++ extension module your code is using.
From your description it sounds like it's unlikely to be #1 (as it would behave the same on any OS) and apparently not #2 either (since there's nothing in gc.garbage.) Considering #3, Windows (in general) has a memory allocator that's notoriously bad with fragmented allocations, but Python works around this with its obmalloc frontend for malloc(). It may still be an issue specific in Windows Server 2008 system libraries that make it look like your application is using more and more memory, though. Or it may be a case of #4, a C/C++ extension module, or a DLL used by Python or an extension module, with a memory leak.

In general, the first culprit for memory leaks in python is to be found in C extensions.
Do you use any of them?
Furthermore, you say the issue happens only on 2008; I would then check extensions for any incompatibility, because with Vista and 2008 there were quite a lot of small changes that caused issues on that field.
As and alternative, try to execute your application in Windows compatibility mode, choosing Windows XP - this could help solving the issue, especially if it's related to changes in the security.

Related

Can python implement program-level virtual memory?

Recently, I wrote a python program, which requires a lot of memory. Then the computer memory is not enough, and it explodes.
It is known that the operating system will use part of the hard disk as virtual memory,which could solve the problem of insufficient memory. If you change the virtual memory of the operating system, you can solve the problem of insufficient memory in python programs, but the scope of impact is too wide.
Can python implement program-level virtual memory? That is, when the memory is insufficient, the hard disk is mapped to the program memory.
I need to run python program with large memory consumption.

Using disk space as memory is usually called swapping.
It is usually simpler to do it yourself than making a script to do it for you. But if you insist on your script doing it for you, then a way is just to execute the commands you would use to do it manually.
Here is a tutorial for how to add swap to a Linux system (first result on google) : https://linuxize.com/post/create-a-linux-swap-file/
Take each command in that tutorial, run them using subprocess, and you will get the desired result.
If you are on Windows (which you did not tell) then the method applies (but I could not find quickly an easy way to do it with commands).

Instrument memory access of python scripts

My research requires processing memory traces of applications. For C/C++ programs, this is easy using Intel's PIN library. However, as suggested here Use Intel Pin to instrument Python scripts, I may need to instrument the Python runtime itself, which I'm not sure will represent the true memory behavior of a given python script due to some overheads(If this is not the case, please comment). Some of the existing python memory profilers only talk about the runtime memory "usage" in terms of the heap space usage, etc.
I ended up making an executable from my python script using PyInstaller and running my PINTool over it. However, I'm not sure if this is the right approach.
Is there any way(any library or any hack into the python runtime) that may help in getting the memory traces accessed by the python scripts?

Python C extension memory leakage

I have written some C extension for my python programs, and I just noticed that there are some memory leakage problem, however, the C program itself won't leak memory, so I guess there is some problem in reference count. Currently when I use python console to run my program, after the computation is finished, the total memory of python3 is really big, indicating some objects are not released, is there anyway I can know what objects are there or when the objects are allocated?
The C extension is part of a big package, so it is impossible to paste the whole package here.

compiling python for embedded linux_rt

I am targetting an embedded platform with linux_rt, and would like to compile cpython. I am not asking whether python is appropriate for realtime, or its latency. I AM asking about compiling under platform constraints.
I would like an interpretter embedded in a C shared library, but will also accept an exectuable binary if needs be.
Any C compiling ive done is for mainstream OS deployment, and i usually just hit make install. Im not afraid to get a little dirty, but am afraid of longterm maintenance and repeatability.
To avoid as much memory overhead as possible, are there any compiler configurations that can be changed from defaults? Can I easily strip sections of the standard library I know will not be needed?
Target platform has a 600 MHz Celeron, and 256mb RAM. The required firmware is built for a v2.6 kernel (might be 2.4). The default OS image uses Busybox, and most standard system libraries are minimally available. The root filesystem is around 100mB (flash), although I will have an external memory card mounted and can extended root onto there.
Python should have 70% Cpu and 128mB ram at most times, although I could imagine sloppy execution of the interpretter at times, and on RT linux, that could start to add up. Just trying to take precautions before I dive in.
Looking for simple Do's or Don'ts. Reference to similar projects would be great, but I really want to stick with CPython where possible.
I do not have the target platform in the shop yet, so I cannot post any tests. Will have the unit in 2 weeks and will update this post, at that time, if needed.

make a VM with the target configuration to help you get started. VirtualBox or QEmu. If you don't have a root FS one place to start is TinyCore, which is very small, configurable, but also can run on your laptop -- http://www.linuxjournal.com/article/11023

How to check for memory leaks in Guile extension modules?

I develop an extension module for Guile, written in C. This extension module embeds a Python interpreter.
Since this extension module invokes the Python interpreter, I need to verify that it properly manages the memory occupied by Python objects.
I found that the Python interpreter is well-behaved in its own memory handling, so that by running valgrind I can find memory leaks due to bugs in my own Python interpreter embedding code, if there are no other interfering factors.
However, when I run Guile under valgrind, valgrind reports memory leaks. Such memory leaks obscure any memory leaks due to my own code.
The question is what can I do to separate memory leaks due to bugs in my code from memory leaks reported by valgrind as due to Guile. Another tool instead of valgrind? Special valgrind options? Give up and rely upon manual code walkthrough?

You've got a couple options. One is to write a supressions file for valgrind that turns off reporting of stuff that you're not working on. Python has such a file, for example:
http://svn.python.org/projects/python/trunk/Misc/valgrind-python.supp
If valgrind doesn't like your setup, another possibility is using libmudflap; you compile your program with gcc -fmudflap -lmudflap, and the resulting code is instrumented for pointer debugging. Described in the gcc docs, and here: http://gcc.gnu.org/wiki/Mudflap_Pointer_Debugging

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.