Memory leaks in Python when using an external C DLL

Memory leaks in Python when using an external C DLL - python

I have a python module that calls a DLL written C to encode XML strings. Once the function returns the encoded string, it fails to de-allocate the memory which was allocated during this step. Concretely:
encodeMyString = ctypes.create_string_buffer(4096)
CallEncodingFuncInDLL(encodeMyString, InputXML)
I have looked at this, this, and this and have also tried calling the gc.collect but perhaps since the object has been allocated in an external DLL, python gc doesn't have any record of it and fails to remove it. But since the code keeps calling the encoding function, it keeps on allocating memory and eventually the python process crashes. Is there a way to profile this memory usage?

Since you haven't given any information about the DLL, this will necessarily be pretty vague, but…
Python can't track memory allocated by something external that it doesn't know about. How could it? That memory could be part of the DLL's constant segment, or allocated with mmap or VirtualAlloc, or part of a larger object, or the DLL could just be expecting it to be alive for its own use.
Any DLL that has a function that allocates and returns a new object has to have a function that deallocates that object. For example, if CallEncodingFuncInDLL returns a new object that you're responsible for, there will be a function like DestroyEncodedThingInDLL that takes such an object and deallocates it.
So, when do you call this function?
Let's step back and make this more concrete. Let's say the function is plain old strdup, so the function you call to free up the memory is free. You have two choices for when to call free. No, I have no idea why you'd ever want to call strdup from Python, but it's about the simplest possible example, so let's pretend it's not useless.
The first option is to call strdup, immediately convert the returned value to a native Python object and free it, and not have to worry about it after that:
newbuf = libc.strdup(mybuf)
s = newbuf.value
libc.free(newbuf)
# now use s, which is just a Python bytes object, so it's GC-able
Or, better, wrap this up so it's automatic by using a custom restype callable:
def convert_and_free_char_p(char_p):
try:
return char_p.value
finally:
libc.free(char_p)
libc.strdup.restype = convert_and_free_char_p
s = libc.strdup(mybuf)
# now use s
But some objects can't be converted to a native Python object so easily—or they can be, but it's not very useful to do so, because you need to keep passing them back into the DLL. In that case, you can't clean it up until you're done with it.
The best way to do this is to wrap that opaque value up in a class that releases it on close or __exit__ or __del__ or whatever seems appropriate. One nice way to do this is with #contextmanager:
#contextlib.contextmanager
def freeing(value):
try:
yield value
finally:
libc.free(value)
So:
newbuf = libc.strdup(mybuf)
with freeing(newbuf):
do_stuff(newbuf)
do_more_stuff(newbuf)
# automatically freed before you get here
# (or even if you don't, because of an exception/return/etc.)
Or:
#contextlib.contextmanager
def strduping(buf):
value = libc.strdup(buf)
try:
yield value
finally:
libc.free(value)
And now:
with strduping(mybuf) as newbuf:
do_stuff(newbuf)
do_more_stuff(newbuf)
# again, automatically freed here

Related

How to return an error string from go to python

I'm writing a shared object in Go (c-shared) which will be loaded and run from python. Everything is working fine, until the Go code needs to return an error. I am converting the error to string using error.Error() but when trying to return that to python, cgo is hitting:
panic: runtime error: cgo result has Go pointer
Which is very odd, since this is a string and not a pointer supposedly. I know there are no issues with returning go strings via shared object exported function, as I do that in several other places without any issue.
The Go code looks like:
package main
import "C"
//export MyFunction
func MyFunction() string {
err := CallSomethingInGo()
if err != nil {
return err.Error()
}
return ""
}
func main() {}
The go code is compiled to .so using buildmode=c-shared and then In the python code, I have something like this:
from ctypes import *
lib = cdll.LoadLibrary("./mygocode.so")
class GoString(Structure):
_fields_ = [("p", c_char_p),("n", c_longlong)]
theFunction = lib.MyFunction
theFunction.restype = GoString
err = theFunction()
When the last line executes and the golang code returns NO error then everything is fine and it works! But, if the golang code tries to return an error (e.g. CallSomethingInGo fails and returns err) then the python code fails with:
panic: runtime error: cgo result has Go pointer
I've tried manually returning strings from go to python and it works fine, but trying to return error.Error() (which should be a string per my understanding) fails. What is the correct way to return the string representation of the error to python?
One more piece of info - from golang, I did a printf("%T", err) and I see the type of the error is:
*os.PathError
I also did printf("%T", err.Error()) and confirmed the type returned by err.Error() was 'string' so I am still not sure why this isn't working.
Even stranger to me...I tried modifying the go functions as shown below for a test, and this code works fine and returns "test" as a string back to python...
//export MyFunction
func MyFunction() string {
err := CallSomethingInGo()
if err != nil {
// test
x := errors.New("test")
return x.Error()
}
return ""
}
I'm so confused! How can that test work, but not err.Error() ?

As I said in a comment, you're just not allowed to do that.
The rules for calling Go code from C code are outlined in the Cgo documentation, with this particular issue described in this section, in this way (though I have bolded a few sections in particular):
Passing pointers
Go is a garbage collected language, and the garbage collector needs to know the location of every pointer to Go memory. Because of this, there are restrictions on passing pointers between Go and C.
In this section the term Go pointer means a pointer to memory allocated by Go (such as by using the & operator or calling the predefined new function) and the term C pointer means a pointer to memory allocated by C (such as by a call to C.malloc). Whether a pointer is a Go pointer or a C pointer is a dynamic property determined by how the memory was allocated; it has nothing to do with the type of the pointer.
Note that values of some Go types, other than the type's zero value, always include Go pointers. This is true of string, slice, interface, channel, map, and function types. A pointer type may hold a Go pointer or a C pointer. Array and struct types may or may not include Go pointers, depending on the element types. All the discussion below about Go pointers applies not just to pointer types, but also to other types that include Go pointers.
Go code may pass a Go pointer to C provided the Go memory to which it points does not contain any Go pointers. The C code must preserve this property: it must not store any Go pointers in Go memory, even temporarily. When passing a pointer to a field in a struct, the Go memory in question is the memory occupied by the field, not the entire struct. When passing a pointer to an element in an array or slice, the Go memory in question is the entire array or the entire backing array of the slice.
C code may not keep a copy of a Go pointer after the call returns. This includes the _GoString_ type, which, as noted above, includes a Go pointer; _GoString_ values may not be retained by C code.
A Go function called by C code may not return a Go pointer (which implies that it may not return a string, slice, channel, and so forth). A Go function called by C code may take C pointers as arguments, and it may store non-pointer or C pointer data through those pointers, but it may not store a Go pointer in memory pointed to by a C pointer. A Go function called by C code may take a Go pointer as an argument, but it must preserve the property that the Go memory to which it points does not contain any Go pointers.
Go code may not store a Go pointer in C memory. C code may store Go pointers in C memory, subject to the rule above: it must stop storing the Go pointer when the C function returns.
These rules are checked dynamically at runtime. The checking is controlled by the cgocheck setting of the GODEBUG environment variable. The default setting is GODEBUG=cgocheck=1, which implements reasonably cheap dynamic checks. These checks may be disabled entirely using GODEBUG=cgocheck=0. Complete checking of pointer handling, at some cost in run time, is available via GODEBUG=cgocheck=2.
It is possible to defeat this enforcement by using the unsafe package, and of course there is nothing stopping the C code from doing anything it likes. However, programs that break these rules are likely to fail in unexpected and unpredictable ways.
This is what you are seeing: you have a program that breaks several rules, and now it fails in unexpected and unpredictable ways. In particular, your lib.MyFunction is
a Go function called by C code
since Python's cdll handlers count as C code. You can return nil, as that's the zero-value, but you are not allowed to return Go strings. The fact that the empty-string constant (and other string constants from some other error types) is not caught at runtime is a matter of luck.1
1Whether this is good luck or bad luck depends on your point of view. If it failed consistently, perhaps you would have consulted the Cgo documentation earlier. Instead, it fails unpredictably, but not in your most common case. What's happening here is that the string constants were compiled to text (or rodata) sections and therefore are not actually dynamically allocated. However, some—not all, but some—errors' string bytes are dynamically allocated. Some os.PathErrors point into GC-able memory, and these are the cases that are caught by the
reasonably cheap dynamic checks
mentioned in the second-to-last paragraph.

What happens to a Python input value if it's not assigned to a variable?

Was just wondering this. So sometimes programmers will insert an input() into a block of code without assigning its value to anything for the purpose of making the program wait for an input before continuing. Usually when it runs, you're expected to just hit enter without typing anything to move forward, but what if you do type something? What happens to that string if its not assigned to any variable? Is there any way to read its value after the fact?

TL;DR: If you don't immediately assign the return value of input(), it's lost.
I can't imagine how or why you would want to retrieve it afterwards.
If you have any callable (as all callables have return values, default is None), call it and do not save its return value, there's no way to get that again. You have one chance to capture the return value, and if you miss it, it's gone.
The return value gets created inside the callable of course, the code that makes it gets run and some memory will be allocated to hold the value. Inside the callable, there's a variable name referencing the value (except if you're directly returning something, like return "unicorns".upper(). In that case there's of course no name).
But after the callable returns, what happens? The return value is still there and can be assigned to a variable name in the calling context. All names that referenced the value inside the callable are gone though. Now if you don't assign the value to a name in your call statement, there are no more names referencing it.
What does that mean? It's gets on the garbage collector's hit list and will be nuked from your memory on its next garbage collection cycle. Of course the GC implementation may be different for different Python interpreters, but the standard CPython implementation uses reference counting.
So to sum it up: if you don't assign the return value a name in your call statement, it's gone for your program and it will be destroyed and the memory it claims will be freed up any time afterwards, as soon as the GC handles it in background.
Now of course a callable might do other stuff with the value before it finally returns it and exits. There are a few possible ways how it could preserve a value:
Write it to an existing, global variable
Write it through any output method, e.g. store it in a file
If it's an instance method of an object, it can also write it to the object's instance variables.
But what for? Unless there would be any benefit from storing the last return value(s), why should it be implemented to hog memory unnecessarily?
There are a few cases where caching the return values makes sense, i.e. for functions with determinable return values (means same input always results in same output) that are often called with the same arguments and take long to calculate.
But for the input function? It's probably the least determinable function existing, even if you call random.random() you can be more sure of the result than when you ask for user input. Caching makes absolutely no sense here.

The value is discarded. You can't get it back. It's the same as if you just had a line like 2 + 2 or random.rand() by itself; the result is gone.

does return object from function lead to memory leak

Taking the following code for example, does return object from function lead to memory leak?
I'm very curious about what happens to the object handle after used by the function use_age.
class Demo(object):
def _get_mysql_handle(self):
handle = MySQLdb.connect(host=self.conf["host"],
port=self.conf["port"],
user=self.conf["user"],
passwd=self.conf["passwd"],
db=self.conf["db"])
return handle
def use_age(self):
cursor = self._get_mysql_handle().cursor()
if __name__ == "__main__":
demo = Demo()
demo.use_age()

No, that code won't lead to a memory leak.
CPython handles object lifetimes by reference counting. In your example the reference count drops back to 0 and the database connection object is deleted again.
The local name handle in _get_mysql_handle is one reference, it is dropped when _get_mysql_handle returns.
The stack holding the return value from self._get_mysql_handle() is another, it too is dropped when the expression result is completed.
.cursor() is a method, so it'll have another reference for the self argument to that method, until the method exits.
The return value from .cursor() probably stores a reference, it'll be dropped when the cursor itself is reaped. That then depends on the lifetime of the local cursor variable in the use_age() method. As a local it doesn't live beyond the use_age() function.
Other Python implementations use garbage collection strategies; Jython uses the Java runtime facilities, for example. The object may live a little longer, but won't 'leak'.
In Python versions < 3.4, you do need to watch out for creating circular references with custom classes that define a __del__ method. Those are the circular references that the gc module does not break. You can introspect such chains in the gc.garbage object.

How to delete an object in python function?

I am working with very large numpy/scipy arrays that take up a huge junk of memory. Suppose my code looks something like the following:
def do_something(a):
a = a / a.sum() #new memory is allocated
#I don't need the original a now anylonger, how to delete it?
#do a lot more stuff
#a = super large numpy array
do_something(a)
print a #still the same as originally (as passed by value)
So I am calling a function with a huge numpy array. The function then processes the array in some way or the other, but the original object is still kept in memory. Is there any way to free the memory inside the function; deleting the reference does not work.

What you want cannot be done; Python will only free the memory when all references to the array object are gone, and you cannot delete the a reference in the calling namespace from the function.
Instead, break up your problem into smaller steps. Do your calculations on a with one function, delete a then, then call another function to do the rest of the work.

Python works with a simple GC algorithm, basically it has a reference counting (it has a generational GC too, but that's not the case), that means that every reference to the object increment a counter, and every object out of scope decrement the scope.
The memory is deallocated only after the counter reach 0.
so while you've a reference to that object, it'll keep on memory.
In your case the caller of do_something still have a reference to the object, if you want that this variable gone you can reduce the scope of that variable.
If you suspect of memory leaks you can set the DEBUG_LEAK flag and inspect the output, more info here: https://docs.python.org/2/library/gc.html

How to do cleanup reliably in python?

I have some ctypes bindings, and for each body.New I should call body.Free. The library I'm binding doesn't have allocation routines insulated out from the rest of the code (they can be called about anywhere there), and to use couple of useful features I need to make cyclic references.
I think It'd solve if I'd find a reliable way to hook destructor to an object. (weakrefs would help if they'd give me the callback just before the data is dropped.
So obviously this code megafails when I put in velocity_func:
class Body(object):
def __init__(self, mass, inertia):
self._body = body.New(mass, inertia)
def __del__(self):
print '__del__ %r' % self
if body:
body.Free(self._body)
...
def set_velocity_func(self, func):
self._body.contents.velocity_func = ctypes_wrapping(func)
I also tried to solve it through weakrefs, with those the things seem getting just worse, just only largely more unpredictable.
Even if I don't put in the velocity_func, there will appear cycles at least then when I do this:
class Toy(object):
def __init__(self, body):
self.body.owner = self
...
def collision(a, b, contacts):
whatever(a.body.owner)
So how to make sure Structures will get garbage collected, even if they are allocated/freed by the shared library?
There's repository if you are interested about more details: http://bitbucket.org/cheery/ctypes-chipmunk/

What you want to do, that is create an object that allocates things and then deallocates automatically when the object is no longer in use, is almost impossible in Python, unfortunately. The del statement is not guaranteed to be called, so you can't rely on that.
The standard way in Python is simply:
try:
allocate()
dostuff()
finally:
cleanup()
Or since 2.5 you can also create context-managers and use the with statement, which is a neater way of doing that.
But both of these are primarily for when you allocate/lock in the beginning of a code snippet. If you want to have things allocated for the whole run of the program, you need to allocate the resource at startup, before the main code of the program runs, and deallocate afterwards. There is one situation which isn't covered here, and that is when you want to allocate and deallocate many resources dynamically and use them in many places in the code. For example of you want a pool of memory buffers or similar. But most of those cases are for memory, which Python will handle for you, so you don't have to bother about those. There are of course cases where you want to have dynamic pool allocation of things that are NOT memory, and then you would want the type of deallocation you try in your example, and that is tricky to do with Python.

If weakrefs aren't broken, I guess this may work:
from weakref import ref
pointers = set()
class Pointer(object):
def __init__(self, cfun, ptr):
pointers.add(self)
self.ref = ref(ptr, self.cleanup)
self.data = cast(ptr, c_void_p).value # python cast it so smart, but it can't be smarter than this.
self.cfun = cfun
def cleanup(self, obj):
print 'cleanup 0x%x' % self.data
self.cfun(self.data)
pointers.remove(self)
def cleanup(cfun, ptr):
Pointer(cfun, ptr)
I yet try it. The important piece is that the Pointer doesn't have any strong references to the foreign pointer, except an integer. This should work if ctypes doesn't free memory that I should free with the bindings. Yeah, it's basicly a hack, but I think it may work better than the earlier things I've been trying.
Edit: Tried it, and it seem to work after small finetuning my code. A surprising thing is that even if I got del out from all of my structures, it seem to still fail. Interesting but frustrating.
Neither works, from some weird chance I've been able to drop away cyclic references in places, but things stay broke.
Edit: Well.. weakrefs WERE broken after all! so there's likely no solution for reliable cleanup in python, except than forcing it being explicit.

In CPython, __del__ is a reliable destructor of an object, because it will always be called when the reference count reaches zero (note: there may be cases - like circular references of items with __del__ method defined - where the reference count will never reaches zero, but that is another issue).
Update
From the comments, I understand the problem is related to the order of destruction of objects: body is a global object, and it is being destroyed before all other objects, thus it is no longer available to them.
Actually, using global objects is not good; not only because of issues like this one, but also because of maintenance.
I would then change your class with something like this
class Body(object):
def __init__(self, mass, inertia):
self._bodyref = body
self._body = body.New(mass, inertia)
def __del__(self):
print '__del__ %r' % self
if body:
body.Free(self._body)
...
def set_velocity_func(self, func):
self._body.contents.velocity_func = ctypes_wrapping(func)
A couple of notes:
The change is only adding a reference to the global body object, that thus will live at least as much as all the objects derived from that class.
Still, using a global object is not good because of unit testing and maintenance; better would be to have a factory for the object, that will set the correct "body" to the class, and in case of unit test will easily put a mock object. But that's really up to you and how much effort you think makes sense in this project.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Memory leaks in Python when using an external C DLL - python

Related

How to return an error string from go to python

What happens to a Python input value if it's not assigned to a variable?

does return object from function lead to memory leak

How to delete an object in python function?

How to do cleanup reliably in python?

Categories

Resources