How to do cleanup reliably in python? - python

I have some ctypes bindings, and for each body.New I should call body.Free. The library I'm binding doesn't have allocation routines insulated out from the rest of the code (they can be called about anywhere there), and to use couple of useful features I need to make cyclic references.
I think It'd solve if I'd find a reliable way to hook destructor to an object. (weakrefs would help if they'd give me the callback just before the data is dropped.
So obviously this code megafails when I put in velocity_func:
class Body(object):
def __init__(self, mass, inertia):
self._body = body.New(mass, inertia)
def __del__(self):
print '__del__ %r' % self
if body:
body.Free(self._body)
...
def set_velocity_func(self, func):
self._body.contents.velocity_func = ctypes_wrapping(func)
I also tried to solve it through weakrefs, with those the things seem getting just worse, just only largely more unpredictable.
Even if I don't put in the velocity_func, there will appear cycles at least then when I do this:
class Toy(object):
def __init__(self, body):
self.body.owner = self
...
def collision(a, b, contacts):
whatever(a.body.owner)
So how to make sure Structures will get garbage collected, even if they are allocated/freed by the shared library?
There's repository if you are interested about more details: http://bitbucket.org/cheery/ctypes-chipmunk/

What you want to do, that is create an object that allocates things and then deallocates automatically when the object is no longer in use, is almost impossible in Python, unfortunately. The del statement is not guaranteed to be called, so you can't rely on that.
The standard way in Python is simply:
try:
allocate()
dostuff()
finally:
cleanup()
Or since 2.5 you can also create context-managers and use the with statement, which is a neater way of doing that.
But both of these are primarily for when you allocate/lock in the beginning of a code snippet. If you want to have things allocated for the whole run of the program, you need to allocate the resource at startup, before the main code of the program runs, and deallocate afterwards. There is one situation which isn't covered here, and that is when you want to allocate and deallocate many resources dynamically and use them in many places in the code. For example of you want a pool of memory buffers or similar. But most of those cases are for memory, which Python will handle for you, so you don't have to bother about those. There are of course cases where you want to have dynamic pool allocation of things that are NOT memory, and then you would want the type of deallocation you try in your example, and that is tricky to do with Python.

If weakrefs aren't broken, I guess this may work:
from weakref import ref
pointers = set()
class Pointer(object):
def __init__(self, cfun, ptr):
pointers.add(self)
self.ref = ref(ptr, self.cleanup)
self.data = cast(ptr, c_void_p).value # python cast it so smart, but it can't be smarter than this.
self.cfun = cfun
def cleanup(self, obj):
print 'cleanup 0x%x' % self.data
self.cfun(self.data)
pointers.remove(self)
def cleanup(cfun, ptr):
Pointer(cfun, ptr)
I yet try it. The important piece is that the Pointer doesn't have any strong references to the foreign pointer, except an integer. This should work if ctypes doesn't free memory that I should free with the bindings. Yeah, it's basicly a hack, but I think it may work better than the earlier things I've been trying.
Edit: Tried it, and it seem to work after small finetuning my code. A surprising thing is that even if I got del out from all of my structures, it seem to still fail. Interesting but frustrating.
Neither works, from some weird chance I've been able to drop away cyclic references in places, but things stay broke.
Edit: Well.. weakrefs WERE broken after all! so there's likely no solution for reliable cleanup in python, except than forcing it being explicit.

In CPython, __del__ is a reliable destructor of an object, because it will always be called when the reference count reaches zero (note: there may be cases - like circular references of items with __del__ method defined - where the reference count will never reaches zero, but that is another issue).
Update
From the comments, I understand the problem is related to the order of destruction of objects: body is a global object, and it is being destroyed before all other objects, thus it is no longer available to them.
Actually, using global objects is not good; not only because of issues like this one, but also because of maintenance.
I would then change your class with something like this
class Body(object):
def __init__(self, mass, inertia):
self._bodyref = body
self._body = body.New(mass, inertia)
def __del__(self):
print '__del__ %r' % self
if body:
body.Free(self._body)
...
def set_velocity_func(self, func):
self._body.contents.velocity_func = ctypes_wrapping(func)
A couple of notes:
The change is only adding a reference to the global body object, that thus will live at least as much as all the objects derived from that class.
Still, using a global object is not good because of unit testing and maintenance; better would be to have a factory for the object, that will set the correct "body" to the class, and in case of unit test will easily put a mock object. But that's really up to you and how much effort you think makes sense in this project.

Related

Segmentation fault in destructor with Python

I have made a class to represent my led strip, and I would like to switch off the strip when I stop it (aka when the program stops and the object is destroyed). Hence, as I would do in C++, I created a destructor to do that. But it looks like Python call it after it destroyed the object. Then I got a segmentation fault error.
Here is my class, the destructor just have to call the function to set the colour of each LED to 0.
class LedStrip:
def __init__(self, led_count, led_pin, led_freq_hz, led_dma, led_invert, led_brightness, led_channel, color = MyColor(0,0,0)):
self.__strip = Adafruit_NeoPixel(led_count, led_pin, led_freq_hz, led_dma, led_invert, led_brightness, led_channel)
self.__color = color
self.__strip.begin()
def __del__(self):
self.__color = MyColor(0,0,0)
self.colorWipe(10)
# ATTRIBUTS (getter/setter)
#property
def color(self):
return self.__color
#color.setter
def color(self, color):
if isinstance(color, MyColor):
self.__color = color
else:
self.__color = MyColor(0,0,0)
def __len__(self):
return self.__strip.numPixels()
# METHODS
def colorWipe(self, wait_ms=50):
"""Wipe color across display a pixel at a time."""
color = self.__color.toNum()
for i in range(self.__strip.numPixels()):
self.__strip.setPixelColor(i, color)
self.__strip.show()
time.sleep(wait_ms/1000.0)
MyColor is just a class that I made to represent an RGB colour. What would be the correct what to achieve that task in Python? I come from C++, hence my OOP method is really C++ oriented, so I have some difficulties thinking in a pythonic way.
Thanks in advance
You have to be very careful when writing __del__ methods (finalizers). They can be called at virtually any time after an object is no longer referenced (it doesn’t necessarily happen immediately) and there's really no guarantee that they'll be called at interpreter exit time. If they do get called during interpreter exit, other objects (such as global variables and other modules) might already have been cleaned up, and therefore unavailable to your finalizer. They exist so that objects can clean up state (such as low-level file handles, connections, etc.), and don't function like C++ destructors. In my Python experience, you rarely need to write your own __del__ methods.
There are other mechanisms you could use here. One choice would be try/finally:
leds = LedStrip(...)
try:
# application logic to interact with the LEDs
finally:
leds.clear() # or whatever logic you need to clear the LEDs to zero
This is still pretty explicit. If you want something a bit more implicit, you could consider using the Python context manager structure instead. To use a context manager, you use the with keyword:
with open("file.txt", "w") as outfile:
outfile.write("Hello!\n")
The with statement calls the special __enter__ method to initialize the "context". When the block ends, the __exit__ method will be called to end the "context". For the case of a file, __exit__ would close the file. The key is that __exit__ will be called even if an exception occurs inside the block (kind of like finally on a try block).
You could implement __enter__ and __exit__ on your LED strip, then write:
with LedStrip(...) as leds:
# do whatever you want with the leds
and when the block ends, the __exit__ method could reset the state of all the LEDs.
Let's put it this way. Firstly, "...as I would do in C++" approach is not appropriate, as I'm sure you know yourself. It goes without saying that Python is totally different language. But in this particular case it should be stressed, since Python's memory management is quite different from C++. Python uses reference counting, when objects reference count goes to zero, its memory will be released (i.e. when an object is garbage collected) and so on.
Python user-defined objects sometimes do need to define __del__() method. But it is not a destructor in any sense (not in C++ sense for sure), it's finalizer. Moreover, it is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits. Yet we can invoke __del__() explicitly, it should not be for your case, as for this I would advise to make LED switch off as an explicit method, not relying on Python's internals. Just like it goes in Zen of Python (import this command).
Explicit is better than implicit.
For more information on __del__(), check this good answer. For more on reference counting check this article.

In python, is it possible to get an object without name from another module?

I want to get all object generated from another module, even the object do not have a name or reference, is it possible? For example:
in module1.py, there's only one line code:
MyClass()
in module2.py:
module1 = __import__("module1")
# print sth of MyClass from module1
What you're trying to do is generally impossible.
An object that has no name or other reference is garbage. That's the technical meaning of the term "garbage". In CPython (the Python implementation you're probably using if you don't know which one you're using), garbage is collected immediately—as soon as that MyClass() statement ends, the instance gets destroyed.
So, you can't access the object, because it doesn't exist.
In some other Python implementations, the object may not be destroyed until the next garbage collection cycle, but that's going to be pretty soon, and it's not deterministic exactly when—and you still have no way to get at it before it's destroyed. So it might as well not exist, even if it hasn't actually been finalized yet.
Now, "generally" means there are some exceptions. They're not common, but they do exist.
For example, imagine a class like this:
class MyClass:
_instances = []
def __init__(self):
MyClass._instances.append(self)
Now, when you do MyClass(), there actually is a reference to that instance, so it's not garbage. And, if you know where it is (which you'd presumably find in the documentation, or in the source code), you can access it as MyClass._instances[-1]. But it's unlikely that an arbitrary class MyClass does anything like this.
OK, I lied. There is sort of a way to do this, but (a) it’s cheating, and (b) it’s almost certainly a terrible idea that has no valid use cases you’ll ever think of. But just for fun, here’s how you could do this.
You need to write an import hook, and make sure it gets installed before the first time you import the module. Then you can do almost anything you want. The simplest idea I can think of is transforming the AST to turn every expression statement (or maybe just every expression statement at the top level) into an assignment statement that assigns to a hidden variable. You can even make the variable name an invalid identifier, so it'll be safe to run on any legal module no matter what's in the global namespace. Then you can access the first object created and abandoned by the module as something like module.globals()['.0'].

Cleaning up in pypy

I've been looking for ways to clean up objects in python.
I'm currently using pypy.
I found a web page and an example.
First a basic example:
class FooType(object):
def __init__(self, id):
self.id = id
print self.id, 'born'
def __del__(self):
print self.id, 'died'
ft = FooType(1)
This SHOULD print:
1 born
1 died
BUT it just prints
1 born
So my question is: how do I clean anything up in PyPy?
When you need a "cleanup" to run for sure at a specific time, use a context manager.
class FooType(object):
def __init__(self, id):
self.id = id
print 'born'
def __enter__(self):
print 'entered'
return self
def __exit__(self, *exc):
print 'exited'
with FooType(1) as ft:
pass # do something with ft
The same way you should do it in every other Python implementation, including CPython: By explicitly managing the lifetime in code, rather than by relying on automatic memory management and __del__.
Even in CPython, there's more than one case where __del__ is not called at all, and quite a few cases where it's called much later than you might expect (e.g., any time any object gets caught up in a reference cycle, which can happen quite easily). That means it's essentially useless, except perhaps to debug lifetime issues (but there are memory profilers for that!) and as a last resort if some code neglects cleaning up explicitly.
By being explicit about cleanup, you sidestep all these issues. Have a method that does the cleanup, and make client code call it at the right time. Context managers can make this easier to get right in the face of exceptions, and more readable. It often also allows cleaning up sooner than __del__, even if reference counting "immediately" calls __del__. For example, this:
def parse_file(path):
f = open(path)
return parse(f.read()) # file stays open during parsing
is worse than this w.r.t. resource usage:
def parse_file(path):
with open(path) as f:
s = f.read()
# file is closed here
return parse(s)
I would also argue that such a design is cleaner, because it doesn't confuse the lifetime of the resource wrapper object with the lifetime of the wrapped resource. Sometimes, it can make sense to have that object outlive the resource, or even make it take ownership of a new resource.
In your example, __del__ is not called, but that's only because the test program you wrote finishes immediately. PyPy guarantees that __del__ is called some time after the object is not reachable any more, but only as long as the program continues to execute. So if you do ft = FooType(1) in an infinite loop, it will after some time print the died too.
As the other answers explain, CPython doesn't really guarantee anything, but in simple cases (e.g. no reference cycles) it will call __del__ immediately and reliably. Still, the point is that you shouldn't strictly rely on this.

Creating a hook to a frequently accessed object

I have an application which relies heavily on a Context instance that serves as the access point to the context in which a given calculation is performed.
If I want to provide access to the Context instance, I can:
rely on global
pass the Context as a parameter to all the functions that require it
I would rather not use global variables, and passing the Context instance to all the functions is cumbersome and verbose.
How would you "hide, but make accessible" the calculation Context?
For example, imagine that Context simply computes the state (position and velocity) of planets according to different data.
class Context(object):
def state(self, planet, epoch):
"""base class --- suppose `state` is meant
to return a tuple of vectors."""
raise NotImplementedError("provide an implementation!")
class DE405Context(Context):
"""Concrete context using DE405 planetary ephemeris"""
def state(self, planet, epoch):
"""suppose that de405 reader exists and can provide
the required (position, velocity) tuple."""
return de405reader(planet, epoch)
def angular_momentum(planet, epoch, context):
"""suppose we care about the angular momentum of the planet,
and that `cross` exists"""
r, v = context.state(planet, epoch)
return cross(r, v)
# a second alternative, a "Calculator" class that contains the context
class Calculator(object):
def __init__(self, context):
self._ctx = context
def angular_momentum(self, planet, epoch):
r, v = self._ctx.state(planet, epoch)
return cross(r, v)
# use as follows:
my_context = DE405Context()
now = now() # assume this function returns an epoch
# first case:
print angular_momentum("Saturn", now, my_context)
# second case:
calculator = Calculator(my_context)
print calculator.angular_momentum("Saturn", now)
Of course, I could add all the operations directly into "Context", but it does not feel right.
In real life, the Context not only computes positions of planets! It computes many more things, and it serves as the access point to a lot of data.
So, to make my question more succinct: how do you deal with objects which need to be accessed by many classes?
I am currently exploring: python's context manager, but without much luck. I also thought about dynamically adding a property "context" to all functions directly (functions are objects, so they can have an access point to arbitrary objects), i.e.:
def angular_momentum(self, planet, epoch):
r, v = angular_momentum.ctx.state(planet, epoch)
return cross(r, v)
# somewhere before calling anything...
import angular_momentum
angular_momentum.ctx = my_context
edit
Something that would be great, is to create a "calculation context" with a with statement, for example:
with my_context:
h = angular_momentum("Earth", now)
Of course, I can already do that if I simply write:
with my_context as ctx:
h = angular_momentum("Earth", now, ctx) # first implementation above
Maybe a variation of this with the Strategy pattern?
You generally don't want to "hide" anything in Python. You may want to signal human readers that they should treat it as "private", but this really just means "you should be able to understand my API even if you ignore this object", not "you can't access this".
The idiomatic way to do that in Python is to prefix it with an underscore—and, if your module might ever be used with from foo import *, add an explicit __all__ global that lists all the public exports. Again, neither of these will actually prevent anyone from seeing your variable, or even accessing it from outside after import foo.
See PEP 8 on Global Variable Names for more details.
Some style guides suggest special prefixes, all-caps-names, or other special distinguishing marks for globals, but PEP 8 specifically says that the conventions are the same, except for the __all__ and/or leading underscore.
Meanwhile, the behavior you want is clearly that of a global variable—a single object that everyone implicitly shares and references. Trying to disguise it as anything other than what it is will do you no good, except possibly for passing a lint check or a code review that you shouldn't have passed. All of the problems with global variables come from being a single object that everyone implicitly shares and references, not from being directly in the globals() dictionary or anything like that, so any decent fake global is just as bad as a real global. If that truly is the behavior you want, make it a global variable.
Putting it together:
# do not include _context here
__all__ = ['Context', 'DE405Context', 'Calculator', …
_context = Context()
Also, of course, you may want to call it something like _global_context or even _private_global_context, instead of just _context.
But keep in mind that globals are still members of a module, not of the entire universe, so even a public context will still be scoped as foo.context when client code does an import foo. And this may be exactly what you want. If you want a way for client scripts to import your module and then control its behavior, maybe foo.context = foo.Context(…) is exactly the right way. Of course this won't work in multithreaded (or gevent/coroutine/etc.) code, and it's inappropriate in various other cases, but if that's not an issue, in some cases, this is fine.
Since you brought up multithreading in your comments: In the simple style of multithreading where you have long-running jobs, the global style actually works perfectly fine, with a trivial change—replace the global Context with a global threading.local instance that contains a Context. Even in the style where you have small jobs handled by a thread pool, it's not much more complicated. You attach a context to each job, and then when a worker pulls a job off the queue, it sets the thread-local context to that job's context.
However, I'm not sure multithreading is going to be a good fit for your app anyway. Multithreading is great in Python when your tasks occasionally have to block for IO and you want to be able to do that without stopping other tasks—but, thanks to the GIL, it's nearly useless for parallelizing CPU work, and it sounds like that's what you're looking for. Multiprocessing (whether via the multiprocessing module or otherwise) may be more of what you're after. And with separate processes, keeping separate contexts is even simpler. (Or, you can write thread-based code and switch it to multiprocessing, leaving the threading.local variables as-is and only changing the way you spawn new tasks, and everything still works just fine.)
It may make sense to provide a "context" in the context manager sense, as an external version of the standard library's decimal module did, so someone can write:
with foo.Context(…):
# do stuff under custom context
# back to default context
However, nobody could really think of a good use case for that (especially since, at least in the naive implementation, it doesn't actually solve the threading/etc. problem), so it wasn't added to the standard library, and you may not need it either.
If you want to do this, it's pretty trivial. If you're using a private global, just add this to your Context class:
def __enter__(self):
global _context
self._stashedcontext = _context
_context = self
def __exit__(self, *args):
global context
_context = self._stashedcontext
And it should be obvious how to adjust this to public, thread-local, etc. alternatives.
Another alternative is to make everything a member of the Context object. The top-level module functions then just delegate to the global context, which has a reasonable default value. This is exactly how the standard library random module works—you can create a random.Random() and call randrange on it, or you can just call random.randrange(), which calls the same thing on a global default random.Random() object.
If creating a Context is too heavy to do at import time, especially if it might not get used (because nobody might ever call the global functions), you can use the singleton pattern to create it on first access. But that's rarely necessary. And when it's not, the code is trivial. For example, the source to random, starting at line 881, does this:
_inst = Random()
seed = _inst.seed
random = _inst.random
uniform = _inst.uniform
…
And that's all there is to it.
And finally, as you suggested, you could make everything a member of a different Calculator object which owns a Context object. This is the traditional OOP solution; overusing it tends to make Python feel like Java, but using it when it's appropriate is not a bad thing.
You might consider using a proxy object, here's a library that helps in creating object proxies:
http://pypi.python.org/pypi/ProxyTypes
Flask uses object proxies for it's "current_app", "request" and other variables, all it takes to reference them is:
from flask import request
You could create a proxy object that is a reference to your real context, and use thread locals to manage the instances (if that would work for you).

Should I worry about circular references in Python?

Suppose I have code that maintains a parent/children structure. In such a structure I get circular references, where a child points to a parent and a parent points to a child. Should I worry about them? I'm using Python 2.5.
I am concerned that they will not be garbage collected and the application will eventually consume all memory.
"Worry" is misplaced, but if your program turns out to be slow, consume more memory than expected, or have strange inexplicable pauses, the cause is indeed likely to be in those garbage reference loops -- they need to be garbage collected by a different procedure than "normal" (acyclic) reference graphs, and that collection is occasional and may be slow if you have a lot of objects tied up in such loops (the cyclical-garbage collection is also inhibited if an object in the loop has a __del__ special method).
So, reference loops will not affect your program's correctness, but may affect its performance and/or footprint.
If and when you want to remove unwanted loops of references, you can often use the weakref module in Python's standard library.
If and when you want to exert more direct control (or perform debugging, see what exactly is happening) regarding cyclical garbage collection, use the gc module in Python's standard library.
Experimentally: you're fine:
import itertools
for i in itertools.count():
a = {}
b = {"a":a}
a["b"] = b
It consistently stays at using 3.6 MB of RAM.
Python will detect the cycle and release the memory when there are no outside references.
Circular references are a normal thing to do, so I don't see a reason to be worried about them. Many tree algorithms require that each node have links to its children and its parent. They're also required to implement something like a doubly linked list.
I don't think you should worry. Try the following program and will you see that it won't consume all memory:
while True:
a=range(100)
b=range(100)
a.append(b)
b.append(a)
a.append(a)
b.append(b)
There seems to be a issue with references to methods in lists in a variable. Here are two examples. The first one does not call __del__. The second one with weakref is ok for __del__. However, in this later case the problem is that you cannot weakly reference methods: http://docs.python.org/2/library/weakref.html
import sys, weakref
class One():
def __init__(self):
self.counters = [ self.count ]
def __del__(self):
print("__del__ called")
def count(self):
print(sys.getrefcount(self))
sys.getrefcount(One)
one = One()
sys.getrefcount(One)
del one
sys.getrefcount(One)
class Two():
def __init__(self):
self.counters = [ weakref.ref(self.count) ]
def __del__(self):
print("__del__ called")
def count(self):
print(sys.getrefcount(self))
sys.getrefcount(Two)
two = Two()
sys.getrefcount(Two)
del two
sys.getrefcount(Two)

Categories

Resources