Creating a hook to a frequently accessed object - python

I have an application which relies heavily on a Context instance that serves as the access point to the context in which a given calculation is performed.
If I want to provide access to the Context instance, I can:
rely on global
pass the Context as a parameter to all the functions that require it
I would rather not use global variables, and passing the Context instance to all the functions is cumbersome and verbose.
How would you "hide, but make accessible" the calculation Context?
For example, imagine that Context simply computes the state (position and velocity) of planets according to different data.
class Context(object):
def state(self, planet, epoch):
"""base class --- suppose `state` is meant
to return a tuple of vectors."""
raise NotImplementedError("provide an implementation!")
class DE405Context(Context):
"""Concrete context using DE405 planetary ephemeris"""
def state(self, planet, epoch):
"""suppose that de405 reader exists and can provide
the required (position, velocity) tuple."""
return de405reader(planet, epoch)
def angular_momentum(planet, epoch, context):
"""suppose we care about the angular momentum of the planet,
and that `cross` exists"""
r, v = context.state(planet, epoch)
return cross(r, v)
# a second alternative, a "Calculator" class that contains the context
class Calculator(object):
def __init__(self, context):
self._ctx = context
def angular_momentum(self, planet, epoch):
r, v = self._ctx.state(planet, epoch)
return cross(r, v)
# use as follows:
my_context = DE405Context()
now = now() # assume this function returns an epoch
# first case:
print angular_momentum("Saturn", now, my_context)
# second case:
calculator = Calculator(my_context)
print calculator.angular_momentum("Saturn", now)
Of course, I could add all the operations directly into "Context", but it does not feel right.
In real life, the Context not only computes positions of planets! It computes many more things, and it serves as the access point to a lot of data.
So, to make my question more succinct: how do you deal with objects which need to be accessed by many classes?
I am currently exploring: python's context manager, but without much luck. I also thought about dynamically adding a property "context" to all functions directly (functions are objects, so they can have an access point to arbitrary objects), i.e.:
def angular_momentum(self, planet, epoch):
r, v = angular_momentum.ctx.state(planet, epoch)
return cross(r, v)
# somewhere before calling anything...
import angular_momentum
angular_momentum.ctx = my_context
edit
Something that would be great, is to create a "calculation context" with a with statement, for example:
with my_context:
h = angular_momentum("Earth", now)
Of course, I can already do that if I simply write:
with my_context as ctx:
h = angular_momentum("Earth", now, ctx) # first implementation above
Maybe a variation of this with the Strategy pattern?

You generally don't want to "hide" anything in Python. You may want to signal human readers that they should treat it as "private", but this really just means "you should be able to understand my API even if you ignore this object", not "you can't access this".
The idiomatic way to do that in Python is to prefix it with an underscore—and, if your module might ever be used with from foo import *, add an explicit __all__ global that lists all the public exports. Again, neither of these will actually prevent anyone from seeing your variable, or even accessing it from outside after import foo.
See PEP 8 on Global Variable Names for more details.
Some style guides suggest special prefixes, all-caps-names, or other special distinguishing marks for globals, but PEP 8 specifically says that the conventions are the same, except for the __all__ and/or leading underscore.
Meanwhile, the behavior you want is clearly that of a global variable—a single object that everyone implicitly shares and references. Trying to disguise it as anything other than what it is will do you no good, except possibly for passing a lint check or a code review that you shouldn't have passed. All of the problems with global variables come from being a single object that everyone implicitly shares and references, not from being directly in the globals() dictionary or anything like that, so any decent fake global is just as bad as a real global. If that truly is the behavior you want, make it a global variable.
Putting it together:
# do not include _context here
__all__ = ['Context', 'DE405Context', 'Calculator', …
_context = Context()
Also, of course, you may want to call it something like _global_context or even _private_global_context, instead of just _context.
But keep in mind that globals are still members of a module, not of the entire universe, so even a public context will still be scoped as foo.context when client code does an import foo. And this may be exactly what you want. If you want a way for client scripts to import your module and then control its behavior, maybe foo.context = foo.Context(…) is exactly the right way. Of course this won't work in multithreaded (or gevent/coroutine/etc.) code, and it's inappropriate in various other cases, but if that's not an issue, in some cases, this is fine.
Since you brought up multithreading in your comments: In the simple style of multithreading where you have long-running jobs, the global style actually works perfectly fine, with a trivial change—replace the global Context with a global threading.local instance that contains a Context. Even in the style where you have small jobs handled by a thread pool, it's not much more complicated. You attach a context to each job, and then when a worker pulls a job off the queue, it sets the thread-local context to that job's context.
However, I'm not sure multithreading is going to be a good fit for your app anyway. Multithreading is great in Python when your tasks occasionally have to block for IO and you want to be able to do that without stopping other tasks—but, thanks to the GIL, it's nearly useless for parallelizing CPU work, and it sounds like that's what you're looking for. Multiprocessing (whether via the multiprocessing module or otherwise) may be more of what you're after. And with separate processes, keeping separate contexts is even simpler. (Or, you can write thread-based code and switch it to multiprocessing, leaving the threading.local variables as-is and only changing the way you spawn new tasks, and everything still works just fine.)
It may make sense to provide a "context" in the context manager sense, as an external version of the standard library's decimal module did, so someone can write:
with foo.Context(…):
# do stuff under custom context
# back to default context
However, nobody could really think of a good use case for that (especially since, at least in the naive implementation, it doesn't actually solve the threading/etc. problem), so it wasn't added to the standard library, and you may not need it either.
If you want to do this, it's pretty trivial. If you're using a private global, just add this to your Context class:
def __enter__(self):
global _context
self._stashedcontext = _context
_context = self
def __exit__(self, *args):
global context
_context = self._stashedcontext
And it should be obvious how to adjust this to public, thread-local, etc. alternatives.
Another alternative is to make everything a member of the Context object. The top-level module functions then just delegate to the global context, which has a reasonable default value. This is exactly how the standard library random module works—you can create a random.Random() and call randrange on it, or you can just call random.randrange(), which calls the same thing on a global default random.Random() object.
If creating a Context is too heavy to do at import time, especially if it might not get used (because nobody might ever call the global functions), you can use the singleton pattern to create it on first access. But that's rarely necessary. And when it's not, the code is trivial. For example, the source to random, starting at line 881, does this:
_inst = Random()
seed = _inst.seed
random = _inst.random
uniform = _inst.uniform
…
And that's all there is to it.
And finally, as you suggested, you could make everything a member of a different Calculator object which owns a Context object. This is the traditional OOP solution; overusing it tends to make Python feel like Java, but using it when it's appropriate is not a bad thing.

You might consider using a proxy object, here's a library that helps in creating object proxies:
http://pypi.python.org/pypi/ProxyTypes
Flask uses object proxies for it's "current_app", "request" and other variables, all it takes to reference them is:
from flask import request
You could create a proxy object that is a reference to your real context, and use thread locals to manage the instances (if that would work for you).

Related

Difference between calling a method vs using the field from __init__ in python within a class?

So I have a class with a couple of methods defined as:
class Recognizer(object):
def __init__(self):
self.image = None
self.reduced_image = None
def load_image(self, path):
self.image = cv2.imread(path)
return self.image
Say I wanna add a third method that uses a return value from load_image(). Should I define it like this:
def shrink_image(self):
self.reduced_img = cv2.resize(self.image, (300, 300))
return self.reduced_img
Or should I define it like this:
def shrink_image(self, path):
reduced_img = cv2.resize(self.load_image(path), (300, 300))
return reduced_img
What exactly is the difference between the two? I can see that I can have access to the fields inside of init from any method that I declare within that class so I guess if I update the fields within init I would be able to access those fields for a instance at a given time.
Is there a consensus on which way is better?
What exactly is the difference between the two?
In Python the function with the signature __init__ is the constructor of the object, which is invoked implicitly when calling it via (), such as Recognizer()
The term "better" is vague, because in the former example you are saving the image as a property on the object, hence making the object larger.
But in second example you are simply returning the data from the function, to be used by the caller.
So it's a matter of context and style.
A simple rule of thumb is if that you are going to be using the property reduced_img in the context of the Recognizer object then it would be ideal to save it as a property on the object, to be accessed via self. If the caller is simply using the reduced_img and Recognizer is unaware of any state changes, then it's fine to just return it from the function.
In the second way the variable is scoped to the shrink_image function.
In the first way the variable is scoped to the objects lifetime, and having self.reduced_img set is a side-effect of the method.
Only seeing your code sample, without seeing clients, the second case is "better", because reduced_img isn't used anywhere else, and is unecessary to bind it to the instance. There def may be a use case where you need to persist the last self.reduced_img call making it a necessary side-effect.
In general it is extremely helpful to minimize side effects. Having side effects especially ones that mutate state can make reasoning about your program more difficult.
This is especially seen when you have multiple accessors to your object.
Imagine having the first shrink_image, you release your program, you have a single client in a single call site of the program calling shrink_object, easy peasy. After the call self.reduced_img will be the result.
Imagine sharing the object between multiple call sites?? It introduces a temporal-ish coupling: you may no longer be able to make an assumption about what reduced_img is, and accesses to it before calling shrink_image may no longer be None, because there may be other callers!!!
Compare this to the second shrink image, callers no longer have the mutatable state, and it's easier to reason about the state of Recognizer instance across shrink_image calls.
Something really nuts happens for the first example when multiple concurrent calls are introduced. It goes from being difficult to reason about and potentially logically incorrect to being a synchronization and data race issue.
Without concurrent callers this isn't going to be an issue. But it's def a possibility, If you're using this call in a web framework and you create a single instance to share between multiple web worker processes you can get this implicit concurrency and could potentially, maybe be subject to race conditions :p

memory usage #on_trait_change vs _foo_changed()

I did built an application with Enthought Traits, which is using too much memory. I think, the problem is caused by trait notifications:
There seems to be a fundamental difference in memory usage of events caught by #on_trait_change or by using the special naming convention (e.g. _foo_changed() ). I made a little example with two classes Foo and FooDecorator, which i assumed to show exactly the same behaviour. But they don't!
from traits.api import *
class Foo(HasTraits):
a = List(Int)
def _a_changed(self):
pass
def _a_items_changed(self):
pass
class FooDecorator(HasTraits):
a = List(Int)
#on_trait_change('a[]')
def bar(self):
pass
if __name__ == '__main__':
n = 100000
c = FooDecorator
a = [c() for i in range(n)]
When running this script with c = Foo, Windows task manager shows a memory usage for the whole python process of 70MB, which stays constant for increasing n. For c = FooDecorator, the python process is using 450MB, increasing for higher n.
Can you please explain this behaviour to me?
EDIT: Maybe i should rephrase: Why would anyone choose FooDecorator over Foo?
EDIT 2: I just uninstalled python(x,y) 2.7.9 and installed the newest version of canopy with traits 4.5.0. Now the 450MB became 750MB.
EDIT 3: Compiled traits-4.6.0.dev0-py2.7-win-amd64 myself. The outcome is the same as in EDIT 2. So despite all plausibility https://github.com/enthought/traits/pull/248/files does not seem to be the cause.
I believe you are seeing the effect of a memory leak that has been fixed recently:
https://github.com/enthought/traits/pull/248/files
As for why one would use the decorator, in this particular instance the two versions are practically equivalent.
In general, the decorator is more flexible: you can give a list of traits to listen to, and you can use the extended name notation, as described here:
http://docs.enthought.com/traits/traits_user_manual/notification.html#semantics
For example, in this case:
class Bar(HasTraits):
b = Str
class FooDecorator(HasTraits):
a = List(Bar)
#on_trait_change('a.b')
def bar(self):
print 'change'
the bar notifier is going to be called for changes to the trait a, its items, and for the change of the trait b in each of the Bar items. Extended names can be quite powerful.
What's going on here is that Traits has two distinct ways of handling notifications: static notifiers and dynamic notifiers.
Static notifiers (such as those created by the specially-named _*_changed() methods) are fairly light-weight: each trait on an instance has a list of notifiers on t, which are basically the functions or methods with a lightweight wrapper.
Dynamic notifiers (such as those created with on_trait_change() and the extended trait name conventions like a[] are significantly more powerful and flexible, but as a result they are much more heavy-weight. In particular, in addition to the wrapper object they create, they also create a parsed representation of the extended trait name and a handler object, some of which are in-turn HasTraits subclass instances.
As a result, even for a simple expression like a[] there will be a fair number of new Python objects created, and these objects have to be created for every on_trait_change listener on every instance separately to properly handle corner-cases like instance traits. The relevant code is here: https://github.com/enthought/traits/blob/master/traits/has_traits.py#L2330
Base on the reported numbers, the majority of the difference in memory usage that you are seeing is in the creation of this dynamic listener infrastructure for each instance and each on_trait_change decorator.
It's worth noting that there is a short-circuit for on_trait_change in the case where you are using a simple trait name, in which case it generates a static trait notifier instead of a dynamic notifier. So if you were to instead write something like:
class FooSimpleDecorator(HasTraits):
a = List(Int)
#on_trait_change('a')
def a_updated(self):
pass
#on_trait_change('a_items')
def a_items_updated(self):
pass
you should see similar memory performance to the specially-named methods.
To answer the rephrased question about "why use on_trait_change", in FooDecorator you can write one method instead of two if your response to a change of either the list or any items in the list is the same. This makes code significantly easier to debug and maintain, and if you aren't creating thousands of these objects then the extra memory usage is negligible.
This becomes even more of a factor when you consider more sophisticated extended trait name patterns, where the dynamic listeners automatically handle changes which would otherwise require significant manual (and error-prone) code for hooking up and removing listeners from intermediate objects and traits. The power and simplicity of this approach usually outweighs the concerns about memory usage.

Explanation of importing Python classes

So I know this could be considered quite a broad quesiton, for which I am sorry, but I'm having problems understanding the whole importing and __init__ and self. things and all that... I've tried reading through the Python documentation and a few other tutorials, but this is my first language, and I'm a little (a lot) confused.
So far through my first semester at university I have learnt the very basics of Python, functions, numeric types, sequence types, basic logic stuff. But it's moving slower than I would like, so I took it upon myself to try learn a bit more and create a basic text based, strategy, resource management sorta game inspired by Ogame.
First problem I ran into was how to define each building, for example each mine, which produces resources. I did some research and found classes were useful, so I have something like this for each building:
class metal_mine:
level = 1
base_production = 15
cost_metal = 40
cost_crystal = 10
power_use = 10
def calc_production():
metal_mine.production = A formula goes here
def calc_cost_metal():
etc, same for crystal
def calc_power_use():
metal_mine.power_use = blah blah
def upgrade():
various things
solar_plant.calc_available_power()
It's kinda long, I left a lot out. Anyway, so the kinda important bit is that last bit, you see when I upgrade the mine, to determine if it has enough power to run, I calculate the power output of the solar plant which is in its own class (solar_plant.calc_output()), which contains many similar things to the metal mine class. If I throw everything in the same module, this all works fantastically, however with many buildings and research levels and the likes, it gets very long and I get lost in it.
So I tried to split it into different modules, so one for mines, one for storage buildings, one for research levels, etc. This makes everything very tidy, however I still need a way to call the functions in classes which are now part of a different module. My initial solution was to put, for example, from power import *, which for the most part, made the solar_plant class available in the metal_mine class. I say for the most part, because depending on the order in which I try to do things, sometimes it seems this doesn't work. The solar_plant class itself calls on variables from the metal_mine class, now I know this is getting very spagetti-ish..but I don't know of any better conventions to follow yet.
Anyway, sometimes when I call the solar_plant class, and it in turn tries to call the metal_mine class, it says that metal_mine is not defined, which leads me to think somehow the modules or classes need to be initialized? There seems to be a bit of looping between things in the code. And depending on the order in which I try and 'play the game', sometimes I am unintentionally doing this, sometimes I'm not. I haven't yet been taught the conventions and details of importing and reloading and all that..so I have no idea if I am taking the right approach or anything.
Provided everything I just said made sense, could I get some input on how I would properly go about making the contents of these various modules freely available and modifiable to others? Am I perhaps trying to split things into different modules which you wouldn't normally do, and I should just deal with the large module? Or am I importing things wrong? Or...?
And on a side note, in most tutorials and places I look for help on this, I see classes or functions full of self.something and the init function..can I get a explanation of this? Or a link to a simple first-time-programmer's tutorial?
==================UPDATE=================
Ok so too broad, like I thought it might be. Based on the help I got, I think I can narrow it down.
I sorted out what I think need to be the class variables, those which don't change - name, base_cost_metal, and base_cost_crystal, all the rest would depend on the players currently selected building of that type (supposing they could have multiple settlements).
To take a snippet of what I have now:
class metal_mine:
name = 'Metal Mine'
base_cost_metal = 60
base_cost_crystal = 15
def __init__(self):
self.level = 0
self.production = 30
self.cost_metal = 60
self.cost_crystal = 15
self.power_use = 0
self.efficiency = 1
def calc_production(self):
self.production = int(30 + (self.efficiency * int(30 * self.level * 1.1 * self.level)))
def calc_cost_metal(self):
self.cost_metal = int(metal_mine.base_cost_metal * 1.5 ** self.level)
So to my understanding, this is now a more correctly defined class? I define the instance variables with their starting values, which are then changed as the user plays.
In my main function where I begin the game, I would create an instance of each mine, say, player_metal_mine = metal_mine(), and then I call all the functions and variables with the likes of
>>> player_metal_mine.level
0
>>> player_metal_mine.upgrade()
>>> player_metal_mine.level
1
So if this is correctly defined, do I now just import each of my modules with these new templates for each building? and once they are imported, and an instance created, are all the new instances and their variables contained within the scope(right terminology?) of the main module, meaning no need for new importing or reloading?
Provided the answer to that is yes, I do just need to import, what method should I use? I understand there is just import mines for example, but that means I would have to use mines.player_metal_mine.upgrade() to use it, which is a tiny bit more typing thanusing the likes of from mines import *, or more particularly, from mines import metal_mine, though that last options means I need to individually import every building from every module. So like I said, provided, yes, I am just importing it, what method is best?
==================UPDATE 2================= (You can probably skip the wall of text and just read this)
So I went through everything, corrected all my classes, everything seems to be importing correctly using from module import *, but I am having issues with the scope of my variables representing the resource levels.
If everything was in 1 module, right at the top I would declare each variable and give it the beginning value, e.g. metal = 1000. Then, in any method of my classes which alters this, such as upgrading a building, costing resources, or in any function which alters this, like the one which periodically adds all the production to the current resource levels, I put global metal, for example, at the top. Then, within the function, I can call and alter the value of metal no problem.
However now that I am importing these classes and functions from various modules all into 1 module, functions cant find these variables. What I thought would happen was that in the process of importing I would basically be saying, take everything in this module, and pretend its now in this one, and work with it. But apparently that's not what is happening.
In my main module, I import all my modules using from mines import * for example and define the value of say, metal, to be 1000. Now I create an instance of a metal mine, `metal_mine_one = metal_mine(), and I can call its methods and variables, e.g.
>>> metal_mine_one.production
30
But when I try call a method like metal_mine_one.upgrade(), which contains global metal, and then metal -= self.cost_metal, it give me an error saying metal is not defined. Like I said, if this is all in 1 module, this problem doesn't happen, but if I try to import things, it does.
So how can I import these modules in a way which doesn't cause this problem, and makes variables in the global scope of my main module available to all functions and methods within all imported modules?
First a little background on object oriented programming. i.e. classes. You should think of a class like a blueprint, it shows how to make something. When you make a class it describes how to make an object to the program. a simple class in python might look like this.
class foo:
def __init__(self, bars_starting_value):
self.bar = bars_starting_value
def print_bar(self):
print(self.bar)
This tells python how to make a foo object. The init function is called a constructor. It is called when you make a new foo. The self is a way of referencing the foo that is running the function. In this case every foo has its own bar which can be accessed from within a foo by using self.bar. Note that you have to put a self as the first argument of the function definition this makes it so those functions belong to a single foo and not all of them.
One might use this class like this:
my_foo = foo(15)
my_other_foo = foo(100)
foo.print_bar()
foo.bar = 20
print(foo.bar)
my_other_foo.print_bar()
This would output
15
20
100
As far as imports go. They take all things that are defined in one file and move them to be defined in another. This is useful if you put the a class definition in a file you can import it into your main program file and make objects from there.
As far as making variables available to others, you could pass the power that has been generated from all the generators to the mine's function to determine if it has enough power.
Hope this helps.
A lot of things to cover here.. init is a builtin method that is automatically called when an instance of a class is created. In the code you provided you've created a class, now you need to create an instance of that class. A simpler example:
class Test:
def __init__(self):
print "this is called when you create an instance of this class"
def a_method(self):
return True
class_instance = Test()
>>> "this is called when you create an instance of this class"
class_instance.a_method()
>>> True
The first argument in a class method is *always itself. By convention we just call that argument 'self'. Your methods did not accept any arguments, make sure they accept self (or have the decorator #staticmethod above them). Also, make sure you refer to attributes (in you case methods) by self.a_method or class_instance.a_method

CherryPy, Threads, and Member Variables; potential issues?

Let's say I have the following simple class:
import cherrypy
import os
class test:
test_member = 0;
def __init__(self):
return
def index(self):
self.test_member = self.test_member + 1
return str(self.test_member)
index.exposed = True
conf = os.path.join(os.path.dirname(__file__), 'config.ini')
if __name__ == '__main__':
# CherryPy always starts with app.root when trying to map request URIs
# to objects, so we need to mount a request handler root. A request
# to '/' will be mapped to HelloWorld().index().
cherrypy.config.update({'server.socket_host': '0.0.0.0'})
cherrypy.quickstart(test(), config=conf)
else:
# This branch is for the test suite; you can ignore it.
cherrypy.config.update({'server.socket_host': '0.0.0.0'})
cherrypy.tree.mount(test(), config=conf)
So when I open my index page the first time I get back 1, the next time 2, then 3, 4, and so on. My questions are:
Are there any big dangers with this, particularly with threads and multiple people accessing the page at the same time?
Do I have to lock the member variable in some way each time it's written to in order to prevent issues?
Does anything change if I'm using a none basic data type as a member (such as my own, complicated class) rather than something as simple as an integer?
I don't totally understand how threading with CherryPy works, I suppose my concern in this simple example would be that on one thread the test_member could be equal to one thing, and when accessed from another thread it'd be something totally different. I apologize in advance if I'm missing something that's well documented, but some googling didn't really turn up what I was looking for. I understand for such a simple example there are a number of relatively easy paths that could solve potential problems here (keep the state of the variable in a database, or something along those lines), but that won't work in my actual use case.
There's a danger there of lost updates. Just setting the value shouldn't need to lock, since replacing an instance variable is atomic with respect to the GIL (assuming it doesn't call any special methods, etc). But incrementing or using more complex variables will need different schemes to make them threadsafe.
Shared access in CherryPy is generally no different than any other Python program. Rather than a long rehash of all those options here, it's best to direct you to http://effbot.org/zone/thread-synchronization.htm As it mentions, replacing an instance variable is probably atomic with respect to the GIL and thereby thread-safe, but incrementing is not.
CherryPy only adds some helpers in the opposite direction: when you don't want to share: the cherrypy.request and cherrypy.response objects are newly created (and properly destroyed) for each request/response--feel free to stick data in cherrypy.request.foo if you want to keep it around for the duration of the request only.

How to do cleanup reliably in python?

I have some ctypes bindings, and for each body.New I should call body.Free. The library I'm binding doesn't have allocation routines insulated out from the rest of the code (they can be called about anywhere there), and to use couple of useful features I need to make cyclic references.
I think It'd solve if I'd find a reliable way to hook destructor to an object. (weakrefs would help if they'd give me the callback just before the data is dropped.
So obviously this code megafails when I put in velocity_func:
class Body(object):
def __init__(self, mass, inertia):
self._body = body.New(mass, inertia)
def __del__(self):
print '__del__ %r' % self
if body:
body.Free(self._body)
...
def set_velocity_func(self, func):
self._body.contents.velocity_func = ctypes_wrapping(func)
I also tried to solve it through weakrefs, with those the things seem getting just worse, just only largely more unpredictable.
Even if I don't put in the velocity_func, there will appear cycles at least then when I do this:
class Toy(object):
def __init__(self, body):
self.body.owner = self
...
def collision(a, b, contacts):
whatever(a.body.owner)
So how to make sure Structures will get garbage collected, even if they are allocated/freed by the shared library?
There's repository if you are interested about more details: http://bitbucket.org/cheery/ctypes-chipmunk/
What you want to do, that is create an object that allocates things and then deallocates automatically when the object is no longer in use, is almost impossible in Python, unfortunately. The del statement is not guaranteed to be called, so you can't rely on that.
The standard way in Python is simply:
try:
allocate()
dostuff()
finally:
cleanup()
Or since 2.5 you can also create context-managers and use the with statement, which is a neater way of doing that.
But both of these are primarily for when you allocate/lock in the beginning of a code snippet. If you want to have things allocated for the whole run of the program, you need to allocate the resource at startup, before the main code of the program runs, and deallocate afterwards. There is one situation which isn't covered here, and that is when you want to allocate and deallocate many resources dynamically and use them in many places in the code. For example of you want a pool of memory buffers or similar. But most of those cases are for memory, which Python will handle for you, so you don't have to bother about those. There are of course cases where you want to have dynamic pool allocation of things that are NOT memory, and then you would want the type of deallocation you try in your example, and that is tricky to do with Python.
If weakrefs aren't broken, I guess this may work:
from weakref import ref
pointers = set()
class Pointer(object):
def __init__(self, cfun, ptr):
pointers.add(self)
self.ref = ref(ptr, self.cleanup)
self.data = cast(ptr, c_void_p).value # python cast it so smart, but it can't be smarter than this.
self.cfun = cfun
def cleanup(self, obj):
print 'cleanup 0x%x' % self.data
self.cfun(self.data)
pointers.remove(self)
def cleanup(cfun, ptr):
Pointer(cfun, ptr)
I yet try it. The important piece is that the Pointer doesn't have any strong references to the foreign pointer, except an integer. This should work if ctypes doesn't free memory that I should free with the bindings. Yeah, it's basicly a hack, but I think it may work better than the earlier things I've been trying.
Edit: Tried it, and it seem to work after small finetuning my code. A surprising thing is that even if I got del out from all of my structures, it seem to still fail. Interesting but frustrating.
Neither works, from some weird chance I've been able to drop away cyclic references in places, but things stay broke.
Edit: Well.. weakrefs WERE broken after all! so there's likely no solution for reliable cleanup in python, except than forcing it being explicit.
In CPython, __del__ is a reliable destructor of an object, because it will always be called when the reference count reaches zero (note: there may be cases - like circular references of items with __del__ method defined - where the reference count will never reaches zero, but that is another issue).
Update
From the comments, I understand the problem is related to the order of destruction of objects: body is a global object, and it is being destroyed before all other objects, thus it is no longer available to them.
Actually, using global objects is not good; not only because of issues like this one, but also because of maintenance.
I would then change your class with something like this
class Body(object):
def __init__(self, mass, inertia):
self._bodyref = body
self._body = body.New(mass, inertia)
def __del__(self):
print '__del__ %r' % self
if body:
body.Free(self._body)
...
def set_velocity_func(self, func):
self._body.contents.velocity_func = ctypes_wrapping(func)
A couple of notes:
The change is only adding a reference to the global body object, that thus will live at least as much as all the objects derived from that class.
Still, using a global object is not good because of unit testing and maintenance; better would be to have a factory for the object, that will set the correct "body" to the class, and in case of unit test will easily put a mock object. But that's really up to you and how much effort you think makes sense in this project.

Categories

Resources