I have a library which stores additional data for foreign user objects in a WeakKeyDictionary:
extra_stuff = weakref.WeakKeyDictionary()
def get_extra_stuff_for_obj(o):
return extra_stuff[o]
When user object is copied, I want the copy to have the same extra stuff. However, I have limited control over the user object. I would like to define a class decorator for user object classes which will be used in this manner:
def has_extra_stuff(klass):
def copy_with_hook(self):
new = magic_goes_here(self)
extra_stuff[new] = extra_stuff[self]
klass.__copy__ = copy_with_hook
return klass
This is easy if klass already defines __copy__, because I can close copy_with_hook over the original and call it. However, typically it's not defined. What to call here? This obviously can't be copy.copy, because that would result in infinite recursion.
I found this question which appears to ask the exact same question, but afaict the answer is wrong because this results in a deepcopy, not a copy. I would also be unable to do this, as I need to install hooks for both deepcopy and copy. (Incidentally, I would have continued the discussion in that question, but having no reputation I am not able to do this.)
I looked at what the copy module does, which is a bunch of voodoo involving __reduce_ex(). I can obviously cut/paste this into my code, or call its private methods directly, but I would consider this an absolute last resort. This seems like such a simple thing, I'm convinced I'm missing a simple solution.
Essentially, you need to (A) copy and preserve the original __copy__ if present (and delegate to it), otherwise (B) trick copy.copy into not using your newly-added __copy__ (and delegate to copy,copy).
So, for example...:
import copy
import threading
copylock = threading.RLock()
def has_extra_stuff(klass):
def simple_copy_with_hook(self):
with copylock:
new = original_copy(self)
extra_stuff[new] = extra_stuff[self]
def tricky_case(self):
with copylock:
try:
klass.__copy__ = None
new = copy.copy(self)
finally:
klass.__copy__ = tricky_case
extra_stuff[new] = extra_stuff[self]
original_copy = getattr(klass, '__copy__', None)
if original_copy is None:
klass.__copy__ = tricky_case
else:
klass.__copy__ = simple_copy_with_hook
return klass
Not the most elegant code ever written, but at least it just plays around with klass, without monkey-patching nor copy-and-pasting copy.py itself:-)
Added: since the OP mentioned in a comment he can't use this solution because the app is multi-threaded, added appropriate locking to make it actually usable. Using a single global re-entrant lock to ensure against deadlocks due to out-of-order acquires of multiple locks among multiple threads, and perhaps over-locked "just in case" although I suspect the simple case and the dict assignent in the tricky case probably don't need the lock... but, when threading threatens, better safe than sorry:-)
After some playing I've come up with the following:
import copy_reg, copy
# Library
def hook(new):
print "new object: %s" % new
def pickle_hooked(o):
pickle = o.__reduce_ex__(2)
creator = pickle[0]
def creator_hook(*args, **kwargs):
new = creator(*args, **kwargs)
hook(new)
return new
return (creator_hook,) + pickle[1:]
def with_copy_hook(klass):
copy_reg.pickle(klass, pickle_hooked)
return klass
# Application
#with_copy_hook
class A(object):
def __init__(self, value):
self.value = value
This registers a pass-through copy hook which also has the advantage of working for both copy and deepcopy. The only detail of the return value of reduce_ex it needs to concern itself with is that the first element in the tuple is a creator function. All other details are handed off to existing library code. It is not perfect, because I still don't see a way of detecting if the target class has already registered a pickler.
Related
I have a class that will always have only 1 object at the time. I'm just starting OOP in python and I was wondering what is a better approach: to assign an instance of this class to the variable and operate on that variable or rather have this instance referenced in the class variable instead. Here is an example of what I mean:
Referenced instance:
def Transaction(object):
current_transaction = None
in_progress = False
def __init__(self):
self.__class__.current_transaction = self
self.__class__.in_progress = True
self.name = 'abc'
self.value = 50
def update(self):
do_smth()
Transaction()
if Transaction.in_progress:
Transaction.current_transaction.update()
print Transaction.current_transaction.name
print Transaction.current_transaction.value
instance in a variable
def Transaction(object):
def __init__(self):
self.name = 'abc'
self.value = 50
def update(self):
do_smth()
current_transaction = Transaction()
in_progress = True
if in_progress:
current_transaction.update()
print current_transaction.name
print current_transaction.value
It's possible to see that you've encapsulated too much in the first case just by comparing the overall readability of the code: the second is much cleaner.
A better way to implement the first option is to use class methods: decorate all your method with #classmethod and then call with Transaction.method().
There's no practical difference in code quality for these two options. However, assuming that the the class is final, that is, without derived classes, I would go for a third choice: use the module as a singleton and kill the class. This would be the most compact and most readable choice. You don't need classes to create sigletons.
I think the first version doesn't make much sense, and the second version of your code would be better in almost all situations. It can sometimes be useful to write a Singleton class (where only one instance ever exists) by overriding __new__ to always return the saved instance (after it's been created the first time). But usually you don't need that unless you're wrapping some external resource that really only ever makes sense to exist once.
If your other code needs to share a single instance, there are other ways to do so (e.g. a global variable in some module or a constructor argument for each other object that needs a reference).
Note that if your instances have a very well defined life cycle, with specific events that should happen when they're created and destroyed, and unknown code running and using the object in between, the context manager protocol may be something you should look at, as it lets you use your instances in with statements:
with Transaction() as trans:
trans.whatever() # the Transaction will be notified if anything raises
other_stuff() # an exception that is not caught within the with block
trans.foo() # (so it can do a rollback if it wants to)
foo() # the Transaction will be cleaned up (e.g. committed) when the indented with block ends
Implementing the context manager protocol requires an __enter__ and __exit__ method.
I'm coming from the C# world, so my views may be a little skewed. I'm looking to do DI in Python, however I'm noticing a trend with libraries where they all appear to rely on a service locator. That is, you must tie your object creation to the framework, such as injectlib.build(MyClass) in order to get an instance of MyClass.
Here is an example of what I mean -
from injector import Injector, inject
class Inner(object):
def __init__(self):
self.foo = 'foo'
class Outer(object):
#inject(inner=Inner)
def __init__(self, inner=None):
if inner is None:
print('inner not provided')
self.inner = Inner()
else:
print('inner provided')
self.inner = inner
injector = Injector()
outer = Outer()
print(outer.inner.foo)
outer = injector.get(Outer)
print(outer.inner.foo)
Is there a way in Python to create a class while automatically inferring dependency types based on parameter names? So if I have a constructor parameter called my_class, then an instance of MyClass will be injected. Reason I ask is that I don't see how I could inject a dependency into a class that gets created automatically via a third party library.
To answer the question you explicitly asked: no, there's no built-in way in Python to automatically get a MyClass object from a parameter named my_class.
That said, neither "tying your object creation to the framework" nor the example code you gave seem terribly Pythonic, and this question in general is kind of confusing because DI in dynamic languages isn't really a big deal.
For general thoughts about DI in Python I'd say this presentation gives a pretty good overview of different approaches. For your specific question, I'll give two options based on what you might be trying to do.
If you're trying to add DI to your own classes, I would use paramaters with default values in the constructor, as that presentation shows. E.g:
import time
class Example(object):
def __init__(self, sleep_func=time.sleep):
self.sleep_func = sleep_func
def foo(self):
self.sleep_func(10)
print('Done!')
And then you could just pass in a dummy sleep function for testing or whatever.
If you're trying to manipulate a library's classes through DI, (not something I can really imagine a use case for, but seems like what you're asking) then I would probably just monkey patch those classes to change whatever needed changing. E.g:
import test_module
def dummy_sleep(*args, **kwargs):
pass
test_module.time.sleep = dummy_sleep
e = test_module.Example()
e.foo()
My Situation
I'm currently writing on a project in python which I want to use to learn a bit more about software architecture. I've read a few texts and watched a couple of talks about dependency injection and learned to love how clear constructor injection shows the dependencies of an object.
However, I'm kind of struggling how to get a dependency passed to an object. I decided NOT to use a DI framework since:
I don't have enough knowledge of DI to specify my requirements and thus cannot choose a framework.
I want to keep the code free of more "magical" stuff since I have the feeling that introducing a seldom used framework drastically decreases readability. (More code to read of which only a small part is used).
Thus, I'm using custom factory functions to create objects and explicitly pass their dependencies:
# Business and Data Objects
class Foo:
def __init__(self,bar):
self.bar = bar
def do_stuff(self):
print(self.bar)
class Bar:
def __init__(self,prefix):
self.prefix = prefix
def __str__(self):
return str(self.prefix)+"Hello"
# Wiring up dependencies
def create_bar():
return Bar("Bar says: ")
def create_foo():
return Foo(create_bar())
# Starting the application
f = create_foo()
f.do_stuff()
Alternatively, if Foo has to create a number of Bars itself, it gets the creator function passed through its constructor:
# Business and Data Objects
class Foo:
def __init__(self,create_bar):
self.create_bar = create_bar
def do_stuff(self,times):
for _ in range(times):
bar = self.create_bar()
print(bar)
class Bar:
def __init__(self,greeting):
self.greeting = greeting
def __str__(self):
return self.greeting
# Wiring up dependencies
def create_bar():
return Bar("Hello World")
def create_foo():
return Foo(create_bar)
# Starting the application
f = create_foo()
f.do_stuff(3)
While I'd love to hear improvement suggestions on the code, this is not really the point of this post. However, I feel that this introduction is required to understand
My Question
While the above looks rather clear, readable and understandable to me, I run into a problem when the prefix dependency of Bar is required to be identical in the context of each Foo object and thus is coupled to the Foo object lifetime. As an example consider a prefix which implements a counter (See code examples below for implementation details).
I have two Ideas how to realize this, however, none of them seems perfect to me:
1) Pass Prefix through Foo
The first idea is to add a constructor parameter to Foo and make it store the prefix in each Foo instance.
The obvious drawback is, that it mixes up the responsibilities of Foo. It controls the business logic AND provides one of the dependencies to Bar. Once Bar does not require the dependency any more, Foo has to be modified. Seems like a no-go for me. Since I don't really think this should be a solution, I did not post the code here, but provided it on pastebin for the very interested reader ;)
2) Use Functions with State
Instead of placing the Prefix object inside Foo this approach is trying to encapsulate it inside the create_foo function. By creating one Prefix for each Foo object and referencing it in a nameless function using lambda, I keep the details (a.k.a there-is-a-prefix-object) away from Foo and inside my wiring-logic. Of course a named function would work, too (but lambda is shorter).
# Business and Data Objects
class Foo:
def __init__(self,create_bar):
self.create_bar = create_bar
def do_stuff(self,times):
for _ in range(times):
bar = self.create_bar()
print(bar)
class Bar:
def __init__(self,prefix):
self.prefix = prefix
def __str__(self):
return str(self.prefix)+"Hello"
class Prefix:
def __init__(self,name):
self.name = name
self.count = 0
def __str__(self):
self.count +=1
return self.name+" "+str(self.count)+": "
# Wiring up dependencies
def create_bar(prefix):
return Bar(prefix)
def create_prefix(name):
return Prefix(name)
def create_foo(name):
prefix = create_prefix(name)
return Foo(lambda : create_bar(prefix))
# Starting the application
f1 = create_foo("foo1")
f2 = create_foo("foo2")
f1.do_stuff(3)
f2.do_stuff(2)
f1.do_stuff(2)
This approach seems much more useful to me. However, I'm not sure about common practices and thus fear that having state inside functions is not really recommended. Coming from a java/C++ background, I'd expect a function to be dependent on its parameters, its class members (if it's a method) or some global state. Thus, a parameterless function that does not use global state would have to return exactly the same value every time it is called. This is not the case here. Once the returned object is modified (which means that counter in prefix has been increased), the function returns an object which has a different state than it had when beeing returned the first time.
Is this assumption just caused by my restricted experience in python and do I have to change my mindset, i.e. don't think of functions but of something callable? Or is supplying functions with state an unintended misuse of lambda?
3) Using a Callable Class
To overcome my doubts on stateful functions I could use callable classes where the create_foo function of approach 2 would be replaced by this:
class BarCreator:
def __init__(self, prefix):
self.prefix = prefix
def __call__(self):
return create_bar(self.prefix)
def create_foo(name):
return Foo(BarCreator(create_prefix(name)))
While this seems a usable solution for me, it is sooo much more verbose.
Summary
I'm not absolutely sure how to handle the situation. Although I prefer number 2 I still have my doubts. Furthermore, I'm still hope that anyone comes up with a more elegant way.
Please comment, if there is anything you think is too vague or can be possibly misunderstood. I will improve the question as far as my abilities allow me to do :)
All examples should run under python2.7 and python3 - if you experience any problems, please report them in the comments and I'll try to fix my code.
If you want to inject a callable object but don't want it to have a complex setup -- if, as in your example, it's really just binding to a single input value -- you could try using functools.partial to provide a function <> value pair:
def factory_function(arg):
#processing here
return configurted_object_base_on_arg
class Consumer(object):
def __init__(self, injection):
self._injected = injection
def use_injected_value():
print self._injected()
injectable = functools.partial(factory_function, 'this is the configuration argument')
example = Consumer(injectable)
example.use_injected_value() # should return the result of your factory function and argument
As an aside, if you're creating a dependency injection setup like your option 3, you probably want to put the knwledge about how to do the configuration into a factory class rather than doing it inline as you're doing here. That way you can swap out factories if you want to choose between strategies. It's not functionally very different (unless the creation is more complex than this example and involves persistent state) but it's more flexible down the road if the code looks like
factory = FooBarFactory()
bar1 = factory.create_bar()
alt_factory = FooBlahFactory(extra_info)
bar2 = alt_factory.create_bar()
Can someone explain why the following code behaves the way it does:
import types
class Dummy():
def __init__(self, name):
self.name = name
def __del__(self):
print "delete",self.name
d1 = Dummy("d1")
del d1
d1 = None
print "after d1"
d2 = Dummy("d2")
def func(self):
print "func called"
d2.func = types.MethodType(func, d2)
d2.func()
del d2
d2 = None
print "after d2"
d3 = Dummy("d3")
def func(self):
print "func called"
d3.func = types.MethodType(func, d3)
d3.func()
d3.func = None
del d3
d3 = None
print "after d3"
The output (note that the destructor for d2 is never called) is this (python 2.7)
delete d1
after d1
func called
after d2
func called
delete d3
after d3
Is there a way to "fix" the code so the destructor is called without deleting the method added? I mean, the best place to put the d2.func = None would be in the destructor!
Thanks
[edit] Based on the first few answers, I'd like to clarify that I'm not asking about the merits (or lack thereof) of using __del__. I tried to create the shortest function that would demonstrate what I consider to be non-intuitive behavior. I'm assuming a circular reference has been created, but I'm not sure why. If possible, I'd like to know how to avoid the circular reference....
You cannot assume that __del__ will ever be called - it is not a place to hope that resources are automagically deallocated. If you want to make sure that a (non-memory) resource is released, you should make a release() or similar method and then call that explicitly (or use it in a context manager as pointed out by Thanatos in comments below).
At the very least you should read the __del__ documentation very closely, and then you should probably not try to use __del__. (Also refer to the gc.garbage documentation for other bad things about __del__)
I'm providing my own answer because, while I appreciate the advice to avoid __del__, my question was how to get it to work properly for the code sample provided.
Short version: The following code uses weakref to avoid the circular reference. I thought I'd tried this before posting the question, but I guess I must have done something wrong.
import types, weakref
class Dummy():
def __init__(self, name):
self.name = name
def __del__(self):
print "delete",self.name
d2 = Dummy("d2")
def func(self):
print "func called"
d2.func = types.MethodType(func, weakref.ref(d2)) #This works
#d2.func = func.__get__(weakref.ref(d2), Dummy) #This works too
d2.func()
del d2
d2 = None
print "after d2"
Longer version:
When I posted the question, I did search for similar questions. I know you can use with instead, and that the prevailing sentiment is that __del__ is BAD.
Using with makes sense, but only in certain situations. Opening a file, reading it, and closing it is a good example where with is a perfectly good solution. You've gone a specific block of code where the object is needed, and you want to clean up the object and the end of the block.
A database connection seems to be used often as an example that doesn't work well using with, since you usually need to leave the section of code that creates the connection and have the connection closed in a more event-driven (rather than sequential) timeframe.
If with is not the right solution, I see two alternatives:
You make sure __del__ works (see this blog for a better
description of weakref usage)
You use the atexit module to run a callback when your program closes. See this topic for example.
While I tried to provide simplified code, my real problem is more event-driven, so with is not an appropriate solution (with is fine for the simplified code). I also wanted to avoid atexit, as my program can be long-running, and I want to be able to perform the cleanup as soon as possible.
So, in this specific case, I find it to be the best solution to use weakref and prevent circular references that would prevent __del__ from working.
This may be an exception to the rule, but there are use-cases where using weakref and __del__ is the right implementation, IMHO.
Instead of del, you can use the with operator.
http://effbot.org/zone/python-with-statement.htm
just like with filetype objects, you could something like
with Dummy('d1') as d:
#stuff
#d's __exit__ method is guaranteed to have been called
del doesn't call __del__
del in the way you are using removes a local variable. __del__ is called when the object is destroyed. Python as a language makes no guarantees as to when it will destroy an object.
CPython as the most common implementation of Python, uses reference counting. As a result del will often work as you expect. However it will not work in the case that you have a reference cycle.
d3 -> d3.func -> d3
Python doesn't detect this and so won't clean it up right away. And its not just reference cycles. If an exception is throw you probably want to still call your destructor. However, Python will typically hold onto to the local variables as part of its traceback.
The solution is not to depend on the __del__ method. Rather, use a context manager.
class Dummy:
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
print "Destroying", self
with Dummy() as dummy:
# Do whatever you want with dummy in here
# __exit__ will be called before you get here
This is guaranteed to work, and you can even check the parameters to see whether you are handling an exception and do something different in that case.
A full example of a context manager.
class Dummy(object):
def __init__(self, name):
self.name = name
def __enter__(self):
return self
def __exit__(self, exct_type, exce_value, traceback):
print 'cleanup:', d
def __repr__(self):
return 'Dummy(%r)' % (self.name,)
with Dummy("foo") as d:
print 'using:', d
print 'later:', d
It seems to me the real heart of the matter is here:
adding the functions is dynamic (at runtime) and not known in advance
I sense that what you are really after is a flexible way to bind different functionality to an object representing program state, also known as polymorphism. Python does that quite well, not by attaching/detaching methods, but by instantiating different classes. I suggest you look again at your class organization. Perhaps you need to separate a core, persistent data object from transient state objects. Use the has-a paradigm rather than is-a: each time state changes, you either wrap the core data in a state object, or you assign the new state object to an attribute of the core.
If you're sure you can't use that kind of pythonic OOP, you could still work around your problem another way by defining all your functions in the class to begin with and subsequently binding them to additional instance attributes (unless you're compiling these functions on the fly from user input):
class LongRunning(object):
def bark_loudly(self):
print("WOOF WOOF")
def bark_softly(self):
print("woof woof")
while True:
d = LongRunning()
d.bark = d.bark_loudly
d.bark()
d.bark = d.bark_softly
d.bark()
An alternative solution to using weakref is to dynamically bind the function to the instance only when it is called by overriding __getattr__ or __getattribute__ on the class to return func.__get__(self, type(self)) instead of just func for functions bound to the instance. This is how functions defined on the class behave. Unfortunately (for some use cases) python doesn't perform the same logic for functions attached to the instance itself, but you can modify it to do this. I've had similar problems with descriptors bound to instances. Performance here probably isn't as good as using weakref, but it is an option that will work transparently for any dynamically assigned function with the use of only python builtins.
If you find yourself doing this often, you might want a custom metaclass that does dynamic binding of instance-level functions.
Another alternative is to add the function directly to the class, which will then properly perform the binding when it's called. For a lot of use cases, this would have some headaches involved: namely, properly namespacing the functions so they don't collide. The instance id could be used for this, though, since the id in cPython isn't guaranteed unique over the life of the program, you'd need to ponder this a bit to make sure it works for your use case... in particular, you probably need to make sure you delete the class function when an object goes out of scope, and thus its id/memory address is available again. __del__ is perfect for this :). Alternatively, you could clear out all methods namespaced to the instance on object creation (in __init__ or __new__).
Another alternative (rather than messing with python magic methods) is to explicitly add a method for calling your dynamically bound functions. This has the downside that your users can't call your function using normal python syntax:
class MyClass(object):
def dynamic_func(self, func_name):
return getattr(self, func_name).__get__(self, type(self))
def call_dynamic_func(self, func_name, *args, **kwargs):
return getattr(self, func_name).__get__(self, type(self))(*args, **kwargs)
"""
Alternate without using descriptor functionality:
def call_dynamic_func(self, func_name, *args, **kwargs):
return getattr(self, func_name)(self, *args, **kwargs)
"""
Just to make this post complete, I'll show your weakref option as well:
import weakref
inst = MyClass()
def func(self):
print 'My func'
# You could also use the types modules, but the descriptor method is cleaner IMO
inst.func = func.__get__(weakref.ref(inst), type(inst))
use eval()
In [1]: int('25.0')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-67d52e3d0c17> in <module>
----> 1 int('25.0')
ValueError: invalid literal for int() with base 10: '25.0'
In [2]: int(float('25.0'))
Out[2]: 25
In [3]: eval('25.0')
Out[3]: 25.0
The standard way of doing singletons in Python is
class Singleton(object):
_instance = None
def __new__(cls, *args, **kwargs):
if not cls._instance:
cls._instance = super(Singleton, cls).__new__(cls, *args, **kwargs)
return cls._instance
However, this doesn't work on App Engine, since there are may be many servers and we would get one instance per server. So how would we do it for an app engine entity?
Something like:
class MySingleton(db.models):
def __init__(self):
all = MySingleton.all()
if all.count() > 0:
return all.fetch(1).get()
super(MySingleton, self).__init__ (*args, **kwargs)
This leads to a recusion error, since get() calls __init__.
How we're going to use it:
We just want to represent a configuration file, ie:
{ 'sitename': "My site", 'footer': "This page owned by X"}
Singletons are usually a bad idea, and I'd be interested to see what makes this an exception. Typically they're just globals in disguise, and apart from all the old problems with globals (eg. see http://c2.com/cgi/wiki?GlobalVariablesAreBad, in particular the bit at the top talking about non-locality, implicit coupling, concurrency issues, and testing and confinement), in the modern world you get additional problems caused by distributed and concurrent systems. If your app is potentially running across multiple servers, can you meaningfully have both instances of your application operate on the same singleton instance both safely and correctly?
If the object has no state of its, then the answer is yes, but you don't need a singleton, just a namespace.
But if the object does have some state, you need to worry about how the two application instances are going to keep the details synchronised. If two instances try reading and then writing to the same instance concurrently then your results are likely to be wrong. (eg. A HitCounter singleton that reads the current value, adds 1, and writes the current value, can miss hits this way - and that's about the least damaging example I can think of.)
I am largely unfamiliar with it, so perhaps Google App Engine has some transactional logic to handle all this for you, but that presumably means you'll have to add some extra stuff in to deal with rollbacks and the like.
So my basic advice would be to see if you can rewrite the algorithm or system without resorting to using a singleton.
If you aren't going to store the data in the datastore, why don't you just create a module with variables instead of a db.Model?
Name your file mysettings.py and inside it write:
sitename = "My site"
footer = "This page owned by X"
Then the python module effectively becomes a "singleton". You can even add functions, if needed. To use it, you do something like this:
import mysettings
print mysettings.sitename
That's how django deals with this with their DJANGO_SETTINGS_MODULE
Update:
It sounds like you really want to use a db.Model, but use memcached so you only retrieve one object once. But you'll have to come up with a way to flush it when you change data, or have it have a timeout so that it gets get'd occasionally. I'd probably go with the timeout version and do something like this in mysettings.py:
from google.appengine.api import memcache
class MySettings(db.Model):
# properties...
def Settings():
key = "mysettings"
obj = memcache.get(key)
if obj is None:
obj = MySettings.all().get() # assume there is only one
if obj:
memcache.add(key, zone, 360)
else:
logging.error("no MySettings found, create one!")
return obj
Or, if you don't want to use memcache, then just store the object in a module level variable and always use the Settings() function to reference it. But then you'll have to implement a way to flush it until the interpreter instance is recycled. I would normally use memcached for this sort of functionality.
__init__ cannot usefully return anything: just like in the first example, override __new__ instead!
I don't think there's a real "singleton" object you can hold in a distributed environment with multiple instances running. The closest you can come to this is using memcache.
Perhaps it's better to think less in terms of singletons and more in terms of data consistency. For this App Engine provides transactions, which allow you to trap any changes in an entity that might happen while you're working with that entity.