I was trying to monkey patch the NetworkX Graph object without typing out
networkx.Graph.method_name = method_name
for every single method I defined. I tried this (minimal version):
import networkx
class _GraphExtended (networkx.Graph):
def is_nonnull(self):
return bool(self.nodes())
for key in _GraphExtended.__dict__:
nx.Graph[key] = _GraphExtended[key]
and I got the error "'type' object is not subscriptable" for every key. How do I monkey patch all methods using a loop?
Analysis of your current approach
You are using subscript notation via the square brackets. Normally, you would type my_object[key], which is translated as a first approximation* into my_object.__getitem__(key).
In particular, if the type(my_object) does not define the __getitem__ attribute, then you effectively get an error that says that type(my_object) is not subscriptable.
In your case, type(_GraphExtended) == type holds true. Furthermore, the type class does not define any __getitem__ attribute. Therefore, this is why you get the error message that type is not subscriptable.
*For the sake of completeness, a more accurate translation would be along the lines of: object.__getattribute__(my_object, '__getitem__')(key).
What you probably intended
What you probably intended was to set the 'method_name' attribute of the networkx.Graph object. In general, this can be accomplished by using the setattr built-in function, as follows:
setattr(networkx.Graph, key, value)
Also, _GraphExtended.__dict__ contains many more keys than what you intend to monkey patch. You may be able to filter out those that start and end with double underscore, but I am neither confident that this filter works under all circumstances nor confident that it is forward-compatible with Python.
Pitfalls to monkey patching
Firstly, monkey patching may break forward-compatibility with the networkx library. There is no guarantee that future versions of networkx will avoid the same method names that you have chosen to monkey patch.
Secondly, monkey patching will prevent you from writing reusable code. It is no longer possible for other developers to reuse your convenience function(s) unless they themselves monkey patch their code, and there may likely be unforeseen reasons that prevent this from being possible.
Practical advice
Don't Do It. I must warn you that the monkey patching library code is very poor style, and should only be used as a last resort in the world of programming (e.g. if it were to have a positive and measurable effect on business revenue or a related resource thereof such as development time).
What are the underlying concern(s) that you wish to solve? I would be willing to followup with alternative solutions that address each underlying concern you may have.
Also, have you considered the simple approach of defining a helper module containing helper functions, such as:
# Module graph_utils
def is_nonnull(graph):
return bool(graph.nodes())
Other notes
Python already has a convention to handle boolean contexts: anything considered empty should also be considered False. For example, according to networkx documentation, the Graph class defines a __len__ method that returns the number of nodes. Because of __len__, Python allows using Graph objects in contexts where a bool is expected. For instance,
graph = networkx.Graph()
print(not graph) # Prints True iff len(graph) == 0
if graph:
print('Graph is nonnull.')
else:
print('Graph is null.')
Related
In Python 2, it was possible to convert arbitrary callables to methods of a class. Importantly, if the callable was a CPython built-in implemented in C, you could use this to make methods of user-defined classes that were C layer themselves, invoking no byte code when called.
This is occasionally useful if you're relying on the GIL to provide "lock-free" synchronization; since the GIL can only be swapped out between op codes, if all the steps in a particular part of your code can be pushed to C, you can make it behave atomically.
In Python 2, you could do something like this:
import types
from operator import attrgetter
class Foo(object):
... This class maintains a member named length storing the length...
def __len__(self):
return self.length # We don't want this, because we're trying to push all work to C
# Instead, we explicitly make an unbound method that uses attrgetter to achieve
# the same result as above __len__, but without no byte code invoked to satisfy it
Foo.__len__ = types.MethodType(attrgetter('length'), None, Foo)
In Python 3, there is no longer an unbound method type, and types.MethodType only takes two arguments and creates only bound methods (which is not useful for Python special methods like __len__, __hash__, etc., since special methods are often looked up directly on the type, not the instance).
Is there some way of accomplishing this in Py3 that I'm missing?
Things I've looked at:
functools.partialmethod (appears to not have a C implementation, so it fails the requirements, and between the Python implementation and being much more general purpose than I need, it's slow, taking about 5 us in my tests, vs. ~200-300 ns for direct Python definitions or attrgetter in Py2, a roughly 20x increase in overhead)
Trying to make attrgetter or the like follow the non-data descriptor protocol (not possible AFAICT, can't monkey-patch in a __get__ or the like)
Trying to find a way to subclass attrgetter to give it a __get__, but of course, the __get__ needs to be delegated to C layer somehow, and now we're back where we started
(Specific to attrgetter use case) Using __slots__ to make the member a descriptor in the first place, then trying to somehow convert from the resulting descriptor for the data into something that skips the final step of binding and acquiring the real value to something that makes it callable so the real value retrieval is deferred
I can't swear I didn't miss something for any of those options though. Anyone have any solutions? Total hackery is allowed; I recognize I'm doing pathological things here. Ideally it would be flexible (to let you make something that behaves like an unbound method out of a class, a Python built-in function like hex, len, etc., or any other callable object not defined at the Python layer). Importantly, it needs to attach to the class, not each instance (both to reduce per-instance overhead, and to work correctly for dunder special methods, which bypass instance lookup in most cases).
Found a (probably CPython only) solution to this recently. It's a little ugly, being a ctypes hack to directly invoke CPython APIs, but it works, and gets the desired performance:
import ctypes
from operator import attrgetter
make_instance_method = ctypes.pythonapi.PyInstanceMethod_New
make_instance_method.argtypes = (ctypes.py_object,)
make_instance_method.restype = ctypes.py_object
class Foo:
# ... This class maintains a member named length storing the length...
# Defines a __len__ method that, at the C level, fetches self.length
__len__ = make_instance_method(attrgetter('length'))
It's an improvement over the Python 2 version in one way, since, as it doesn't need the class to be defined to make an unbound method for it, you can define it in the class body by simple assignment (where the Python 2 version must explicitly reference Foo twice in Foo.__len__ = types.MethodType(attrgetter('length'), None, Foo), and only after class Foo has finished being defined).
On the other hand, it doesn't actually provide a performance benefit on CPython 3.7 AFAICT, at least not for the simple case here where it's replacing def __len__(self): return self.length; in fact, for __len__ accessed via len(instance) on an instance of Foo, ipython %%timeit microbenchmarks show len(instance) is ~10% slower when __len__ is defined via __len__ = make_instance_method(attrgetter('length')), . This is likely an artifact of attrgetter itself having slightly higher overhead due to CPython not having moved it to the "FastCall" protocol (called "Vectorcall" in 3.8 when it was made semi-public for provisional third-party use), while user-defined functions already benefit from it in 3.7, as well as having to dynamically choose whether to perform dotted or undotted attribute lookup and single or multiple attribute lookup each time (which Vectorcall might be able to avoid by choosing a __call__ implementation appropriate to the gets being performed at construction time) adds more overhead that the plain method avoids. It should win for more complicated cases (say, if the attribute to be retrieved is a nested attribute like self.contained.length), since attrgetter's overhead is largely fixed, while nested attribute lookup in Python means more byte code, but right now, it's not useful very often.
If they ever get around to optimizing operator.attrgetter for Vectorcall, I'll rebenchmark and update this answer.
I did built an application with Enthought Traits, which is using too much memory. I think, the problem is caused by trait notifications:
There seems to be a fundamental difference in memory usage of events caught by #on_trait_change or by using the special naming convention (e.g. _foo_changed() ). I made a little example with two classes Foo and FooDecorator, which i assumed to show exactly the same behaviour. But they don't!
from traits.api import *
class Foo(HasTraits):
a = List(Int)
def _a_changed(self):
pass
def _a_items_changed(self):
pass
class FooDecorator(HasTraits):
a = List(Int)
#on_trait_change('a[]')
def bar(self):
pass
if __name__ == '__main__':
n = 100000
c = FooDecorator
a = [c() for i in range(n)]
When running this script with c = Foo, Windows task manager shows a memory usage for the whole python process of 70MB, which stays constant for increasing n. For c = FooDecorator, the python process is using 450MB, increasing for higher n.
Can you please explain this behaviour to me?
EDIT: Maybe i should rephrase: Why would anyone choose FooDecorator over Foo?
EDIT 2: I just uninstalled python(x,y) 2.7.9 and installed the newest version of canopy with traits 4.5.0. Now the 450MB became 750MB.
EDIT 3: Compiled traits-4.6.0.dev0-py2.7-win-amd64 myself. The outcome is the same as in EDIT 2. So despite all plausibility https://github.com/enthought/traits/pull/248/files does not seem to be the cause.
I believe you are seeing the effect of a memory leak that has been fixed recently:
https://github.com/enthought/traits/pull/248/files
As for why one would use the decorator, in this particular instance the two versions are practically equivalent.
In general, the decorator is more flexible: you can give a list of traits to listen to, and you can use the extended name notation, as described here:
http://docs.enthought.com/traits/traits_user_manual/notification.html#semantics
For example, in this case:
class Bar(HasTraits):
b = Str
class FooDecorator(HasTraits):
a = List(Bar)
#on_trait_change('a.b')
def bar(self):
print 'change'
the bar notifier is going to be called for changes to the trait a, its items, and for the change of the trait b in each of the Bar items. Extended names can be quite powerful.
What's going on here is that Traits has two distinct ways of handling notifications: static notifiers and dynamic notifiers.
Static notifiers (such as those created by the specially-named _*_changed() methods) are fairly light-weight: each trait on an instance has a list of notifiers on t, which are basically the functions or methods with a lightweight wrapper.
Dynamic notifiers (such as those created with on_trait_change() and the extended trait name conventions like a[] are significantly more powerful and flexible, but as a result they are much more heavy-weight. In particular, in addition to the wrapper object they create, they also create a parsed representation of the extended trait name and a handler object, some of which are in-turn HasTraits subclass instances.
As a result, even for a simple expression like a[] there will be a fair number of new Python objects created, and these objects have to be created for every on_trait_change listener on every instance separately to properly handle corner-cases like instance traits. The relevant code is here: https://github.com/enthought/traits/blob/master/traits/has_traits.py#L2330
Base on the reported numbers, the majority of the difference in memory usage that you are seeing is in the creation of this dynamic listener infrastructure for each instance and each on_trait_change decorator.
It's worth noting that there is a short-circuit for on_trait_change in the case where you are using a simple trait name, in which case it generates a static trait notifier instead of a dynamic notifier. So if you were to instead write something like:
class FooSimpleDecorator(HasTraits):
a = List(Int)
#on_trait_change('a')
def a_updated(self):
pass
#on_trait_change('a_items')
def a_items_updated(self):
pass
you should see similar memory performance to the specially-named methods.
To answer the rephrased question about "why use on_trait_change", in FooDecorator you can write one method instead of two if your response to a change of either the list or any items in the list is the same. This makes code significantly easier to debug and maintain, and if you aren't creating thousands of these objects then the extra memory usage is negligible.
This becomes even more of a factor when you consider more sophisticated extended trait name patterns, where the dynamic listeners automatically handle changes which would otherwise require significant manual (and error-prone) code for hooking up and removing listeners from intermediate objects and traits. The power and simplicity of this approach usually outweighs the concerns about memory usage.
I know this may sound like a stupid question, especially to someone who knows python's nature, but I was just wondering, is there a way to know if an object "implements an interface" so as to say?
To give an example of what I want to say:
let's say I have this function:
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
My question is: Is there a way to make sure that the object passed to the function is iterable? I know that in Java or C# I could do this by having the method accept any object that implements a specific interface, let's say (for example) iIterable like this: void get_counts(iIterable sequence)
My guess is that in Python I would have to employ preemptive introspection checks (in a decorator perhaps?) and throw a custom exception if the object doesn't have an __iter__ attribute). But is there a more pythonic way to do this?
Use polymorphism and duck-typing before isinstance() or interfaces
You generally define what you want to do with your objects, then either use polymorphism to adjust how each object responds to what you want to do, or you use duck typing; test if the object at hand can do the thing you want to do in the first place. This is the invocation versus introspection trade-off, conventional wisdom states that invocation is preferable over introspection, but in Python, duck-typing is preferred over isinstance testing.
So you need to work out why you need to filter on wether or not something is iterable in the first place; why do you need to know this? Just use a try: iter(object), except TypeError: # not iterable to test.
Or perhaps you just need to throw an exception if whatever that was passed was not an iterable, as that would signal an error.
ABCs
With duck-typing, you may find that you have to test for multiple methods, and thus a isinstance() test may look a better option. In such cases, using a Abstract Base Class (ABC) could also be an option; using an ABC let's you 'paint' several different classes as being the right type for a given operation, for example. Using a ABC let's you focus on the tasks that need to be performed rather than the specific implementations used; you can have a Paintable ABC, a Printable ABC, etc.
Zope interfaces and component architecture
If you find your application is using an awful lot of ABCs or you keep having to add polymorphic methods to your classes to deal with various different situations, the next step is to consider using a full-blown component architecture, such as the Zope Component Architecture (ZCA).
zope.interface interfaces are ABCs on steroids, especially when combined with the ZCA adapters. Interfaces document expected behaviour of a class:
if IFrobnarIterable.providedBy(yourobject):
# it'll support iteration and yield Frobnars.
but it also let's you look up adapters; instead of putting all the behaviours for every use of shapes in your classes, you implement adapters to provide polymorphic behaviours for specific use-cases. You can adapt your objects to be printable, or iterable, or exportable to XML:
class FrobnarsXMLExport(object):
adapts(IFrobnarIterable)
provides(IXMLExport)
def __init__(self, frobnariterator):
self.frobnars = frobnariterator
def export(self):
entries = []
for frobnar in self.frobnars:
entries.append(
u'<frobnar><width>{0}</width><height>{0}</height></frobnar>'.format(
frobnar.width, frobnar.height)
return u''.join(entries)
and your code merely has to look up adapters for each shape:
for obj in setofobjects:
self.result.append(IXMLExport(obj).export())
Python (since 2.6) has abstract base classes (aka virtual interfaces), which are more flexible than Java or C# interfaces. To check whether an object is iterable, use collections.Iterable:
if isinstance(obj, collections.Iterable):
...
However, if your else block would just raise an exception, then the most Python answer is: don't check! It's up to your caller to pass in an appropriate type; you just need to document that you're expecting an iterable object.
The Pythonic way is to use duck typing and, "ask forgiveness, not permission". This usually means performing an operation in a try block assuming it behaves the way you expect it to, then handling other cases in an except block.
I think this is the way that the community would recommend you do it:
import sys
def do_something(foo):
try:
for i in foo:
process(i)
except:
t, ex, tb = sys.exc_info()
if "is not iterable" in ex.message:
print "Is not iterable"
do_something(True)
Or, you could use something like zope.interface.
It seems there are different ways the __repr__ function can return.
I have a class InfoObj that stores a number of things, some of which I don't particularly want users of the class to set by themselves. I recognize nothing is protected in python and they could just dive in and set it anyway, but seems defining it in __init__ makes it more likely someone might see it and assume it's fine to just pass it in.
(Example: Booleans that get set by a validation function when it determines that the object has been fully populated, and values that get calculated from other values when enough information is stored to do so... e.g. A = B + C, so once A and B are set then C is calculated and the object is marked Valid=True.)
So, given all that, which is the best way to design the output of __ repr__?
bob = InfoObj(Name="Bob")
# Populate bob.
# Output type A:
bob.__repr__()
'<InfoObj object at 0x1b91ca42>'
# Output type B:
bob.__repr__()
'InfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
# Output type C:
bob.__repr__()
'InfoObj.NewInfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
... the point of type C would be to not happily take all the stuff I'd set 'private' in C++ as arguments to the constructor, and make teammates using the class set it up using the interface functions even if it's more work for them. In that case I would define a constructor that does not take certain things in, and a separate function that's slightly harder to notice, for the purposes of __repr__
If it makes any difference, I am planning to store these python objects in a database using their __repr__ output and retrieve them using eval(), at least unless I come up with a better way. The consequence of a teammate creating a full object manually instead of going through the proper interface functions is just that one type of info retrieval might be unstable until someone figures out what he did.
The __repr__ method is designed to produce the most useful output for the developer, not the enduser, so only you can really answer this question. However, I'd typically go with option B. Option A isn't very useful, and option C is needlessly verbose -- you don't know how your module is imported anyway. Others may prefer option C.
However, if you want to store Python objects is a database, use pickle.
import pickle
bob = InfoObj(Name="Bob")
> pickle.dumps(bob)
b'...some bytestring representation of Bob...'
> pickle.loads(pickle.dumps(bob))
Bob(...)
If you're using older Python (pre-3.x), then note that cPickle is faster, but pickle is more extensible. Pickle will work on some of your classes without any configuration, but for more complicated objects you might want to write custom picklers.
I'm a bit surprised by Python's extensive use of 'magic methods'.
For example, in order for a class to declare that instances have a "length", it implements a __len__ method, which it is called when you write len(obj). Why not just define a len method which is called directly as a member of the object, e.g. obj.len()?
See also: Why does Python code use len() function instead of a length method?
AFAIK, len is special in this respect and has historical roots.
Here's a quote from the FAQ:
Why does Python use methods for some
functionality (e.g. list.index()) but
functions for other (e.g. len(list))?
The major reason is history. Functions
were used for those operations that
were generic for a group of types and
which were intended to work even for
objects that didn’t have methods at
all (e.g. tuples). It is also
convenient to have a function that can
readily be applied to an amorphous
collection of objects when you use the
functional features of Python (map(),
apply() et al).
In fact, implementing len(), max(),
min() as a built-in function is
actually less code than implementing
them as methods for each type. One can
quibble about individual cases but
it’s a part of Python, and it’s too
late to make such fundamental changes
now. The functions have to remain to
avoid massive code breakage.
The other "magical methods" (actually called special method in the Python folklore) make lots of sense, and similar functionality exists in other languages. They're mostly used for code that gets called implicitly when special syntax is used.
For example:
overloaded operators (exist in C++ and others)
constructor/destructor
hooks for accessing attributes
tools for metaprogramming
and so on...
From the Zen of Python:
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
This is one of the reasons - with custom methods, developers would be free to choose a different method name, like getLength(), length(), getlength() or whatsoever. Python enforces strict naming so that the common function len() can be used.
All operations that are common for many types of objects are put into magic methods, like __nonzero__, __len__ or __repr__. They are mostly optional, though.
Operator overloading is also done with magic methods (e.g. __le__), so it makes sense to use them for other common operations, too.
Python uses the word "magic methods", because those methods really performs magic for you program. One of the biggest advantages of using Python's magic methods is that they provide a simple way to make objects behave like built-in types. That means you can avoid ugly, counter-intuitive, and nonstandard ways of performing basic operators.
Consider a following example:
dict1 = {1 : "ABC"}
dict2 = {2 : "EFG"}
dict1 + dict2
Traceback (most recent call last):
File "python", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict' and 'dict'
This gives an error, because the dictionary type doesn't support addition. Now, let's extend dictionary class and add "__add__" magic method:
class AddableDict(dict):
def __add__(self, otherObj):
self.update(otherObj)
return AddableDict(self)
dict1 = AddableDict({1 : "ABC"})
dict2 = AddableDict({2 : "EFG"})
print (dict1 + dict2)
Now, it gives following output.
{1: 'ABC', 2: 'EFG'}
Thus, by adding this method, suddenly magic has happened and the error you were getting earlier, has gone away.
I hope, it makes things clear to you. For more information, refer to:
A Guide to Python's Magic Methods (Rafe Kettler, 2012)
Some of these functions do more than a single method would be able to implement (without abstract methods on a superclass). For instance bool() acts kind of like this:
def bool(obj):
if hasattr(obj, '__nonzero__'):
return bool(obj.__nonzero__())
elif hasattr(obj, '__len__'):
if obj.__len__():
return True
else:
return False
return True
You can also be 100% sure that bool() will always return True or False; if you relied on a method you couldn't be entirely sure what you'd get back.
Some other functions that have relatively complicated implementations (more complicated than the underlying magic methods are likely to be) are iter() and cmp(), and all the attribute methods (getattr, setattr and delattr). Things like int also access magic methods when doing coercion (you can implement __int__), but do double duty as types. len(obj) is actually the one case where I don't believe it's ever different from obj.__len__().
They are not really "magic names". It's just the interface an object has to implement to provide a given service. In this sense, they are not more magic than any predefined interface definition you have to reimplement.
While the reason is mostly historic, there are some peculiarities in Python's len that make the use of a function instead of a method appropriate.
Some operations in Python are implemented as methods, for example list.index and dict.append, while others are implemented as callables and magic methods, for example str and iter and reversed. The two groups differ enough so the different approach is justified:
They are common.
str, int and friends are types. It makes more sense to call the constructor.
The implementation differs from the function call. For example, iter might call __getitem__ if __iter__ isn't available, and supports additional arguments that don't fit in a method call. For the same reason it.next() has been changed to next(it) in recent versions of Python - it makes more sense.
Some of these are close relatives of operators. There's syntax for calling __iter__ and __next__ - it's called the for loop. For consistency, a function is better. And it makes it better for certain optimisations.
Some of the functions are simply way too similar to the rest in some way - repr acts like str does. Having str(x) versus x.repr() would be confusing.
Some of them rarely use the actual implementation method, for example isinstance.
Some of them are actual operators, getattr(x, 'a') is another way of doing x.a and getattr shares many of the aforementioned qualities.
I personally call the first group method-like and the second group operator-like. It's not a very good distinction, but I hope it helps somehow.
Having said this, len doesn't exactly fit in the second group. It's more close to the operations in the first one, with the only difference that it's way more common than almost any of them. But the only thing that it does is calling __len__, and it's very close to L.index. However, there are some differences. For example, __len__ might be called for the implementation of other features, such as bool, if the method was called len you might break bool(x) with custom len method that does completely different thing.
In short, you have a set of very common features that classes might implement that might be accessed through an operator, through a special function (that usually does more than the implementation, as an operator would), during object construction, and all of them share some common traits. All the rest is a method. And len is somewhat of an exception to that rule.
There is not a lot to add to the above two posts, but all the "magic" functions are not really magic at all. They are part of the __ builtins__ module which is implicitly/automatically imported when the interpreter starts. I.e.:
from __builtins__ import *
happens every time before your program starts.
I always thought it would be more correct if Python only did this for the interactive shell, and required scripts to import the various parts from builtins they needed. Also probably different __ main__ handling would be nice in shells vs interactive. Anyway, check out all the functions, and see what it is like without them:
dir (__builtins__)
...
del __builtins__
Perhaps, you have noticed it is possible to use certain built-in methods (ex. len(my_list_or_my_string)), and syntaxes (ex. my_list_or_my_string[:3], my_fancy_dict['some_key']) on some native types such as list, dict. Maybe you have been curious as to why it is not possible (yet) to use these same syntaxes on some of the classes you have written.
Variables of native types (list, dict, int, str) have unique behaviours and respond to certain syntaxes because they have some special methods defined in their respective classes — these methods are called Magic Methods.
A few magic methods include: __len__, __gt__, __eq__, etc.
Read more here: https://tomisin.dev/blog/supercharging-python-classes-with-magic-methods