I am curious about memory consumption / performance of python related to nested classes vs class attributes.
If i have classes called OtherClass, ClassA, ClassB, ClassC where OtherClass needs access to limited attributes of ClassA-C. Assuming ClassA-C are large classes with many attributes, methods, and properties. Which one of these scenarios is more efficient.
Option 1:
def OtherClass(object):
def __init__(self, classa, classb, classc):
self.classa
self.classb
self.classc
Option 2:
def OtherClass(object):
def __init__(self, classa_id, classa_atr1, classa_atr2,
classb_id, classb_atr1, classb_atr2,
classc_id, classc_atr1, classc_atr2):
self.classa_id
self.classb_id
self.classc_id
self.classa_atr1
self.classb_atr1
self.classc_atr1
self.classa_atr2
self.classb_atr2
self.classc_atr2
I imagine option 1 is better, since the 3 attributes will simply link to the class instance already existing in memory. Where option 2 is adding 6 additional attributes per instance to memory. Is this correct?
TL;DR
My answer is that you should prefer option 1 for it's simplicity and better OOP design, and avoid premature optimization.
The Rest
I think the efficiency question here is dwarfed by how difficult it will be in the future to maintain your second option. If one object needs to use attributes of another object (your example code uses a form of composition), then it should have those objects as attributes, rather than creating extra references directly to the object attributes it needs. Your first option is the way to go. The first option supports encapsulation, option 2 very clearly violates it. (Granted, encapsulation isn't as strongly enforced in Python as some langauages, like Java, but it's still a good principle to follow).
The only efficiency-related reason you should prefer number two is if you find your code is slow, you profile, and your profiling shows that these extra lookups are indeed your bottleneck. Then you could consider sacrificing things like ease of maintenance for the speed you need. It is possible that the extra layer of references (foo = self.classa.bar() vs. foo = self.bar()) could slow things down if you're using them in tight loops, but it's not likely.
In fact, I would go one step further and say you should modify your code so that OtherClass actually instantiates the object it needs, rather than having them passed in. With Option 1, if I want to use OtherClass, I have to do this:
classa = ClassA(class_a_init_args)
classb = ClassC(class_b_init_args)
classc = ClassC(class_c_init_args)
otherclass_obj = OtherClass(classa_obj, classb_obj, classc_obj)
That's too much setup required just to instantiate OtherClass. Instead, change OtherClass to this:
def OtherClass(object):
def __init__(self, classa_init_args, classb_init_args, classc_init_args):
self.classa = ClassA(class_a_init_args)
self.classb = ClassC(class_b_init_args)
self.classc = ClassC(class_c_init_args)
Now instantiating an OtherClass object is simply this:
otherclass_obj = OtherClass(classa_init_args, classb_init_args, classc_init_args)
If possible, another option may be possible to reconfigure your class so that you don't even have to instantiate the other classes! Have a look at Class Attributes and the classmethod decorator. That allows you to do things like this:
class foo(object):
bar = 2
#classmethod
def frobble(self):
return "I didn't even have to be instantiated!"
print(foo.bar)
print(foo.frobble())
This code prints this:
2
I didn't even have to be instantiated!
If your OtherClass uses attributes or methods of classa, classb, and classc that don't need to be tied to an instance of those classes, consider using them directly via class methods and attributes instead of instantiating the objects. That would actually save you the most memory by avoiding the creation of entire objects.
Related
Overview
I need to duplicate a whole inheritance tree of classes. Simply deep-copying the class objects does not work; a proper factory pattern involves a huge amount of code changes; I'm not sure how to use metaclasses to accomplish this.
Background
The software I work on implements support for specialized external hardware, connected to the host computer via USB. Many years ago, it was assumed that there would only ever be one type of hardware in use at a time. Consequently, the hardware object is used as a singleton. Along the years, secondary classes were configured based on the currently active hardware class.
At the moment, it is impossible to use this library with two types of hardware at the same time, since the classobjects cannot be configured for both hardware at the same time.
In recent years, we have avoided this issue by creating one python process for each hardware, but this is becoming untenable.
Here is an extremely simplified example of the architecture:
# ----------
# Hardware classes
class HwBase():
def customizeComponent(self, compDict):
compDict['ComponentBase'].hardware = self
class HwA(HwBase):
def customizeComponent(self, compDict):
super().customizeComponent(compDict)
compDict['AnotherComponent'].prop.configure(1,2,3)
class HwB(HwBase):
def customizeComponent(self, compDict):
super().customizeComponent(compDict)
compDict['AnotherComponent'].prop.configure(4,5,6)
# ----------
# Property classes
class SpecialProperty(property):
def __init__(self, fvalidate):
self.fvalidate = fvalidate
# handle fset, fget, etc. here.
# super().__init__()
# ----------
# Component classes
class ComponentBase():
hardware = None
def validateProp(self, val):
return val < self.maxVal
prop = SpecialProperty(fvalidate=validateProp)
class SomeComponent():
"""Users directly instantiate and use this compoent via an interactive shell.
This component does complex operations with the hardware attribute"""
def validateThing(self, val):
return isinstance(val, ComponentBase)
thing = SpecialProperty(fvalidate=validateThing)
class AnotherComponent():
"""Users directly instantiate and use this compoent via an interactive shell
This component does complex operations with the hardware attribute"""
maxVal = 15
# ----------
# Initialization
def initialize():
""" This is only called once perppython instance."""
#activeCls = HwA
activeCls = HwB
allComponents = {
'ComponentBase': ComponentBase,
'SomeComponent': SomeComponent,
'AnotherComponent': AnotherComponent
}
hwInstance = activeCls()
hwInstance.customizeComponent(allComponents)
return allComponents
components = initialize()
# ----------
# User code goes here
someInstance1 = components['SomeComponent']()
someInstance2 = components['SomeComponent']()
someInstance1.prop = 10
someInstance2.prop = 10
The overarching goal would be to interact with both HwA and HwB at the same time. Since most interactions are done via components instead of the Hw objects themselves, I believe the solution involves having multiple versions of the components, e.g.: two separate inheritance trees, for a total of 6 final components, one tree/set configured for each hardware. This is what I need help with.
Potential solutions
Consider that I have around tens different hardware do configure for. Furthermore, there are hundreds of different leaf components classes, with many extra bases and mixin classes.
Move all configuration steps in the component's init method
Not possible due to the use of properties; these need to be set on the class.
Deepcopy the classobjects
Copy all classobjects, swap in the appropriate __bases__. Mutable class variables need to be carefully handled. However, I'm not sure how to deal with properties for this, since classbody references within the property objects (such as fvalidate) need to be updated to that of the copied class.
This requires a significant amount of manual intervention to work. Not impossible, but prone to breaking in the long term.
Factory pattern
Wrap all component definition in a factory function:
def ComponentBaseFactory(hw):
class SomeComponent(cache[hw].ComponentBase):
pass
and have some sort of component cache which would handle creating all classobjects during initialize()
This is what I consider the most architecturally-correct option available. Since the class body is re-executed
on every factory call, the attributes of the properties will reference the appropriate class object.
Downside: huge code footprint. I am familiar with doing codebase-wide changes via sed or python scripts, but this would be quite a lot.
Add metaclasses on components
I am not sure how to proceed for this. Based on the python data model (py3.7), the following happens at class creation (which happens right after the class definition indentation ends):
MRO entries are resolved;
the appropriate metaclass is determined;
the class namespace is prepared;
the class body is executed;
the class object is created.
I would need to redo these steps after the class has been defined (like a factory function!), but i'm not sure how to redo step 4. Specifically, the python documentation states in section 3.3.3.5 that the class body is executed as with a "special?" form of the exec() builtin. How can I re-exec the class body with a different set of locals/globals? Even if I access the class body's code with inspect shenanigans, i'm not sure i'll be able to reproduce the module environment properly.
Even if I mess with __prepare__ and __new__, I don't see how I can fix the cross-references introduced in the class code block regarding the property instantiation.
Components as metaclasses
A metaclass is a class factory, just like a class is an object factory. SomeComponent and AnotherComponent could be declared as metaclasses, then get instantiated with the Hw object during initialize():
SomeComponent = SomeComponentMeta(hw)
This is similar to the factory pattern, but would also require quite a few code changes: a lot of class code would have to be moved to the metaclass __init__.
I'd have to spend a lot more of time here to proper understand what you need, but if your "TL;DR" of executing the class body with different globals/nonlocal variables is the bottom line, the factory approach is a very clean and readable way, as you had considered.
At first, I don't think a metaclass could be a good approach here - although it could be used to customize your special properties (in my first read, I could not figure out what they actually do, and how they should differ between your final classes). If the function as a class factory can specialize your properties, it would work nonetheless.
If what you need is that the properties are independent for Hwa and HwB like in accessing a different list object in HwA than is accessed in HwB, yes, a metaclass could take care of that, by automatically recreating any properties when creating a subclass (so that the property objects themselves are not shared with the supper-classes and across the hierarchy).
If that i what you need, leave a comment, I can write some proof of concept code.
Anyway, it is possible to create a metaclass that, upon instantiating a subclass, will look upon the hierarchy for all SpecialProperty and create new-instances of those for the subclass - so that a base value set on a superclass remains valid for the subclasses, but when configuration runs, each class will have an independent configuration. (as it turns out, no metaclass is needed: we are covered by __init_subclass__ )
Another thing to take care of is that subclassses of property cannot be simply copies with Python's copy.copy (tested empirically), so we need a way to create reliable copies of those. I include one function bellow, but it might need to be improved to work with the actual SpecialProperty class.
from copy import copy
def copy_property(prop):
cls = prop.__class__
new_prop = cls.__new__(cls)
# Initialize the attributes that can't be set from Python code, inplace:
property.__init__(new_prop, prop.fget, prop.fset, prop.fdel)
if hasattr(prop, "__dict__"): # only exists for subclasses of property
# Possible adaptation needed: it may be that for some attributes of
# SpecialProperty, a deepcopy would be needed.
# But for the given example attribute of "fvalidate" a simple copy is better:
new_prop.__dict__ = copy(prop.__dict__)
return new_prop
# Python 3.6 introduced `__init_subclass__` which is called at subclass _creation_
# time. With it, the logic can be inserted in ComponentBase and there is no need for
# a metaclass.
class ComponentBase():
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
for attrname in dir(cls):
attr = getattr(cls, attrname)
if not isinstance(attr, SpecialProperty):
continue
new_prop = copy_property(attr)
setattr(cls, attrname, new_prop)
hardware = None
...
As you see- theres some workarounds that had to be done because your project opted for subclassing property. I am leaving this remark here as a remainder that unless property fits one exact needs, it is more clean to write a new class implementing the Descriptor Protocol - just by implementing __set__, __get__ and __delete__ directly.
I have a library with one parent and a dozen of children:
# mylib1.py:
#
class Foo(object):
def __init__(self, a):
self.a = a
class FooChild(Foo):
def __init__(self, a, b):
super(FooChild, self).__init__(a)
self.b = b
# more children here...
Now I want to extend that library with a simple (but a bit spesific, for use in another approach) method. So I would like to change parent class and use it's children.
# mylib2.py:
#
import mylib1
def fooMethod(self):
print 'a={}, b={}'.format(self.a, self.b)
setattr(mylib1.Foo, 'fooMethod', fooMethod)
And now I can use it like this:
# way ONE:
import mylib2
fc = mylib2.mylib1.FooChild(3, 4)
fc.fooMethod()
or like this:
# way TWO:
# order doesn't matter here:
import mylib1
import mylib2
fc = mylib1.FooChild(3, 4)
fc.fooMethod()
So, my questions are:
Is this good thing?
How this should be done in a better way?
A common approach is to use mixin
If you want, you could add dynamically How do I dynamically add mixins as base classes without getting MRO errors?.
There is a general rule in programming, that you should avoid dependence on global state. Which in other words means that your globals should be if possible constant. Classes are (mostly) globals.
Your approach is called monkey patching. And if you don't have a really really good reason to explain it, you should avoid it. This is because monkey patching violates the above rule.
Imagine you have two separate modules and both of them use this approach. One of them sets Foo.fooMethod to some method. The other - to another. Then you somehow switch control between these modules. The result would be, that it would be hard to determine what fooMethod is used where. This means hard to debug problems.
There are people (e.g. Brandon Craig-Rhodes), who believe that patching is bad even in tests.
What I would suggest is to use some attribute that you would set when instantiating instances of your Foo() class (and its children), that would control the behaviour of your fooMethod. Then the behaviour of this method would depend on how you instantiated the object, not on global state.
By using the #property decorator, Python has completely eliminated the need for getters and setters on object properties (some might say 'attributes'). This makes code much simpler, while maintaining the extensibility when things do need to get more complex.
I was wondering what the Pythonic approach to the following kind of method is, though. Say I have the following class:
class A(object):
def is_winner(self):
return True # typically a more arcane method to determine the answer
Such methods typically take no arguments, and have no side effects. One might call these predicates. And given their name, they often closely resemble something one might also have stored as a property.
I am inclined to add a #property decorator to the above, in order to be able to call it as an object property (i.e. foo.is_winner), but I was wondering if this is the standard thing to do. At first glance, I could not find any documentation on this subject. Is there a common standard for this situation?
It seems that the general consensus is that attributes are generally seen as being instant and next-to-free to use, so if the computation being decorated as a #property is expensive, it's probably best to either cache the outcome for repeated use (#Martijn Pieters) or to leave it as a method, as methods are generally expected to take more time than attribute lookups. PEP 8 notes specifically:
Note 2: Try to keep the functional behavior side-effect free, although side-effects such as caching are generally fine.
Note 3: Avoid using properties for computationally expensive operations; the attribute notation makes the caller believe that access is (relatively) cheap.
One particular use case of the #property decorator is to add some behavior to a class without requiring that users of the class change from foo.bar references to foo.bar() calls -- for example, if you wanted to count the number of times that an attribute was referenced, you could convert the attribute into a #property where the decorated method manipulates some state before returning the requested data.
Here is an example of the original class:
class Cat(object):
def __init__(self, name):
self.name = name
# In user code
baxter = Cat('Baxter')
print(baxter.name) # => Baxter
With the #property decorator, we can now add some under-the-hood machinery without affecting the user code:
class Cat(object):
def __init__(self, name):
self._name = name
self._name_access_count = 0
#property
def name(self):
self._name_access_count += 1
return self._name
# User code remains unchanged
baxter = Cat('Baxter')
print(baxter.name) # => Baxter
# Also have information available about the number of times baxter's name was accessed
print(baxter._name_access_count) # => 1
baxter.name # => 'Baxter'
print(baxter._name_access_count) # => 2
This treatment of the #property decorator has been mentioned in some blog posts(1, 2) as one of the main use cases -- allowing us to initially write the simplest code possible, and then later on switch over to #propery-decorated methods when we need the functionality.
I usually use classes similarly to how one might use namedtuple (except of course that the attributes are mutable). Moreover, I try to put lengthy functions in classes that won't be instantiated as frequently, to help conserve memory.
From a memory point of view, is it inefficient to put functions in classes, if it is expected that the class will be instantiated often? Keeping aside that it's good design to compartmentalize functionality, should this be something to be worried about?
Methods don't add any weight to an instance of your class. The method itself only exists once and is parameterized in terms of the object on which it operates. That's why you have a self parameter.
Python doesn't maintain pointers directly to its methods in instances of new-style classes. Instead, it maintains a single pointer to the parent class. Consider the following example:
class Foo:
def bar(self):
print 'hello'
f = Foo()
f.bar()
In order to dispatch the bar method from the instance f, two lookups need to be made. Instead of f containing a method table to look for bar, f contains a reference to the class object Foo. Foo contains the method table, where it calls bar with f as the first argument. So f.bar() can be rewritten as
Foo.bar(f)
Instances of a class have one pointer that refers to the class; all other features of the class are unique and accessed through that pointer. Something like
foo.bar()
really translates to something like
foo.__class__.bar(foo)
so methods are unique, long-lived objects belonging to the class that take the instance as an argument when called.
Each object has its own copy of data members whereas the the member functions are shared. The compiler creates one copy of the member functions separate from all objects of the class. All the objects of the class share this one copy.
The whole point of OOP is to combine data and functions together. Without OOP, the data cannot be reused, only the functions can be reused.
I'm interested in hearing some discussion about class attributes in Python. For example, what is a good use case for class attributes? For the most part, I can not come up with a case where a class attribute is preferable to using a module level attribute. If this is true, then why have them around?
The problem I have with them, is that it is almost too easy to clobber a class attribute value by mistake, and then your "global" value has turned into a local instance attribute.
Feel free to comment on how you would handle the following situations:
Constant values used by a class and/or sub-classes. This may include "magic number" dictionary keys or list indexes that will never change, but possible need one-time initialization.
Default class attribute, that in rare occasions updated for a special instance of the class.
Global data structure used to represent an internal state of a class shared between all instances.
A class that initializes a number of default attributes, not influenced by constructor arguments.
Some Related Posts:
Difference Between Class and Instance Attributes
#4:
I never use class attributes to initialize default instance attributes (the ones you normally put in __init__). For example:
class Obj(object):
def __init__(self):
self.users = 0
and never:
class Obj(object):
users = 0
Why? Because it's inconsistent: it doesn't do what you want when you assign anything but an invariant object:
class Obj(object):
users = []
causes the users list to be shared across all objects, which in this case isn't wanted. It's confusing to split these into class attributes and assignments in __init__ depending on their type, so I always put them all in __init__, which I find clearer anyway.
As for the rest, I generally put class-specific values inside the class. This isn't so much because globals are "evil"--they're not so big a deal as in some languages, because they're still scoped to the module, unless the module itself is too big--but if external code wants to access them, it's handy to have all of the relevant values in one place. For example, in module.py:
class Obj(object):
class Exception(Exception): pass
...
and then:
from module import Obj
try:
o = Obj()
o.go()
except o.Exception:
print "error"
Aside from allowing subclasses to change the value (which isn't always wanted anyway), it means I don't have to laboriously import exception names and a bunch of other stuff needed to use Obj. "from module import Obj, ObjException, ..." gets tiresome quickly.
what is a good use case for class attributes
Case 0. Class methods are just class attributes. This is not just a technical similarity - you can access and modify class methods at runtime by assigning callables to them.
Case 1. A module can easily define several classes. It's reasonable to encapsulate everything about class A into A... and everything about class B into B.... For example,
# module xxx
class X:
MAX_THREADS = 100
...
# main program
from xxx import X
if nthreads < X.MAX_THREADS: ...
Case 2. This class has lots of default attributes which can be modified in an instance. Here the ability to leave attribute to be a 'global default' is a feature, not bug.
class NiceDiff:
"""Formats time difference given in seconds into a form '15 minutes ago'."""
magic = .249
pattern = 'in {0}', 'right now', '{0} ago'
divisions = 1
# there are more default attributes
One creates instance of NiceDiff to use the existing or slightly modified formatting, but a localizer to a different language subclasses the class to implement some functions in a fundamentally different way and redefine constants:
class Разница(NiceDiff): # NiceDiff localized to Russian
'''Из разницы во времени, типа -300, делает конкретно '5 минут назад'.'''
pattern = 'через {0}', 'прям щас', '{0} назад'
Your cases:
constants -- yes, I put them to class. It's strange to say self.CONSTANT = ..., so I don't see a big risk for clobbering them.
Default attribute -- mixed, as above may go to class, but may also go to __init__ depending on the semantics.
Global data structure --- goes to class if used only by the class, but may also go to module, in either case must be very well-documented.
Class attributes are often used to allow overriding defaults in subclasses. For example, BaseHTTPRequestHandler has class constants sys_version and server_version, the latter defaulting to "BaseHTTP/" + __version__. SimpleHTTPRequestHandler overrides server_version to "SimpleHTTP/" + __version__.
Encapsulation is a good principle: when an attribute is inside the class it pertains to instead of being in the global scope, this gives additional information to people reading the code.
In your situations 1-4, I would thus avoid globals as much as I can, and prefer using class attributes, which allow one to benefit from encapsulation.