Overview
I need to duplicate a whole inheritance tree of classes. Simply deep-copying the class objects does not work; a proper factory pattern involves a huge amount of code changes; I'm not sure how to use metaclasses to accomplish this.
Background
The software I work on implements support for specialized external hardware, connected to the host computer via USB. Many years ago, it was assumed that there would only ever be one type of hardware in use at a time. Consequently, the hardware object is used as a singleton. Along the years, secondary classes were configured based on the currently active hardware class.
At the moment, it is impossible to use this library with two types of hardware at the same time, since the classobjects cannot be configured for both hardware at the same time.
In recent years, we have avoided this issue by creating one python process for each hardware, but this is becoming untenable.
Here is an extremely simplified example of the architecture:
# ----------
# Hardware classes
class HwBase():
def customizeComponent(self, compDict):
compDict['ComponentBase'].hardware = self
class HwA(HwBase):
def customizeComponent(self, compDict):
super().customizeComponent(compDict)
compDict['AnotherComponent'].prop.configure(1,2,3)
class HwB(HwBase):
def customizeComponent(self, compDict):
super().customizeComponent(compDict)
compDict['AnotherComponent'].prop.configure(4,5,6)
# ----------
# Property classes
class SpecialProperty(property):
def __init__(self, fvalidate):
self.fvalidate = fvalidate
# handle fset, fget, etc. here.
# super().__init__()
# ----------
# Component classes
class ComponentBase():
hardware = None
def validateProp(self, val):
return val < self.maxVal
prop = SpecialProperty(fvalidate=validateProp)
class SomeComponent():
"""Users directly instantiate and use this compoent via an interactive shell.
This component does complex operations with the hardware attribute"""
def validateThing(self, val):
return isinstance(val, ComponentBase)
thing = SpecialProperty(fvalidate=validateThing)
class AnotherComponent():
"""Users directly instantiate and use this compoent via an interactive shell
This component does complex operations with the hardware attribute"""
maxVal = 15
# ----------
# Initialization
def initialize():
""" This is only called once perppython instance."""
#activeCls = HwA
activeCls = HwB
allComponents = {
'ComponentBase': ComponentBase,
'SomeComponent': SomeComponent,
'AnotherComponent': AnotherComponent
}
hwInstance = activeCls()
hwInstance.customizeComponent(allComponents)
return allComponents
components = initialize()
# ----------
# User code goes here
someInstance1 = components['SomeComponent']()
someInstance2 = components['SomeComponent']()
someInstance1.prop = 10
someInstance2.prop = 10
The overarching goal would be to interact with both HwA and HwB at the same time. Since most interactions are done via components instead of the Hw objects themselves, I believe the solution involves having multiple versions of the components, e.g.: two separate inheritance trees, for a total of 6 final components, one tree/set configured for each hardware. This is what I need help with.
Potential solutions
Consider that I have around tens different hardware do configure for. Furthermore, there are hundreds of different leaf components classes, with many extra bases and mixin classes.
Move all configuration steps in the component's init method
Not possible due to the use of properties; these need to be set on the class.
Deepcopy the classobjects
Copy all classobjects, swap in the appropriate __bases__. Mutable class variables need to be carefully handled. However, I'm not sure how to deal with properties for this, since classbody references within the property objects (such as fvalidate) need to be updated to that of the copied class.
This requires a significant amount of manual intervention to work. Not impossible, but prone to breaking in the long term.
Factory pattern
Wrap all component definition in a factory function:
def ComponentBaseFactory(hw):
class SomeComponent(cache[hw].ComponentBase):
pass
and have some sort of component cache which would handle creating all classobjects during initialize()
This is what I consider the most architecturally-correct option available. Since the class body is re-executed
on every factory call, the attributes of the properties will reference the appropriate class object.
Downside: huge code footprint. I am familiar with doing codebase-wide changes via sed or python scripts, but this would be quite a lot.
Add metaclasses on components
I am not sure how to proceed for this. Based on the python data model (py3.7), the following happens at class creation (which happens right after the class definition indentation ends):
MRO entries are resolved;
the appropriate metaclass is determined;
the class namespace is prepared;
the class body is executed;
the class object is created.
I would need to redo these steps after the class has been defined (like a factory function!), but i'm not sure how to redo step 4. Specifically, the python documentation states in section 3.3.3.5 that the class body is executed as with a "special?" form of the exec() builtin. How can I re-exec the class body with a different set of locals/globals? Even if I access the class body's code with inspect shenanigans, i'm not sure i'll be able to reproduce the module environment properly.
Even if I mess with __prepare__ and __new__, I don't see how I can fix the cross-references introduced in the class code block regarding the property instantiation.
Components as metaclasses
A metaclass is a class factory, just like a class is an object factory. SomeComponent and AnotherComponent could be declared as metaclasses, then get instantiated with the Hw object during initialize():
SomeComponent = SomeComponentMeta(hw)
This is similar to the factory pattern, but would also require quite a few code changes: a lot of class code would have to be moved to the metaclass __init__.
I'd have to spend a lot more of time here to proper understand what you need, but if your "TL;DR" of executing the class body with different globals/nonlocal variables is the bottom line, the factory approach is a very clean and readable way, as you had considered.
At first, I don't think a metaclass could be a good approach here - although it could be used to customize your special properties (in my first read, I could not figure out what they actually do, and how they should differ between your final classes). If the function as a class factory can specialize your properties, it would work nonetheless.
If what you need is that the properties are independent for Hwa and HwB like in accessing a different list object in HwA than is accessed in HwB, yes, a metaclass could take care of that, by automatically recreating any properties when creating a subclass (so that the property objects themselves are not shared with the supper-classes and across the hierarchy).
If that i what you need, leave a comment, I can write some proof of concept code.
Anyway, it is possible to create a metaclass that, upon instantiating a subclass, will look upon the hierarchy for all SpecialProperty and create new-instances of those for the subclass - so that a base value set on a superclass remains valid for the subclasses, but when configuration runs, each class will have an independent configuration. (as it turns out, no metaclass is needed: we are covered by __init_subclass__ )
Another thing to take care of is that subclassses of property cannot be simply copies with Python's copy.copy (tested empirically), so we need a way to create reliable copies of those. I include one function bellow, but it might need to be improved to work with the actual SpecialProperty class.
from copy import copy
def copy_property(prop):
cls = prop.__class__
new_prop = cls.__new__(cls)
# Initialize the attributes that can't be set from Python code, inplace:
property.__init__(new_prop, prop.fget, prop.fset, prop.fdel)
if hasattr(prop, "__dict__"): # only exists for subclasses of property
# Possible adaptation needed: it may be that for some attributes of
# SpecialProperty, a deepcopy would be needed.
# But for the given example attribute of "fvalidate" a simple copy is better:
new_prop.__dict__ = copy(prop.__dict__)
return new_prop
# Python 3.6 introduced `__init_subclass__` which is called at subclass _creation_
# time. With it, the logic can be inserted in ComponentBase and there is no need for
# a metaclass.
class ComponentBase():
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
for attrname in dir(cls):
attr = getattr(cls, attrname)
if not isinstance(attr, SpecialProperty):
continue
new_prop = copy_property(attr)
setattr(cls, attrname, new_prop)
hardware = None
...
As you see- theres some workarounds that had to be done because your project opted for subclassing property. I am leaving this remark here as a remainder that unless property fits one exact needs, it is more clean to write a new class implementing the Descriptor Protocol - just by implementing __set__, __get__ and __delete__ directly.
Related
In a nutshell, I receive json events via an API and recently I've been learning a lot more about classes. One of the recommended ways to use classes is to implement getters, setters etc.. However, my classes aren't too sophisticated all they're doing is parsing data from a json object and passing better formatted data onto further ETL processes.
Below is a simple example of what I've encountered.
data = {'status': 'ready'}
class StatusHandler:
def __init__(self, data):
self.status = data.get('status', None)
class StatusHandler2:
def __init__(self, data):
self._status = data.get('status', None)
#property
def status(self):
return self._status
without_getter = StatusHandler(data)
print(without_getter.status)
with_getter = StatusHandler2(data)
print(with_getter.status)
Is there anything wrong with me using the class StatusHandler and referencing a status instance variable and using that to pass information forward to other bits of code? I'm just wondering if further down the line as my project gets more complicated that this would be an issue as it doesn't seem to be standard although I could be wrong...
The point of getters/setters is to avoid replacing plain attributes access with computed ones without breaking client code if and when you have to change your implementation. This only make sense for languages that have no support for computed attributes.
Python has a quite strong support for computed attributes thru the descriptor protocol, including the generic builtin property type, so you don't need explicit getters/setters - if you have to change your implementation, just replace affected public attributes by computed ones.
Just make sure to not abuse computed attributes - they should not make any heavy computation, external resource access or so. No one expects what looks like an attribute to have a high cost or raise IOErrors or so ;-)
EDIT
With regard to your example: computed attributes are a way to control attribute access, and making an attribute read-only (not providing a setter for your property) IS a perfectly valid use case - IF you have a reason to make it read-only of course.
I've created an example class (a bitmask class) which has 4 really simple functions. I've also created a unit-test for this class.
import unittest
class BitMask:
def __init__(self):
self.__mask = 0
def set(self, slot):
self.__mask |= (1 << slot)
def remove(self, slot):
self.__mask &= ~(1 << slot)
def has(self, slot):
return (self.__mask >> slot) & 1
def clear(self):
self.__mask = 0
class TestBitmask(unittest.TestCase):
def setUp(self):
self.bitmask = BitMask()
def test_set_on_valid_input(self):
self.bitmask.set(5)
self.assertEqual(self.bitmask.has(5), True)
def test_has_on_valid_input(self):
self.bitmask.set(5)
self.assertEqual(self.bitmask.has(5), True)
def test_remove_on_valid_input(self):
self.bitmask.set(5)
self.bitmask.remove(5)
self.assertEqual(self.bitmask.has(5), False)
def test_clear(self):
for i in range(16):
self.bitmask.set(i)
self.bitmask.clear()
for j in range(16):
with self.subTest(j=j):
self.assertEqual(self.bitmask.has(j), False)
The problem I'm facing is that all these tests requires both the set and has methods for setting and checking values in the bitmask, but these methods are untested. I cannot confirm that one is correct without knowing that the other one is.
This example class isn't the first time I've experienced this issue. It usually occurs when I need to set up and check values/states within a class in order to test a method.
I've tried to find resources that explain this, but unfortunately their examples only use pure functions or where the changed attribute can be read directly. I could solve the problem by extracting the methods to be pure functions, or using a read-only property that returns the attribute __mask.
But is this the preferred approach? If not, how do I test a method that needs to be set up and/or checked using untested methods?
Not sure this answers your question, as it deals with changing of initial class design,
but here it goes.
You make a lazy class with no constructor or property , which hides the state of your
object. It is not the set or has methods that are untested, it is the issue of
state of the object being unknown. Have you had a .value property to reveal
self.__mask, this would solve a question of testing .set() and has().
Also I would strongly consider a default value in constructor, which makes it a better-looking
instantination and allows easier testing (some advice on avoiding setters in python is here).
def __init__(self, mask=0):
self.__mask = mask
If there any design considerations that prevent you from having a .value property,
perhaps an `__eq__ method can be used, if __init__ accepts a value.
a = BitMask(0)
b = BitMask(5)
a.set(5)
assert a == b
Of course, you can challenge that on how is __eq__tested itself.
Finally, perhaps you are failiar with patching or monkey-patching - a
technique to block something inside a object under test or make it work differently
(eg imitate web response without actual call). With any of the libraries for pathcing
I think you would still endup-performing a kind of x.__mask = value assignment, which
is not too reasonable for a small, nice, and locally-defined class like one here.
Hope it helps in line of what you are exploring.
I would’ve used single underscore instead of double, and just looked directly at the _mask in unit test.
Python doesn’t really have private attributes or methods, even double underscore attributes are accessible on your instance like this: obj._BitMask__mask.
Double underscore is used when you want subclasses to not overwrite the attribute of superclass. To indicate “private” you should use single underscore.
Allowing access to private fields is a part of python's design, so using this ability responsibly is not considered wrong, doubly so if you are accessing your own class.
The rationale behind "Do not touch the private fields" is that you as the developer can mess something up with the internals of the class, also private interface of s library can change at any point and break your code.
When you are writing unit tests you are not afraid of messing with your own class, and is accepting that you have to change unit test if you change your class, so this programming idiom is not useful for you to apply.
I have gone through this: What is a metaclass in Python?
But can any one explain more specifically when should I use the meta class concept and when it's very handy?
Suppose I have a class like below:
class Book(object):
CATEGORIES = ['programming','literature','physics']
def _get_book_name(self,book):
return book['title']
def _get_category(self, book):
for cat in self.CATEGORIES:
if book['title'].find(cat) > -1:
return cat
return "Other"
if __name__ == '__main__':
b = Book()
dummy_book = {'title':'Python Guide of Programming', 'status':'available'}
print b._get_category(dummy_book)
For this class.
In which situation should I use a meta class and why is it useful?
Thanks in advance.
You use metaclasses when you want to mutate the class as it is being created. Metaclasses are hardly ever needed, they're hard to debug, and they're difficult to understand -- but occasionally they can make frameworks easier to use. In our 600Kloc code base we've used metaclasses 7 times: ABCMeta once, 4x models.SubfieldBase from Django, and twice a metaclass that makes classes usable as views in Django. As #Ignacio writes, if you don't know that you need a metaclass (and have considered all other options), you don't need a metaclass.
Conceptually, a class exists to define what a set of objects (the instances of the class) have in common. That's all. It allows you to think about the instances of the class according to that shared pattern defined by the class. If every object was different, we wouldn't bother using classes, we'd just use dictionaries.
A metaclass is an ordinary class, and it exists for the same reason; to define what is common to its instances. The default metaclass type provides all the normal rules that make classes and instances work the way you're used to, such as:
Attribute lookup on an instance checks the instance followed by its class, followed by all superclasses in MRO order
Calling MyClass(*args, **kwargs) invokes i = MyClass.__new__(MyClass, *args, **kwargs) to get an instance, then invokes i.__init__(*args, **kwargs) to initialise it
A class is created from the definitions in a class block by making all the names bound in the class block into attributes of the class
Etc
If you want to have some classes that work differently to normal classes, you can define a metaclass and make your unusual classes instances of the metaclass rather than type. Your metaclass will almost certainly be a subclass of type, because you probably don't want to make your different kind of class completely different; just as you might want to have some sub-set of Books behave a bit differently (say, books that are compilations of other works) and use a subclass of Book rather than a completely different class.
If you're not trying to define a way of making some classes work differently to normal classes, then a metaclass is probably not the most appropriate solution. Note that the "classes define how their instances work" is already a very flexible and abstract paradigm; most of the time you do not need to change how classes work.
If you google around, you'll see a lot of examples of metaclasses that are really just being used to go do a bunch of stuff around class creation; often automatically processing the class attributes, or finding new ones automatically from somewhere. I wouldn't really call those great uses of metaclasses. They're not changing how classes work, they're just processing some classes. A factory function to create the classes, or a class method that you invoke immediately after class creation, or best of all a class decorator, would be a better way to implement this sort of thing, in my opinion.
But occasionally you find yourself writing complex code to get Python's default behaviour of classes to do something conceptually simple, and it actually helps to step "further out" and implement it at the metaclass level.
A fairly trivial example is the "singleton pattern", where you have a class of which there can only be one instance; calling the class will return an existing instance if one has already been created. Personally I am against singletons and would not advise their use (I think they're just global variables, cunningly disguised to look like newly created instances in order to be even more likely to cause subtle bugs). But people use them, and there are huge numbers of recipes for making singleton classes using __new__ and __init__. Doing it this way can be a little irritating, mainly because Python wants to call __new__ and then call __init__ on the result of that, so you have to find a way of not having your initialisation code re-run every time someone requests access to the singleton. But wouldn't be easier if we could just tell Python directly what we want to happen when we call the class, rather than trying to set up the things that Python wants to do so that they happen to do what we want in the end?
class Singleton(type):
def __init__(self, *args, **kwargs):
super(Singleton, self).__init__(*args, **kwargs)
self.__instance = None
def __call__(self, *args, **kwargs):
if self.__instance is None:
self.__instance = super(Singleton, self).__call__(*args, **kwargs)
return self.__instance
Under 10 lines, and it turns normal classes into singletons simply by adding __metaclass__ = Singleton, i.e. nothing more than a declaration that they are a singleton. It's just easier to implement this sort of thing at this level, than to hack something out at the class level directly.
But for your specific Book class, it doesn't look like you have any need to do anything that would be helped by a metaclass. You really don't need to reach for metaclasses unless you find the normal rules of how classes work are preventing you from doing something that should be simple in a simple way (which is different from "man, I wish I didn't have to type so much for all these classes, I wonder if I could auto-generate the common bits?"). In fact, I have never actually used a metaclass for something real, despite using Python every day at work; all my metaclasses have been toy examples like the above Singleton or else just silly exploration.
A metaclass is used whenever you need to override the default behavior for classes, including their creation.
A class gets created from the name, a tuple of bases, and a class dict. You can intercept the creation process to make changes to any of those inputs.
You can also override any of the services provided by classes:
__call__ which is used to create instances
__getattribute__ which is used to lookup attributes and methods on a class
__setattr__ which controls setting attributes
__repr__ which controls how the class is diplayed
In summary, metaclasses are used when you need to control how classes are created or when you need to alter any of the services provided by classes.
If you for whatever reason want to do stuff like Class[x], x in Class etc., you have to use metaclasses:
class Meta(type):
def __getitem__(cls, x):
return x ** 2
def __contains__(cls, x):
return int(x ** (0.5)) == x ** 0.5
# Python 2.x
class Class(object):
__metaclass__ = Meta
# Python 3.x
class Class(metaclass=Meta):
pass
print Class[2]
print 4 in Class
check the link Meta Class Made Easy to know how and when to use meta class.
I have a number of atomic classes (Components/Mixins, not really sure what to call them) in a library I'm developing, which are meant to be subclassed by applications. This atomicity was created so that applications can only use the features that they need, and combine the components through multiple inheritance.
However, sometimes this atomicity cannot be ensured because some component may depend on another one. For example, imagine I have a component that gives a graphical representation to an object, and another component which uses this graphical representation to perform some collision checking. The first is purely atomic, however the latter requires that the current object already subclassed this graphical representation component, so that its methods are available to it. This is a problem, because we have to somehow tell the users of this library, that in order to use a certain Component, they also have to subclass this other one. We could make this collision component sub class the visual component, but if the user also subclasses this visual component, it wouldn't work because the class is not on the same level (unlike a simple diamond relationship, which is desired), and would give the cryptic meta class errors which are hard to understand for the programmer.
Therefore, I would like to know if there is any cool way, through maybe metaclass redefinition or using class decorators, to mark these unatomic components, and when they are subclassed, the additional dependency would be injected into the current object, if its not yet available. Example:
class AtomicComponent(object):
pass
#depends(AtomicComponent) # <- something like this?
class UnAtomicComponent(object):
pass
class UserClass(UnAtomicComponent): #automatically includes AtomicComponent
pass
class UserClass2(AtomicComponent, UnAtomicComponent): #also works without problem
pass
Can someone give me an hint on how I can do this? or if it is even possible...
edit:
Since it is debatable that the meta class solution is the best one, I'll leave this unaccepted for 2 days.
Other solutions might be to improve error messages, for example, doing something like UserClass2 would give an error saying that UnAtomicComponent already extends this component. This however creates the problem that it is impossible to use two UnAtomicComponents, given that they would subclass object on different levels.
"Metaclasses"
This is what they are for! At time of class creation, the class parameters run through the
metaclass code, where you can check the bases and change then, for example.
This runs without error - though it does not preserve the order of needed classes
marked with the "depends" decorator:
class AutoSubclass(type):
def __new__(metacls, name, bases, dct):
new_bases = set()
for base in bases:
if hasattr(base, "_depends"):
for dependence in base._depends:
if not dependence in bases:
new_bases.add(dependence)
bases = bases + tuple(new_bases)
return type.__new__(metacls, name, bases, dct)
__metaclass__ = AutoSubclass
def depends(*args):
def decorator(cls):
cls._depends = args
return cls
return decorator
class AtomicComponent:
pass
#depends(AtomicComponent) # <- something like this?
class UnAtomicComponent:
pass
class UserClass(UnAtomicComponent): #automatically includes AtomicComponent
pass
class UserClass2(AtomicComponent, UnAtomicComponent): #also works without problem
pass
(I removed inheritance from "object", as I declared a global __metaclass__ variable. All classs will still be new style class and have this metaclass. Inheriting from object or another class does override the global __metaclass__variable, and a class level __metclass__ will have to be declared)
-- edit --
Without metaclasses, the way to go is to have your classes to properly inherit from their dependencies. Tehy will no longer be that "atomic", but, since they could not work being that atomic, it may be no matter.
In the example bellow, classes C and D would be your User classes:
>>> class A(object): pass
...
>>> class B(A, object): pass
...
>>>
>>> class C(B): pass
...
>>> class D(B,A): pass
...
>>>
I'm interested in hearing some discussion about class attributes in Python. For example, what is a good use case for class attributes? For the most part, I can not come up with a case where a class attribute is preferable to using a module level attribute. If this is true, then why have them around?
The problem I have with them, is that it is almost too easy to clobber a class attribute value by mistake, and then your "global" value has turned into a local instance attribute.
Feel free to comment on how you would handle the following situations:
Constant values used by a class and/or sub-classes. This may include "magic number" dictionary keys or list indexes that will never change, but possible need one-time initialization.
Default class attribute, that in rare occasions updated for a special instance of the class.
Global data structure used to represent an internal state of a class shared between all instances.
A class that initializes a number of default attributes, not influenced by constructor arguments.
Some Related Posts:
Difference Between Class and Instance Attributes
#4:
I never use class attributes to initialize default instance attributes (the ones you normally put in __init__). For example:
class Obj(object):
def __init__(self):
self.users = 0
and never:
class Obj(object):
users = 0
Why? Because it's inconsistent: it doesn't do what you want when you assign anything but an invariant object:
class Obj(object):
users = []
causes the users list to be shared across all objects, which in this case isn't wanted. It's confusing to split these into class attributes and assignments in __init__ depending on their type, so I always put them all in __init__, which I find clearer anyway.
As for the rest, I generally put class-specific values inside the class. This isn't so much because globals are "evil"--they're not so big a deal as in some languages, because they're still scoped to the module, unless the module itself is too big--but if external code wants to access them, it's handy to have all of the relevant values in one place. For example, in module.py:
class Obj(object):
class Exception(Exception): pass
...
and then:
from module import Obj
try:
o = Obj()
o.go()
except o.Exception:
print "error"
Aside from allowing subclasses to change the value (which isn't always wanted anyway), it means I don't have to laboriously import exception names and a bunch of other stuff needed to use Obj. "from module import Obj, ObjException, ..." gets tiresome quickly.
what is a good use case for class attributes
Case 0. Class methods are just class attributes. This is not just a technical similarity - you can access and modify class methods at runtime by assigning callables to them.
Case 1. A module can easily define several classes. It's reasonable to encapsulate everything about class A into A... and everything about class B into B.... For example,
# module xxx
class X:
MAX_THREADS = 100
...
# main program
from xxx import X
if nthreads < X.MAX_THREADS: ...
Case 2. This class has lots of default attributes which can be modified in an instance. Here the ability to leave attribute to be a 'global default' is a feature, not bug.
class NiceDiff:
"""Formats time difference given in seconds into a form '15 minutes ago'."""
magic = .249
pattern = 'in {0}', 'right now', '{0} ago'
divisions = 1
# there are more default attributes
One creates instance of NiceDiff to use the existing or slightly modified formatting, but a localizer to a different language subclasses the class to implement some functions in a fundamentally different way and redefine constants:
class Разница(NiceDiff): # NiceDiff localized to Russian
'''Из разницы во времени, типа -300, делает конкретно '5 минут назад'.'''
pattern = 'через {0}', 'прям щас', '{0} назад'
Your cases:
constants -- yes, I put them to class. It's strange to say self.CONSTANT = ..., so I don't see a big risk for clobbering them.
Default attribute -- mixed, as above may go to class, but may also go to __init__ depending on the semantics.
Global data structure --- goes to class if used only by the class, but may also go to module, in either case must be very well-documented.
Class attributes are often used to allow overriding defaults in subclasses. For example, BaseHTTPRequestHandler has class constants sys_version and server_version, the latter defaulting to "BaseHTTP/" + __version__. SimpleHTTPRequestHandler overrides server_version to "SimpleHTTP/" + __version__.
Encapsulation is a good principle: when an attribute is inside the class it pertains to instead of being in the global scope, this gives additional information to people reading the code.
In your situations 1-4, I would thus avoid globals as much as I can, and prefer using class attributes, which allow one to benefit from encapsulation.