Python design pattern for magic methods and pandas inheritance - python

So I basically put this question for some advice
I have few classes which basically does some pandas operations and return a dataframe. But these dataframes needs addition and subtraction on some filter options. I planned to write a class to override __add__ and __sub__ methods, so that these dataframes are added or subtracted by my code which implements those filters. Below is a basic structure
import ArithOperation
class A:
def dataA(self, filenameA):
dfa = pd.read_excel(filenameA)
return ArithOperation(dfA)
class B:
def dataB(self, filenameB):
dfb = pd.read_excel(filenameB)
return ArithOperation(dfB)
dfA and dfB here are pandas dataframes.
class ArithOperation:
def __init__(self, df):
self.df = df
def __add__(self, other):
# here the filtering and manual addition is done
return ArithOperation(sumdf)
def __sub__(self, other):
# here the filtering and manual subtraction is done
return ArithOperation(deltadf)
Basically I do the calculation as below
dfa = A().dataA()
dfb = B().dataB()
sumdf = dfa+dfb
deltadf = dfa-dfb
But how do I make the sumdf and deltadf have default dataframe functions too. I know I should inherit dataframe class to ArithOperation but I am confused and bit uncomfortable of instantiating ArithOperation() at many places.
Is there a better design pattern for this problem?
Which class to inherit on ArithOperation so that I have all pandas dataframe functions too on ArithOperation object ?

It looks to me you want to customize the behaviour of an existent type (dataframes in this case, but it could be any type). Usually we want to alter some of the behaviour to a different one, or we want to extend that behaviour, or we want to do both. When I say behaviour think methods).
You can also choose between a has-a approach (that is wrapping an object, by creating a new class whose objects hold a reference to the original object. That way you can create several new, different or similar methods that make new things, eventually using some of the existing ones, by using the stored reference to invoke original methods. This way you kind of adapt the original class interface to a different one. This is known as a wrapper pattern (or adapter pattern).
That is what you have made. But then you face a problem: how do you accept all of the possible methods of the original class? You will have to rewrite all the possible methods (not pratical), just to delegate them on the original class, or you find a way of delegating them all except the few ones you override. I will not cover this last possibility, because you have inheritance at your disposal and that makes things like that quite straightforward.
Just inherit from the original class and you'll be able to create objects with the same behaviour as the parent class. If you need new data members, override __init__ and add those but don't forget to invoke the parent's class __init__, otherwise the object won't initialize properly. That's where you use super().__init__() like described below. If you want to add new methods, just add the usual defs to your child class. If you want to extend existant methods, do as described for the __init__ method. If you want to completely override a method just write your version with a def with the same name of the original.method, it will totally replace the original.
So in your case you want to inherit from dataframe class, you'll have to either choose if you need a custom __init__ or not. If you need define one but do not forget to call the parent's original inside it. Then define your custom methods, say a new __add__ and __sub__, that either replace or extend the original ones (with the same technique).
Be careful in not defining methods that you think are new, but actually existed in the original class, cause they will be overriden. This is an small inconvenience of inheriting, specially if the original has an interface with lots of methods.
Use super() to extend a parent's class behaviour
class A:
pass # has some original __init__
class B(A): # B inherits A
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs) # preserves parent initialization, passing the same received arguments. If you don't do this parent __init__ will be overrided, not initialising parent object properly.
# add your custom code
pass
The thing with super() is that it solves some problems that arise when using multiple inheritance (inherit from several classes), when determining which of the parents method should be called, if you have several with the same named method; it is recommended over calling the parent's method with SomeParentClass.method() inside the child class.
Also if you subclass a type because you need it, you shouldn't be afraid of using it "everywhere" as you said. Make sure you customize it for a good reason (so think if making this to dataframes is appropriate, or is there another simpler way of achieving the same objective without doing that; can't advise you here, I haven't experience with pandas), then use it instead of the original class.

Related

Is there a way to decorate a class injecting a parent class?

I have a base class A, and a decorator behavior. Both has different behaviors but sometimes it can be used at the same time.
There is to implement a new class decorator new_behavior that applies behavior and "inject" A as a parent class?
Something like this:
#new_behavior
class B:
...
So B will behave just like if it was declared like class B(A): but B also inhirts all #behavior behaviors?
Broadly speaking, by the time a decorator gets a chance to operate on a class, it's too late to change fundamental properties of the class, like its bases. But that doesn't necessarily mean you can't do what you want, it only rules out direct approaches.
You could have your decorator create a new class with the desired bases, and add the contents of the old class to the new one. But there are a lot of subtle details that might go wrong, like methods that don't play correctly with super and other stuff that make it somewhat challenging. I would not want to do this on a whim.
One possible option that might be simpler than most is to make a new class that inherits from both the class you're decorating, and the base class you want to add. That isn't exactly the same as injecting a base class as a base of the decorated, but it will usually wind up with the same MRO, and super should work just fine. Here's how I'd implement that:
def new_behavior(cls):
class NewClass(cls, A): # do the multiple inheritance by adding A here
pass
NewClass.__name__ = f'New{cls.__name__}' # should modify __qualname__ too
return NewClass
I'm not applying any other decorators in that code, but you could do that by changing the last line to return some_other_decorator(NewClass) or just applying the decorator to the class statement with #decorator syntax. In order to make introspection nicer, you might want to modify a few parameters of NewClass before returning it. I demonstrate altering the __name__ attribute, but you would probably also want to change __qualname__ (which I've skipped doing because it would be a bit more fiddly and annoying to get something appropriate), and maybe some others that I can't think of off the top of my head.

Inherit from multiple classes and change the return type of all methods of one parent class

I want to create a new class that inherits from two parent classes, for example my own object class and pandas DataFrame class. Now I want to overwrite the to_excel() method of pandas DataFrame to add some optical stuff.
class myObject(object):
# some stuff
pass
class myDataFrame(pandas.DataFrame, myObject):
def to_excel(self, *args, **kwargs):
super(myDataFrame, self).to_excel(*args, **kwargs)
# some additional things
return
The problem is: if an instance of myDataFrame is created, but an operation like the following is done:
a = myDataFrame(data=[1,2,3], index=["a", "b", "c"], columns=["Values"])
a = a.set_index("Values", drop=False)
Then set_index() will return an object of type pandas.DataFrame, not of type myDataFrame (of course I could use set_index(inplace=True), but thats not what I want). Using to_excel() now will, of course, not call my own method but pandas original method.
In other words, I want objects that are instances of myDataFrame to never change their type. The class myDataFrame should somehow overwrite all methods of pandas.DataFrame and change their returned type to myDataFrame, if nescessary. Doing this hard-coded by overwriting all methods by hand feels unpythonic to me.
What could be the smartest way to do this? I know of meta classes and decorators but somehow dont really understand how I have to handle them to achieve the goal. I also dont want to touch anything within the pandas module.
Thanks for hints!
See the Sub-classing pandas Data Structures section override constructor properties.
In short, also define a _constructor property that returns your subclass:
#property
def _constructor(self):
return type(self)
This specific property is responsible for:
_constructor: Used when a manipulation result has the same dimensions as the original.
Additional constructors: _constructor_sliced and _constructor_expanddim can be defined when an operation on the sub-classed structure results in it's dimensions getting changed.

Re-initializing parent of a class

I have become stuck on a problem with a class that I am writing where I need to be able to reinitialize the parents of that class after having created an instance of the class. The problem is that the parent class has a read and a write mode that is determined by passing a string to the init function. I want to be able to switch between these modes without destroying the object and re-initialising. Here is an example of my problem:
from parent import Parent
class Child(Parent):
def __init__(mode="w"):
super.__init__(mode=mode)
def switch_mode():
# need to change the mode called in the super function here somehow
The idea is to extend a class that I have imported from a module to offer extended functionality. The problem is I still need to be able to access the original class methods from the new extended object. This has all worked smoothly so far with me simply adding and overwriting methods as needed. As far as I can see the alternative is to use composition rather than inheritance so that the object I want to extend is created as a member of the new class. The problem with this is this requires me to make methods for accessing each of the object's methods
ie. lots of this sort of thing:
def read_frames(self):
return self.memberObject.read_frames()
def seek(self):
return self.memberObject.seek()
which doesn't seem all that fantastic and comes with the problem that if any new methods are added to the base class in the future I have to create new methods manually in order to access them, but is perhaps the only option?
Thanks in advance for any help!
This should work. super is a function.
super(Child, self).__init__(mode=mode)

Trouble trying to dynamically add methods to Python class (i.e. django-tables2 'Table')

So for a Django project, I would really like to be able to generate and display tables (not based on querysets) dynamically without needing to know the contents or schema beforehand.
It looks like the django-tables2 app provides nice functionality for rendering tables, but it requires that you either explicitly declare column names by declaring attributes on a custom-defined Table subclass or else provide a model for it infer the columns.
I.e, to use a column named "name", you'd do:
class NameTable(tables.Table):
name = tables.Column()
The Tables class does not provide a method for adding columns post-facto because, from reading the source, it seems to use a metaclass that sweeps the class attributes on __new__ and locks them in.
It seemed like very simple metaprogramming would be an elegant solution. I defined a basic class factory that accepts column names are arguments:
def define_table(columns):
class klass(tables.Table): pass
for col in columns:
setattr(klass, col, tables.Column())
return klass
Sadly this does not work. If I run `
x = define_table(["foo", "bar"])(data)
x.foo
x.bar
I get back:
<django_tables2.columns.base.Column object at 0x7f34755af5d0>
<django_tables2.columns.base.Column object at 0x7f347577f750>
But if I list the columns:
print x.base_columns
I get back nothing i.e. {}
I realize that there are probably simpler solutions (e.g. just bite the bullet and define every possible data configuration in code, or don't use django-tables2 and roll my own), but I am now treating this as an opportunity to learn more about meta programming, so I would really like to make this work this way.
Any idea what I'm wrong doing wrong? My theory is that the __new__ method (which is redefined in the metaclass Table uses) is getting invoked when klass is defined rather than when it's instantiated, so by the time I tack on the attributes it's too late. But that violates my understanding of when __new__ should happen. Otherwise, I'm struggling to understand how the metaclass __new__ can tell the difference between defined-in-code attributes vs. dynamically defined ones.
Thanks!
You're on the right track here, but instead of creating a barebones class and adding attributes to it, you should use the type() built-in function. The reason it's not working the way you're trying, is because the metaclass has already done its work.
Using type() allows you to construct a new class with your own attributes, while setting the base class. Meaning - you get to describe the fields you want as a blueprint to your class, allowing the Tables metaclass to take over after your definition.
Here's an example of using type() with django. I've used this myself for my own project (with some slight variations) but it should give you a nice place to start from, considering you're already almost there.
def define_table(columns):
attrs = dict((c, tables.Column()) for c in columns)
klass = type('DynamicTable', (tables.Table,), attrs)
return klass
You're confusing the __new__ of a "regular" class with the __new__ of a metaclass. As you note, Table relies on __new__ method on its metaclass. The metaclass is indeed invoked when the class is defined. The class is itself an instance of the metaclass, so defining the class is instantiating the metaclass. (In this case, Table is an instance of DeclarativeColumnMetaClass.) So by the time the class is define, it's too late.
One possible solution is to write a Table subclass that has some method refreshColumns or the like. You could adapt the code from DeclarativeColumnMetaclass.__new__ to essentially make refreshColumns do the same magic again. Then you could call refreshColumns() on your new class.

In Python, when should I use a meta class?

I have gone through this: What is a metaclass in Python?
But can any one explain more specifically when should I use the meta class concept and when it's very handy?
Suppose I have a class like below:
class Book(object):
CATEGORIES = ['programming','literature','physics']
def _get_book_name(self,book):
return book['title']
def _get_category(self, book):
for cat in self.CATEGORIES:
if book['title'].find(cat) > -1:
return cat
return "Other"
if __name__ == '__main__':
b = Book()
dummy_book = {'title':'Python Guide of Programming', 'status':'available'}
print b._get_category(dummy_book)
For this class.
In which situation should I use a meta class and why is it useful?
Thanks in advance.
You use metaclasses when you want to mutate the class as it is being created. Metaclasses are hardly ever needed, they're hard to debug, and they're difficult to understand -- but occasionally they can make frameworks easier to use. In our 600Kloc code base we've used metaclasses 7 times: ABCMeta once, 4x models.SubfieldBase from Django, and twice a metaclass that makes classes usable as views in Django. As #Ignacio writes, if you don't know that you need a metaclass (and have considered all other options), you don't need a metaclass.
Conceptually, a class exists to define what a set of objects (the instances of the class) have in common. That's all. It allows you to think about the instances of the class according to that shared pattern defined by the class. If every object was different, we wouldn't bother using classes, we'd just use dictionaries.
A metaclass is an ordinary class, and it exists for the same reason; to define what is common to its instances. The default metaclass type provides all the normal rules that make classes and instances work the way you're used to, such as:
Attribute lookup on an instance checks the instance followed by its class, followed by all superclasses in MRO order
Calling MyClass(*args, **kwargs) invokes i = MyClass.__new__(MyClass, *args, **kwargs) to get an instance, then invokes i.__init__(*args, **kwargs) to initialise it
A class is created from the definitions in a class block by making all the names bound in the class block into attributes of the class
Etc
If you want to have some classes that work differently to normal classes, you can define a metaclass and make your unusual classes instances of the metaclass rather than type. Your metaclass will almost certainly be a subclass of type, because you probably don't want to make your different kind of class completely different; just as you might want to have some sub-set of Books behave a bit differently (say, books that are compilations of other works) and use a subclass of Book rather than a completely different class.
If you're not trying to define a way of making some classes work differently to normal classes, then a metaclass is probably not the most appropriate solution. Note that the "classes define how their instances work" is already a very flexible and abstract paradigm; most of the time you do not need to change how classes work.
If you google around, you'll see a lot of examples of metaclasses that are really just being used to go do a bunch of stuff around class creation; often automatically processing the class attributes, or finding new ones automatically from somewhere. I wouldn't really call those great uses of metaclasses. They're not changing how classes work, they're just processing some classes. A factory function to create the classes, or a class method that you invoke immediately after class creation, or best of all a class decorator, would be a better way to implement this sort of thing, in my opinion.
But occasionally you find yourself writing complex code to get Python's default behaviour of classes to do something conceptually simple, and it actually helps to step "further out" and implement it at the metaclass level.
A fairly trivial example is the "singleton pattern", where you have a class of which there can only be one instance; calling the class will return an existing instance if one has already been created. Personally I am against singletons and would not advise their use (I think they're just global variables, cunningly disguised to look like newly created instances in order to be even more likely to cause subtle bugs). But people use them, and there are huge numbers of recipes for making singleton classes using __new__ and __init__. Doing it this way can be a little irritating, mainly because Python wants to call __new__ and then call __init__ on the result of that, so you have to find a way of not having your initialisation code re-run every time someone requests access to the singleton. But wouldn't be easier if we could just tell Python directly what we want to happen when we call the class, rather than trying to set up the things that Python wants to do so that they happen to do what we want in the end?
class Singleton(type):
def __init__(self, *args, **kwargs):
super(Singleton, self).__init__(*args, **kwargs)
self.__instance = None
def __call__(self, *args, **kwargs):
if self.__instance is None:
self.__instance = super(Singleton, self).__call__(*args, **kwargs)
return self.__instance
Under 10 lines, and it turns normal classes into singletons simply by adding __metaclass__ = Singleton, i.e. nothing more than a declaration that they are a singleton. It's just easier to implement this sort of thing at this level, than to hack something out at the class level directly.
But for your specific Book class, it doesn't look like you have any need to do anything that would be helped by a metaclass. You really don't need to reach for metaclasses unless you find the normal rules of how classes work are preventing you from doing something that should be simple in a simple way (which is different from "man, I wish I didn't have to type so much for all these classes, I wonder if I could auto-generate the common bits?"). In fact, I have never actually used a metaclass for something real, despite using Python every day at work; all my metaclasses have been toy examples like the above Singleton or else just silly exploration.
A metaclass is used whenever you need to override the default behavior for classes, including their creation.
A class gets created from the name, a tuple of bases, and a class dict. You can intercept the creation process to make changes to any of those inputs.
You can also override any of the services provided by classes:
__call__ which is used to create instances
__getattribute__ which is used to lookup attributes and methods on a class
__setattr__ which controls setting attributes
__repr__ which controls how the class is diplayed
In summary, metaclasses are used when you need to control how classes are created or when you need to alter any of the services provided by classes.
If you for whatever reason want to do stuff like Class[x], x in Class etc., you have to use metaclasses:
class Meta(type):
def __getitem__(cls, x):
return x ** 2
def __contains__(cls, x):
return int(x ** (0.5)) == x ** 0.5
# Python 2.x
class Class(object):
__metaclass__ = Meta
# Python 3.x
class Class(metaclass=Meta):
pass
print Class[2]
print 4 in Class
check the link Meta Class Made Easy to know how and when to use meta class.

Categories

Resources