I have a module where I try to follow the SOLID principles to create and generate data and I think the following is based around the Liskov Substitution Principle:
class BaseLoader(ABC):
def __init__(self, dataset_name='mnist'):
self.dataset_name=dataset_name
class MNISTLoader(BaseLoader):
def load(self):
# Logic for loading the data
pass
class OCTMNISTLoader(Baseloader):
def download(self):
# Logic for downloading the data
pass
Now I want to create an instance based on a parsed argument or a loaded config file, I wonder if the following is the best practice or if better ways exist to create dynamically an instance:
possible_instances = {'mnist': MNISTLoader, 'octmnist': OCTMNISTLoader}
choosen_dataset = 'mnist'
instance = possible_instances[choosen_dataset](dataset_name=choosen_dataset)
EDIT #1:
We also thought about using a function to call the classes dynamically. This function is than placed inside the module, which includes the classes:
def get_loader(loader_name:str) -> BaseLoader:
loaders = {
'mnist': MNISTLoader,
'octmnist': OCTMNISTLoader
}
try:
return loaders[loader_name]
except KeyError as err:
raise CustomError("good error message")
I am still not shure which is the most pythonic way to solve this.
I wouldn't say this has much to do with LSP since both classes inherit only from an abstract base class that never gets instantiated. You are simply sharing a dataset_name member of the base class to reduce code duplication. And forget about the default value in argument dataset_name='mnist' it has no point the way you are using it.
I don't know much about Python, but to instantiate a class from a string you would usually want to have a factory class where you would use whatever ugly method you come up with to map a string to an instance of a matching class, perhaps a simple if/elif. The factory method could contain something like this.
if loader_name == "mnist":
return MNISTLoader(loader_name)
elif loader_name == "octmnist":
return OCTMNISTLoader(loader_name)
# but you probably don't need to pass loader_name argument
While the above is not an elegant solution, you now have a reusable class with a method that is the single source of truth for how strings should map to type.
Also, you may remove the dataset_name init argument and member, unless you have a reason for objects to hold that information. I guess you tried to use the base init method like a factory but that's not the way to go.
One problem your code has is the methods "load" and "download", how will you know which one of the two methods to call if you fetch instances dynamically? Again, if you named them differently because you thought this had something to do with LSP, it doesn't. You are losing the advantage of polymorphism by having two methods with different names. Just have both classes extend the same abstract method (since you're using the ABC thing) "load" and be done with it. Then you can do whatever_the_object_type_is.load()
Related
I want to define an abstract base class, called ParentClass. I want every child class of ParentClass to have a method called "fit" that defines a property "required". If someone tries to create a child class which does not have the "required" property within its fit method, I want an error to be created when the object's fit method is called. I am having trouble doing this.
The context is that I want to create a Parent class, abstract or otherwise, that requires its children to behave in a certain way and have certain properties so that I can trust them to behave in certain ways no matter who is creating children classes. I have found similar questions, but nothing precisely like what I am asking.
My naive attempt was something like the following:
class ParentClass(ABC):
#abstractmethod
def fit(self):
self.required = True
class ChildClass(ParentClass):
def __init__(self):
pass
def fit(self):
self.required = True
class ChildClass2(ParentClass):
def __init__(self):
pass
def fit(self):
self.not_essential = True
This doesn't work, but if possible I would like to refactor ParentClass in such a way that if someone runs:
>> b = ChildClass()
>> b.fit()
everything works fine, but if someone tries to run
>> b2 = ChildClass2()
>> b2.fit()
an error is thrown because the fit method of ChildClass2 doesn't define "required".
Is this possible in Python?
A related question is whether there is a better way to think about structuring my problem. Perhaps there is a better paradigm to achieve what I want? I understand that I can force child classes to have certain methods defined. A clunky way to achieve what I want is to have every property I want to be defined to be returned by a required method, but this feels very clunky, particularly if the number of properties I want to enforce as part of a standard becomes rather large.
I was looking into the following code.
On many occasions the __init__ method is not really used but there is a custom initialize function like in the following example:
def __init__(self):
pass
def initialize(self, opt):
# ...
This is then called as:
data_loader = CustomDatasetDataLoader()
# other instance method is called
data_loader.initialize(opt)
I see the problem that variables, that are used in other instance methods, could still be undefined, if one forgets to call this custom initialize function. But what are the benefits of this approach?
Some APIs out in the wild (such as inside setuptools) have similar kind of thing and they use it to their advantage. The __init__ call could be used for the low level internal API while public constructors are defined as classmethods for the different ways that one might construct objects. For instance, in pkg_resources.EntryPoint, the way to create instances of this class is to make use of the parse classmethod. A similar way can be followed if a custom initialization is desired
class CustomDatasetDataLoader(object):
#classmethod
def create(cls):
"""standard creation"""
return cls()
#classmethod
def create_with_initialization(cls, opt):
"""create with special options."""
inst = cls()
# assign things from opt to cls, like
# inst.some_update_method(opt.something)
# inst.attr = opt.some_attr
return inst
This way users of the class will not need two lines of code to do what a single line could do, they can just simply call CustomDatasetDataLoader.create_with_initialization(some_obj) if that is what they want, or call the other classmethod to construct an instance of this class.
Edit: I see, you had an example linked (wish underlining links didn't go out of fashion) - that particular usage and implementation I feel is a poor way, when a classmethod (or just rely on the standard __init__) would be sufficient.
However, if that initialize function were to be an interface with some other system that receives an object of a particular type to invoke some method with it (e.g. something akin to the visitor pattern) it might make sense, but as it is it really doesn't.
In Python, I have a class that I've built.
However, there is one method where I apply a rather specific type of substring-search procedure. This procedure could be a standalone function by itself (it just requires a needle a haystack string), but it feels odd to have the function outside the class, because my class depends on it.
What is the typical design paradigm for this? Is it typical to just have myClassName.py with the main class, as well as all the support functions outside the class itself, in the same file? Or is it better to have the support function embedded within the class at the expense of modularity?
You can create a staticmethod, like so:
class yo:
#staticmethod
def say_hi():
print "Hi there!"
Then, you can do this:
>>> yo.say_hi()
Hi there!
>>> a = yo()
>>> a.say_hi()
Hi there!
They can be used non-statically, and statically (if that makes sense).
About where to put your functions...
If a method is required by a class, and it is appropriate for the method to perform data that is specific to the class, then make it a method. This is what you would want:
class yo:
self.message = "Hello there!"
def say_message(self):
print self.message
My say_message relies on the data that is particular to the instance of a class.
If you feel the need to have a function, in addition to the class method, by all means go ahead. Use whichever one is more appropriate in your script. There are many examples of this, including in the python built-ins. Take generator objects for example:
a = my_new_generator()
a.next()
Can also be done as:
a = my_new_generator()
next(a)
Use whichever is more appropriate, and obviously whichever one is more readable. :)
If you can think or any reason to override this function one day, make it a staticmethod, else a plain function is just ok - FWIW, your class probably depends on much more than this simple function. And if you cannot think of any reason for anyone else to ever use this function, keep it in the same module as your class.
As a side note: "myClassName.py" is definitly unpythonic. First because module names should be all_lower, then because the one-module-per-class stuff is a nonsense in Python - we group related classes and functions (and exceptions and whatnots) together.
If the search method you are talking about is really so specific and you will never need to reuse it somewhere else, I do not see any reason to make it static. The fact that it doesn't require access to instance variables doesn't make it static by definition.
If there is a possibility, that this method is going to be reused, refactor it into a helper/utility class (no static again).
ADDED:
Just wanted to add, that when you consider something being static or not, think about how method name relates to the class name. Does this method name makes more sense when used in class context or object context?
When and how are static methods suppose to be used in python? We have already established using a class method as factory method to create an instance of an object should be avoided when possible. In other words, it is not best practice to use class methods as an alternate constructor (See Factory method for python object - best practice).
Lets say I have a class used to represent some entity data in a database. Imagine the data is a dict object containing field names and field values and one of the fields is an ID number that makes the data unique.
class Entity(object):
def __init__(self, data, db_connection):
self._data = data
self._db_connection
Here my __init__ method takes the entity data dict object. Lets say I only have an ID number and I want to create an Entity instance. First I will need to find the rest of the data, then create an instance of my Entity object. From my previous question, we established that using a class method as a factory method should probably be avoided when possible.
class Entity(object):
#classmethod
def from_id(cls, id_number, db_connection):
filters = [['id', 'is', id_number]]
data = db_connection.find(filters)
return cls(data, db_connection)
def __init__(self, data, db_connection):
self._data = data
self._db_connection
# Create entity
entity = Entity.from_id(id_number, db_connection)
Above is an example of what not to do or at least what not to do if there is an alternative. Now I am wondering if editing my class method so that it is more of a utility method and less of a factory method is a valid solution. In other words, does the following example comply with the best practice for using static methods.
class Entity(object):
#staticmethod
def data_from_id(id_number, db_connection):
filters = [['id', 'is', id_number]]
data = db_connection.find(filters)
return data
# Create entity
data = Entity.data_from_id(id_number, db_connection)
entity = Entity(data)
Or does it make more sense to use a standalone function to find the entity data from an ID number.
def find_data_from_id(id_number, db_connection):
filters = [['id', 'is', id_number]]
data = db_connection.find(filters)
return data
# Create entity.
data = find_data_from_id(id_number, db_connection)
entity = Entity(data, db_connection)
Note: I do not want to change my __init__ method. Previously people have suggested making my __init__ method to look something like this __init__(self, data=None, id_number=None) but there could be 101 different ways to find the entity data so I would prefer to keep that logic separate to some extent. Make sense?
When and how are static methods suppose to be used in python?
The glib answer is: Not very often.
The even glibber but not quite as useless answer is: When they make your code more readable.
First, let's take a detour to the docs:
Static methods in Python are similar to those found in Java or C++. Also see classmethod() for a variant that is useful for creating alternate class constructors.
So, when you need a static method in C++, you need a static method in Python, right?
Well, no.
In Java, there are no functions, just methods, so you end up creating pseudo-classes that are just bundles of static methods. The way to do the same thing in Python is to just use free functions.
That's pretty obvious. However, it's good Java style to look as hard as possible for an appropriate class to wedge a function into, so you can avoid writing those pseudo-classes, while doing the same thing is bad Python style—again, use free functions—and this is much less obvious.
C++ doesn't have the same limitation as Java, but many C++ styles are pretty similar anyway. (On the other hand, if you're a "Modern C++" programmer who's internalized the "free functions are part of a class's interface" idiom, your instincts for "where are static methods useful" are probably pretty decent for Python.)
But if you're coming at this from first principles, rather than from another language, there's a simpler way to look at things:
A #staticmethod is basically just a global function. If you have a function foo_module.bar() that would be more readable for some reason if it were spelled as foo_module.BazClass.bar(), make it a #staticmethod. If not, don't. That's really all there is to it. The only problem is building up your instincts for what's more readable to an idiomatic Python programmer.
And of course use a #classmethod when you need access to the class, but not the instance—alternate constructors are the paradigm case for that, as the docs imply. Although you often can simulate a #classmethod with a #staticmethod just by explicitly referencing the class (especially when you don't have much subclassing), you shouldn't.
Finally, getting to your specific question:
If the only reason clients ever need to look up data by ID is to construct an Entity, that sounds like an implementation detail you shouldn't be exposing, and it also makes client code more complex. Just use a constructor. If you don't want to modify your __init__ (and you're right that there are good reasons you might not want to), use a #classmethod as an alternate constructor: Entity.from_id(id_number, db_connection).
On the other hand, if that lookup is something that's inherently useful to clients in other cases that have nothing to do with Entity construction, it seems like this has nothing to do with the Entity class (or at least no more than anything else in the same module). So, just make it a free function.
The answer to the linked question specifically says this:
A #classmethod is the idiomatic way to do an "alternate constructor"—there are examples all over the stdlib—itertools.chain.from_iterable, datetime.datetime.fromordinal, etc.
So I don't know how you got the idea that using a classmethod is inherently bad. I actually like the idea of using a classmethod in your specific situation, as it makes following the code and using the api easy.
The alternative would be to use default constructor arguments like so:
class Entity(object):
def __init__(self, id, db_connection, data=None):
self.id = id
self.db_connection = db_connection
if data is None:
self.data = self.from_id(id, db_connection)
else:
self.data = data
def from_id(cls, id_number, db_connection):
filters = [['id', 'is', id_number]]
return db_connection.find(filters)
I prefer the classmethod version that you wrote originally however. Especially since data is fairly ambiguous.
Your first example makes the most sense to me: Entity.from_id is pretty succinct and clear.
It avoids the use of data in the next two examples, which does not describe what's being returned; the data is used to construct an Entity. If you wanted to be specific about the data being used to construct the Entity, then you could name your method something like Entity.with_data_for_id or the equivalent function entity_with_data_for_id.
Using a verb such as find can also be pretty confusing, as it doesn't give any indication of the return value — what is the function supposed to do when it's found the data? (Yes, I realize str has a find method; wouldn't it be better named index_of? But then there's also index...) It reminds me of the classic:
I always try to think what a name would indicate to someone with (a) no knowledge of the system, and (b) knowledge of other parts of the system — not to say I'm always successful!
Here is a decent use case for #staticmethod.
I have been working on a game as a side project. Part of that game includes rolling dice based on stats, and the possibility of picking up items and effects that impact your character's stats (for better or worse).
When I roll the dice in my game, I need to basically say... take the base character stats and then add any inventory and effect stats into this grand netted figure.
You can't take these abstract objects and add them without instructing the program how. I'm not doing anything at the class level or instance level either. I didn't want to define the function in some global module. The last best option was to go with a static method for adding up stats together. It just makes the most sense this way.
class Stats:
attribs = ['strength', 'speed', 'intellect', 'tenacity']
def __init__(self,
strength=0,
speed=0,
intellect=0,
tenacity=0
):
self.strength = int(strength)
self.speed = int(speed)
self.intellect = int(intellect)
self.tenacity = int(tenacity)
# combine adds stats objects together and returns a single stats object
#staticmethod
def combine(*args: 'Stats'):
assert all(isinstance(arg, Stats) for arg in args)
return_stats = Stats()
for stat in Stats.attribs:
for _ in args:
setattr(return_stats, stat,
getattr(return_stats, stat) + getattr(_, stat))
return (return_stats)
Which would make the stat combination calls work like this
a = Stats(strength=3, intellect=3)
b = Stats(strength=1, intellect=-1)
c = Stats(tenacity=5)
print(Stats.combine(a, b, c).__dict__)
{'strength': 4, 'speed': 0, 'intellect': 2, 'tenacity': 5}
So for a Django project, I would really like to be able to generate and display tables (not based on querysets) dynamically without needing to know the contents or schema beforehand.
It looks like the django-tables2 app provides nice functionality for rendering tables, but it requires that you either explicitly declare column names by declaring attributes on a custom-defined Table subclass or else provide a model for it infer the columns.
I.e, to use a column named "name", you'd do:
class NameTable(tables.Table):
name = tables.Column()
The Tables class does not provide a method for adding columns post-facto because, from reading the source, it seems to use a metaclass that sweeps the class attributes on __new__ and locks them in.
It seemed like very simple metaprogramming would be an elegant solution. I defined a basic class factory that accepts column names are arguments:
def define_table(columns):
class klass(tables.Table): pass
for col in columns:
setattr(klass, col, tables.Column())
return klass
Sadly this does not work. If I run `
x = define_table(["foo", "bar"])(data)
x.foo
x.bar
I get back:
<django_tables2.columns.base.Column object at 0x7f34755af5d0>
<django_tables2.columns.base.Column object at 0x7f347577f750>
But if I list the columns:
print x.base_columns
I get back nothing i.e. {}
I realize that there are probably simpler solutions (e.g. just bite the bullet and define every possible data configuration in code, or don't use django-tables2 and roll my own), but I am now treating this as an opportunity to learn more about meta programming, so I would really like to make this work this way.
Any idea what I'm wrong doing wrong? My theory is that the __new__ method (which is redefined in the metaclass Table uses) is getting invoked when klass is defined rather than when it's instantiated, so by the time I tack on the attributes it's too late. But that violates my understanding of when __new__ should happen. Otherwise, I'm struggling to understand how the metaclass __new__ can tell the difference between defined-in-code attributes vs. dynamically defined ones.
Thanks!
You're on the right track here, but instead of creating a barebones class and adding attributes to it, you should use the type() built-in function. The reason it's not working the way you're trying, is because the metaclass has already done its work.
Using type() allows you to construct a new class with your own attributes, while setting the base class. Meaning - you get to describe the fields you want as a blueprint to your class, allowing the Tables metaclass to take over after your definition.
Here's an example of using type() with django. I've used this myself for my own project (with some slight variations) but it should give you a nice place to start from, considering you're already almost there.
def define_table(columns):
attrs = dict((c, tables.Column()) for c in columns)
klass = type('DynamicTable', (tables.Table,), attrs)
return klass
You're confusing the __new__ of a "regular" class with the __new__ of a metaclass. As you note, Table relies on __new__ method on its metaclass. The metaclass is indeed invoked when the class is defined. The class is itself an instance of the metaclass, so defining the class is instantiating the metaclass. (In this case, Table is an instance of DeclarativeColumnMetaClass.) So by the time the class is define, it's too late.
One possible solution is to write a Table subclass that has some method refreshColumns or the like. You could adapt the code from DeclarativeColumnMetaclass.__new__ to essentially make refreshColumns do the same magic again. Then you could call refreshColumns() on your new class.