I am building a tool that takes directories as inputs and performs actions where necessary. These actions vary depeding on certain variables so I created a few class objects which help me with my needs in an organised fashion.
However, I hit a wall figuring out how to best design the following scenario.
For the sake of simplicity, let's assume there are only directories (no files). Also, the below is a heavily simplified example.
I have the following parent class:
# directory.py
from pathlib import Path
class Directory:
def __init__(self, absolute_path):
self.path = Path(absolute_path)
def content(self):
return [Directory(c) for c in self.path.iterdir()]
So, I have a method in the parent class that returns Directory instances for each directory inside the initial directory in absolute_path
What the above does, is hold all methods that can be performed on all directories. Now, I have a separate class that inherits from the above and adds further methods.
# special_directory.py
from directory import Directory
class SpecialDirectory(Directory):
def __init__(self, absolute_path):
super().__init__(absolute_path)
# More methods
I am using an Object Factory like approach to build one or the other based on a condition like so:
# directory_factory.py
from directory import Directory
from special_directory import SpecialDirectory
def pick(path):
return SpecialDirectory(path) if 'foo' in path else Directory(path)
So, if 'foo' exists in the path, it should be a SpecialDirectory instead allowing it to do everything Directory does plus more.
The problem I'm facing is with the content() method. Both should be able to do that but I don't want it to be limited to making a list of Directory instances. If any of its content has "foo*", it should be a SpecialDirectory.
Directory doesn't (and shouldn't) know about SpecialDirectory, so I tried importing and using the factory but it complains about some circular import (which makes sense).
I am not particularly stuck as I have come up with a temp fix, but it isn't pretty. So I was hoping I could get some tips as to what would be an effective and clean solution for this specific situation.
What you need is sometimes called a "virtual constructor" which is a way to allow subclasses to determine what type of class instance is created when calling the base class constructor. There's no such thing in Python (or C++ for that matter), but you can simulate them. Below is an example of a way of doing this.
Note this code is very similar to what's in my answer to the question titled Improper use of __new__ to generate classes? (which has more information about the technique). Also see the one to What exactly is a Class Factory?
from pathlib import Path
class Directory:
subclasses = []
#classmethod
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
cls.subclasses.append(cls)
def __init__(self, absolute_path):
self.path = Path(absolute_path)
def __new__(cls, path):
""" Create instance of appropriate subclass. """
for subclass in cls.subclasses:
if subclass.pick(path):
return object.__new__(subclass)
else:
return object.__new__(cls) # Default is this base class.
def content(self):
return [Directory(c) for c in self.path.iterdir()]
def __repr__(self):
classname = type(self).__name__
return f'{classname}(path={self.path!r})'
# More methods
...
class SpecialDirectory(Directory):
def __init__(self, absolute_path):
super().__init__(absolute_path)
#classmethod
def pick(cls, path):
return 'foo' in str(path)
# More methods
...
if __name__ == '__main__':
root = './_obj_factory_test'
d = Directory(root)
print(d.content())
Related
class TaskInput:
def __init__(self):
self.cfg = my_config #### Question: How do I do this only once?
class TaskA(TaskInput):
def __init__(self):
pass
class TaskB (TaskInput):
def __init__(self):
pass
There are many tasks like TaskA, TaskB etc, they all are inherited from TaskInput.
Tasks also depend on something, let's say, a configuration which I only want to set ONCE.
The code has multiple Tasks classes, like TaskA, TaskB etc. They all depend on this common configuration.
One natural way would be to make this configuration a class member of TaskInput, ie, TaskInput.cfg = my_config, something that's initialized in __init__() of TaskInput.
However, if it's a member of TaskInput, it'll get executed multiple times, every time a new object of type TaskX is created as all those Tasks are inherited from TaskInput.
What's the best practice and best way to accomplish this in Python?
Make the configuration a class attribute by defining it on the class rather than in __init__.
class TaskInput:
cfg = my_config
It is now accessible as self.cfg on any instance of TaskInput or its children.
If the configuration isn't available when the class is defined (e.g. it's read later from a file), assign it on the class when it is available:
TaskInput.cfg = my_config
You have two choices as to how to handle writing the class definition in this situation.
Don't define cfg in the class definition at all, so you'll get a big juicy AttributeError if you try to access the configuration before it's available.
Define a default configuration which gets overwritten when the real configuration is available.
Generally I favor approach #1, since it "fails fast" (i.e. detects logic errors in your code when they occur rather than hiding them until something goes screwy later), but there are situations where you might need a default configuration to get things up and running before you can read the "real" configuration. In that case the default configuration should be the bare minimum possible, just what you need to get going.
I will not try to guess your need so I will assume you mean exactly what you said below, namely that you want a single initialization of a class member, but done through the creation of an instance.
a class member of TaskInput, ie, TaskInput.cfg = my_config, something
that's initialized in init() of TaskInput.
This can work, but not the way you did it. in your code you never created a class attribute, anything created with self is an instance attribute belonging to a single specific task instance so that:
from copy import deepcopy
class TaskInput:
_cfg = None # prefix with '_' to indicate it should be considered private
def __init__(self, my_config=None):
_cfg = _cfg or my_config
#property
def cfg(self):
""" If you want to privatize it a bit more,
make yourself a getter that returns a deep copy."""
return deepcopy(cfg)
Now, there basically is no such thing as true privatization in python and you will never be able to entirely prevent manipulation. In the example above, any child has direct read-write access to _cfg, so it would fall on us not to use it directly and pass by its accessors (__init__() and cfg()).
There's always a way to make things more difficult, like the following, using modules.
Project
├─ __init__.py
├─ settings.py
├─ module1.py
└─ module2.py
settings.py
cfg = None
module1.py
from copy import deepcopy
import settings
class A:
def __init__(self, cfg_=None):
settings.cfg = settings.cfg or cfg_
#property
def cfg(self):
return deepcopy(settings.cfg)
module2.py
""" The following classes won't be able to
overwrite the config without importing
from settings.py.
"""
from module1 import A
class B(A):
pass
class C(A):
def __init__(self):
super().__init__("foobar")
Giving these results:
b0 = B()
b0.cfg
# > None
b1 = B({"foo1": "bar1"})
b1.cfg
# > {'foo1': 'bar1'}
b2 = B({"foo1": "bar2", "foo3": "bar3"})
b2.cfg
# > {'foo1': 'bar1'}
try:
b2.cfg = 1234
except Exception as e:
print(type(e), e)
# > <class 'AttributeError'> can't set attribute
b2.cfg
# > {'foo1': 'bar1'}
c = C("asdf")
c.cfg
# > {'foo1': 'bar1'}
Which can be overkill of course and removes the actual ownership of the configuration from the class
TLDR;
I am using a #classmethod as a constructor for my class, and I need to override it with a different signature for one specific child class that needs extra parameters. PyCharm gives a warning about overriding a method with different signature. I wonder whether it also applies to #classmethod constructors.
I am using the IDE PyCharm for my python project and I have received the following warning regarding the overriding of a method in a class:
Signature of method [...] does not match signature of base method in class [...]
I understand this is related to the Liskov substitution principle, meaning objects of a parent class should always be replaceable by objects of a child class.
However, in my case I am overriding a #classmethod which is used as a constructor, following some sort of factory pattern. A simplification of my code would be as follows:
class Parent:
def __init__(self, common, data):
self.common = common
self.data = data
#classmethod
def from_directory(cls, data_dir, common):
all_data = [load_data(data_file) for data_file in get_data_files(data_dir)]
return [cls(common, data) for data in all_data]
class ChildA(Parent):
def __init__(self, common, data, specific):
super().__init__(common, data)
self.specific = specific
#classmethod
def from_directory(cls, data_dir, common, specific):
all_data = [load_data(data_file) for data_file in get_data_files(data_dir)]
return [cls(common, data, specific) for data in all_data]
In this example, basically I have a parent class Parent with some common attribute that all child classes will inherit, and some particular child class ChildA which has an extra, subclass-specific attribute.
Since I am using the #classmethod as a constructor, I assume the Liskov principle does not apply, just in the same way that the __init__() method can be overridden with a different signature. However, the PyCharm warning has made me consider whether there is something I might have missed. I am not sure whether I am using the #classmethod in a sensitive way.
My main question is then: Is PyCharm being overzealous with its warnings here or is there any reason the pattern described above should be avoided?
Also, any feedback about any other design issues / misconceptions I might have is most welcome.
I would refine your class method. There are really two class methods to provide here: one that creates an instance of the class from a data file, and one that produces a list of instances from the files in a directory (using the first class method). Further, the class methods shouldn't care about which arguments cls will need: it just passes on whatever it receives (with the exception of data, which it knows about and will provide or override with whatever it reads from a file).
class Parent:
def __init__(self, common, data, **kwargs):
super().__init__(**kwargs)
self.common = common
self.data = data
#classmethod
def from_file(cls, filename, **kwargs):
# If the caller provided a data argument,
# ignore it and use the data from the file instead.
kwargs['data'] = load_data(filename)
return cls(**kwargs)
#classmethod
def from_directory(cls, data_dir, **kwargs):
return [cls.from_file(data_file, **kwargs)
for data_file in get_data_files(data_dir)]
class ChildA(Parent):
def __init__(self, specific, **kwargs):
super().__init__(**kwargs)
self.specific = specific
Notice that you no longer need to override Parent.from_directory; it's already agnostic about what arguments it receives that are intended for __init__.
I think using a base class would be very helpful for a set of classes I am defining for an application. In the (possibly incorrect) example below, I outline what I'm going for: a base class containing an attribute that I won't want to define multiple times. In this case, the base class will define the base part of a file path that each child class will then use to build out their own more specific paths.
However, it seems like I'd have to type in parent_path to the __init__ method of the children classes anyway, regardless of the use of single inheritance from the base class.
import pathlib
class BaseObject:
def __init__(self, parent_path: pathlib.Path):
self.parent_path = parent_path
class ChildObject(BaseObject):
def __init__(self, parent_path: pathlib.Path, child_path: pathlib.Path):
super(ChildObject, self).__init__()
self.full_path = parent_path.joinpath(child_path)
class ChildObject2(BaseObject):
...
class ChildObject3(BaseObject):
...
If this is the case, then is there any reason to use inheritance from a base class like this, other than to make it clearer what my implementation is trying to do?
I don't see an advantage for this implementation. As you've noted, you still have to pass the parent_path into the child instantiation. You also have to call the parent's __init__, which counteracts the one-line clarity "improvement".
For my eyes, you've already made it clear by using good attribute names. I'd switch from parent_path to base_path, so the reader doesn't look for a parent object.
Alternately, you might want to make that a class attribute of the parent: set it once, and let all the objects share it by direct reference, rather than passing in the same value for every instantiation.
Yes, it is correct that you have to provide parent_path into the __init__ call of the parent, that is super(ChildObject, self).__init__(parent_path) (you missed to provide parent_path in your example).
However, this is Python, so there is usually help so you can avoid writing boilerplate code. In this case, I would recommend to use the attrs library. With this you can even avoid writing your init classes all together.
To get a usefulness of such inheritance scheme - make your BaseObject more flexible and accept optional (keyword) arguments:
import pathlib
class BaseObject:
def __init__(self, parent_path: pathlib.Path, child_path: pathlib.Path=None):
self.parent_path = parent_path
self.full_path = parent_path.joinpath(child_path) if child_path else parent_path
class ChildObject(BaseObject):
...
class ChildObject2(BaseObject):
...
class ChildObject3(BaseObject):
...
co = ChildObject(pathlib.Path('.'), pathlib.Path('../text_files'))
print(co, vars(co))
# <__main__.ChildObject object at 0x7f1a664b49b0> {'parent_path': PosixPath('.'), 'full_path': PosixPath('../text_files')}
How to get the filename of the subclass?
Example:
base.py:
class BaseClass:
def __init__(self):
# How to get the path "./main1.py"?
main1.py:
from base import BaseClass
class MainClass1(BaseClass):
pass
Remember that self in BaseClass.__init__ is an instance of the actual class that's being initialised. Therefore, one solution, is to ask that class which module it came from, and then from the path for that module:
import importlib
class BaseClass:
def __init__(self):
m = importlib.import_module(self.__module__)
print m.__file__
I think there are probably a number of way you could end up with a module that you can't import though; this doesn't feel like the most robust solution.
If all you're trying to do is identify where the subclass came from, then probably combining the module name and class name is sufficient, since that should uniquely identify it:
class BaseClass:
def __init__(self):
print "{}.{}".format(
self.__module__,
self.__class__.__name__
)
You could do it by reaching back through the calling stack to get the global namespace of the caller of the BaseClass.__init__() method, and from that you can extract the name of the file it is in by using the value of the __file__ key in that namespace.
Here's what I mean:
base.py:
import sys
class BaseClass(object):
def __init__(self):
print('In BaseClass.__init__()')
callers_path = sys._getframe(1).f_globals['__file__']
print(' callers_path:', callers_path)
main1.py:
from base import BaseClass
class MainClass1(BaseClass):
def __init(self):
super().__init__()
mainclass1 = MainClass1()
Sample output of running main1.py:
In BaseClass.__init__()
callers_path: the\path\to\main1.py
I think you're looking to the wrong mechanism for your solution. Your comments suggest that what you want is an exception handler with minimal trace-back capability. This is not something readily handled within the general class mechanism.
Rather, you should look into Python's stack inspection capabilities. Very simply, you want your __init__ method to report the file name of the calling sub-class. You can hammer this by requiring the caller to pass its own __file__ value. In automated fashion, you can dig back one stack frame and access __file__ via that context record. Note that this approach assumes that the only time you need this information is when __init__ is called is directly from a sub-class method.
Is that enough to get you to the right documentation?
I am having trouble with this setup mainly because I am not sure what I actually want in order to solve this problem.
This is the setup
- main.py
- lib
- __init__.py
- index.py
- test.py
__init__.py has this code
import os
for module in os.listdir(os.path.dirname(__file__)+"/."):
if module == '__init__.py' or module[-3:] != '.py':
continue
__import__(module[:-3], locals(), globals())
del module
main.py has this code as of now
from lib.index import *
print User.__dict__
index.py has this code
class User(object):
def test(self):
return "hi"
pass
test.py has this code
class User(object):
def tes2(self):
return "hello"
When I execute main.py it successfully prints the method test from index.py but what I am trying to do is figure out a way where I can just create a file in the lib folder where that while has only one function in the format
class User(object):
def newFunction(self):
return abc
and this function should automatically be available for me in main.py
I am sure that this is not a hard thing to do but I honestly don't know what I want (what to search for to solve this) which is preventing me from researching the solution.
You can use a metaclass to customize class creation and add functions defined elsewhere:
import types
import os
import os.path
import imp
class PluginMeta(type):
def __new__(cls, name, bases, dct):
modules = [imp.load_source(filename, os.path.join(dct['plugindir'], filename))
for filename in os.listdir(dct['plugindir']) if filename.endswith('.py')]
for module in modules:
for name in dir(module):
function = getattr(module, name)
if isinstance(function, types.FunctionType):
dct[function.__name__] = function
return type.__new__(cls, name, bases, dct)
class User(metaclass=PluginMeta):
plugindir = "path/to/the/plugindir"
def foo(self):
print "foo"
user = User()
print dir(user)
Then in the plugin files, just create functions not classes:
def newFunction(self, abc):
self.abc = abc
return self.abc
And the metaclass will find them, turn them into methods, and attach them to your class.
Classes are objects, and methods are nothing more than attributes on class-objects.
So if you want to add a method to an existing class, outside the original class block, all that is is the problem of adding an attribute to an object, which I would hope you know how to do:
class User(object):
pass
def newFunction(self):
return 'foo'
User.newFunction = newFunction
agf's metaclass answer is basically a nifty automatic way of doing this, although it works by adding extra definitions to the class block before the class is created, rather than adding extra attributes to the class object afterwards.
That should be basically all you need to develop a framework in which things defined in one module are automatically added to a class defined elsewhere. But you still need to make a number of design decisions, such as:
If your externally-defined functions need auxiliary definitions, how do you determine what's supposed to get added to the class and what was just a dependency?
If you have more than one class you're extending this way, how do you determine what goes in which class?
At what point(s) in your program does the auto-extension happen?
Do you want to say in your class "this class has extensions defined elsewhere", or say in your extensions "this is an extension to a class defined elsewhere", or neither and somewhere bind extensions to classes externally from both?
Do you need to be able to have multiple versions of the "same" class with different extensions active at the same time?
A metaclass such as proposed by agf can be a very good way of implementing this sort of framework, because it lets you put all the complex code in one place while still "tagging" every class that doesn't work the way classes normally work. It does fix the answers to some of the questions I posed above, though.
here a working code we used in a project, I'm not sure it's the best way but it worked and there is almost no additional code to add to other files
cpu.py:
from cpu_base import CPU, CPUBase
import cpu_common
import cpu_ext
cpu_base.py:
def getClass():
return __cpu__
def setClass(CPUClass):
global __cpu__
__cpu__ = CPUClass
__classes__.append(CPUClass)
def CPU(*kw):
return __cpu__(*kw)
class CPUBase:
def __init__(self):
your_init_Stuff
# optionally a method classname_constructor to mimic __init__ for each one
for c in __classes__:
constructor = getattr(c, c.__name__ + '_constructor', None)
if constructor is not None:
constructor(self)
setClass(CPUBase)
cpu_common.py:
from cpu_base import getClass, setClass
class CPUCommon(getClass()):
def CPUCommon_constructor(self):
pass
setClass(CPUCommon)
cpu_ext.py:
from cpu_base import getClass, setClass
class CPUExt(getClass()):
pass
setClass(CPUExt)
to use the class import CPU from cpu.py