I have a Python class that requires some data in order to be initialized. This data is usually obtained using a function from another module, which makes calls to an API. One of the parameters my class' initializer takes is the same ID that can be used to obtain the resource with the API.
Calling the API from inside the initializer, and obtaining the data it needs would make for shorter (and cleaner?) initialization. But I am concerned this could make the class harder to test, and introduce a dependency deep inside the code.
I'm trying to devise the best way to implement this in a maintainable and testable way.
Would it be bad to call the API module directly from within the initializer, and obtain the data it needs to complete initialization? Or is it better to just call the API from outside and pass the data to the initializer?
The "normal" way(1) is the pass the dependent function, module, or class, into the constructor itself.
Then, in your production code, pass in the real thing. In your test code, pass in a dummy one that will behave exactly as you desire for the specific test case.
That's actually a half-way measure between the two things you posit.
In other words, something like:
def do_something_with(something_generator):
something = something_generator.get()
print(something)
# Real code.
do_something_with(ProductionGenerator())
# Test code.
class TestGenerator:
def get(self):
return 42
do_something_with(TestGenerator())
If you're reticent to always pass in a dependency, you can get around that with something like a default value and creating it inside the function if not given:
def do_something(something_generator=None):
if something_generator is None:
local_gen = ProductionGenerator()
else:
local_gen = something_generator
something = something_generator.get()
print(something)
# Real code.
do_something()
# Test code.
class TestGenerator:
def get(self):
return 42
do_something(TestGenerator())
(1) Defined, of course, as the way I do it :-)
Related
I have a module where I try to follow the SOLID principles to create and generate data and I think the following is based around the Liskov Substitution Principle:
class BaseLoader(ABC):
def __init__(self, dataset_name='mnist'):
self.dataset_name=dataset_name
class MNISTLoader(BaseLoader):
def load(self):
# Logic for loading the data
pass
class OCTMNISTLoader(Baseloader):
def download(self):
# Logic for downloading the data
pass
Now I want to create an instance based on a parsed argument or a loaded config file, I wonder if the following is the best practice or if better ways exist to create dynamically an instance:
possible_instances = {'mnist': MNISTLoader, 'octmnist': OCTMNISTLoader}
choosen_dataset = 'mnist'
instance = possible_instances[choosen_dataset](dataset_name=choosen_dataset)
EDIT #1:
We also thought about using a function to call the classes dynamically. This function is than placed inside the module, which includes the classes:
def get_loader(loader_name:str) -> BaseLoader:
loaders = {
'mnist': MNISTLoader,
'octmnist': OCTMNISTLoader
}
try:
return loaders[loader_name]
except KeyError as err:
raise CustomError("good error message")
I am still not shure which is the most pythonic way to solve this.
I wouldn't say this has much to do with LSP since both classes inherit only from an abstract base class that never gets instantiated. You are simply sharing a dataset_name member of the base class to reduce code duplication. And forget about the default value in argument dataset_name='mnist' it has no point the way you are using it.
I don't know much about Python, but to instantiate a class from a string you would usually want to have a factory class where you would use whatever ugly method you come up with to map a string to an instance of a matching class, perhaps a simple if/elif. The factory method could contain something like this.
if loader_name == "mnist":
return MNISTLoader(loader_name)
elif loader_name == "octmnist":
return OCTMNISTLoader(loader_name)
# but you probably don't need to pass loader_name argument
While the above is not an elegant solution, you now have a reusable class with a method that is the single source of truth for how strings should map to type.
Also, you may remove the dataset_name init argument and member, unless you have a reason for objects to hold that information. I guess you tried to use the base init method like a factory but that's not the way to go.
One problem your code has is the methods "load" and "download", how will you know which one of the two methods to call if you fetch instances dynamically? Again, if you named them differently because you thought this had something to do with LSP, it doesn't. You are losing the advantage of polymorphism by having two methods with different names. Just have both classes extend the same abstract method (since you're using the ABC thing) "load" and be done with it. Then you can do whatever_the_object_type_is.load()
Suppose that I have a function in my Python application that define some kind of context - a user_id for example. This function call other functions that do not take this context as a function argument. For example:
def f1(user, operation):
user_id = user.id
# somehow define user_id as a global/context variable for any function call inside this scope
f2(operation)
def f2(operation):
# do something, not important, and then call another function
f3(operation)
def f3(operation):
# get user_id if there is a variable user_id in the context, get `None` otherwise
user_id = getcontext("user_id")
# do something with user_id and operation
My questions are:
Can the Context Variables of Python 3.7 be used for this? How?
Is this what these Context Variables are intended for?
How to do this with Python v3.6 or earlier?
EDIT
For multiple reasons (architectural legacy, libraries, etc) I can't/won't change the signature of intermediary functions like f2, so I can't just pass user_id as arguments, neither place all those functions inside the same class.
You can use contextvars in Python 3.7 for what you're asking about. It's usually really easy:
import contextvars
user_id = contextvars.ContextVar("user_id")
def f1(user, operation):
user_id.set(user.id)
f2()
def f2():
f3()
def f3():
print(user_id.get(default=None)) # gets the user_id value, or None if no value is set
The set method on the ContextVar returns a Token instance, which you can use to reset the variable to the value it had before the set operation took place. So if you wanted f1 to restore things the way they were (not really useful for a user_id context variable, but more relevant for something like setting the precision in the decimal module), you can do:
token = some_context_var.set(value)
try:
do_stuff() # can use value from some_context_var with some_context_var.get()
finally:
some_context_var.reset(token)
There's more to the contextvars module than this, but you almost certainly don't need to deal with the other stuff. You probably only need to be creating your own contexts and running code in other contexts if you're writing your own asynchronous framework from scratch.
If you're just using an existing framework (or writing a library that you want to play nice with asynchronous code), you don't need to deal with that stuff. Just create a global ContextVar (or look up one already defined by your framework) and get and set values on it as shown above, and you should be good to go.
A lot of contextvars use is probably going to be in the background, as an implementation detail of various libraries that want to have a "global" state that doesn't leak changes between threads or between separate asynchronous tasks within a single thread. The example above might make more sense in this kind of situation: f1 and f3 are part of the same library, and f2 is a user-supplied callback passed into the library somewhere else.
Essentially what you're looking for is a way to share a state between a set of function. The canonical way to do so in an object oriented language is to use a class:
class Foo(object):
def __init__(self, operation, user=None):
self._operation = operation
self._user_id = user.id if user else None
def f1(self):
print("in f1 : {}".format(self._user_id))
self.f2()
def f2(self):
print("in f2 : {}".format(self._user_id))
self.f3()
def f3(self):
print("in f3 : {}".format(self._user_id))
f = Foo(operation, user)
f.f1()
With this solution, your class instances (here f) are "the context" in which the functions are executed - each instance having it's own dedicated context.
The functional programing equivalent would be to use closures, I'm not going to give an example here since while Python supports closures, it's still first and mainly an object language so the OO solution is the most obvious.
And finally, the clean procedural solution is to pass this context (which can be expressed as a dict or any similar datatype) all along the call chain, as shown in DFE's answer.
As a general rule : relying on global variables or some "magic" context that could - or not - be set by you-dont-who-nor-where-nor-when makes for code that is hard if not impossible to reason about, and that can break in the most unpredictable ways (googling for "globals evil" will yield an awful lot of litterature on the topic).
You can use kwargs in your function calls in order to pass
def f1(user, operation):
user_id = user.id
# somehow define user_id as a global/context variable for any function call inside this scope
f2(operation, user_id=user_id)
def f2(operation, **kwargs):
# do something, not important, and then call another function
f3(operation, **kwargs)
def f3(operation, **kwargs):
# get user_id if there is a variable user_id in the context, get `None` otherwise
user_id = kwargs.get("user_id")
# do something with user_id and operation
the kwargs dict is the equivalent to what you are looking at in context variables, but limited at a call stack. It is the same memory element passed (through pointer-like) in each function and not duplicates variables in memory.
In my opinion, but I would like to see what you all think, context variables is an elegant way to authorize globals variables and to control it.
I have a function foo that takes a parameter stuff
Stuff can be something in a database and I'd like to create a function that takes a stuff_id, get the stuff from the db, execute foo.
Here's my attempt to solve it:
1/ Create a second function with suffix from_stuff_id
def foo(stuff):
do something
def foo_from_stuff_id(stuff_id):
stuff = get_stuff(stuff_id)
foo(stuff)
2/ Modify the first function
def foo(stuff=None, stuff_id=None):
if stuff_id:
stuff = get_stuff(stuff_id)
do something
I don't like both ways.
What's the most pythonic way to do it ?
Assuming foo is the main component of your application, your first way. Each function should have a different purpose. The moment you combine multiple purposes into a single function, you can easily get lost in long streams of code.
If, however, some other function can also provide stuff, then go with the second.
The only thing I would add is make sure you add docstrings (PEP-257) to each function to explain in words the role of the function. If necessary, you can also add comments to your code.
I'm not a big fan of type overloading in Python, but this is one of the cases where I might go for it if there's really a need:
def foo(stuff):
if isinstance(stuff, int):
stuff = get_stuff(stuff)
...
With type annotations it would look like this:
def foo(stuff: Union[int, Stuff]):
if isinstance(stuff, int):
stuff = get_stuff(stuff)
...
It basically depends on how you've defined all these functions. If you're importing get_stuff from another module the second approach is more Pythonic, because from an OOP perspective you create functions for doing one particular purpose and in this case when you've already defined the get_stuff you don't need to call it within another function.
If get_stuff it's not defined in another module, then it depends on whether you are using classes or not. If you're using a class and you want to use all these modules together you can use a method for either accessing or connecting to the data base and use that method within other methods like foo.
Example:
from some module import get_stuff
MyClass:
def __init__(self, *args, **kwargs):
# ...
self.stuff_id = kwargs['stuff_id']
def foo(self):
stuff = get_stuff(self.stuff_id)
# do stuff
Or if the functionality of foo depends on the existence of stuff you can have a global stuff and simply check for its validation :
MyClass:
def __init__(self, *args, **kwargs):
# ...
_stuff_id = kwargs['stuff_id']
self.stuff = get_stuff(_stuff_id) # can return None
def foo(self):
if self.stuff:
# do stuff
else:
# do other stuff
Or another neat design pattern for such situations might be using a dispatcher function (or method in class) that delegates the execution to different functions based on the state of stuff.
def delegator(stff, stuff_id):
if stuff: # or other condition
foo(stuff)
else:
get_stuff(stuff_id)
I was looking into the following code.
On many occasions the __init__ method is not really used but there is a custom initialize function like in the following example:
def __init__(self):
pass
def initialize(self, opt):
# ...
This is then called as:
data_loader = CustomDatasetDataLoader()
# other instance method is called
data_loader.initialize(opt)
I see the problem that variables, that are used in other instance methods, could still be undefined, if one forgets to call this custom initialize function. But what are the benefits of this approach?
Some APIs out in the wild (such as inside setuptools) have similar kind of thing and they use it to their advantage. The __init__ call could be used for the low level internal API while public constructors are defined as classmethods for the different ways that one might construct objects. For instance, in pkg_resources.EntryPoint, the way to create instances of this class is to make use of the parse classmethod. A similar way can be followed if a custom initialization is desired
class CustomDatasetDataLoader(object):
#classmethod
def create(cls):
"""standard creation"""
return cls()
#classmethod
def create_with_initialization(cls, opt):
"""create with special options."""
inst = cls()
# assign things from opt to cls, like
# inst.some_update_method(opt.something)
# inst.attr = opt.some_attr
return inst
This way users of the class will not need two lines of code to do what a single line could do, they can just simply call CustomDatasetDataLoader.create_with_initialization(some_obj) if that is what they want, or call the other classmethod to construct an instance of this class.
Edit: I see, you had an example linked (wish underlining links didn't go out of fashion) - that particular usage and implementation I feel is a poor way, when a classmethod (or just rely on the standard __init__) would be sufficient.
However, if that initialize function were to be an interface with some other system that receives an object of a particular type to invoke some method with it (e.g. something akin to the visitor pattern) it might make sense, but as it is it really doesn't.
I'm more of an engineer and less of a coder, but I know enough python and C++ to be dangerous.
I'm creating a python vector/matrix class as a helper class based upon numpy as well as cvxopt. The overall goal (which I've already obtained... the answer to this question will just make the class better) is to make dot products and other processes more unified and easier for numerical methods.
However, I'd like to make my helper class even more transparent. What I'd like to do is to redefine the cvxopt.matrix() init function based upon the current variable which was used. This is to say, if I have a custom matrix: "cstmat", I'd like the function "cvxopt.matrix(cstmat)" to be defined by my own methods instead of what is written in the cvxopt class.
In short, I'd like to "intercept" the other function call and use my own function.
The kicker, though, is that I don't want to take over cvxopt.matrix(any_other_type). I just want to redefine the function when it's called upon my own custom class. Is this possible?
Thanks,
Jon
You can do this, but it's not pretty.
You can do probably something along these lines:
cvxopt._orig_matrix = cvxopt.matrix
def my_matrix(*args, **kwargs):
if isinstance(arg[0], cstmat):
# do your stuff here
else:
cvxopt._orig_matrix(*args, **kwargs)
cvxopt.matrix = my_matrix
But you're probably better off finding a less weird way. And no guarantees that won't forget who "self" is.
Better would be to use inheritance! Kinda like this:
class Cstmat(cvsopt.matrix):
def __init__(self, ...):
pass
def matrix(self, arg):
if isinstance(arg, cstmat):
# do your stuff here
else:
cvsopt.matrix(arg)