Pythonic - How to initialize a construtor with multiple arguments and validate

Pythonic - How to initialize a construtor with multiple arguments and validate - python

I'm a python noob and I'm trying to solve my problems the 'pythonic' way. I have a class, who's __init__ method takes 6 parameters. I need to validate each param and throw/raise an Exception if any fails to validate.
Is this the right way?
class DefinitionRunner:
def __init__(self, canvasSize, flightId, domain, definitionPath, harPath):
self.canvasSize = canvasSize
self.flightId = flightId
self.domain = domain
self.harPath = harPath
self.definitionPath = definitionPath
... bunch of validation checks...
... if fails, raise ValueError ...

If you want the variables to be settable independently of __init__, you could use properties to implement validations in separate methods.
They work only for new style classes though, so you need to define the class as class DefinitionRunner(object)
So for example,
#property
def canvasSize(self):
return self._canvasSize
#canvasSize.setter
def canvasSize(self, value):
# some validation here
self._canvasSize = value

Broadly speaking, that looks like the way you'd do it. Though strictly speaking, you might as well do validation before rather than after assignment, especially if assignment could potentially be time or resource intensive. Also, style convention says not to align assignment blocks like you are.

I would do it like you did it. Except the validating stuff. I would validate in a setter method and use it to set the attributes.

You could do something like this. Make a validator for each type of input. Make a helper function to run validation:
def validate_and_assign(obj, items_d, validators):
#validate all entries
for key, validator in validators.items():
if not validator[key](items_d[key]):
raise ValueError("Validation for %s failed" % (key,))
#set all entries
for key, val in items_d.items():
setattr(obj, key, val)
Which you'd use like this:
class DefinitionRunner:
validators = {
'canvasSize': canvasSize_validator,
'flightId': flightId_validator,
'domain': domain_validator,
'definitionPath': definitionPath_validator,
'harPath': harPath_validator,
}
def __init__(self, canvasSize, flightId, domain, definitionPath, harPath):
validate_and_assign(self, {
'canvasSize': canvasSize,
'flightId': flightId,
'domain': domain,
'definitionPath': definitionPath,
'harPath': harPath,
}, DefinitionRunner.validators)
The validators might be the same function, of course, if the data type is the same.

I'm not sure if this is exactly "Pythonic", but I've defined a function decorator called require_type. (To be honest, I think I found it somewhere online.)
def require_type(my_arg, *valid_types):
'''
A simple decorator that performs type checking.
#param my_arg: string indicating argument name
#param valid_types: list of valid types
'''
def make_wrapper(func):
if hasattr(func, 'wrapped_args'):
wrapped = getattr(func, 'wrapped_args')
else:
body = func.func_code
wrapped = list(body.co_varnames[:body.co_argcount])
try:
idx = wrapped.index(my_arg)
except ValueError:
raise(NameError, my_arg)
def wrapper(*args, **kwargs):
def fail():
all_types = ', '.join(str(typ) for typ in valid_types)
raise(TypeError, '\'%s\' was type %s, expected to be in following list: %s' % (my_arg, all_types, type(arg)))
if len(args) > idx:
arg = args[idx]
if not isinstance(arg, valid_types):
fail()
else:
if my_arg in kwargs:
arg = kwargs[my_arg]
if not isinstance(arg, valid_types):
fail()
return func(*args, **kwargs)
wrapper.wrapped_args = wrapped
return wrapper
return make_wrapper
Then, to use it:
class SomeObject(object):
#require_type("prop1", str)
#require_type("prop2", numpy.complex128)
def __init__(self, prop1, prop2):
pass

Related

Is there a way to manually validate object typing?

I was writing a test case for a function that accepts typing.BinaryIO that comes from fastapi.UploadFile.file.
def upload_binary(data: typing.BinaryIO):
...
I was confused what kind of object do I create that will pass type check. I tried io.StringIO and io.BytesIO, and the only way to check which one will be accepted as typing.BinaryIO was to use IDE's highlighting. It didn't accept StringIO but accepted BytesIO.
So my question - is there a way in Python to manually check if object will be validated with given typing hint.
For example some function like
file1 = StringIO("text")
file2 = BytesIO(b"text")
typing_check(file1, typing.BinaryIO) # >>> False
typing_check(file2, typing.BinaryIO) # >>> True
UPD
Looking at starlette/datastructures.py we have
class UploadFile:
...
file: typing.BinaryIO
def __init__(...):
if self.file is None:
self.file = tempfile.SpooledTemporaryFile(...)
And if you try to test it
s = tempfile.SpooledTemporaryFile()
isinstance(s, typing.BinaryIO) # >>> False

Somehow you can try building basic static type checking decorator using annotations and inspect module.
inspect.signature(fn) can read annotations of function parameters, and you can compare types with isinstance function.
import inspect
def static_type_checker(fn):
spec = inspect.signature(fn)
params = spec.parameters
def inner_fn(*args, **kwargs):
for arg, (name, param) in zip(args, params.items()):
assert isinstance(arg, param.annotation)
for key, value in kwargs.items():
assert isinstance(value, params[key].annotation)
return fn(*args, **kwargs)
return inner_fn
#static_type_checker
def sample(a: int):
print(a)
sample(1) # prints 1
sample("a") # AssertionError
Note that this is not perfect solution. It has many limitations, like when giving various-length arguments. I'm just suggesting basic idea.

Inheritance from class and override method

I have two classes, one inherits of the other. When I hesitate and re-establish the function get_commande_date I receive the following error:
TypeError: BooksCommande.get_commandes_date() missing 1 required positional argument: 'key'
This is my code:
class BaseCommande(ABC):
def __init__(self, list_of_commande: list) -> NoReturn:
if list_of_commande:
self.list_of_commande = list_of_commande
self.commande_date = None
self.comande_payed = None
self.commande_price = None
self.total_commandes = None
self.process_commande(list_of_commande)
super().__init__()
def get_commandes_date(self, list_of_commande):
return [commande['date_start'] for commande in list_of_commande]
def process_commande(self, list_of_commande):
self.commande_date = self.get_commandes_date(list_of_commande)
def my_dict(self):
return{
"commende_date": self.commande_date}
class BooksCommande(BaseCommande):
def __init__(self, list_of_commande: list) -> NoReturn:
super().__init__(list_of_commande)
self.commande_syplies = None
self.commande_books = None
self.process_books(list_of_commande)
def get_commandes_date(self, list_of_commande, key):
commande_date = []
for commande in list_of_commande:
cmd = {
'date_start': commande['date_start'],
'key': key,
'date_end': commande['date_end'],
}
commande_date.append(cmd)
return commande_date
def get_commande_books(self, books: list):
return 10
def process_books(self, list_of_commande):
self.books_list = self.get_commande_books(list_of_commande)
def my_dict2(self):
return{**super().my_dict(),
"books": self.books_list
}
commande_list = [{"date_start": "10/10/2021", "date_end": "12/15/2019"}]
print(BooksCommande(commande_list).my_dict2())
Is there a way to force BaseCommande to use the new redefined function or not? I really don't know how or from where to start.

The problem is you're attempting to change the number of arguments that get passed to the get_commandes_date() method — something that cannot be done when defining a derived class.
The workaround is to make the argument optional. So in class BaseCommande declare a key parameter:
def get_commandes_date(self, list_of_commande, key):
return [commande['date_start'] for commande in list_of_commande]
And then give it a default value in the derived BooksCommande class version of the method. (I'm not sure what might make sense here, so just made it None.)
def get_commandes_date(self, list_of_commande, key=None):
commande_date = []
for commande in list_of_commande:
cmd = {
'date_start': commande['date_start'],
'key': key,
'date_end': commande['date_end'],
}
commande_date.append(cmd)
return commande_date

As others have explained, the issue with your code is that your subclass, BooksCommande, changes the signature of the get_commandes_date method to be different than the version in the base class, BaseCommande. While that might be a bad idea in an abstract sense, it's not forbidden by Python. The real trouble is that one of BaseCommande's other methods, process_commande, tries to use the old signature, so everything breaks when that it gets called.
There is a fairly direct way to fix this, if you want to do so without dramatically changing the code. The general idea is for the two BaseCommande methods to call each other through a private reference. Even if one is overridden in a subclass, the private reference will remain pointing to the original implementation. Name mangling, with two leading underscores is often useful for this:
class BaseCommande(ABC):
...
def get_commandes_date(self, list_of_commande): # this method will be overridden
return [commande['date_start'] for commande in list_of_commande]
__get_commandes_date = get_commandes_date # private reference to previous method
def process_commande(self, list_of_commande):
self.commande_date = self.__get_commandes_date(list_of_commande) # use it here
This kind of design won't always be correct, so you'll need to figure out if it's appropriate for your specific classes or not. If the fact that process_commande is calls get_commandes_date is supposed to be an implementation detail (and so it should keep behaving the same way, even though the latter method is overridden), then this is a good approach. If the relationship between the methods is part of the class's API, then you probably don't want to do this (since overriding the get_commandes_date method may be a deliberate way to change the results of processess_commande in a subclass).

I think you want the method my_dict to have both my_dict and my_dict2 and have a boolean to trigger whenever you want to use one or the other.
def my_dict(self, trigger=False):
if not Trigger:
return{
"commende_date": self.commande_date}
else:
return{**super().my_dict(),
"books": self.books_list
Put this in place of your old my_dict method
def my_dict(self):
return{
"commende_date": self.commande_date}
Edit to add code

How to make a polymorphic dataclass constructor method

I have 3 dataclass objects say:
class Message1:
def __init__(a):
...
class Message2:
def __init__(d,e,f):
...
class Message3:
def __init__(g,i):
...
For these 3 messages I want to make a factory type method which can return one of the three objects if it succeeds and if not it should return either the one it identified as the correct message to be created but failed at creation or it should notify the user that it could not create any of the messages. Are there any OOP patterns for this?
My initial thought was to do a:
def factory_method(**parameters):
try:
Message1(**parameters)
except TypeError:
try:
Message2(**parameters)
except:
try:
Message3(**parameters)
except:
print("Could not deduce message type")
My issue with this idea is that:
It's not a dynamically scalable solution, with each new message class I introduce I need to add a new try catch block
If the whole nested block structure fails, I have no feedback as to why, was the parameters correct for one of the message but wrong value, or was it plain gibberish?
I realize this might be a bit opinion based on what the best outcome is. At the same time it might be the solution is not too elegant and the simplest way is to just tell the factory_method what kind of message to initialize. Any suggestions or ideas would be appreciated.

If you can't join them all in a single class and you can't point a call to a single class, i would match the arguments to the posible class. To make it work a type hint and a "proxy" class is required. This example asumes that any of the classes wont contain a __init__(*args, **kwargs), and to add a new class you just add it to Message.msg_cls, you can eval the global scope if you don't want to add manually each class.
class Message1:
def __init__(self, a: int, alt=None, num=10):
print('Message 1')
class Message2:
def __init__(self, d: str, e: str, f: int):
print('Message 2')
class Message3:
def __init__(self, g: int, i: any):
print('Message 3')
class Message:
msg_cls = (
Message1,
Message2,
Message3
)
#staticmethod
def eq_kwargs(cls, kwargs):
cls_kwargs = cls.__init__.__defaults__
if cls_kwargs is None:
if len(kwargs) > 0:
return False
else:
return True
cls_astr = cls.__init__.__code__
kw_types = [type(t) for t in cls_kwargs]
for k in kwargs:
if k in cls_astr.co_varnames:
if type(kwargs[k]) in kw_types:
kw_types.remove(type(kwargs[k]))
else:
if type(None) in kw_types:
kw_types.remove(type(None))
else:
return False
else:
return False
return True
#staticmethod
def eq_args(cls, args):
cls_args = cls.__init__.__annotations__
if len(cls_args) != len(args):
return False
for a, b in zip(args, cls_args):
if type(a) != cls_args[b] and cls_args[b] != any:
return False
return True
def __new__(cls, *args, **kwargs):
for mc in Message.msg_cls:
if Message.eq_args(mc, args):
if Message.eq_kwargs(mc, kwargs):
return mc(*args, **kwargs)
raise ValueError('Message.__new__, no match')
if __name__ == '__main__':
ms_1_a = Message(1, alt='a')
ms_1_b = Message(2, alt='a', num=5)
ms_2 = Message('X', 'Y', 5)
ms_3_a = Message(1, [1, 4])
ms_3_b = Message(2, Message(10))

Returning an Object (class) in Parallel Python

I have created a function that takes a value, does some calculations and return the different answers as an object. However when I try to parallelize the code, using pp, I get the following error.
File "trmm.py", line 8, in getattr
return self.header_array[name]
RuntimeError: maximum recursion depth exceeded while calling a Python object
Here is a simple version of what I am trying to do.
class DataObject(object):
"""
Class to handle data objects with several arrays.
"""
def __getattr__(self, name):
try:
return self.header_array[name]
except KeyError:
try:
return self.line[name]
except KeyError:
raise AttributeError("%s instance has no attribute '%s'" %(self.__class__.__name__, name))
def __setattr__(self, name, value):
if name in ('header_array', 'line'):
object.__setattr__(self, name, value)
elif name in self.line:
self.line[name] = value
else:
self.header_array[name] = value
class TrmmObject(DataObject):
def __init__(self):
DataObject.__init__(self)
self.header_array = {
'header': None
}
self.line = {
'longitude': None,
'latitude': None
}
if __name__ == '__main__':
import pp
ppservers = ()
job_server = pp.Server(2, ppservers=ppservers)
def get_monthly_values(value):
tplObj = TrmmObject()
tplObj.longitude = value
tplObj.latitude = value * 2
return tplObj
job1 = job_server.submit(get_monthly_values, (5,), (DataObject,TrmmObject,),("numpy",))
result = job1()
If I change return tplObj to return [tplObj.longitude, tplObj.latitude] there is no problem. However, as I said before this is a simple version, in reality this change would complicate the program a lot.
I am very grateful for any help.

You almost never need to use getattr and setattr, and it almost always ends up with something blowing up, and infinite recursions is a typical effect of that. I can't really see any reason for using them here either. Be explicit and use the line and header_array dictionaries directly.
If you want a function that looks up a value over all arrays, create a function for that and call it explicitly. Calling the function __getitem__ and using [] is explicit. :-)
(And please don't call a dictionary "header_array", it's confusing).

How to put variables on the stack/context in Python

In essence, I want to put a variable on the stack, that will be reachable by all calls below that part on the stack until the block exits. In Java I would solve this using a static thread local with support methods, that then could be accessed from methods.
Typical example: you get a request, and open a database connection. Until the request is complete, you want all code to use this database connection. After finishing and closing the request, you close the database connection.
What I need this for, is a report generator. Each report consist of multiple parts, each part can rely on different calculations, sometimes different parts relies in part on the same calculation. As I don't want to repeat heavy calculations, I need to cache them. My idea is to decorate methods with a cache decorator. The cache creates an id based on the method name and module, and it's arguments, looks if it has this allready calculated in a stack variable, and executes the method if not.
I will try and clearify by showing my current implementation. Want I want to do is to simplify the code for those implementing calculations.
First, I have the central cache access object, which I call MathContext:
class MathContext(object):
def __init__(self, fn):
self.fn = fn
self.cache = dict()
def get(self, calc_config):
id = create_id(calc_config)
if id not in self.cache:
self.cache[id] = calc_config.exec(self)
return self.cache[id]
The fn argument is the filename the context is created in relation to, from where data can be read to be calculated.
Then we have the Calculation class:
class CalcBase(object):
def exec(self, math_context):
raise NotImplementedError
And here is a stupid Fibonacci example. Non of the methods are actually recursive, they work on large sets of data instead, but it works to demonstrate how you would depend on other calculations:
class Fibonacci(CalcBase):
def __init__(self, n): self.n = n
def exec(self, math_context):
if self.n < 2: return 1
a = math_context.get(Fibonacci(self.n-1))
b = math_context.get(Fibonacci(self.n-2))
return a+b
What I want Fibonacci to be instead, is just a decorated method:
#cache
def fib(n):
if n<2: return 1
return fib(n-1)+fib(n-2)
With the math_context example, when math_context goes out of scope, so does all it's cached values. I want the same thing for the decorator. Ie. at point X, everything cached by #cache is dereferrenced to be gced.

I went ahead and made something that might just do what you want. It can be used as both a decorator and a context manager:
from __future__ import with_statement
try:
import cPickle as pickle
except ImportError:
import pickle
class cached(object):
"""Decorator/context manager for caching function call results.
All results are cached in one dictionary that is shared by all cached
functions.
To use this as a decorator:
#cached
def function(...):
...
The results returned by a decorated function are not cleared from the
cache until decorated_function.clear_my_cache() or cached.clear_cache()
is called
To use this as a context manager:
with cached(function) as function:
...
function(...)
...
The function's return values will be cleared from the cache when the
with block ends
To clear all cached results, call the cached.clear_cache() class method
"""
_CACHE = {}
def __init__(self, fn):
self._fn = fn
def __call__(self, *args, **kwds):
key = self._cache_key(*args, **kwds)
function_cache = self._CACHE.setdefault(self._fn, {})
try:
return function_cache[key]
except KeyError:
function_cache[key] = result = self._fn(*args, **kwds)
return result
def clear_my_cache(self):
"""Clear the cache for a decorated function
"""
try:
del self._CACHE[self._fn]
except KeyError:
pass # no cached results
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self.clear_my_cache()
def _cache_key(self, *args, **kwds):
"""Create a cache key for the given positional and keyword
arguments. pickle.dumps() is used because there could be
unhashable objects in the arguments, but passing them to
pickle.dumps() will result in a string, which is always hashable.
I used this to make the cached class as generic as possible. Depending
on your requirements, other key generating techniques may be more
efficient
"""
return pickle.dumps((args, sorted(kwds.items())), pickle.HIGHEST_PROTOCOL)
#classmethod
def clear_cache(cls):
"""Clear everything from all functions from the cache
"""
cls._CACHE = {}
if __name__ == '__main__':
# used as decorator
#cached
def fibonacci(n):
print "calculating fibonacci(%d)" % n
if n == 0:
return 0
if n == 1:
return 1
return fibonacci(n - 1) + fibonacci(n - 2)
for n in xrange(10):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))
def lucas(n):
print "calculating lucas(%d)" % n
if n == 0:
return 2
if n == 1:
return 1
return lucas(n - 1) + lucas(n - 2)
# used as context manager
with cached(lucas) as lucas:
for i in xrange(10):
print 'lucas(%d) = %d' % (i, lucas(i))
for n in xrange(9, -1, -1):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))
cached.clear_cache()
for n in xrange(9, -1, -1):
print 'fibonacci(%d) = %d' % (n, fibonacci(n))

this question seems to be two question
a) sharing db connection
b) caching/Memoizing
b) you have answered yourselves
a) I don't seem to understand why you need to put it on stack?
you can do one of these
you can use a class and connection
could be attribute of it
you can decorate all your function
so that they get a connection from
central location
each function can explicitly use a
global connection method
you can create a connection and pass
around it, or create a context
object and pass around
context,connection can be a part of
context
etc, etc

You could use a global variable wrapped in a getter function:
def getConnection():
global connection
if connection:
return connection
connection=createConnection()
return connection

"you get a request, and open a database connection.... you close the database connection."
This is what objects are for. Create the connection object, pass it to other objects, and then close it when you're done. Globals are not appropriate. Simply pass the value around as a parameter to the other objects that are doing the work.
"Each report consist of multiple parts, each part can rely on different calculations, sometimes different parts relies in part on the same calculation.... I need to cache them"
This is what objects are for. Create a dictionary with useful calculation results and pass that around from report part to report part.
You don't need to mess with "stack variables", "static thread local" or anything like that.
Just pass ordinary variable arguments to ordinary method functions. You'll be a lot happier.
class MemoizedCalculation( object ):
pass
class Fibonacci( MemoizedCalculation ):
def __init__( self ):
self.cache= { 0: 1, 1: 1 }
def __call__( self, arg ):
if arg not in self.cache:
self.cache[arg]= self(arg-1) + self(arg-2)
return self.cache[arg]
class MathContext( object ):
def __init__( self ):
self.fibonacci = Fibonacci()
You can use it like this
>>> mc= MathContext()
>>> mc.fibonacci( 4 )
5
You can define any number of calculations and fold them all into a single container object.
If you want, you can make the MathContext into a formal Context Manager so that it work with the with statement. Add these two methods to MathContext.
def __enter__( self ):
print "Initialize"
return self
def __exit__( self, type_, value, traceback ):
print "Release"
Then you can do this.
with MathContext() as mc:
print mc.fibonacci( 4 )
At the end of the with statement, you can guaranteed that the __exit__ method was called.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pythonic - How to initialize a construtor with multiple arguments and validate - python

Broadly speaking, that looks like the way you'd do it. Though strictly speaking, you might as well do validation before rather than after assignment, especially if assignment could potentially be time or resource intensive. Also, style convention says not to align assignment blocks like you are.

I would do it like you did it. Except the validating stuff. I would validate in a setter method and use it to set the attributes.

Related

Is there a way to manually validate object typing?

Inheritance from class and override method

How to make a polymorphic dataclass constructor method

Returning an Object (class) in Parallel Python

How to put variables on the stack/context in Python

Categories

Resources