Using an Abstract Base class as an argument to a function (Python)

Using an Abstract Base class as an argument to a function (Python) - python

If I have an Abstract Base Class called BaseData which has the function update which is overridden with different functionality in its Child Classes, can I have a function as follows, where I want the function to take any Child Class as an argument and call the update function for the corresponding Child Class.
def date_func(BaseData, time):
result = BaseData.update(time)
lastrow = len(result.index)
return result['Time'].iloc[lastrow],result['Time'].iloc[lastrow-100]

Sure you can. Python won't care because it doesn't do any type checking.
In fact, you can use any type that provides a compatible interface independent from whether the instance derives from BaseData.

Including the name of the ABC as the name of the parameter won't restrict it to only subclasses of the ABC. All it does is make a parameter of that name.
Any object of any type can be passed in as an argument to any function or method. Any object that - in this case - doesn't have update() will cause an AttributeError to be raised, but if the argument has an update() method that can accept the one argument given, it won't cause a problem.
If you want to be certain that the first argument is a subclass of BaseData, follow these steps:
rename the parameter to something like data. This will make it so that the name of the parameter isn't shadowing ("replacing within this context") out the actual BaseData class
write if isinstance(data, BaseData): at the beginning of the function, tabbing everything that was already there over to be within it.
(optional) write an else clause that raises an Error. If you don't do this, then None will simply be returned when the type check fails.
Now that you know how to do what you're asking, you should be aware that there are few worthwhile cases for doing this. Again, any object that fulfills the needed 'protocol' can work and doesn't need to necessarily be a subclass of your ABC.
This follows python's principle of "It's easier to ask for forgiveness than permission" (or EAFTP), which lets us assume that the person who passed in an argument gave one of a compatible type. If you're worried about the possibility of someone giving the wrong type, then you can wrap the code in a try-catch block that deals with the exception raised when it's wrong. This is how we "ask for forgiveness".
Generally, if you're going to do type checks, it's because you're prepared to handle different sets of protocols and the ABCs that define these protocols also (preferably) define __subclasshook__() so that it doesn't JUST check whether the class is 'registered' subclass, but rather follows the prescribed protocol.

Related

How can i type hint the init params are the same as fields in a dataclass?

Let us say I have a custom use case, and I need to dynamically create or define the __init__ method for a dataclass.
For exampel, say I will need to decorate it like #dataclass(init=False) and then modify __init__() method to taking keyword arguments, like **kwargs. However, in the kwargs object, I only check for presence of known dataclass fields, and set these attributes accordingly (example below)
I would like to type hint to my IDE (PyCharm) that the modified __init__ only accepts listed dataclass fields as parameters or keyword arguments. I am unsure if there is a way to approach this, using typing library or otherwise. I know that PY3.11 has dataclass transforms planned, which may or may not do what I am looking for (my gut feeling is no).
Here is a sample code I was playing around with, which is a basic case which illustrates problem I am having:
from dataclasses import dataclass
# get value from input source (can be a file or anything else)
def get_value_from_src(_name: str, tp: type):
return tp() # dummy value
#dataclass
class MyClass:
foo: str
apple: int
def __init__(self, **kwargs):
for name, tp in self.__annotations__.items():
if name in kwargs:
value = kwargs[name]
else:
# here is where I would normally have the logic
# to read the value from another input source
value = get_value_from_src(name, tp)
if value is None:
raise ValueError
setattr(self, name, value)
c = MyClass(apple=None)
print(c)
c = MyClass(foo='bar', # here, I would like to auto-complete the name
# when I start typing `apple`
)
print(c)
If we assume that number or names of the fields are not fixed, I am curious if there could be a generic approach which would basically say to type checkers, "the __init__ of this class accepts only (optional) keyword arguments that match up on the fields defined in the dataclass itself".
Addendums, based on notes in comments below:
Passing #dataclass(kw_only=True) won't work because imagine I am writing this for a library, and need to support Python 3.7+. Also, kw_only has no effect when a custom __init__() is implemented, as in this case.
The above is just a stub __init__ method. it could have more complex logic, such as setting attributes based on a file source for example. basically the above is just a sample implementation of a larger use case.
I can't update each field to foo: Optional[str] = None because that part would be implemented in user code, which I would not have any control over. Also, annotating it in this way doesn't make sense when you know a custom __init__() method will be generated for you - meaning not by dataclasses. Lastly, setting a default for each field just so that the class can
be instantiated without arguments, like MyClass(), don't seem like
the best idea to me.
It would not work to let dataclasses auto-generate an __init__, and instead implement a __post_init__(). This would not work because I need to be able to construct the class without arguments, like MyClass(), as the field values will be set from another input source (think local file or elsewhere); this means that all fields would be required, so annotating them as Optional would be fallacious in this case. I still need to be able to support user to enter optional keyword arguments, but these **kwargs will always match up with dataclass field names, and so I desire some way for auto-completion to work with my IDE (PyCharm)
Hope this post clarifies the expectations and desired result. If there are any questions or anything that is a bit vague, please let me know.

What you are describing is impossible in theory and unlikely to be viable in practice.
TL;DR
Type checkers don't run your code, they just read it. A dynamic type annotation is a contradiction in terms.
Theory
As I am sure you know, the term static type checker is not coincidental. A static type checker is not executing the code your write. It just parses it and infers types according to it's own internal logic by applying certain rules to a graph that it derives from your code.
This is important because unlike some other languages, Python is dynamically typed, which as you know means that the type of a "thing" (variable) can completely change at any point. In general, there is theoretically no way of knowing the type of all variables in your code, without actually stepping through the entire algorithm, which is to say running the code.
As a silly but illustrative example, you could decide to put the name of a type into a text file to be read at runtime and then used to annotate some variable in your code. Could you do that with valid Python code and typing? Sure. But I think it is beyond clear, that static type checkers will never know the type of that variable.
Why your proposition won't work
Abstracting away all the dataclass stuff and the possible logic inside your __init__ method, what you are asking boils down to the following.
"I want to define a method (__init__), but the types of its parameters will only be known at runtime."
Why am I claiming that? I mean, you do annotate the types of the class' attributes, right? So there you have the types!
Sure, but these have -- in general -- nothing whatsoever to do with the arguments you could pass to the __init__ method, as you yourself point out. You want the __init__ method to accept arbitrary keyword-arguments. Yet you also want a static type checker to infer which types are allowed/expected there.
To connect the two (attribute types and method parameter types), you could of course write some kind of logic. You could even implement it in a way that enforces adherence to those types. That logic could read the type annotations of the class attributes, match up the **kwargs and raise TypeError if one of them doesn't match up. This is entirely possible and you almost implemented that already in your example code. But this only works at runtime!
Again, a static type checker has no way to infer that, especially since your desired class is supposed to just be a base class and any descendant can introduce its own attributes/types at any point.
But dataclasses work, don't they?
You could argue that this dynamic way of annotating the __init__ method works with dataclasses. So why are they so different? Why are they correctly inferred, but your proposed code can't?
The answer is, they aren't.
Even dataclasses don't have any magical way of telling a static type checker which parameter types the __init__ method is to expect, even though they do annotate them, when they dynamically construct the method in _init_fn.
The only reason mypy correctly infers those types, is because they implemented a separate plugin just for dataclasses. Meaning it works because they read through PEP 557 and hand-crafted a plugin for mypy that specifically facilitates type inference based on the rules described there.
You can see the magic happening in the DataclassTransformer.transform method. You cannot generalize this behavior to arbitrary code, which is why they had to write a whole plugin just for this.
I am not familiar enough with how PyCharm does its type checking, but I strongly suspect they used something similar.
So you could argue that dataclasses are "cheating" with regards to static type checking. Though I am certainly not complaining.
Pragmatic solution
Even something as "high-profile" as Pydantic, which I personally love and use extensively, requires its own mypy plugin to realize the __init__ type inference properly (see here). For PyCharm they have their own separate Pydantic plugin, without which the internal type checker cannot provide those nice auto-suggestions for initialization etc.
That approach would be your best bet, if you really want to take this further. Just be aware that this will be (in the best sense of the word) a hack to allow specifc type checkers to catch "errors" that they otherwise would have no way of catching.
The reason I argue that it is unlikely to be viable is because it will essentially blow up the amount of work for your project to also cover the specific hacks for those type checkers that you want to satisfy. If you are committed enough and have the resources, go for it.
Conclusion
I am not trying to discourage you. But it is important to know the limitations enforced by the environment. It's either dynamic types and hacky imperfect type checking (still love mypy), or static types and no "kwargs can be anything" behavior.
Hope this makes sense. Please let me know, if I made any errors. This is just based on my understanding of typing in Python.

For
It would not work to let dataclasses auto-generate an __init__, and instead implement a __post_init__(). This would not work because I need to be able to construct the class without arguments, like MyClass(), as the field values will be set from another input source (think local file or elsewhere); this means that all fields would be required, so annotating them as Optional would be fallacious in this case. I still need to be able to support user to enter optional keyword arguments, but these **kwargs will always match up with dataclass field names, and so I desire some way for auto-completion to work with my IDE (PyCharm)
dataclasses.field + default_factory can be a solution.
But, it seems that dataclass field declarations are implemented in user code:
I can't update each field to foo: Optional[str] = None because that part would be implemented in user code, which I would not have any control over. Also, annotating it in this way doesn't make sense when you know a custom __init__() method will be generated for you - meaning not by dataclasses. Lastly, setting a default for each field just so that the class can be instantiated without arguments, like MyClass(), don't seem like the best idea to me.
If your IDE supports ParamSpec, there is a workaround: not correct(cannot pass static type checker), but has auto-completion:
from typing import Callable, Iterable, TypeVar, ParamSpec
from dataclasses import dataclass
T = TypeVar('T')
P = ParamSpec('P')
# user defined dataclass
#dataclass
class MyClass:
foo: str
apple: int
def wrap(factory: Callable[P, T], annotations: Iterable[tuple[str, type]]) -> Callable[P, T]:
def default_factory(**kwargs):
for name, type_ in annotations:
kwargs.setdefault(name, type_())
return factory(**kwargs)
return default_factory
WrappedMyClass = wrap(MyClass, MyClass.__annotations__.items())
WrappedMyClass() # Okay

How to check an argument type before init gets called

I need to check the argument type in __init__(). I did it this way:
class Matrix:
def __init__(self, matrix):
"""
if type(matrix) != list raise argument error
"""
if type(matrix) != list:
raise TypeError(f"Expected list got {type(matrix)}")
self.matrix = matrix
a = 5
new_matrix = Matrix(a)
But when the TypeError gets raised the __init__() method is already called. I am wondering how to do this before it gets called. I reckon this could be done using metaclass to "intercept it" at some point and raise an error, but I do not know how.

First:
one usually do not make such runtime checks in Python - unless strictly necessary. The idea is that whatever gets passed to __init__ in this case will behave similarly enough to a list, to be used in its place. That is the idea of "duck typing".
Second:
if this check is really necessary, then an if statement inside the function or method body, just like you did, is the way to do it. It does not matter that the "method was run, and the error was raised inside it" - this is the way dynamic typing works in Python.
Third:
There actually is a way to prevent your program to ever call __init__ with an incorrect parameter type, and that is to use static type checking. Your program will error on the preparation steps, when you run the checker, like "mypy" - that is roughly the same moment in time some static languages would raise the error: when they are compiled in an explicit step prior to being run. Static type checking can add the safety you think you need - but it is a whole new world of boilerplate and bureaucracy to code in Python. A web search for "python static type checking" can list you some points were you can start to learn that - the second link I get seens rather interesting: https://realpython.com/python-type-checking/
Fourth:
If you opt for the if-based checking, you should check if the object you got is "enough of a list" for your purposes, not "type(a) != list". This will bar subclasses of lists. not isinstance(a, list) will accept list subclasses, but block several other object types that might just work. Depending on what you want, your code will work with any "sequence" type. In that case, you can import collections.abc.Sequence and check if your parameter is an instance of that instead - this will allow the users to your method to use any classes that have a length and can retrieve itens in order.
And, just repeating it again: there is absolutely no problem in making this check inside the method. It could be factored out, by creating a complicated decorator that could do type checking - actually there are Python packages that can use type annotations, just as they are used by static type checking tools, and do runtime checks. But this won't gain you any execution time. Static type checking will do it before running, but the resources gained by that are negligible, nonetheless.
And finally, no, this has nothing to do with a metaclass. It would be possible to use a metaclass to add decorators to all your methods, and have these decorators performe the runtime checking - but you might just use the decorator explicitly anyway.

Why method accepts class name and name 'object' as an argument?

Consider the following code, I expected it to generate error. But it worked. mydef1(self) should only be invoked with instance of MyClass1 as an argument, but it is accepting MyClass1 as well as rather vague object as instance.
Can someone explain why mydef is accepting class name(MyClass1) and object as argument?
class MyClass1:
def mydef1(self):
return "Hello"
print(MyClass1.mydef1(MyClass1))
print(MyClass1.mydef1(object))
Output
Hello
Hello

There are several parts to the answer to your question because your question signals confusion about a few different aspects of Python.
First, type names are not special in Python. They're just another variable. You can even do something like object = 5 and cause all kinds of confusion.
Secondly, the self parameter is just that, a parameter. When you say MyClass1.mydef1 you're asking for the value of the variable with the name mydef1 inside the variable (that's a module, or class, or something else that defines the __getattr__ method) MyClass1. You get back a function that takes one argument.
If you had done this:
aVar = MyClass1()
aVar.mydef1(object)
it would've failed. When Python gets a method from an instance of a class, the instance's __getattr__ method has special magic to bind the first argument to the same object the method was retrieved from. It then returns the bound method, which now takes one less argument.
I would recommend fiddling around in the interpreter and type in your MyClass1 definition, then type in MyClass1.mydef1 and aVar = MyClass1(); aVar.mydef1 and observe the difference in the results.
If you come from a language like C++ or Java, this can all seem very confusing. But, it's actually a very regular and logical structure. Everything works the same way.
Also, as people have pointed out, names have no type associated with them. The type is associated with the object the name references. So any name can reference any kind of thing. This is also referred to as 'dynamic typing'. Python is dynamically typed in another way as well. You can actually mess around with the internal structure of something and change the type of an object as well. This is fairly deep magic, and I wouldn't suggest doing it until you know what you're doing. And even then you shouldn't do it as it will just confuse everybody else.

Python is dynamically typed, so it doesn't care what gets passed. It only cares that the single required parameter gets an argument as a value. Once inside the function, you never use self, so it doesn't matter what the argument was; you can't misuse what you don't use in the first place.
This question only arises because you are taking the uncommon action of running an instance method as an unbound method with an explicit argument, rather than invoking it on an instance of the class and letting the Python runtime system take care of passing that instance as the first argument to mydef1: MyClass().mydef1() == MyClass.mydef1(MyClass()).

Python is not a statically-typed language, so you can pass to any function any objects of any data types as long as you pass in the right number of parameters, and the self argument in a class method is no different from arguments in any other function.

There is no problem with that whatsoever - self is an object like any other and may be used in any context where object of its type/behavior would be welcome.
Python - Is it okay to pass self to an external function

Trying to eliminate the types module in python code

Is saying:
if not callable(output.write):
raise ValueError("Output class must have a write() method")
The same as saying:
if type(output.write) != types.MethodType:
raise exceptions.ValueError("Output class must have a write() method")
I would rather not use the types module if I can avoid it.

No, they are not the same.
callable(output.write) just checks whether output.write is callable. Things that are callable include:
Bound method objects (whose type is types.MethodType).
Plain-old functions (whose type is types.FunctionType)
partial instances wrapping bound method objects (whose type is functools.partial)
Instances of you own custom callable class with a __call__ method that are designed to be indistinguishable from bound method objects (whose type is your class).
Instances of a subclass of the bound method type (whose type is that subclass).
…
type(output.write) == types.MethodType accepts only the first of these. Nothing else, not even subclasses of MethodType, will pass. (If you want to allow subclasses, use isinstance(output.write, types.MethodType).)
The former is almost certainly what you want. If I've monkeypatched an object to replace the write method with something that acts just like a write method when called, but isn't implemented as a bound method, why would your code want to reject my object?
As for your side question in the comments:
I do want to know if the exceptions.ValueError is necessary
No, it's not.
In Python 2.7, the builtin exceptions are also available in the exceptions module:
>>> ValueError is exceptions.ValueError
True
In Python 3, they were moved to builtins along with all the other builtins:
>>> ValueError is builtins.ValueError
True
But either way, the only reason you'd ever need to refer to its module is if you hid ValueError with a global of the same name in your own module.
One last thing:
As user2357112 points out in a comment, your solution doesn't really ensures anything useful.
The most common problem is almost certainly going to be output.write not existing at all. In which case you're going to get an AttributeError rather than the ValueError you wanted. (If this is acceptable, you don't need to check anything—just call the method and you'll get an AttributeError if it doesn't exist, and a TypeError if it does but isn't callable.) You could solve that by using getattr(output, 'write', None) instead of output.write, because None is not callable.
The next most common problem is probably going to be output.write existing, and being callable, but with the wrong signature. Which means you'll still get the same TypeError you were trying to avoid when you try to call it. You could solve that by, e.g., using the inspect module.
But if you really want to do all of this, you should probably be factoring it all out into an ABC. ABCs only have built-in support for checking that abstract methods exist as attributes; it doesn't check whether they're callable, or callable with the right signature. But it's not that hard to extend that support. (Or, maybe better, just grabbing one of the interface/protocol modules off PyPI.) And I think something like isinstance(output, StringWriteable) would declare your intention a lot better than a bunch of lines involving getattr or hasattr, type checking, and inspect grubbing.

Vector in python

I'm working on this project which deals with vectors in python. But I'm new to python and don't really know how to crack it. Here's the instruction:
"Add a constructor to the Vector class. The constructor should take a single argument. If this argument is either an int or a long or an instance of a class derived from one of these, then consider this argument to be the length of the Vector instance. In this case, construct a Vector of the specified length with each element is initialized to 0.0. If the length is negative, raise a ValueError with an appropriate message. If the argument is not considered to be the length, then if the argument is a sequence (such as a list), then initialize with vector with the length and values of the given sequence. If the argument is not used as the length of the vector and if it is not a sequence, then raise a TypeError with an appropriate message.
Next implement the __repr__ method to return a string of python code which could be used to initialize the Vector. This string of code should consist of the name of the class followed by an open parenthesis followed by the contents of the vector represented as a list followed by a close parenthesis."
I'm not sure how to do the class type checking, as well as how to initialize the vector based on the given object. Could someone please help me with this? Thanks!

Your instructor seems not to "speak Python as a native language". ;) The entire concept for the class is pretty silly; real Python programmers just use the built-in sequence types directly. But then, this sort of thing is normal for academic exercises, sadly...
Add a constructor to the Vector class.
In Python, the common "this is how you create a new object and say what it's an instance of" stuff is handled internally by default, and then the baby object is passed to the class' initialization method to make it into a "proper" instance, by setting the attributes that new instances of the class should have. We call that method __init__.
The constructor should take a single argument. If this argument is either an int or a long or an instance of a class derived from one of these
This is tested by using the builtin function isinstance. You can look it up for yourself in the documentation (or try help(isinstance) at the REPL).
In this case, construct a Vector of the specified length with each element is initialized to 0.0.
In our __init__, we generally just assign the starting values for attributes. The first parameter to __init__ is the new object we're initializing, which we usually call "self" so that people understand what we're doing. The rest of the arguments are whatever was passed when the caller requested an instance. In our case, we're always expecting exactly one argument. It might have different types and different meanings, so we should give it a generic name.
When we detect that the generic argument is an integer type with isinstance, we "construct" the vector by setting the appropriate data. We just assign to some attribute of self (call it whatever makes sense), and the value will be... well, what are you going to use to represent the vector's data internally? Hopefully you've already thought about this :)
If the length is negative, raise a ValueError with an appropriate message.
Oh, good point... we should check that before we try to construct our storage. Some of the obvious ways to do it would basically treat a negative number the same as zero. Other ways might raise an exception that we don't get to control.
If the argument is not considered to be the length, then if the argument is a sequence (such as a list), then initialize with vector with the length and values of the given sequence.
"Sequence" is a much fuzzier concept; lists and tuples and what-not don't have a "sequence" base class, so we can't easily check this with isinstance. (After all, someone could easily invent a new kind of sequence that we didn't think of). The easiest way to check if something is a sequence is to try to create an iterator for it, with the built-in iter function. This will already raise a fairly meaningful TypeError if the thing isn't iterable (try it!), so that makes the error handling easy - we just let it do its thing.
Assuming we got an iterator, we can easily create our storage: most sequence types (and I assume you have one of them in mind already, and that one is certainly included) will accept an iterator for their __init__ method and do the obvious thing of copying the sequence data.
Next implement the __repr__ method to return a string of python code which could be used to initialize the Vector. This string of code should consist of the name of the class followed by an open parenthesis followed by the contents of the vector represented as a list followed by a close parenthesis."
Hopefully this is self-explanatory. Hint: you should be able to simplify this by making use of the storage attribute's own __repr__. Also consider using string formatting to put the string together.

Everything you need to get started is here:
http://docs.python.org/library/functions.html

There are many examples of how to check types in Python on StackOverflow (see my comment for the top-rated one).
To initialize a class, use the __init__ method:
class Vector(object):
def __init__(self, sequence):
self._internal_list = list(sequence)
Now you can call:
my_vector = Vector([1, 2, 3])
And inside other functions in Vector, you can refer to self._internal_list. I put _ before the variable name to indicate that it shouldn't be changed from outside the class.
The documentation for the list function may be useful for you.

You can do the type checking with isinstance.
The initialization of a class with done with an __init__ method.
Good luck with your assignment :-)

This may or may not be appropriate depending on the homework, but in Python programming it's not very usual to explicitly check the type of an argument and change the behaviour based on that. It's more normal to just try to use the features you expect it to have (possibly catching exceptions if necessary to fall back to other options).
In this particular example, a normal Python programmer implementing a Vector that needed to work this way would try using the argument as if it were an integer/long (hint: what happens if you multiply a list by an integer?) to initialize the Vector and if that throws an exception try using it as if it were a sequence, and if that failed as well then you can throw a TypeError.
The reason for doing this is that it leaves your class open to working with other objects types people come up with later that aren't integers or sequences but work like them. In particular it's very difficult to comprehensively check whether something is a "sequence", because user-defined classes that can be used as sequences don't have to be instances of any common type you can check. The Vector class itself is quite a good candidate for using to initialize a Vector, for example!
But I'm not sure if this is the answer your teacher is expecting. If you haven't learned about exception handling yet, then you're almost certainly not meant to use this approach so please ignore my post. Good luck with your learning!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.