Why can’t fields with default values come first? - python

I got the following error using dataclasses. Does anyone knoes why this isn’t valid?
from dataclasses import dataclass
#dataclass(frozen= False)
class Contact:
contact_id: int= 0
contact_firstname: str
contact_lastname: str
contact_email: str= None
Error: Fields without default values cannot appear after fields without default values

Fields in a dataclass are translated, in the same order, to arguments in the constructor function. So, if it were allowed, then this
#dataclass(frozen= False)
class Contact:
contact_id: int= 0
contact_firstname: str
contact_lastname: str
contact_email: str= None
would get translated to (omitting the __eq__ and all the other dataclass convenience functions)
class Contact:
def __init__(contact_id=0, contact_firstname, contact_lastname, contact_email=None):
self.contact_id = contact_id
self.contact_firstname = contact_firstname
self.contact_lastname = contact_lastname
self.contact_email = contact_email
And, by the usual rules of Python functions, default arguments have to come at the end, since (positionally) there's no way to supply later arguments without supplying earlier ones. Now, in a language like Python, theoretically, you can used named arguments to make the above syntax useful, but the Python developers decided to keep things simple (Simple is better than complex, after all) and follow in the C++ convention of requiring them at the end.
Likewise, they could have reordered the dataclass fields in the constructor so that the default ones end up at the end, but again, they decided to keep it as simple and predictable as possible. And, personally, I think they made the right call. Ruby, for instance, allows default arguments in the middle of a function argument list (not just at the end), and every Ruby style guide I've seen says to avoid that feature like the plague.

Related

How can i type hint the init params are the same as fields in a dataclass?

Let us say I have a custom use case, and I need to dynamically create or define the __init__ method for a dataclass.
For exampel, say I will need to decorate it like #dataclass(init=False) and then modify __init__() method to taking keyword arguments, like **kwargs. However, in the kwargs object, I only check for presence of known dataclass fields, and set these attributes accordingly (example below)
I would like to type hint to my IDE (PyCharm) that the modified __init__ only accepts listed dataclass fields as parameters or keyword arguments. I am unsure if there is a way to approach this, using typing library or otherwise. I know that PY3.11 has dataclass transforms planned, which may or may not do what I am looking for (my gut feeling is no).
Here is a sample code I was playing around with, which is a basic case which illustrates problem I am having:
from dataclasses import dataclass
# get value from input source (can be a file or anything else)
def get_value_from_src(_name: str, tp: type):
return tp() # dummy value
#dataclass
class MyClass:
foo: str
apple: int
def __init__(self, **kwargs):
for name, tp in self.__annotations__.items():
if name in kwargs:
value = kwargs[name]
else:
# here is where I would normally have the logic
# to read the value from another input source
value = get_value_from_src(name, tp)
if value is None:
raise ValueError
setattr(self, name, value)
c = MyClass(apple=None)
print(c)
c = MyClass(foo='bar', # here, I would like to auto-complete the name
# when I start typing `apple`
)
print(c)
If we assume that number or names of the fields are not fixed, I am curious if there could be a generic approach which would basically say to type checkers, "the __init__ of this class accepts only (optional) keyword arguments that match up on the fields defined in the dataclass itself".
Addendums, based on notes in comments below:
Passing #dataclass(kw_only=True) won't work because imagine I am writing this for a library, and need to support Python 3.7+. Also, kw_only has no effect when a custom __init__() is implemented, as in this case.
The above is just a stub __init__ method. it could have more complex logic, such as setting attributes based on a file source for example. basically the above is just a sample implementation of a larger use case.
I can't update each field to foo: Optional[str] = None because that part would be implemented in user code, which I would not have any control over. Also, annotating it in this way doesn't make sense when you know a custom __init__() method will be generated for you - meaning not by dataclasses. Lastly, setting a default for each field just so that the class can
be instantiated without arguments, like MyClass(), don't seem like
the best idea to me.
It would not work to let dataclasses auto-generate an __init__, and instead implement a __post_init__(). This would not work because I need to be able to construct the class without arguments, like MyClass(), as the field values will be set from another input source (think local file or elsewhere); this means that all fields would be required, so annotating them as Optional would be fallacious in this case. I still need to be able to support user to enter optional keyword arguments, but these **kwargs will always match up with dataclass field names, and so I desire some way for auto-completion to work with my IDE (PyCharm)
Hope this post clarifies the expectations and desired result. If there are any questions or anything that is a bit vague, please let me know.
What you are describing is impossible in theory and unlikely to be viable in practice.
TL;DR
Type checkers don't run your code, they just read it. A dynamic type annotation is a contradiction in terms.
Theory
As I am sure you know, the term static type checker is not coincidental. A static type checker is not executing the code your write. It just parses it and infers types according to it's own internal logic by applying certain rules to a graph that it derives from your code.
This is important because unlike some other languages, Python is dynamically typed, which as you know means that the type of a "thing" (variable) can completely change at any point. In general, there is theoretically no way of knowing the type of all variables in your code, without actually stepping through the entire algorithm, which is to say running the code.
As a silly but illustrative example, you could decide to put the name of a type into a text file to be read at runtime and then used to annotate some variable in your code. Could you do that with valid Python code and typing? Sure. But I think it is beyond clear, that static type checkers will never know the type of that variable.
Why your proposition won't work
Abstracting away all the dataclass stuff and the possible logic inside your __init__ method, what you are asking boils down to the following.
"I want to define a method (__init__), but the types of its parameters will only be known at runtime."
Why am I claiming that? I mean, you do annotate the types of the class' attributes, right? So there you have the types!
Sure, but these have -- in general -- nothing whatsoever to do with the arguments you could pass to the __init__ method, as you yourself point out. You want the __init__ method to accept arbitrary keyword-arguments. Yet you also want a static type checker to infer which types are allowed/expected there.
To connect the two (attribute types and method parameter types), you could of course write some kind of logic. You could even implement it in a way that enforces adherence to those types. That logic could read the type annotations of the class attributes, match up the **kwargs and raise TypeError if one of them doesn't match up. This is entirely possible and you almost implemented that already in your example code. But this only works at runtime!
Again, a static type checker has no way to infer that, especially since your desired class is supposed to just be a base class and any descendant can introduce its own attributes/types at any point.
But dataclasses work, don't they?
You could argue that this dynamic way of annotating the __init__ method works with dataclasses. So why are they so different? Why are they correctly inferred, but your proposed code can't?
The answer is, they aren't.
Even dataclasses don't have any magical way of telling a static type checker which parameter types the __init__ method is to expect, even though they do annotate them, when they dynamically construct the method in _init_fn.
The only reason mypy correctly infers those types, is because they implemented a separate plugin just for dataclasses. Meaning it works because they read through PEP 557 and hand-crafted a plugin for mypy that specifically facilitates type inference based on the rules described there.
You can see the magic happening in the DataclassTransformer.transform method. You cannot generalize this behavior to arbitrary code, which is why they had to write a whole plugin just for this.
I am not familiar enough with how PyCharm does its type checking, but I strongly suspect they used something similar.
So you could argue that dataclasses are "cheating" with regards to static type checking. Though I am certainly not complaining.
Pragmatic solution
Even something as "high-profile" as Pydantic, which I personally love and use extensively, requires its own mypy plugin to realize the __init__ type inference properly (see here). For PyCharm they have their own separate Pydantic plugin, without which the internal type checker cannot provide those nice auto-suggestions for initialization etc.
That approach would be your best bet, if you really want to take this further. Just be aware that this will be (in the best sense of the word) a hack to allow specifc type checkers to catch "errors" that they otherwise would have no way of catching.
The reason I argue that it is unlikely to be viable is because it will essentially blow up the amount of work for your project to also cover the specific hacks for those type checkers that you want to satisfy. If you are committed enough and have the resources, go for it.
Conclusion
I am not trying to discourage you. But it is important to know the limitations enforced by the environment. It's either dynamic types and hacky imperfect type checking (still love mypy), or static types and no "kwargs can be anything" behavior.
Hope this makes sense. Please let me know, if I made any errors. This is just based on my understanding of typing in Python.
For
It would not work to let dataclasses auto-generate an __init__, and instead implement a __post_init__(). This would not work because I need to be able to construct the class without arguments, like MyClass(), as the field values will be set from another input source (think local file or elsewhere); this means that all fields would be required, so annotating them as Optional would be fallacious in this case. I still need to be able to support user to enter optional keyword arguments, but these **kwargs will always match up with dataclass field names, and so I desire some way for auto-completion to work with my IDE (PyCharm)
dataclasses.field + default_factory can be a solution.
But, it seems that dataclass field declarations are implemented in user code:
I can't update each field to foo: Optional[str] = None because that part would be implemented in user code, which I would not have any control over. Also, annotating it in this way doesn't make sense when you know a custom __init__() method will be generated for you - meaning not by dataclasses. Lastly, setting a default for each field just so that the class can be instantiated without arguments, like MyClass(), don't seem like the best idea to me.
If your IDE supports ParamSpec, there is a workaround: not correct(cannot pass static type checker), but has auto-completion:
from typing import Callable, Iterable, TypeVar, ParamSpec
from dataclasses import dataclass
T = TypeVar('T')
P = ParamSpec('P')
# user defined dataclass
#dataclass
class MyClass:
foo: str
apple: int
def wrap(factory: Callable[P, T], annotations: Iterable[tuple[str, type]]) -> Callable[P, T]:
def default_factory(**kwargs):
for name, type_ in annotations:
kwargs.setdefault(name, type_())
return factory(**kwargs)
return default_factory
WrappedMyClass = wrap(MyClass, MyClass.__annotations__.items())
WrappedMyClass() # Okay

Python type hinting for upper vs lower-cased strings?

In Python, if I am creating a list of strings, I would type hint like this:
from typing import List
list_of_strings : List[str] = []
list_of_strings.append('some string')
list_of_strings.append('some other string')
Is there some way to type hint the expected case of the strings? That way, when I write a comparison operator to search for a specific string, for example, I don't accidentally search for the mixed or upper-cased version of a string I know will be lower-cased because all strings in list_of_strings should be lower-cased. I realize I can just add comments and refer back to the list's declaration, but I was wondering if there was a more integrated way to do it.
An alternate way to solve this problem would be to make a class which extends str and rejects any values which aren't in the proper case, and then type hint for that class. Is there any reason why this would be a bad idea aside from it being more of a pain to create than a simple string?
The reason I run into this problem, is that I create lists, dicts, and other structures to store data, then need to add to or search them for a particular key/value and not knowing the expected case creates problems where I add duplicate entries because a simple if 'example' in string_list doesn't find it. And doing if 'example'.upper() in string_list is easy to forget and not very pretty. Jumping back and forth between the declaration (if I wrote a comment there describing expected case) and where I'm coding distracts from my flow, it would be nice to have the information when I'm referencing that object later.
You can in Python 3.10 using typing.TypeGuard.
from typing import TypeGuard
class LowerStr(str):
'''a dummy subclass of str, not actually used at runtime'''
def is_lower_str(val: str) -> TypeGuard[LowerStr]:
return val.islower()
l: list[LowerStr] = []
def append(lst: list[LowerStr], v: str):
if not is_lower_str(v):
raise TypeError('oh no')
lst.append(v)
You could indeed enforce runtime safety using a subclass of str. The disadvantage would mostly be performance. You would want to take care to not add an unnecessary __dict__, by adding __slots__ = () to the class definition, from the top of my head.
Either way, string literals are not going to be validated automatically, so it will cause some overhead, either by calling the typeguard, passing them to the constructor of the subtype, or using cast(LowerStr, 'myliteral').
No, I've been searching on the net for this but it doesn't exist.
The Python's typing module doesn't provide any hint for lowercase or uppercase strings.
[...] type hint the expected case of the strings
Remember that a type is for example str, and in this case you should be talking about hint and not about type hint.
You can anyway create a custom class in this scope:
class UppercaseString(str): pass
The UppercaseString will inherit all the functionalities of the built-in class str (that's what happens when you specify : pass in class declaration).
You can anyway create an instance's method that checks if the string is really uppercase, and otherwise raises an error.

Is it possible to reference function parameters in Python's function annotation?

I'd like to be able to say
def f(param) -> type(param): return param
but I get the NameError: name 'param' is not defined. Key thing here is that the return type is a function of a function parameter. I have glanced through the https://www.python.org/dev/peps/pep-3107/, but I don't see any precise description of what comprises a valid annotation expression.
I would accept an answer which explains why exactly is this not possible at the moment, i.e., does it not fit into current annotation paradigm or is there a technical problem with this?
There are a few issues with the type(param) method.
First off, as Oleh mentioned in his answer, all annotations must be valid at the time of the function's definition. In an example like yours, you could potentially have problems due to variable shadowing.
param = 10
def f(param) -> type(param):
return param
f('a')
Since the variable param is of type int, the function's annotation is essentially read as f(param: Any) -> int. So when you pass in the argument param with the value 'a', which means f will return a str, this makes it inconsistent with the annotation. Admittedly this example is contrived, but from a language design stand point, it is something to be careful.
Instead, as jonrsharpe mentioned, often the best way to reference the generic types of parameters (as jonrsharpe) mentioned is with type variables.
This can be done using the typing.TypeVar class.
from typing import TypeVar
def f(param: T) -> T:
return param
This means that static checkers won't need to actually access the type of param, just check that at check-time that there is a way to consider both param and the return value of the same type. I say consider the same type because you will sometimes only assert that they both implement the same abstract base class/interface, like numbers.Real.
And then can use typevars in generic types
from typing import List, TypeVar
T = TypeVar('T')
def total(items: List[T]) -> List[T]:
return [f(item) for item in items]
Using type variables and generics can be better because it adds additional information and allows for a little bit more flexibility (as explained in the example with numbers.Real). For instance, the ability to use List[T] is really important. In your case of using type(param), it would only return list, not list of like List[T] would. So using type(param) would actually lose information, not add it.
Therefore, it is a better idea to stick to using type variables and generic types instead.
TL;DR:
Due to variable shadowing, type(param) could lead to inconsistent annotations.
Since sometimes when thinking of the types of your system you are thinking in terms of interfaces (abstract base classes in Python) instead of concrete types, it can be better to rely on ABC's and type variables
Using type(param) could lose information that would be provided by generics.
Let's take a glance at PEP-484 - Type Hints # Acceptable type hints.
Annotations must be valid expressions that evaluate without raising exceptions at the time the function is defined (but see below for forward references).
Annotations should be kept simple or static analysis tools may not be able to interpret the values. For example, dynamically computed types are unlikely to be understood. (This is an intentionally somewhat vague requirement, specific inclusions and exclusions may be added to future versions of this PEP as warranted by the discussion.)
I'd say that your approach is quite interesting and may be useful for static analysis. But if we accept PEPs as a source of an explanation for the current annotation paradigm, the highlighted text explains why return type can't be defined dynamically at the time the function is called.

What's a good use case for enums in python?

I've been writing Python 2 code for ~ 3 years now, and although I've known about enums for a long time, we've started using them in our project (backported - pypi package enum34 ).
I'd like to understand when to use them.
One place where we started using them was to map some postgres database level enums to python enums. Therefore we have this enum class
class Status(enum.Enum):
active = 'active'
inactive = 'inactive'
But then when using these, we've ended up using them like this:
if value == Status.active.value:
...
And so using enums in this case is less helpful than just using a more simple class, like this
class Status(object):
active = 'active'
inactive = 'inactive'
Because we could use this more easily, like value == Status.active.
So far the only place I found this useful - though not as useful as I'd like - is in docstrings. Instead of explicitly saying that allowed values are 'active' and 'inactive', I can just declare that my formal parameter takes expects a member of the Status enum (more helpful when more statuses exist)
So I don't really know what would be their exact use case - I don't know how they're better than string constants.
In short: when to use enums?
A couple points:
Your Enum class can use str as a mixin.
Some database layers allow converting the stored data into Python data types.
For the first point:
class Status(str, Enum):
active = 'active'
inactive = 'inactive'
...
if value == Status.active:
...
For the second point:
I have little experience here, but I believe SQLAlchemy will do this.
I use Enum most often in GUI design. Imagine you have a radio button such as
This represents a list of fixed choices, so under the hood, it's well represented by an Enum. That is, if a user picks button 2, then Enum.Choice.Choice2 could be returned by some callback. This is better than returning an int 2 or string "choice 2", as there's nothing to validate these later. In other words, if you changed "choice 2" to "user choice 2", you could potentially break downstream components expecting the original symbol.
Think of Enum as a convenient shortcut to presenting a static set of choices, rather than creating boilerplate object classes.
I've found Enum's in Java (and other statically typed languages I presume) to be a bit more useful, as you can declare them in a method signature. For example, a method may have the signature,
public void (Enum.Choice, mychoice)
Instead of
public void (String, mychoice)
In the second case, users may have to know ahead of time that mychoice should be "foo" or "bar", but what if they pass in "baz", which is invalid. Using an Enum will ensure invalid input can't be passed to the method, as Enum.Choice would only have fields foo and bar. You couldn't create a baz Choice if you tried.
Sorry, I strayed into Java. Is this too off topic to be helpful?
The issue you see is because your ORM isn't mapping database values to your Enum object. If it did, you wouldn't have to deal with .value.
An alternative would be something like:
if Status(value) is Status.active: since the constructor creates an Enum from the given argument.

Defining my own None-like Python constant

I have a situation in which I'm asked to read collections of database update instructions from a variety of sources. All sources will contain a primary key value so that the code that applies the updates to the database can find the correct record. The files will vary, however, in what additional columns are reported.
When I read and create my update instructions I must differentiate between an update in which a column (for instance, MiddleName) was provided but was empty (meaning no middle name and the field should be updated to NULL) and an update in which the MiddleName field was not included (meaning the update should not touch the middle name column at all).
The former situation (column provided but no value) seems appropriately represented by the None value. For the second situation, however, I'd like to have a NotInFile "value" that I can use similar to the way I use None.
Is the correct way to implement this as follows?
NotInFile = 1
class PersonUpdate(object):
def __init__(self):
self.PersonID = None
self.FirstName = NotInFile
self.MiddleName = NotInFile
and then in another module
import othermod
upd = othermod.PersonUpdate()
if upd.MiddleName is othermod.NotInFile:
print 'Hey, middle name was not supplied'
I don't see anything particularly wrong with your implementation. however, 1 isn't necessarily the best sentinel value as it is a cached constant in Cpython. (e.g. -1+2 is 1 will return True). In these cases, I might consider using a sentinel object instance:
NotInFile = object()
python also provides a few other named constants which you could use if it seems appropriate: NotImplemented and Ellipsis come to mind immediately. (Note that I'm not recommending you use these constants ... I'm just providing more options).
No, using the integer one is a bad idea. It might work out in this case if MiddleName is always a string or None, but in general the implementation is free to intern integers, strings, tuples and other immutable values as it pleases. CPython does it for small integers and constants of the aforementioned types. PyPy defines is by value for integers and a few other types. So if MiddleName is 1, you're bound to see your code consider it not supplied.
Use an object instead, each new object has a distinct identity:
NotInFile = object()
Alternatively, for better debugging output, define your own class:
class NotInFileType(object):
# __slots__ = () if you want to save a few bytes
def __repr__(self):
return 'NotInFile'
NotInFile = NotInFileType()
del NotInFileType # look ma, no singleton
If you're paranoid, you could make it a proper singleton (ugly). If you need several such instances, you could rename the class into Sentiel or something, make the representation an instance variable and use multiple instances.
If you want type-checking, this idiom is now blessed by PEP 484 and supported by mypy:
from enum import Enum
class NotInFileType(Enum):
_token = 0
NotInFile = NotInFileType._token
If you are using mypy 0.740 or earlier, you need to workaround this bug in mypy by using typing.Final:
from typing import Final
NotInFile: Final = NotInFileType._token
If you are using Python 3.7 or earlier, you can use typing_extensions.Final from pip package typing_extensions instead of typing.Final

Categories

Resources