What's a good use case for enums in python? - python

I've been writing Python 2 code for ~ 3 years now, and although I've known about enums for a long time, we've started using them in our project (backported - pypi package enum34 ).
I'd like to understand when to use them.
One place where we started using them was to map some postgres database level enums to python enums. Therefore we have this enum class
class Status(enum.Enum):
active = 'active'
inactive = 'inactive'
But then when using these, we've ended up using them like this:
if value == Status.active.value:
...
And so using enums in this case is less helpful than just using a more simple class, like this
class Status(object):
active = 'active'
inactive = 'inactive'
Because we could use this more easily, like value == Status.active.
So far the only place I found this useful - though not as useful as I'd like - is in docstrings. Instead of explicitly saying that allowed values are 'active' and 'inactive', I can just declare that my formal parameter takes expects a member of the Status enum (more helpful when more statuses exist)
So I don't really know what would be their exact use case - I don't know how they're better than string constants.
In short: when to use enums?

A couple points:
Your Enum class can use str as a mixin.
Some database layers allow converting the stored data into Python data types.
For the first point:
class Status(str, Enum):
active = 'active'
inactive = 'inactive'
...
if value == Status.active:
...
For the second point:
I have little experience here, but I believe SQLAlchemy will do this.

I use Enum most often in GUI design. Imagine you have a radio button such as
This represents a list of fixed choices, so under the hood, it's well represented by an Enum. That is, if a user picks button 2, then Enum.Choice.Choice2 could be returned by some callback. This is better than returning an int 2 or string "choice 2", as there's nothing to validate these later. In other words, if you changed "choice 2" to "user choice 2", you could potentially break downstream components expecting the original symbol.
Think of Enum as a convenient shortcut to presenting a static set of choices, rather than creating boilerplate object classes.
I've found Enum's in Java (and other statically typed languages I presume) to be a bit more useful, as you can declare them in a method signature. For example, a method may have the signature,
public void (Enum.Choice, mychoice)
Instead of
public void (String, mychoice)
In the second case, users may have to know ahead of time that mychoice should be "foo" or "bar", but what if they pass in "baz", which is invalid. Using an Enum will ensure invalid input can't be passed to the method, as Enum.Choice would only have fields foo and bar. You couldn't create a baz Choice if you tried.
Sorry, I strayed into Java. Is this too off topic to be helpful?

The issue you see is because your ORM isn't mapping database values to your Enum object. If it did, you wouldn't have to deal with .value.
An alternative would be something like:
if Status(value) is Status.active: since the constructor creates an Enum from the given argument.

Related

How can i type hint the init params are the same as fields in a dataclass?

Let us say I have a custom use case, and I need to dynamically create or define the __init__ method for a dataclass.
For exampel, say I will need to decorate it like #dataclass(init=False) and then modify __init__() method to taking keyword arguments, like **kwargs. However, in the kwargs object, I only check for presence of known dataclass fields, and set these attributes accordingly (example below)
I would like to type hint to my IDE (PyCharm) that the modified __init__ only accepts listed dataclass fields as parameters or keyword arguments. I am unsure if there is a way to approach this, using typing library or otherwise. I know that PY3.11 has dataclass transforms planned, which may or may not do what I am looking for (my gut feeling is no).
Here is a sample code I was playing around with, which is a basic case which illustrates problem I am having:
from dataclasses import dataclass
# get value from input source (can be a file or anything else)
def get_value_from_src(_name: str, tp: type):
return tp() # dummy value
#dataclass
class MyClass:
foo: str
apple: int
def __init__(self, **kwargs):
for name, tp in self.__annotations__.items():
if name in kwargs:
value = kwargs[name]
else:
# here is where I would normally have the logic
# to read the value from another input source
value = get_value_from_src(name, tp)
if value is None:
raise ValueError
setattr(self, name, value)
c = MyClass(apple=None)
print(c)
c = MyClass(foo='bar', # here, I would like to auto-complete the name
# when I start typing `apple`
)
print(c)
If we assume that number or names of the fields are not fixed, I am curious if there could be a generic approach which would basically say to type checkers, "the __init__ of this class accepts only (optional) keyword arguments that match up on the fields defined in the dataclass itself".
Addendums, based on notes in comments below:
Passing #dataclass(kw_only=True) won't work because imagine I am writing this for a library, and need to support Python 3.7+. Also, kw_only has no effect when a custom __init__() is implemented, as in this case.
The above is just a stub __init__ method. it could have more complex logic, such as setting attributes based on a file source for example. basically the above is just a sample implementation of a larger use case.
I can't update each field to foo: Optional[str] = None because that part would be implemented in user code, which I would not have any control over. Also, annotating it in this way doesn't make sense when you know a custom __init__() method will be generated for you - meaning not by dataclasses. Lastly, setting a default for each field just so that the class can
be instantiated without arguments, like MyClass(), don't seem like
the best idea to me.
It would not work to let dataclasses auto-generate an __init__, and instead implement a __post_init__(). This would not work because I need to be able to construct the class without arguments, like MyClass(), as the field values will be set from another input source (think local file or elsewhere); this means that all fields would be required, so annotating them as Optional would be fallacious in this case. I still need to be able to support user to enter optional keyword arguments, but these **kwargs will always match up with dataclass field names, and so I desire some way for auto-completion to work with my IDE (PyCharm)
Hope this post clarifies the expectations and desired result. If there are any questions or anything that is a bit vague, please let me know.
What you are describing is impossible in theory and unlikely to be viable in practice.
TL;DR
Type checkers don't run your code, they just read it. A dynamic type annotation is a contradiction in terms.
Theory
As I am sure you know, the term static type checker is not coincidental. A static type checker is not executing the code your write. It just parses it and infers types according to it's own internal logic by applying certain rules to a graph that it derives from your code.
This is important because unlike some other languages, Python is dynamically typed, which as you know means that the type of a "thing" (variable) can completely change at any point. In general, there is theoretically no way of knowing the type of all variables in your code, without actually stepping through the entire algorithm, which is to say running the code.
As a silly but illustrative example, you could decide to put the name of a type into a text file to be read at runtime and then used to annotate some variable in your code. Could you do that with valid Python code and typing? Sure. But I think it is beyond clear, that static type checkers will never know the type of that variable.
Why your proposition won't work
Abstracting away all the dataclass stuff and the possible logic inside your __init__ method, what you are asking boils down to the following.
"I want to define a method (__init__), but the types of its parameters will only be known at runtime."
Why am I claiming that? I mean, you do annotate the types of the class' attributes, right? So there you have the types!
Sure, but these have -- in general -- nothing whatsoever to do with the arguments you could pass to the __init__ method, as you yourself point out. You want the __init__ method to accept arbitrary keyword-arguments. Yet you also want a static type checker to infer which types are allowed/expected there.
To connect the two (attribute types and method parameter types), you could of course write some kind of logic. You could even implement it in a way that enforces adherence to those types. That logic could read the type annotations of the class attributes, match up the **kwargs and raise TypeError if one of them doesn't match up. This is entirely possible and you almost implemented that already in your example code. But this only works at runtime!
Again, a static type checker has no way to infer that, especially since your desired class is supposed to just be a base class and any descendant can introduce its own attributes/types at any point.
But dataclasses work, don't they?
You could argue that this dynamic way of annotating the __init__ method works with dataclasses. So why are they so different? Why are they correctly inferred, but your proposed code can't?
The answer is, they aren't.
Even dataclasses don't have any magical way of telling a static type checker which parameter types the __init__ method is to expect, even though they do annotate them, when they dynamically construct the method in _init_fn.
The only reason mypy correctly infers those types, is because they implemented a separate plugin just for dataclasses. Meaning it works because they read through PEP 557 and hand-crafted a plugin for mypy that specifically facilitates type inference based on the rules described there.
You can see the magic happening in the DataclassTransformer.transform method. You cannot generalize this behavior to arbitrary code, which is why they had to write a whole plugin just for this.
I am not familiar enough with how PyCharm does its type checking, but I strongly suspect they used something similar.
So you could argue that dataclasses are "cheating" with regards to static type checking. Though I am certainly not complaining.
Pragmatic solution
Even something as "high-profile" as Pydantic, which I personally love and use extensively, requires its own mypy plugin to realize the __init__ type inference properly (see here). For PyCharm they have their own separate Pydantic plugin, without which the internal type checker cannot provide those nice auto-suggestions for initialization etc.
That approach would be your best bet, if you really want to take this further. Just be aware that this will be (in the best sense of the word) a hack to allow specifc type checkers to catch "errors" that they otherwise would have no way of catching.
The reason I argue that it is unlikely to be viable is because it will essentially blow up the amount of work for your project to also cover the specific hacks for those type checkers that you want to satisfy. If you are committed enough and have the resources, go for it.
Conclusion
I am not trying to discourage you. But it is important to know the limitations enforced by the environment. It's either dynamic types and hacky imperfect type checking (still love mypy), or static types and no "kwargs can be anything" behavior.
Hope this makes sense. Please let me know, if I made any errors. This is just based on my understanding of typing in Python.
For
It would not work to let dataclasses auto-generate an __init__, and instead implement a __post_init__(). This would not work because I need to be able to construct the class without arguments, like MyClass(), as the field values will be set from another input source (think local file or elsewhere); this means that all fields would be required, so annotating them as Optional would be fallacious in this case. I still need to be able to support user to enter optional keyword arguments, but these **kwargs will always match up with dataclass field names, and so I desire some way for auto-completion to work with my IDE (PyCharm)
dataclasses.field + default_factory can be a solution.
But, it seems that dataclass field declarations are implemented in user code:
I can't update each field to foo: Optional[str] = None because that part would be implemented in user code, which I would not have any control over. Also, annotating it in this way doesn't make sense when you know a custom __init__() method will be generated for you - meaning not by dataclasses. Lastly, setting a default for each field just so that the class can be instantiated without arguments, like MyClass(), don't seem like the best idea to me.
If your IDE supports ParamSpec, there is a workaround: not correct(cannot pass static type checker), but has auto-completion:
from typing import Callable, Iterable, TypeVar, ParamSpec
from dataclasses import dataclass
T = TypeVar('T')
P = ParamSpec('P')
# user defined dataclass
#dataclass
class MyClass:
foo: str
apple: int
def wrap(factory: Callable[P, T], annotations: Iterable[tuple[str, type]]) -> Callable[P, T]:
def default_factory(**kwargs):
for name, type_ in annotations:
kwargs.setdefault(name, type_())
return factory(**kwargs)
return default_factory
WrappedMyClass = wrap(MyClass, MyClass.__annotations__.items())
WrappedMyClass() # Okay

Why can’t fields with default values come first?

I got the following error using dataclasses. Does anyone knoes why this isn’t valid?
from dataclasses import dataclass
#dataclass(frozen= False)
class Contact:
contact_id: int= 0
contact_firstname: str
contact_lastname: str
contact_email: str= None
Error: Fields without default values cannot appear after fields without default values
Fields in a dataclass are translated, in the same order, to arguments in the constructor function. So, if it were allowed, then this
#dataclass(frozen= False)
class Contact:
contact_id: int= 0
contact_firstname: str
contact_lastname: str
contact_email: str= None
would get translated to (omitting the __eq__ and all the other dataclass convenience functions)
class Contact:
def __init__(contact_id=0, contact_firstname, contact_lastname, contact_email=None):
self.contact_id = contact_id
self.contact_firstname = contact_firstname
self.contact_lastname = contact_lastname
self.contact_email = contact_email
And, by the usual rules of Python functions, default arguments have to come at the end, since (positionally) there's no way to supply later arguments without supplying earlier ones. Now, in a language like Python, theoretically, you can used named arguments to make the above syntax useful, but the Python developers decided to keep things simple (Simple is better than complex, after all) and follow in the C++ convention of requiring them at the end.
Likewise, they could have reordered the dataclass fields in the constructor so that the default ones end up at the end, but again, they decided to keep it as simple and predictable as possible. And, personally, I think they made the right call. Ruby, for instance, allows default arguments in the middle of a function argument list (not just at the end), and every Ruby style guide I've seen says to avoid that feature like the plague.

How to denote belonging of class to particular group: base class or trait attribute?

In statically typed languages to solve the problem below I would use interface (or abstract class etc.). But I wonder if in python there is more "pythonic" way of doing it.
So, consider the situation:
class MyClass(object):
# ...
def my_function(self, value_or_value_provider):
if is_value_provider(value_or_value_provider):
self._value_provider = value_or_value_provider
else:
self._value_provider = StandardValueProvider(value_or_valie_provider)
# `StandardValueProvider` just always returns the same value.
Above "value provider" is custom class, which has get_value() method. Of course, the idea is that it can be implemented by the user.
Now, the question is: what is the best way to implement is_value_provider()? I.e., what is the best way to distinguish between "single value" and "value provider"?
The first idea, which came to my mind is to use inheritance: Introduce base class BaseValueProvider with empty implementation and tell in documentation, that custom "value providers" must inherit from it. Then in is_value_provider() function just check isinstance(objectToCheck, BaseValueProvider). What I don't like about this solution is that inheritance seems to be somehow redundant in this case (specifically in case of python), because we cannot even force one, who derive to implement get_value() method. Besides, for someone who wants to implement custom "value provider" this solution implies need to have a dependency on the module, which exposes BaseValueProvider.
The other solution would be to use "trait attribute". I.e. instead of checking base class, check existence of particular attribute with hasattr() function. We can check either existence of get_value() method itself. Or, if we afraid, that the name of the method is too common, we could check for dedicated trait attribute, like is_my_library_value_provider. Then in documentation tell, that any custom "value provider" must have not only get_value() method, but also is_my_library_value_provider. This second solution seems to be better, as it does not abuse inheritance and allows to implement custom "value providers" without being dependent on some additional library, which provides base class.
Could someone comment on which solution is preferable (or if there are other better ones), and why?
EDIT: Change the example slightly to reflect the fact, that value-provider is going to be stored and used later (probably, multiple times).
I highly suggest using hasattr().
Your code will be highly readable and via ducktyping you can later on make it work with other types you have in mind.
Regarding not getting confused with other objects having the get_value() function, Python idioms assume the coder is a responsible person and won't try to destoy the system he is implementing the code with, therefore a single hasattr(obj, "get_value") is enough. If the class has .get_value() it can be assumed as a value provider and not a value (else, the value is the value itself. .get_value() on a value, returning self is rather useless).

Defining my own None-like Python constant

I have a situation in which I'm asked to read collections of database update instructions from a variety of sources. All sources will contain a primary key value so that the code that applies the updates to the database can find the correct record. The files will vary, however, in what additional columns are reported.
When I read and create my update instructions I must differentiate between an update in which a column (for instance, MiddleName) was provided but was empty (meaning no middle name and the field should be updated to NULL) and an update in which the MiddleName field was not included (meaning the update should not touch the middle name column at all).
The former situation (column provided but no value) seems appropriately represented by the None value. For the second situation, however, I'd like to have a NotInFile "value" that I can use similar to the way I use None.
Is the correct way to implement this as follows?
NotInFile = 1
class PersonUpdate(object):
def __init__(self):
self.PersonID = None
self.FirstName = NotInFile
self.MiddleName = NotInFile
and then in another module
import othermod
upd = othermod.PersonUpdate()
if upd.MiddleName is othermod.NotInFile:
print 'Hey, middle name was not supplied'
I don't see anything particularly wrong with your implementation. however, 1 isn't necessarily the best sentinel value as it is a cached constant in Cpython. (e.g. -1+2 is 1 will return True). In these cases, I might consider using a sentinel object instance:
NotInFile = object()
python also provides a few other named constants which you could use if it seems appropriate: NotImplemented and Ellipsis come to mind immediately. (Note that I'm not recommending you use these constants ... I'm just providing more options).
No, using the integer one is a bad idea. It might work out in this case if MiddleName is always a string or None, but in general the implementation is free to intern integers, strings, tuples and other immutable values as it pleases. CPython does it for small integers and constants of the aforementioned types. PyPy defines is by value for integers and a few other types. So if MiddleName is 1, you're bound to see your code consider it not supplied.
Use an object instead, each new object has a distinct identity:
NotInFile = object()
Alternatively, for better debugging output, define your own class:
class NotInFileType(object):
# __slots__ = () if you want to save a few bytes
def __repr__(self):
return 'NotInFile'
NotInFile = NotInFileType()
del NotInFileType # look ma, no singleton
If you're paranoid, you could make it a proper singleton (ugly). If you need several such instances, you could rename the class into Sentiel or something, make the representation an instance variable and use multiple instances.
If you want type-checking, this idiom is now blessed by PEP 484 and supported by mypy:
from enum import Enum
class NotInFileType(Enum):
_token = 0
NotInFile = NotInFileType._token
If you are using mypy 0.740 or earlier, you need to workaround this bug in mypy by using typing.Final:
from typing import Final
NotInFile: Final = NotInFileType._token
If you are using Python 3.7 or earlier, you can use typing_extensions.Final from pip package typing_extensions instead of typing.Final

Which is a better __repr__ for a custom Python class?

It seems there are different ways the __repr__ function can return.
I have a class InfoObj that stores a number of things, some of which I don't particularly want users of the class to set by themselves. I recognize nothing is protected in python and they could just dive in and set it anyway, but seems defining it in __init__ makes it more likely someone might see it and assume it's fine to just pass it in.
(Example: Booleans that get set by a validation function when it determines that the object has been fully populated, and values that get calculated from other values when enough information is stored to do so... e.g. A = B + C, so once A and B are set then C is calculated and the object is marked Valid=True.)
So, given all that, which is the best way to design the output of __ repr__?
bob = InfoObj(Name="Bob")
# Populate bob.
# Output type A:
bob.__repr__()
'<InfoObj object at 0x1b91ca42>'
# Output type B:
bob.__repr__()
'InfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
# Output type C:
bob.__repr__()
'InfoObj.NewInfoObj(Name="Bob",Pants=True,A=7,B=5,C=2,Valid=True)'
... the point of type C would be to not happily take all the stuff I'd set 'private' in C++ as arguments to the constructor, and make teammates using the class set it up using the interface functions even if it's more work for them. In that case I would define a constructor that does not take certain things in, and a separate function that's slightly harder to notice, for the purposes of __repr__
If it makes any difference, I am planning to store these python objects in a database using their __repr__ output and retrieve them using eval(), at least unless I come up with a better way. The consequence of a teammate creating a full object manually instead of going through the proper interface functions is just that one type of info retrieval might be unstable until someone figures out what he did.
The __repr__ method is designed to produce the most useful output for the developer, not the enduser, so only you can really answer this question. However, I'd typically go with option B. Option A isn't very useful, and option C is needlessly verbose -- you don't know how your module is imported anyway. Others may prefer option C.
However, if you want to store Python objects is a database, use pickle.
import pickle
bob = InfoObj(Name="Bob")
> pickle.dumps(bob)
b'...some bytestring representation of Bob...'
> pickle.loads(pickle.dumps(bob))
Bob(...)
If you're using older Python (pre-3.x), then note that cPickle is faster, but pickle is more extensible. Pickle will work on some of your classes without any configuration, but for more complicated objects you might want to write custom picklers.

Categories

Resources