Make a Union of strings to be used as possible dictionary keys - python

I have some Python 3.7 code and I am trying to add types to it. One of the types I want to add is actually an Union of several possible strings:
from typing import Union, Optional, Dict
PossibleKey = Union["fruits", "cars", "vegetables"]
PossibleType = Dict[PossibleKey, str]
def some_function(target: Optional[PossibleType] = None):
if target:
all_fruits = target["fruits"]
print(f"I have {all_fruits}")
The problem here is that Pyright complains about PossibleKey. It says:
"fruits is not defined"
I would like to get Pyright/Pylance to work.
I have checked the from enum import Enum module from another SO answer, but if I try that I end up with more issues since I am actually dealing with a Dict[str, Any] and not an Enum.
What is the proper Pythonic way of representing my type?

"fruits" is not a type (hint), but Literal["fruits"] is.
from typing import Union, Literal
PossibleKey = Union[Literal["fruits"], Literal["cars"], Literal["vegetables"]]
or the much shorter version,
PossibleKey = Literal["fruits", "cars", "vegetables"]
Or, as you mentioned, define an Enum populated by the three values.
from enum import Enum
class Key(Enum):
Fruits = "fruits"
Cars = "cars"
Vegetables = "vegetables"
def some_function(target: Optional[PossibleType] = None):
if target:
all_fruits = target[Key.Fruits]
print(f"I have {all_fruits}")
(However, just because target is not None doesn't necessarily mean it actually has "fruits" as a key, only that doesn't have a key other than Key.Fruits, Key.Cars, or Key.Vegetables.)

Pyright error disappears if you define PossibleKey as Enum as below.
This requires only one line change to the original code.
If there is some issue with using Enum, please elaborate on that.
from typing import Union, Optional, Dict
from enum import Enum
PossibleKey = Enum("PossibleKey", ["fruits", "cars", "vegetables"])
PossibleType = Dict[PossibleKey, str]
def some_function(target: Optional[PossibleType] = None):
if target:
all_fruits = target["fruits"]
print(f"I have {all_fruits}")

Related

Create a pydantic model with dynamic keys

I want to create a Pydantic model for this structure:
{
"key-1": ["value-1", "value-2"],
"key-2": ["value-3"],
"key-3": []
}
My first attempt was
class MyModel(BaseModel):
__root__ = Dict[str, List[str]]
#root_validator(pre=True)
def validate_all_the_things(cls, values):
# check if keys and values match some regexes
But this raises an exception:
RuntimeError: no validator found for <class 'typing._GenericAlias'>, see `arbitrary_types_allowed` in Config
If i change Dict to dict, i don’t get the exception, but the resulting object yields an empty dict:
>>> MyModel(**{"key-1": ["value-1"]}).dict()
{}
what am i doing wrong?
You have a typo in model declaration. Use a colon instead of the equal sign.
from typing import List, Dict
from pydantic import BaseModel
class MyModel(BaseModel):
__root__: Dict[str, List[str]]
Then you can create a model instance:
>>> my_instance = MyModel.parse_obj({"key-1": ["value-1"]})
>>> my_instance.dict()
{'__root__': {'key-1': ['value-1']}}
You can find more information here: custom-root-types
Please also look at this section. Maybe you will find here some interesting stuff: dynamic-model-creation

Python type hinting annotation for Dataclass attribute

I have a dataclass and I use it as a constant store.
#dataclass
class MyClass:
CONSTANT_1 = "first"
CONSTANT_2 = "second"
I have a function:
def my_func(value: ?):
print(value)
I want to add annotation to my function to specify that possible value is one of attribute of MyClass
How to do it (I am using python 3.10) ?
Hopefully I not misunderstand the ask, please let me know if so. But I think in this case is best to use Enum type in python.
Here is a simple example:
from enum import Enum
class MyEnum(Enum):
CONSTANT_1 = "first"
CONSTANT_2 = "second"
Then to answer the second part, for annotation the ? becomes a MyEnum. This means any enum member of this type, but not the type (class) itself.
def my_func(value: MyEnum):
print(value, value.name, value.value)
Putting it all together, it becomes like:
from enum import Enum
class MyEnum(Enum):
CONSTANT_1 = "first"
CONSTANT_2 = "second"
def my_func(value: MyEnum):
# technically you can remove this check
if not isinstance(value, MyEnum):
return
print(value, value.name, value.value)
# note below: only type checker or ide complain, but code still runs fine
my_func('hello') # not OK!
my_func('second') # not OK!
my_func(MyEnum) # not OK!
my_func(MyEnum.CONSTANT_1) # OK
I think you're asking an XY problem. From your response in the comments, it seems like what you want is rather:
Have a class-like interface to hold a bunch of constant values.
Constraint the argument to only take the above values.
As as mentioned in rv.kvetch's answer, the conventional way of doing this is to use enums. I'm not sure what you mean by "wanting to skip .value", the value field of an enum simply gives you what's associated with that enum, and I would say that it's not important at all. Here's an example:
class StrEnum(enum.Enum):
FIRST = "first"
SECOND = "second"
class StrEnum2(enum.Enum):
FIRST = "first"
SECOND = "second"
print(StrEnum.FIRST.value) # first
print(StrEnum2.FIRST.value) # first
print(StrEnum.FIRST.value == StrEnum2.FIRST.value) # True
print(StrEnum.FIRST == StrEnum2.FIRST) # False
class IntEnum(enum.Enum):
FIRST = enum.auto()
SECOND = enum.auto()
print(IntEnum.FIRST.value) # 1
print(IntEnum.SECOND.value) # 2
What I want to show with this example are two things:
You don't really need .value at all if you're just comparing enums.
You don't even need to manually assign values to the enums; you can use enum.auto() to auto-assign a unique value to it.
Because at the end of the day, enums themselves already represent a choice among valid choices, so it doesn't matter what values it has.
That said, if what you want is just to put a type constraint on what values an argument can type, and not have to use enums, then you can use the Literal type. See this answer for details. For your example, you could do something like:
from typing import Literal, Final
def my_func(value: Literal["first", "second"]):
print(value)
my_func("first") # ok
my_func("not first") # bad
x = "first"
y: Final = "first"
my_func(x) # bad, because `x` is not final
my_func(y) # ok
But note that type annotations don't really prevent you from calling a function with an invalid value, it's just a hint for IDEs and type checkers.

Type hinting a list of specific strings in Python

I have a function as shown below:
from typing import List, Literal, Union
def foo(possible_values: List[Union[Literal['abcd', 'efgh', 'ijkl']]]):
return {}
Now this is how I want the code to behave:
whenever the possible_values parameter gets values other than ["abcd", "efgh","ijkl"]
Eg:
res = foo(possible_values=["abc", "efgh"])
It should throw an error as abc is not defined in the function signature.
However,
res = foo(possible_values=["abcd", "efgh"])
should work fine as they are subset of what is defined.
Currently, with the above code, it just accepts any arbitrary list of strings.
If you want to constrain values to a predefined set, you might want to use Enum. Like mentioned by others, type hinting won't enforce check and error natively in python, you'll have to either implement it in your function's code, or use a library allowing annotation-based control. Here is an example.
from typing import List
from enum import Enum
# Let's define your possible values as an enumeration. Note that it also inherits from
# str, which will allow to use its members in comparisons as if they were strings
class PossibleValues(str, Enum):
abcd = 'abcd'
efgh = 'efgh'
ijkl = 'ijkl'
Now your function. Note the type-hinting.
def foo(possible_values: List[PossibleValues]):
# We unroll the enum as a set, and check that possible_values is a subset of it
if not set(PossibleValues).issuperset(possible_values):
raise ValueError(f'Only {[v.value for v in PossibleValues]} are allowed.')
# Do whatever you need to do
return {}
Now when you use it:
foo(['abcd', 'efgh'])
# output: {}
foo(['abc', 'efgh'])
# ValueError: Only ['abcd', 'efgh', 'ijkl'] are allowed.

Python - Dataclass: load attribute value from a dictionary containing an invalid name

Unfortunately I have to load a dictionary containing an invalid name (which I can't change):
dict = {..., "invalid-name": 0, ...}
I would like to cast this dictionary into a dataclass object, but I can't define an attribute with this name.
from dataclasses import dataclass
#dataclass
class Dict:
...
invalid-name: int # can't do this
...
The only solution I could find is to change the dictionary key into a valid one right before casting it into a dataclass object:
dict["valid_name"] = dict.pop("invalid-name")
But I would like to avoid using string literals...
Is there any better solution to this?
One solution would be using dict-to-dataclass. As mentioned in its documents it has two options:
1.passing dictionary keys
It's probably quite common that your dataclass fields have the same names as the dictionary keys they map to but in case they don't, you can pass the dictionary key as the first argument (or the dict_key keyword argument) to field_from_dict:
#dataclass
class MyDataclass(DataclassFromDict):
name_in_dataclass: str = field_from_dict("nameInDictionary")
origin_dict = {
"nameInDictionary": "field value"
}
dataclass_instance = MyDataclass.from_dict(origin_dict)
>>> dataclass_instance.name_in_dataclass
"field value"
Custom converters
If you need to convert a dictionary value that isn't covered by the defaults, you can pass in a converter function using field_from_dict's converter parameter:
def yes_no_to_bool(yes_no: str) -> bool:
return yes_no == "yes"
#dataclass
class MyDataclass(DataclassFromDict):
is_yes: bool = field_from_dict(converter=yes_no_to_bool)
dataclass_instance = MyDataclass.from_dict({"is_yes": "yes"})
>>> dataclass_instance.is_yes
True
The following code allow to filter the nonexistent keys :
import dataclasses
#dataclasses.dataclass
class ClassDict:
valid-name0: str
valid-name1: int
...
dict = {..., "invalid-name": 0, ...}
dict = {k:v for k,v in dict.items() if k in tuple(e.name for e in dataclasses.fields(ClassDict).keys())}
However, I'm sure there should be a better way to do it since this is a bit hacky.
I would define a from_dict class method anyway, which would be a natural place to make the change.
#dataclass
class MyDict:
...
valid_name: int
...
#classmethod
def from_dict(cls, d):
d['valid_name'] = d.pop('invalid-name')
return cls(**d)
md = MyDict.from_dict({'invalid-name': 3, ...})
Whether you should modify d in place or do something to avoid unnecessary copies is another matter.
Another option could be to use the dataclass-wizard library, which is likewise a de/serialization library built on top of dataclasses. It should similarly support custom key mappings, as needed in this case.
I've also timed it with the builtin timeit module, and found it to be (on average) about 5x faster than a solution with dict_to_dataclass. I've added the code I used for comparison below.
from dataclasses import dataclass
from timeit import timeit
from typing_extensions import Annotated # Note: in Python 3.9+, can import this from `typing` instead
from dataclass_wizard import JSONWizard, json_key
from dict_to_dataclass import DataclassFromDict, field_from_dict
#dataclass
class ClassDictWiz(JSONWizard):
valid_name: Annotated[int, json_key('invalid-name')]
#dataclass
class ClassDict(DataclassFromDict):
valid_name: int = field_from_dict('invalid-name')
my_dict = {"invalid-name": 0}
n = 100_000
print('dict-to-dataclass: ', round(timeit('ClassDict.from_dict(my_dict)', globals=globals(), number=n), 3))
print('dataclass-wizard: ', round(timeit('ClassDictWiz.from_dict(my_dict)', globals=globals(), number=n), 3))
i1, i2 = ClassDict.from_dict(my_dict), ClassDictWiz.from_dict(my_dict)
# assert we get the same result with both approaches
assert i1.__dict__ == i2.__dict__
Results, on my Mac OS X laptop:
dict-to-dataclass: 0.594
dataclass-wizard: 0.098

Python building complex mypy types

In a perfect world, I could just do this:
ScoreBaseType = Union[bool, int, float]
ScoreComplexType = Union[ScoreBaseType, Dict[str, ScoreBaseType]]
But, that says a ScoreComplexType is either a ScoreBaseType or a dictionary which allows multiple types of values... not what I want.
The following looks like it should work to me, but it doesn't:
ScoreBaseTypeList = [bool, int, float]
ScoreBaseType = Union[*ScoreBaseTypeList] # pycharm says "can't use starred expression here"
ScoreDictType = reduce(lambda lhs,rhs: Union[lhs, rhs], map(lambda x: Dict[str, x], ScoreBaseTypeList))
ScoreComplexType = Union[ScoreBaseType, ScoreDictType]
Is there any way I can do something like the above without having to go through this tedium?
ScoreComplexType = Union[bool, int, float,
                     Dict[str, bool],
                     Dict[str, int],
                     Dict[str, float]]
Edit: More fleshed out desired usage example:
# these strings are completely arbitrary and determined at runtime. Used as keys in nested dictionaries.
CatalogStr = NewType('CatalogStr', str)
DatasetStr = NewType('DatasetStr', str)
ScoreTypeStr = NewType('ScoreTypeStr', str)
ScoreBaseType = Union[bool, int, float]
ScoreDictType = Dict[ScoreTypeStr, 'ScoreBaseTypeVar']
ScoreComplexType = Union['ScoreBaseTypeVar', ScoreDictType]
ScoreBaseTypeVar = TypeVar('ScoreBaseTypeVar', bound=ScoreBaseType)
ScoreComplexTypeVar = TypeVar('ScoreComplexTypeVar', bound=ScoreComplexType) # errors: "constraints cannot be parameterized by type variables"
class EvalBase(ABC, Generic[ScoreComplexTypeVar]):
def __init__(self) -> None:
self.scores: Dict[CatalogStr,
Dict[DatasetStr,
ScoreComplexTypeVar]
] = {}
class EvalExample(EvalBase[Dict[float]]): # can't do this either
...
Edit 2:
It occurs to me that I could simplify a LOT of my type hinting if I used tuples instead of nested dictionaries. This seems to maybe work? I've only tried it in the below toy example and haven't yet tried adapting all my code.
# These are used to make typing hints easier to understand
CatalogStr = NewType('CatalogStr', str) # A str corresponding to the name of a catalog
DatasetStr = NewType('DatasetStr', str) # A str corresponding to the name of a dataset
ScoreTypeStr = NewType('ScoreTypeStr', str) # A str corresponding to the label for a ScoreType
ScoreBaseType = Union[bool, int, float]
SimpleScoreDictKey = Tuple[CatalogStr, DatasetStr]
ComplexScoreDictKey = Tuple[CatalogStr, DatasetStr, ScoreTypeStr]
ScoreKey = Union[SimpleScoreDictKey, ComplexScoreDictKey]
ScoreKeyTypeVar = TypeVar('ScoreKeyTypeVar', bound=ScoreKey)
ScoreDictType = Dict[ScoreKey, ScoreBaseType]
# These are used for Generics in classes
DatasetTypeVar = TypeVar('DatasetTypeVar', bound='Dataset') # Must match a type inherited from Dataset
ScoreBaseTypeVar = TypeVar('ScoreBaseTypeVar', bound=ScoreBaseType)
class EvalBase(ABC, Generic[ScoreBaseTypeVar, ScoreKeyTypeVar]):
def __init__(self):
self.score: ScoreDictType = {}
class EvalExample(EvalBase[float, ComplexScoreDictKey]):
...
Although then what would the equivalent of this be? Seems like I might have to store a couple lists of keys in order to iterate?
for catalog_name in self.catalog_list:
for dataset_name in self.scores[catalog_name]:
for score in self.scores[catalog_name][dataset_name]:
You may need to use TypeVars to express this, but without an example of how you intend to use it, it's hard to say.
An example of how this would be used for typing a return value dependent on input:
ScoreBaseType = Union[bool, int, float]
ScoreTypeVar = TypeVar('ScoreTypeVar', bound=ScoreBaseType)
ScoreDictType = Union[ScoreTypeVar, Dict[str, ScoreTypeVar]]
def scoring_func(Iterable[ScoreTypeVar]) -> ScoreDictType:
...
If you're not doing this based on input values though, you probably want
ScoreBaseType = Union[bool, int, float]
ScoreDictTypes = Union[Dict[str, bool], Dict[str, int], Dict[str, float]]
ScoreComplexType = Union[ScoreBaseType, ScoreDictTypes]
Depending on how you are handling the types, you may also be able to use SupportsInt or SupportsFloat types rather than both int and float
Edit: (Additional Info Based on the edited OP below)
Since you are typing an ABC with this, it may be sufficient to type the base class using Dict[str, Any] and constrain subclasses further.
If it isn't, you are going to have very verbose type definitions, and there isn't much alternative, as mypy currently has some issues resolving some classes of programmatically generated types, even when operating on constants.
mypy also doesn't have support for recursive type aliases at this time (though there is a potential of support for them being added, it's not currently planned), so for readability, you'd need to define the allowed types for each potential level of nesting, and then collect those into a type representing the full nested structure.

Categories

Resources