Create a pydantic model with dynamic keys - python

I want to create a Pydantic model for this structure:
{
"key-1": ["value-1", "value-2"],
"key-2": ["value-3"],
"key-3": []
}
My first attempt was
class MyModel(BaseModel):
__root__ = Dict[str, List[str]]
#root_validator(pre=True)
def validate_all_the_things(cls, values):
# check if keys and values match some regexes
But this raises an exception:
RuntimeError: no validator found for <class 'typing._GenericAlias'>, see `arbitrary_types_allowed` in Config
If i change Dict to dict, i don’t get the exception, but the resulting object yields an empty dict:
>>> MyModel(**{"key-1": ["value-1"]}).dict()
{}
what am i doing wrong?

You have a typo in model declaration. Use a colon instead of the equal sign.
from typing import List, Dict
from pydantic import BaseModel
class MyModel(BaseModel):
__root__: Dict[str, List[str]]
Then you can create a model instance:
>>> my_instance = MyModel.parse_obj({"key-1": ["value-1"]})
>>> my_instance.dict()
{'__root__': {'key-1': ['value-1']}}
You can find more information here: custom-root-types
Please also look at this section. Maybe you will find here some interesting stuff: dynamic-model-creation

Related

Is it possible to automatically convert a Union type to only one type automatically with pydantic?

Given the following data model:
class Demo(BaseModel):
id: Union[int, str]
files: Union[str, List[str]]
Is there a way to tell pydantic to always convert id to str type and files to List[str] type automatically when I access them, instead of doing this manually every time.
Pydantic has built-in validation logic built-in for most of the common types out there. This includes str. It just so happens that the default string validator simply coerces values of type int, float or Decimal to str by default. (see str_validator source)
This means even if you annotate id as str, but pass an int value, the model will initialize properly without validation error and the id value will be the str version of that value. (e.g. str(42) gives "42")
list also has a default validator built-in, but in this case it may be not what you want. If it encounters a non-list value, but sees that it is a sequence (or a generator), it again coerces it to a list. (see list_validator source) In this case, since the value you might pass to it will be a str and a str is a sequence, the outcome would be a list of single-character strings from the initial string. (e.g. list("abc") gives ["a", "b", "c"])
So for list[str] you will likely need your own custom pre=True validator to perform whatever you deem necessary with the str value to turn it into a list[str].
Example:
from pydantic import BaseModel, validator
class Demo(BaseModel):
id: str
files: list[str]
#validator("files", pre=True)
def str_to_list_of_str(cls, v: object) -> object:
if isinstance(v, str):
return v.split(",")
return v
if __name__ == "__main__":
obj = Demo.parse_obj({"id": 42, "files": "foo,bar,baz"})
print(obj)
print(type(obj.id), type(obj.files))
Output:
id='42' files=['foo', 'bar', 'baz']
<class 'str'> <class 'list'>
As you can see, you don't even need any additional id field logic, if your values are int because they end up as str on the model instance.
I figure out how to make it after get help from the maintainer. The key point is to remove Union from the type definition and use a pre-process hook to convert the value before validation, here is the sample code:
from pydantic import BaseModel, validator
from typing import List
class Demo(BaseModel):
id: str
files: List[str]
#validator('id', pre=True)
def id_must_be_str(cls, v):
if isinstance(v, int):
v = str(v)
return v
#validator('files', pre=True)
def files_must_be_list_of_str(cls, v):
if isinstance(v, str):
v = [v]
return v
obj = Demo.parse_obj({'id': 1, 'files': '/data/1.txt'})
print(type(obj.id))
print(type(obj.files))

Python - Dataclass: load attribute value from a dictionary containing an invalid name

Unfortunately I have to load a dictionary containing an invalid name (which I can't change):
dict = {..., "invalid-name": 0, ...}
I would like to cast this dictionary into a dataclass object, but I can't define an attribute with this name.
from dataclasses import dataclass
#dataclass
class Dict:
...
invalid-name: int # can't do this
...
The only solution I could find is to change the dictionary key into a valid one right before casting it into a dataclass object:
dict["valid_name"] = dict.pop("invalid-name")
But I would like to avoid using string literals...
Is there any better solution to this?
One solution would be using dict-to-dataclass. As mentioned in its documents it has two options:
1.passing dictionary keys
It's probably quite common that your dataclass fields have the same names as the dictionary keys they map to but in case they don't, you can pass the dictionary key as the first argument (or the dict_key keyword argument) to field_from_dict:
#dataclass
class MyDataclass(DataclassFromDict):
name_in_dataclass: str = field_from_dict("nameInDictionary")
origin_dict = {
"nameInDictionary": "field value"
}
dataclass_instance = MyDataclass.from_dict(origin_dict)
>>> dataclass_instance.name_in_dataclass
"field value"
Custom converters
If you need to convert a dictionary value that isn't covered by the defaults, you can pass in a converter function using field_from_dict's converter parameter:
def yes_no_to_bool(yes_no: str) -> bool:
return yes_no == "yes"
#dataclass
class MyDataclass(DataclassFromDict):
is_yes: bool = field_from_dict(converter=yes_no_to_bool)
dataclass_instance = MyDataclass.from_dict({"is_yes": "yes"})
>>> dataclass_instance.is_yes
True
The following code allow to filter the nonexistent keys :
import dataclasses
#dataclasses.dataclass
class ClassDict:
valid-name0: str
valid-name1: int
...
dict = {..., "invalid-name": 0, ...}
dict = {k:v for k,v in dict.items() if k in tuple(e.name for e in dataclasses.fields(ClassDict).keys())}
However, I'm sure there should be a better way to do it since this is a bit hacky.
I would define a from_dict class method anyway, which would be a natural place to make the change.
#dataclass
class MyDict:
...
valid_name: int
...
#classmethod
def from_dict(cls, d):
d['valid_name'] = d.pop('invalid-name')
return cls(**d)
md = MyDict.from_dict({'invalid-name': 3, ...})
Whether you should modify d in place or do something to avoid unnecessary copies is another matter.
Another option could be to use the dataclass-wizard library, which is likewise a de/serialization library built on top of dataclasses. It should similarly support custom key mappings, as needed in this case.
I've also timed it with the builtin timeit module, and found it to be (on average) about 5x faster than a solution with dict_to_dataclass. I've added the code I used for comparison below.
from dataclasses import dataclass
from timeit import timeit
from typing_extensions import Annotated # Note: in Python 3.9+, can import this from `typing` instead
from dataclass_wizard import JSONWizard, json_key
from dict_to_dataclass import DataclassFromDict, field_from_dict
#dataclass
class ClassDictWiz(JSONWizard):
valid_name: Annotated[int, json_key('invalid-name')]
#dataclass
class ClassDict(DataclassFromDict):
valid_name: int = field_from_dict('invalid-name')
my_dict = {"invalid-name": 0}
n = 100_000
print('dict-to-dataclass: ', round(timeit('ClassDict.from_dict(my_dict)', globals=globals(), number=n), 3))
print('dataclass-wizard: ', round(timeit('ClassDictWiz.from_dict(my_dict)', globals=globals(), number=n), 3))
i1, i2 = ClassDict.from_dict(my_dict), ClassDictWiz.from_dict(my_dict)
# assert we get the same result with both approaches
assert i1.__dict__ == i2.__dict__
Results, on my Mac OS X laptop:
dict-to-dataclass: 0.594
dataclass-wizard: 0.098

Python TypeHint: TypeVar dependency

I have defined a generic function using TypeVar to describe a process that has a common structure.
I want to get a value from data keyed by a Literal string like the following function.
from typing import Literal, TypeVar
data: dict[Literal["cat"] | Literal["dog"], int] = {
"cat": 0,
"dog": 1,
}
Key = TypeVar("Key")
Value = TypeVar("Value")
def get_value(key: Key, data: dict[Key, Value]) -> Value:
return data[key] # actualy more complex logic.
cat = get_value("cat", data) # Collect! data has 'cat'
bird = get_value("bird", data) # Wrong! data don't have 'bird', but pass type check :(
However, the type annotation seems to be interpreted as Callable[[str, dict [str, int]], int].
I want to get the type annotation like Callable[[Literal["cat", "dog"], dict[Literal["cat", "dog"], int]], int].
So, I tried to set TypeVar dependency like this, but error.
Key = TypeVar("Key")
Value = TypeVar("Value")
Label = TypeVar("Label", Key) # TypeVar bound type cannot be generic
def get_value(label: Label, data: dict[Key, Value]) -> Value:
return data[label]
Question
How to set TypeVar dependency, or the another approach to set the type annotation.

Make a Union of strings to be used as possible dictionary keys

I have some Python 3.7 code and I am trying to add types to it. One of the types I want to add is actually an Union of several possible strings:
from typing import Union, Optional, Dict
PossibleKey = Union["fruits", "cars", "vegetables"]
PossibleType = Dict[PossibleKey, str]
def some_function(target: Optional[PossibleType] = None):
if target:
all_fruits = target["fruits"]
print(f"I have {all_fruits}")
The problem here is that Pyright complains about PossibleKey. It says:
"fruits is not defined"
I would like to get Pyright/Pylance to work.
I have checked the from enum import Enum module from another SO answer, but if I try that I end up with more issues since I am actually dealing with a Dict[str, Any] and not an Enum.
What is the proper Pythonic way of representing my type?
"fruits" is not a type (hint), but Literal["fruits"] is.
from typing import Union, Literal
PossibleKey = Union[Literal["fruits"], Literal["cars"], Literal["vegetables"]]
or the much shorter version,
PossibleKey = Literal["fruits", "cars", "vegetables"]
Or, as you mentioned, define an Enum populated by the three values.
from enum import Enum
class Key(Enum):
Fruits = "fruits"
Cars = "cars"
Vegetables = "vegetables"
def some_function(target: Optional[PossibleType] = None):
if target:
all_fruits = target[Key.Fruits]
print(f"I have {all_fruits}")
(However, just because target is not None doesn't necessarily mean it actually has "fruits" as a key, only that doesn't have a key other than Key.Fruits, Key.Cars, or Key.Vegetables.)
Pyright error disappears if you define PossibleKey as Enum as below.
This requires only one line change to the original code.
If there is some issue with using Enum, please elaborate on that.
from typing import Union, Optional, Dict
from enum import Enum
PossibleKey = Enum("PossibleKey", ["fruits", "cars", "vegetables"])
PossibleType = Dict[PossibleKey, str]
def some_function(target: Optional[PossibleType] = None):
if target:
all_fruits = target["fruits"]
print(f"I have {all_fruits}")

How can I instantiate a new dataclass instance in Python without supplying parameters?

I want to create a data class instance and supply values later.
How can I do this?
def create_trade_data():
trades = []
td = TradeData()
td.Symbol='New'
trades.append(td)
return trades
DataClass:
from dataclasses import dataclass
#dataclass
class TradeData:
Symbol : str
ExecPrice : float
You have to make the attributes optional by giving them a default value None
from dataclasses import dataclass
#dataclass
class TradeData:
Symbol: str = None
ExecPrice: float = None
Then your create_trade_data function would return
[TradeData(Symbol='New', ExecPrice=None)]
Now, I chose None as the default value to indicate a lack of content. Of course, you could choose more sensible defaults like in the other answer.
from dataclasses import dataclass
#dataclass
class TradeData:
Symbol : str = ''
ExecPrice : float = 0.0
With the = operator you can assign default values.
There is the field method which is used for mutable values, like list.

Categories

Resources