Pydantic Model Structure for Largely Similar Objects? - python

I wonder if anyone might have a suggestion for a better way to build up a Pydantic model for this case?
The data set I am working with (JSON) is mostly the same structure throughout, but with some differences only down at the lowest levels of the tree. ie:
// data.json
{
"FirstItem": {
"Name": "first item",
"Data": {
"attr_1": "a",
"attr_2": "b"
}
},
"SecondItem": {
"Name": "second item",
"Data": {
"attr_3": "d",
"attr_4": "e"
}
},
...
}
So I am wondering, is there a suggested method for building a Pydantic model that uses a standard 'Item' (in this case, it would have 'Name' and 'Data'), but then change the 'Data' on a case-by-case basis?
I have a working example, but it feels quite verbose?
working example:
from pydantic import BaseModel
class FirstItemData(BaseModel):
attr_1: str
attr_2: str
class FirstItem(BaseModel):
Name: str
Data: FirstItemData # <--- The unique part
class SecondItemData(BaseModel):
attr_3: str
attr_4: str
class SecondItem(BaseModel):
Name: str
Data: SecondItemData
class Example(BaseModel):
FirstItem: FirstItem
SecondItem: SecondItem
o = Example.parse_file("data.json")
The above does work, but it feels like building the Item 'holder' each time (the part with 'Name' and 'Data') is redundant? Is there way to specify a generic 'container' structure, and then swap out the 'Data'"? Something like:
class GenericContainer(BaseModel):
Name: str
Data: ????
class Example(BaseModel):
FirstItem: GenericContainer(Data = FirstItemData)
SecondItem: GenericContainer(Data = SecondItemData)
or something of that sort? In this case I have several dozen of these unique 'Items' (only unique in their 'Data' part) and it doesn't seem correct to create 2 classes for each one? Does it?
I do realize that using the type Dict in place of the detailed 'Data' does work to load in the data, but it comes in as a dict instead of an object, which is not ideal in this case.
any thoughts or suggestions are much appreciated. Thanks!

Based on the comment from Hernán Alarcón, i wanted to try and
i believe this should work. Perhaps it will usefull to someone.
from pydantic.generics import BaseModel, GenericModel
from typing import Generic, TypeVar, Optional
class FirstItemData(BaseModel):
attr_1: str
attr_2: str
class SecondItemData(BaseModel):
attr_3: str
attr_4: str
TypeX = TypeVar('TypeX')
class GenericContainer(GenericModel, Generic[TypeX]):
Name: str
Data: TypeX
class ItemBag(BaseModel):
FirstItem: Optional[GenericContainer[FirstItemData]]
SecondItem: Optional[GenericContainer[SecondItemData]]
# some tests
one_bag = ItemBag(FirstItem = {"Name":"My first item", "Data":{"attr_1":"test1", "attr_2":"test2"}})
another_bag = ItemBag(FirstItem = {"Name":"My first item", "Data":{"attr_1":"test1", "attr_2":"test2"}}, SecondItem = {"Name":"My first item", "Data":{"attr_3":"test3", "attr_4":"test4"}})
# failing tests to slightly check validation
one_failing_bag = ItemBag(FirstItem = {"Name":"My first item", "Data":{"attr_3":"test1", "attr_42":"test2"}})
another_failing_bag = ItemBag(SecondItem = {"Name":"My second item", "Data":{"attr_3":"test3", "attr_42":"test2"}})
# the parsing way
parsed_bag = ItemBag.parse_obj({"FirstItem":{"Name":"My first item", "Data":{"attr_1":"test1", "attr_2":"test2"}}, "SecondItem": {"Name":"My first item", "Data":{"attr_3":"test3", "attr_4":"test4"}}})
So it works,
but i am not sure i'd choose genericity versus readability.

Related

How to create a dataclass with optional fields that outputs field in json only if the field is not None

I am unclear about how to use a #dataclass to convert a mongo doc into a python dataclass. With my NSQL documents they may or may not contain some of the fields. I only want to output a field (using asdict) from the dataclass if that field was present in the mongo document.
Is there a way to create a field that will be output with dataclasses.asdict only if it exists in the mongo doc?
I have tried using post_init but have not figured out a solution.
# in this example I want to output the 'author' field ONLY if it is present in the mongo document
#dataclass
class StoryTitle:
_id: str
title: str
author: InitVar[str] = None
dateOfPub: int = None
def __post_init__(self, author):
print(f'__post_init__ got called....with {author}')
if author is not None:
self.newauthor = author
print(f'self.author is now {self.newauthor}')
# foo and bar approximate documents in mongodb
foo = dict(_id='b23435xx3e4qq', title = 'goldielocks and the big bears', author='mary', dateOfPub = 220415)
newFoo = StoryTitle(**foo)
json_foo = json.dumps(asdict(newFoo))
print(json_foo)
bar = dict(_id='b23435xx3e4qq', title = 'War and Peace', dateOfPub = 220415)
newBar = StoryTitle(**bar)
json_bar = json.dumps(asdict(newBar))
print(json_bar)
My output json does not (of course) have the 'author' field. Anyone know how to accomplish this? I suppose I could just create my own asdict method ...
The dataclasses.asdict helper function doesn't offer a way to exclude fields with default or un-initialized values unfortunately -- however, the dataclass-wizard library does.
The dataclass-wizard is a (de)serialization library I've created, which is built on top of dataclasses module. It adds no extra dependencies outside of stdlib, only the typing-extensions module for compatibility reasons with earlier Python versions.
To skip dataclass fields with default or un-initialized values in serialization for ex. with asdict, the dataclass-wizard provides the skip_defaults option. However, there is also a minor issue I noted with your code above. If we set a default for the author field as None, that means that we won't be able to distinguish between null values and also the case when author field is not present when de-serializing the json data.
So in below example, I've created a CustomNull object similar to the None singleton in python. The name and implementation doesn't matter overmuch, however in our case we use it as a sentinel object to determine if a value for author is passed in or not. If it is not present in the input data when from_dict is called, then we simply exclude it when serializing data with to_dict or asdict, as shown below.
from __future__ import annotations # can be removed in Python 3.10+
from dataclasses import dataclass
from dataclass_wizard import JSONWizard
# create our own custom `NoneType` class
class CustomNullType:
# these methods are not really needed, but useful to have.
def __repr__(self):
return '<null>'
def __bool__(self):
return False
# this is analogous to the builtin `None = NoneType()`
CustomNull = CustomNullType()
# in this example I want to output the 'author' field ONLY if it is present in the mongo document
#dataclass
class StoryTitle(JSONWizard):
class _(JSONWizard.Meta):
# skip default values for dataclass fields when `to_dict` is called
skip_defaults = True
_id: str
title: str
# note: we could also define it like
# author: str | None = None
# however, using that approach we won't know if the value is
# populated as a `null` when de-serializing the json data.
author: str | None = CustomNull
# by default, the `dataclass-wizard` library uses regex to case transform
# json fields to snake case, and caches the field name for next time.
# dateOfPub: int = None
date_of_pub: int = None
# foo and bar approximate documents in mongodb
foo = dict(_id='b23435xx3e4qq', title='goldielocks and the big bears', author='mary', dateOfPub=220415)
new_foo = StoryTitle.from_dict(foo)
json_foo = new_foo.to_json()
print(json_foo)
bar = dict(_id='b23435xx3e4qq', title='War and Peace', dateOfPub=220415)
new_bar = StoryTitle.from_dict(bar)
json_bar = new_bar.to_json()
print(json_bar)
# lastly, we try de-serializing with `author=null`. the `author` field should still
# be populated when serializing the instance, as it was present in input data.
bar = dict(_id='b23435xx3e4qq', title='War and Peace', dateOfPub=220415, author=None)
new_bar = StoryTitle.from_dict(bar)
json_bar = new_bar.to_json()
print(json_bar)
Output:
{"_id": "b23435xx3e4qq", "title": "goldielocks and the big bears", "author": "mary", "dateOfPub": 220415}
{"_id": "b23435xx3e4qq", "title": "War and Peace", "dateOfPub": 220415}
{"_id": "b23435xx3e4qq", "title": "War and Peace", "author": null, "dateOfPub": 220415}
Note: the dataclass-wizard can be installed with pip:
$ pip install dataclass-wizard

Pydantic validations for extra fields that not defined in schema

I am using pydantic for schema validations and I would like to throw an error when any extra field is added to a schema that isn't defined.
from typing import Literal, Union
from pydantic import BaseModel, Field, ValidationError
class Cat(BaseModel):
pet_type: Literal['cat']
meows: int
class Dog(BaseModel):
pet_type: Literal['dog']
barks: float
class Lizard(BaseModel):
pet_type: Literal['reptile', 'lizard']
scales: bool
class Model(BaseModel):
pet: Union[Cat, Dog, Lizard] = Field(..., discriminator='pet_type')
n: int
print(Model(pet={'pet_type': 'dog', 'barks': 3.14, 'eats': 'biscuit'}, n=1))
""" try:
Model(pet={'pet_type': 'dog'}, n=1)
except ValidationError as e:
print(e) """
In the above code, I have added the eats field which is not defined. The pydantic validations are applied and the extra values that I defined are removed in response. I wanna throw an error saying eats is not allowed for Dog or something like that. Is there any way to achieve that?
And is there any chance that we can provide the input directly instead of the pet object?
print(Model({'pet_type': 'dog', 'barks': 3.14, 'eats': 'biscuit', n=1})). I tried without descriminator but those specific validations are missing related to pet_type. Can someone guide me how to achive either one of that?
You can use the extra field in the Config class to forbid extra attributes during model initialisation (by default, additional attributes will be ignored).
For example:
from pydantic import BaseModel, Extra
class Pet(BaseModel):
name: str
class Config:
extra = Extra.forbid
data = {
"name": "some name",
"some_extra_field": "some value",
}
my_pet = Pet.parse_obj(data) # <- effectively the same as Pet(**pet_data)
will raise a VaidationError:
ValidationError: 1 validation error for Pet
some_extra_field
extra fields not permitted (type=value_error.extra)
Works as well when the model is "nested", e.g.:
class PetModel(BaseModel):
my_pet: Pet
n: int
pet_data = {
"my_pet": {"name": "Some Name", "invalid_field": "some value"},
"n": 5,
}
pet_model = PetModel.parse_obj(pet_data)
# Effectively the same as
# pet_model = PetModel(my_pet={"name": "Some Name", "invalid_field": "some value"}, n=5)
will raise:
ValidationError: 1 validation error for PetModel
my_pet -> invalid_field
extra fields not permitted (type=value_error.extra)
Pydantic is made to validate your input with the schema. In your case, you want to remove one of its validation feature.
I think you should create a new class that inherit from BaseModel
class ModifiedBaseModel(BaseModel):
def __init__(__pydantic_self__, **data: Any) -> None:
registered, not_registered = __pydantic_self__.filter_data(data)
super().__init__(**registered)
for k, v in not_registered.items():
__pydantic_self__.__dict__[k] = v
#classmethod
def filter_data(cls, data):
registered_attr = {}
not_registered_attr = {}
annots = cls.__annotations__
for k, v in data.items():
if k in annots:
registered_attr[k] = v
else:
not_registered_attr[k] = v
return registered_attr, not_registered_attr
then create your validation classes
class Cat(ModifiedBaseModel):
pet_type: Literal['cat']
meows: int
now you can create a new Cat without worries about undefined attribute. Like this
my_cat = Cat(pet_type='cat', meows=3, name='blacky', age=3)
2nd question, to put the input directly from dict you can use double asterisk **
Dog(**my_dog_data_in_dict)
or
Dog(**{'pet_type': 'dog', 'barks': 3.14, 'eats': 'biscuit', n=1})

Python: mapping between class and json

I am getting Data via a REST-Interface and I want to store those data in a class-object.
my class could looks like this:
class Foo:
firstname = ''
lastname = ''
street = ''
number = ''
and the json may look like this:
[
{
"fname": "Carl",
"lname": "any name",
"address": ['carls street', 12]
}
]
What's the easiest way to map between the json and my class?
My problem is: I want to have a class with a different structure than the json.
I want the names of the attributes to be more meaningful.
Of course I know that I could simply write a to_json method and a from_json method which does what I want.
The thing is: I have a lot of those classes and I am looking for more declarative way to write the code.
e.g. in Java I probably would use mapstruct.
Thanks for your help!
Use a dict for the json input. Use **kwargs in an __init__ method in your class and map the variables accordingly.
I had a similar problem, and I solved it by using #classmethod
import json
class Robot():
def __init__(self, x, y):
self.type = "new-robot"
self.x = x
self.y = y
#classmethod
def create_robot(cls, sdict):
if sdict["type"] == "new-robot":
position = sdict["position"]
return cls(position['x'], position['y'])
else:
raise Exception ("Unable to create a new robot!!!")
if __name__=='__main__':
input_string = '{"type": "new-robot", "position": {"x": 3, "y": 3}}'
cmd = json.loads(input_string)
bot = Robot.create_robot(cmd)
print(bot.type)
Perhaps you could you two classes, one directly aligned with the Json (your source class) and the other having the actual structure you need. Then you could map them using the ObjectMapper class[https://pypi.org/project/object-mapper/]. This is very close to the MapStruct Library for Java.
ObjectMapper is a class for automatic object mapping. It helps you to create objects between project layers (data layer, service layer, view) in a simple, transparent way.

Traitlets: Best way for a "Dict of Instance"?

In my code, I need a Dict of Instance (for example a list of Parameter keyed by name). Currently I've solved this by using a regular Dict-traitlet as the incoming property (parameters) and then having a function that "translates" these into instances of the Parameter class.
Is there a better way to do this than:
import traitlets as t
import traitlets.config as tc
class Parameter(tc.Configurable):
name = t.Unicode().tag(config=True)
description = t.Unicode(allow_none=True).tag(config=True)
value = t.Any(default_value=None).tag(config=True)
class Job(tc.Configurable):
parameters = t.Dict(allow_none=True).tag(config=True)
_parameter_map = t.Dict()
def init_parameters(self):
self._parameter_map.clear()
for name, configuration in self.parameters.items():
configuration['name'] = name
parameter = Parameter(**configuration, parent=self)
self._parameter_map[name] = parameter
And then this:
c.Job.parameters = {
"parameter1": {
"description": "The first parameter",
"value": True
}
}
It works and logic dictates that - since you configure by "class names" with the traitlets - it is the only way, but I just wanted to be sure

XML to Python Class to C Struct

I need some advice. Two questions, does something already exist for this, what modules should I use to develop this.
I have some structures that come from an XML file. I want to represent them in Python Classes (maybe using a factory to create a class per structure). But I want these classes to have a function that will emit the structure as a C Struct.
From my research ctypes seems like the recommended thing to use to represent the structures in Python classes, but I don't see any methods for anything that will emit C Stucts for the creation of a header file.
From OP's comment I think the minimal solution a set of helper functions instead of classes. the xmltodict library makes it easy to turn the XML data into nested dictionaries, more or less like JSON. A set of helpers that parse the contents and generate appropriate C-struct strings is all that's really needed. If you can work with dictionaries :
{
"name": "my_struct",
"members": {
[
"name": "intmember",
"ctype": "int"
},
{
"name": "floatmember",
"ctype": "float"
}
]
}
You can do something like:
from string import Template
struct_template_string = '''
typedef $structname struct {
$defs
} $structname;
'''
struct_template = Template(struct_template_string)
member_template = Template(" $ctype $name;")
def spec_to_struct(spec_dict):
structname = spec_dict['name']
member_data = spec_dict['members']
members = [member_template.substitute(d) for d in member_data]
return struct_template.substitute(structname = structname, defs = "\n".join(members))
Which will produce something like:
typedef my_struct struct {
int intmember;
float floatmember;
} my_struct;
I'd try to get it working with basic functions first before trying to build up a class scaffold. It would be pretty easy to hide the details in a class using property descriptors:
class data_property(object):
def __init__(self, path, wrapper = None):
self.path = path
self.wrapper = wrapper
def __get__(self, instance, owner):
result = instance[self.path]
if self.wrapper:
if hasattr(result, '__iter__'):
return [self.wrapper(**i) for i in result]
return self.wrapper(**result)
return result
class MemberWrapper(dict):
name = data_property('name')
type = data_property('ctype')
class StructWrapper(dict):
name = data_property('name')
members = data_property('members', MemberWrapper )
test = StructWrapper(**example)
print test.name
print test.members
for member in test.members:
print member.type, member.name
# my_struct
# [{'name': 'intmember', 'ctype': 'int'}, {'name': 'floatmember', 'ctype': 'float'}]
# int intmember
# float floatmember

Categories

Resources