Is there a straight-forward approach to generate a Pydantic model from a dictionary?
Here is a sample of the data I have.
{
'id': '424c015f-7170-4ac5-8f59-096b83fe5f5806082020',
'contacts': [{
'displayName': 'Norma Fisher',
'id': '544aa395-0e63-4f9a-8cd4-767b3040146d'
}],
'startTime': '2020-06-08T09:38:00+00:00'
}
Expecting a model similar to ...
class NewModel(BaseModel):
id: str
contacts: list
startTime: str
You can use MyModel.parse_obj(my_dict) to generate a model from a dictionary. According to the documentation –
this is very similar to the __init__ method of the model, except it takes a dict rather than keyword arguments.
You can also use its __init__ method:
your_mode = YourMode(**your_dict)
I use this method to generate models at run time using a dictionary definition. This approach allows you to define nested models too. The field type syntax borrows from the create_model method.
from pydantic import create_model
m = {
"a":(int,...),
"b":{
"c":(str,"hi"),
"d":{
"e":(bool,True),
"f":(float,0.5)
}
}
}
def dict_model(name:str,dict_def:dict):
fields = {}
for field_name,value in dict_def.items():
if isinstance(value,tuple):
fields[field_name]=value
elif isinstance(value,dict):
fields[field_name]=(dict_model(f'{name}_{field_name}',value),...)
else:
raise ValueError(f"Field {field_name}:{value} has invalid syntax")
return create_model(name,**fields)
model = dict_model("some_name",m)
There's no method for exactly that, but you can use create_model() to create a model if you know the field types.
Or there's datamodel-code-generator (separate package) which allows you to generate models from schema definitions.
If you have a sample json and want to generate a pydantic model for validation and use it, then you can try this website - https://jsontopydantic.com/
which can generate a pydantic model from a sample json
Whilst I like #data_wiz dictionary definition, Here is an alternative suggestion based on what my needs to take simple JSON responses on the fly which are normally CamelCase key elements and be able to process this into a pythonic styled class.
With the standard functions JSON converts to Dict easily, however!
I wanted to work on this in a pythonic style
I also wanted to be able to have some type overrides converting strings to pythonic types
I also wanted to indicated elements that are optional. This is where I start loving Pydantic.
The following code snippet can generate a model from an actual data Dict from a JSON API response, as keys are camelcase it will convert them to pythonic snake style but retain the CamelCase as Alias.
This pydantic aliasing enables easy consumption of a JSON converted to Dict without key conversion and also the direct export of JSON formatted output. NB observe the config of the dynamic model DynamicModel.__config__.allow_population_by_field_name = True this allow the creation of a dynamicModel from Alias or Pythonic field names.
This Code is not fully featured currently cannot handle Lists but it is working well for me for simple cases.
Example of use is in the docstring of the pydanticModelGenerator
from inflection import underscore
from typing import Any, Dict, Optional
from pydantic import BaseModel, Field, create_model
class ModelDef(BaseModel):
"""Assistance Class for Pydantic Dynamic Model Generation"""
field: str
field_alias: str
field_type: Any
class pydanticModelGenerator:
"""
Takes source_data:Dict ( a single instance example of something like a JSON node) and self generates a pythonic data model with Alias to original source field names. This makes it easy to popuate or export to other systems yet handle the data in a pythonic way.
Being a pydantic datamodel all the richness of pydantic data validation is available and these models can easily be used in FastAPI and or a ORM
It does not process full JSON data structures but takes simple JSON document with basic elements
Provide a model_name, an example of JSON data and a dict of type overrides
Example:
source_data = {'Name': '48 Rainbow Rd',
'GroupAddressStyle': 'ThreeLevel',
'LastModified': '2020-12-21T07:02:51.2400232Z',
'ProjectStart': '2020-12-03T07:36:03.324856Z',
'Comment': '',
'CompletionStatus': 'Editing',
'LastUsedPuid': '955',
'Guid': '0c85957b-c2ae-4985-9752-b300ab385b36'}
source_overrides = {'Guid':{'type':uuid.UUID},
'LastModified':{'type':datetime },
'ProjectStart':{'type':datetime },
}
source_optionals = {"Comment":True}
#create Model
model_Project=pydanticModelGenerator(
model_name="Project",
source_data=source_data,
overrides=source_overrides,
optionals=source_optionals).generate_model()
#create instance using DynamicModel
project_instance=model_Project(**project_info)
"""
def __init__(
self,
model_name: str = None,
source_data: str = None,
overrides: Dict = {},
optionals: Dict = {},
):
def field_type_generator(k, overrides, optionals):
pass
field_type = str if not overrides.get(k) else overrides[k]["type"]
return field_type if not optionals.get(k) else Optional[field_type]
self._model_name = model_name
self._json_data = source_data
self._model_def = [
ModelDef(
field=underscore(k),
field_alias=k,
field_type=field_type_generator(k, overrides, optionals),
)
for k in source_data.keys()
]
def generate_model(self):
"""
Creates a pydantic BaseModel
from the json and overrides provided at initialization
"""
fields = {
d.field: (d.field_type, Field(alias=d.field_alias)) for d in self._model_def
}
DynamicModel = create_model(self._model_name, **fields)
DynamicModel.__config__.allow_population_by_field_name = True
return DynamicModel
Here is a customized code for data model generation using python dicts.
Code mostly borrowed from #data_wiz
Helper Functions
from pydantic import create_model
# https://stackoverflow.com/questions/62267544/generate-pydantic-model-from-a-dict
from copy import deepcopy
def get_default_values(input_schema_copy):
"""Get the default values from the structured schema dictionary. Recursive Traversal of the Schema is performed here.
Args:
input_schema_copy (dict): The input structured dictionary schema. Preferred deepcopy of the input schema to avoid inplace changes for the same.
Returns:
default_values (dict): The default values of the input schema.
"""
for k, v in input_schema_copy.items():
if isinstance(v, dict):
input_schema_copy[k] = get_default_values(v)
else:
input_schema_copy[k] = v[1]
return input_schema_copy
def get_defaults(input_schema):
"""Wrapper around get_default_values to get the default values of the input schema using a deepcopy of the same to avoid arbitrary value changes.
Args:
input_schema (dict): The input structured dictionary schema.
Returns:
default_values (dict): The default values of the input schema.
"""
input_schema_copy = deepcopy(input_schema)
return get_default_values(input_schema_copy)
def are_any_defaults_empty(default_values):
"""Check if any of the default values are empty (Ellipsis - ...)?
Args:
default_values (dict): The default values of the input schema.
Returns:
Bool: True if any of the default values are empty (Ellipsis - ...), False otherwise.
"""
for _, v in default_values.items():
if isinstance(v, dict):
are_any_defaults_empty(v)
else:
if v is Ellipsis: # ... symbol
return True
return False
def correct_schema_structure(input_schema_copy):
for k, v in input_schema_copy.items():
if isinstance(v, dict):
input_schema_copy[k] = correct_schema_structure(v)
elif type(v) == type:
input_schema_copy[k] = (v,...)
elif not hasattr(v, '__iter__') or isinstance(v, str):
input_schema_copy[k] = (type(v),v)
return input_schema_copy
def dict_model(dict_def:dict, name :str = "Demo_Pydantic_Nested_Model"):
"""Helper function to create the Pydantic Model from the dictionary.
Args:
name (str): The Model Name that you wish to give to the Pydantic Model.
dict_def (dict): The Schema Definition using a Dictionary.
Raises:
ValueError: When the Schema Definition is not a Tuple/Dictionary.
Returns:
pydantic.Model: A Pydantic Model.
"""
fields = {}
for field_name,value in dict_def.items():
if isinstance(value,tuple):
fields[field_name]=value
elif isinstance(value,dict):
# assign defaults to nested structures here (if present)
default_value = get_defaults(value)
default_value = Ellipsis if are_any_defaults_empty(default_value) else default_value
fields[field_name]=(dict_model(value, f'{name}_{field_name}'),default_value)
else:
raise ValueError(f"Field {field_name}:{value} has invalid syntax")
print(fields) # helpful for debugging
return create_model(name,**fields)
Schema Correction
input_schema = {
"a":(int,...),
"b":{
"c":(str,"hi"),
"d":{
"e":(bool,True),
"f":(float,0.5)
},
},
"g":"hello",
"h" : 123,
"i" : str,
"k" : int
}
input_schema_corrected = correct_schema_structure(input_schema)
input_schema_corrected
Output :
{'a': (int, Ellipsis),
'b': {'c': (str, 'hi'), 'd': {'e': (bool, True), 'f': (float, 0.5)}},
'g': (str, 'hello'),
'h': (int, 123),
'i': (str, Ellipsis),
'k': (int, Ellipsis)}
Actual Model Creation
model = dict_model(dict_def= input_schema, name= "Demo_Pydantic_Nested_Model")
Checking the Model Schema
model.schema()
{'title': 'Demo_Pydantic_Nested_Model',
'type': 'object',
'properties': {'a': {'title': 'A', 'type': 'integer'},
'b': {'title': 'B',
'default': {'c': 'hi', 'd': {'e': True, 'f': 0.5}},
'allOf': [{'$ref': '#/definitions/Demo_Pydantic_Nested_Model_b'}]},
'g': {'title': 'G', 'default': 'hello', 'type': 'string'},
'h': {'title': 'H', 'default': 123, 'type': 'integer'},
'i': {'title': 'I', 'type': 'string'},
'k': {'title': 'K', 'type': 'integer'}},
'required': ['a', 'i', 'k'],
'definitions': {'Demo_Pydantic_Nested_Model_b_d': {'title': 'Demo_Pydantic_Nested_Model_b_d',
'type': 'object',
'properties': {'e': {'title': 'E', 'default': True, 'type': 'boolean'},
'f': {'title': 'F', 'default': 0.5, 'type': 'number'}}},
'Demo_Pydantic_Nested_Model_b': {'title': 'Demo_Pydantic_Nested_Model_b',
'type': 'object',
'properties': {'c': {'title': 'C', 'default': 'hi', 'type': 'string'},
'd': {'title': 'D',
'default': {'e': True, 'f': 0.5},
'allOf': [{'$ref': '#/definitions/Demo_Pydantic_Nested_Model_b_d'}]}}}}}
Validation on Test Data
test_dict = { "a" : 0, "i" : "hello", "k" : 123}
model(**test_dict).dict()
Advantages over original answer :
Extended Default Values (for nested structures)
Easier Type Declarations
Related
I have following pydantic base model:
from typing import Dict, List, Optional, Union
from pydantic import BaseModel
class WSMessage(BaseModel):
action: str
success: Optional[bool] = None
sent_from: Optional[str] = None
send_to: Optional[str] = None
data: Optional[Union[str, Dict, List]] = None
msg: Optional[Union[str, Dict, List]] = None
reason: Optional[Union[str, Dict, List]] = None
class Config:
extra = "allow"
And the following data:
data = {
'action': 'reply',
'sent_from': 'master',
'send_to': '192.168.0.100_UE4yWw69iSBEf67JhhWTpg==',
'data': None,
'success': True,
'msg': [
{'name': 'entry1_name', 'value': 'entry1_value'},
{'name': 'entry2_name', 'value': 'entry2_value'}
],
'reason': None,
'to_action': 'get_system_properties',
'completed': True,
}
However when I try to load the values into pydantic model, data['msg'] is translated to a single dict instead of list of dicts.
>>> msg = WSMessage(**data)
>>> msg
WSMessage(
action='reply',
success=True,
sent_from='jumphost',
send_to='46.235.96.113_UE4yWw69iSBEf67JhhWTpg==',
data=None,
msg={'name': 'value'},
reason=None,
completed=True,
to_action='get_system_properties'
)
>>> msg.msg
{'name': 'value'}
What am I doing wrong? I want msg to accept any form of data. Or more specifically, string, dict or list.
If I remove 'msg' from my model, it will properly parse it to list of dict.
Your code almost works. First, you should use List[dict] over List since it is more precise. Second, when you use a Union to define a field, pydantic will match in the order of the union (first matching data structure).
As your code is written:
msg: Optional[Union[str, Dict, List[Dict]] = None
Given a list of dictionaries, pydantic will try to coerce your value to a dict (before attempting a list of dict) – since the first object in your list is a dictionary, it can successfully coerce it and it completes.
If you switch the order of the union:
msg: Optional[Union[str, List[Dict], Dict]] = None
Pydantic will now first check if the value is a list of dictionaries, before resolving to match a dictionary. This should now work.
Relevantly: Discriminated unions are an oft debated subject in pydantic:
https://github.com/pydantic/pydantic/issues/619
https://github.com/pydantic/pydantic/issues/4675
The following data is input
data = {
'campaigns': [
{
'title': 'GBP',
'geo_segment': 'WW',
'ac_type': 'Value',
'conversion': 'soft',
'asset_type': 'ALL',
'date': '22.04.21',
'name': 'GBP_WW_1_core_22.04.21',
'budget': '2000',
'cpa': '1,00'
}
],
'stages': [
'pre',
'post'
],
'language_mode': 'all_en'
}
To parse campaigns, I use the parse_obj() method
campaigns = parse_obj_as(List[CampaignData], data['campaigns'])
class CampaignData(BaseModel):
title: NonEmptyString
geo_segment: NonEmptyString
......
It works.
How to validate the rest of the data (stages: List, language_mode: str), which is not of type dict?
class GoogleCheckCampaignStages(BaseModel):
stages: List[str]
class GoogleCheckLanguageMode(BaseModel):
language_mode: str
If I run
stages = parse_obj_as(List[GoogleCheckCampaignStages], data['stages'])
returns
value is not a valid dict (type=type_error.dict)
Same result with data['language_mode'].
If I try with parse_raw_as() method
parse_raw_as(GoogleCheckLanguageMode, data['language_mode'])
returns
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
So how to parse str and list values?
The error you are encountering is because you are passing in data["stages"] which is just a list/array. It has no key called stages and therefore Pydantic doesn't know how to assign it.
Assuming that NonEmptyString is defined as below, I would suggest creating one model that processes the whole data object, for example like this:
from pydantic import BaseModel, parse_obj_as, constr
from typing import List
NonEmptyString = constr(min_length=1)
class CampaignData(BaseModel):
title: NonEmptyString
geo_segment: NonEmptyString
# and so on ...
class Data(BaseModel):
campaigns: List[CampaignData]
stages: List[str]
language_mode: str
parsed_data = parse_obj_as(Data, data)
print(parsed_data)
# campaigns=[CampaignData(title='GBP', geo_segment='WW', ...)] stages=['pre', 'post'] language_mode='all_en'
If you'd like to access only specific elements, you can easily do it like this:
print(parsed_data.stages)
# ['pre', 'post']
I am trying to map some values from data to a template.I want to fill in the values (with some manipulations) in the template only if they are already present in it.My template has hundreds of keys and my goal is to avoid the if statement before each manipulation and assignment.
The point of the if statements is to defer evaluation of the manipulations I am performing as they may be expensive to perform. Any solutions should take this into account.
data = {
'a':1,
'b':2,
'c':3,
'd':4,
'e':5
}
template1 = {
'p':'Nan',
'q':'Nan',
'r':'Nan'
}
template2 = {
'p':'Nan',
's':'Nan',
't':'Nan'
}
def func(template,data):
if 'p' in template.keys():
template['p'] = data['a']
if 'q' in template.keys():
template['q'] = data['b'][:2] + 'some manipulation'
if 'r' in template.keys():
template['r'] = data['c']
if 's' in template.keys():
template['s'] = data['d'] + 'some mainpulation'
if 't' in template.keys():
template['t'] = data['e']
I know I am missing something basic, my actual code and requirements are pretty complex and I tried to simplify them and bring them down to this simple structure.
Thanks for your help in advance!
You could also store manipulations directly in your data dict using lambda functions, then check if any retrieved value from the data dict is callable() when using this dict to update the template. Assuming your can't modify the keys in the data dict, then this approach could still work with the template_dict mapping approach suggested by Jlove.
data = {
'p': 1,
'q': 2,
'r': 3,
's': 4,
't': 5,
'u': lambda x: x * 2
}
template1 = {
'p':'Nan',
'q':'Nan',
'r':'Nan',
'u': 2
}
def func(template, data):
for key in template:
if callable(data[key]):
template[key] = data[key](template[key])
else:
template[key] = data[key]
#driver
func(template1, data)
for k in template1.items():
print(k)
--- expanded solution based on comments ---
basically the same as the above, but shows how to use a mapping dict to direct how the data dict and an actions dict can be combined to modify the template dict. Also shows how to map keys to functions using a dict.
from collections import defaultdict
def qManipulation(x):
return x * 10
def sManipulation(x):
return x * 3
data = {
'a':1,
'b':2,
'c':3,
'd':4,
'e':5
}
actions = {
'q': qManipulation,
's': sManipulation,
'u': lambda x: x * 7
}
tempToDataMap = defaultdict(lambda: None, {
'p': 'a',
'q': 'b',
'r': 'c',
's': 'd',
't': 'e'
})
template1 = {
'p':'Nan',
'q':'Nan',
'r':'Nan',
'u': 2
}
def func(template, data):
for key, val in template.items():
dataKey = tempToDataMap[key]
# check if the template key corrosponds to a data dict key
if dataKey is not None:
# if key mapping from template to data is actually in data dict, use data value in template
if dataKey in data:
template[key] = data[dataKey]
# if the template key is registered to an action in action dict, run action
if key in actions:
template[key] = actions[key](data[dataKey])
# use this if you have a manipulation on a template field that is not populated by data.
# this isn't present in the example, but could be handy if the template ever has default values other that Nan
elif key in actions:
template[key] = actions[key](template[key])
func(template1, data)
for k in template1.items():
print(k)
If your manipulations can be expressed as a simple lambda, you could encapsulate the condition/assigment in a function to reduce the code clutter:
def func(template,data):
def apply(k,action):
if k in template: template[k] = action()
apply('p',lambda: data['a'])
apply('q',lambda: data['b'][:2] + 'some manipulation')
apply('r',lambda: data['c'])
apply('s',lambda: data['d'] + 'some mainpulation')
apply('t',lambda: data['e'])
This is probably not a great idea but you could subclass dict and override __setitem__.
class GuardDict(dict):
def __setitem__(self, key, callable_value):
if key in self:
super().__setitem__(key, callable_value())
# we need a method to transform back to a dict
def to_dict(self):
return dict(self)
data = {
'a': 1,
'b': '2',
'c': 3,
'd': '4',
'e': 5
}
template1 = {
'p':'Nan',
'q':'Nan',
'r':'Nan'
}
template2 = {
'p':'Nan',
's':'Nan',
't':'Nan'
}
def func(template,data):
# create a GuardDict from the dict
# this will utilize __setitem__ and only actually set keys
# that already exist in the original dict
template = GuardDict(template)
template['p'] = lambda: data['a']
template['q'] = lambda: data['b'] + 'some manipulation'
template['r'] = lambda: data['c']
template['s'] = lambda: data['d'] + 'some mainpulation'
template['t'] = lambda: data['e']
# set back to a dict
return template.to_dict()
template1 = func(template1, data)
template2 = func(template2, data)
print(template1)
print(template2)
I should probably note if there are other users of your code they will probably hate you for this.
a dynamically functional approach might relieve you from all the ifs and elses, but might complicate the overall program structure.
data = {
'a':1,
'b':2,
'c':3,
'd':4,
'e':5
}
template1 = {
'p': 'Nan',
'q': 'Nan',
'r': 'Nan'
}
template2 = {
'p': 'Nan',
's': 'Nan',
't': 'Nan'
}
# first, define your complex logic in functions, accounting for every possible template key
def p_logic(data, x):
return data[x]
def q_logic(data, x):
return data[x][:2] + 'some manipulation'
# Then build a dict of every possible template key, the associated value and reference to one of the
# functions defined above
logic = {
'p': {
'value': 'a',
'logic': p_logic
},
'q': {
'value': 'b',
'logic': q_logic
},
}
def func(template, data):
# for every key in a template, lookup that key in our logic dict
# grab the value from the data
# and apply the complex logic that has been defined for this template value
for item in template: # template.keys() is not necessary!
template[item] = logic[item]['logic'](data, logic[item]['value'])
The only thing I could think to do here would be to have some sort of dict and run your template through a for loop instead. Such as:
template_dict = {'p': 'a', 'q': 'b', 'r': 'c', 's': 'd', 't': 'e'}
def func(template, data):
for key, value in template_dict.items():
if key in template.keys():
template[key] = data[value]
Otherwise, I'm not sure how you might be able to avoid all those conditionals.
I'm getting a JSON object with a key "sera:blah"
How would I deserialize that object into a python data type using the marshmallow library as that colon is an invalid property name?
Edit:
So classes in python cannot accept a colon in the porperty name. It's invalid syntax.
Edit2:
Ideally I would like to have a workaround within marshmallow.
I see 2 routes you can take with this
Try to deserialize it with JSON.loads first, and iterate through each property and replace all the malformed keys before feeding it to marshmallow, or
Use the JSON.JSONDecoder class and roll your own object_hook function. Then call the .decode() function before feeding it to marshmallow.
I've expanded on the latter (which I think is more appropriate)
from json import loads, JSONDecoder
s = """{
"obj1": 123,
"list": [
{"example2": 42},
{"sera:blah": false},
{"object:3": {"nest:ed": "obj"}}
]
}"""
data = loads(s)
print(data)
def obj_transform(obj):
for key in obj.keys(): # Iterate through obj
if ':' in key:
obj[key.replace(':', '_')] = obj.pop(key)
return obj
decoder = JSONDecoder(object_hook=obj_transform)
print(decoder.decode(s))
The result of this prints:
{'obj1': 123, 'list': [{'example2': 42}, {'sera:blah': False}, {'object:3': {'nest:ed': 'obj'}}]}
{'obj1': 123, 'list': [{'example2': 42}, {'sera_blah': False}, {'object_3': {'nest_ed': 'obj'}}]}
Which seems like what you are looking for, to sanitize your input to marshmallow.
Marshmallow handles this with the data_key attribute.
class MySchema(ma.Schema):
sara_blah = ma.fields.String(data_key="sara_blah")
(This is marshmallow 3 syntax. Marshmallow 2 used load_from and dump_to.)
I am trying to serialize a list of nested objects as scalar values by taking only one field from the nested item. Instead of [{key: value}, ...] I want to receive [value1, value2, ...].
Code:
from marshmallow import *
class MySchema(Schema):
key = fields.String(required=True)
class ParentSchema(Schema):
items = fields.Nested(MySchema, only='key', many=True)
Given the above schemas, I want to serialize some data:
>>> data = {'items': [{'key': 1}, {'key': 2}, {'key': 3}]}
>>> result, errors = ParentSchema().dump(data)
>>> result
{'items': ['1', '2', '3']}
This works as expected, giving me the list of scalar values. However, when trying to deserialize the data using the models above, the data is suddenly invalid:
>>> data, errors = ParentSchema().load(result)
>>> data
{'items': [{}, {}, {}]}
>>> errors
{'items': {0: {}, '_schema': ['Invalid input type.', 'Invalid input type.', 'Invalid input type.'], 1: {}, 2: {}}}
Is there any configuration option I am missing or is this simply not possible?
For anyone stumbling across the same issue, this is the workaround I am using currently:
class MySchema(Schema):
key = fields.String(required=True)
def load(self, data, *args):
data = [
{'key': item} if isinstance(item, str) else item
for item in data
]
return super().load(data, *args)
class ParentSchema(Schema):
items = fields.Nested(MySchema, only='key', many=True)