python-marshmallow: deserializing nested schema with only one exposed key

python-marshmallow: deserializing nested schema with only one exposed key - python

I am trying to serialize a list of nested objects as scalar values by taking only one field from the nested item. Instead of [{key: value}, ...] I want to receive [value1, value2, ...].
Code:
from marshmallow import *
class MySchema(Schema):
key = fields.String(required=True)
class ParentSchema(Schema):
items = fields.Nested(MySchema, only='key', many=True)
Given the above schemas, I want to serialize some data:
>>> data = {'items': [{'key': 1}, {'key': 2}, {'key': 3}]}
>>> result, errors = ParentSchema().dump(data)
>>> result
{'items': ['1', '2', '3']}
This works as expected, giving me the list of scalar values. However, when trying to deserialize the data using the models above, the data is suddenly invalid:
>>> data, errors = ParentSchema().load(result)
>>> data
{'items': [{}, {}, {}]}
>>> errors
{'items': {0: {}, '_schema': ['Invalid input type.', 'Invalid input type.', 'Invalid input type.'], 1: {}, 2: {}}}
Is there any configuration option I am missing or is this simply not possible?

For anyone stumbling across the same issue, this is the workaround I am using currently:
class MySchema(Schema):
key = fields.String(required=True)
def load(self, data, *args):
data = [
{'key': item} if isinstance(item, str) else item
for item in data
]
return super().load(data, *args)
class ParentSchema(Schema):
items = fields.Nested(MySchema, only='key', many=True)

Related

Applying keys sequentially to a dict from a list

I'm scraping a website, which returns a dictionary:
person = {'name0':{'first0': 'John', 'last0':'Smith'},
'age0':'10',
'location0':{'city0':'Dublin'}
}
I'm trying to write a function that will return a dictionary {'name':'John', 'age':'10'} when passed the above dictionary.
I want to ideally put a try:... except KeyError around each item since sometimes keys will be missing.
def func(person):
filters = [('age', 'age0'), ('name', ['name0', 'first0'])]
result = {'name': None, 'age': None}
for i in filters:
try:
result[i[0]] = person[i[1]]
except KeyError:
pass
return result
The problem is result[i[0]] = person[i[1]] doesn't work for 'name' since there's two keys that need to be followed sequentially and I don't know how to do that.
I want some way of telling it (in the loop) to go to person['name0']['first0'] (and so on to whatever depth the thing I want is).
I have lots of things to extract, so I'd rather do it in a loop instead of a try..except statement for each variable individually.

In order to follow several key sequentially, you can use get and set the default value to {} (empty dictionary) for the upper levels. Set the default value to None (or whatever suits you) for the last level:
def func(person):
return {'name': person.get('name0', {}).get('first0', None),
'age': person.get('age0', None)}

Best I could manage was using a for loop to iterate through the keys:
person = {'name0':{'first0': 'John', 'last0':'Smith'},
'age0':'10',
'location0':{'city0':'Dublin'}
}
Additionally I used .get(key) rather than try..except as suggested by #wiwi
def func(person):
filters = [('age', ['age0']), ('name', ['name0', 'first0'])]
result = {'name': None, 'age': None}
for filter in filters:
temp = person.copy()
for key in filter[1]:
temp = temp.get(key)
if not temp: # NoneType doesn't have .get method
break
result[filter[0]] = temp
return result
func(person) then returns {'name': 'John', 'age': '10'}.
It handles missing input too:
person2 = {'age0':'10',
'location0':{'city0':'Dublin'}}
func(person2) returns {'name': None, 'age': '10'}

You can put the try...except in another loop, if there's a list of keys instead of a single key:
def getNestedVal(obj, kPath:list, defaultVal=None):
if isinstance(kPath, str) or not hasattr(kPath, '__iter__'):
kPath = [kPath] ## if not iterable, wrap as list
for k in kPath:
try: obj = obj[k]
except: return defaultVal
return obj
def func(person):
filters = [('age', 'age0'), ('name', ['name0', 'first0']),#]
('gender', ['gender0'], 'N/A')] # includes default value
return {k[0]: getNestedVal(person, *k[1:3]) for k in filters}
[I added gender just to demonstrate how defaults can also be specified for missing values.]
With this, func(person) should return
{'age': '10', 'name': 'John', 'gender': 'N/A'}
I also have a flattenObj function, a version of which is defined below:
def flattenDict(orig:dict, kList=[], kSep='_', stripNum=True):
if not isinstance(orig, dict): return [(kList, orig)]
tList = []
for k, v in orig.items():
if isinstance(k, str) and stripNum: k = k.strip('0123456789')
tList += flattenDict(v, kList+[str(k)], None)
if not isinstance(kSep, str): return tList
return {kSep.join(kl): v for kl,v in tList}
[I added stripNum just to get rid of the 0s in your keys...]
flattenDict(person) should return
{'name_first': 'John', 'name_last': 'Smith', 'age': '10', 'location_city': 'Dublin'}

Generate pydantic model from a dict

Is there a straight-forward approach to generate a Pydantic model from a dictionary?
Here is a sample of the data I have.
{
'id': '424c015f-7170-4ac5-8f59-096b83fe5f5806082020',
'contacts': [{
'displayName': 'Norma Fisher',
'id': '544aa395-0e63-4f9a-8cd4-767b3040146d'
}],
'startTime': '2020-06-08T09:38:00+00:00'
}
Expecting a model similar to ...
class NewModel(BaseModel):
id: str
contacts: list
startTime: str

You can use MyModel.parse_obj(my_dict) to generate a model from a dictionary. According to the documentation –
this is very similar to the __init__ method of the model, except it takes a dict rather than keyword arguments.

You can also use its __init__ method:
your_mode = YourMode(**your_dict)

I use this method to generate models at run time using a dictionary definition. This approach allows you to define nested models too. The field type syntax borrows from the create_model method.
from pydantic import create_model
m = {
"a":(int,...),
"b":{
"c":(str,"hi"),
"d":{
"e":(bool,True),
"f":(float,0.5)
}
}
}
def dict_model(name:str,dict_def:dict):
fields = {}
for field_name,value in dict_def.items():
if isinstance(value,tuple):
fields[field_name]=value
elif isinstance(value,dict):
fields[field_name]=(dict_model(f'{name}_{field_name}',value),...)
else:
raise ValueError(f"Field {field_name}:{value} has invalid syntax")
return create_model(name,**fields)
model = dict_model("some_name",m)

There's no method for exactly that, but you can use create_model() to create a model if you know the field types.
Or there's datamodel-code-generator (separate package) which allows you to generate models from schema definitions.

If you have a sample json and want to generate a pydantic model for validation and use it, then you can try this website - https://jsontopydantic.com/
which can generate a pydantic model from a sample json

Whilst I like #data_wiz dictionary definition, Here is an alternative suggestion based on what my needs to take simple JSON responses on the fly which are normally CamelCase key elements and be able to process this into a pythonic styled class.
With the standard functions JSON converts to Dict easily, however!
I wanted to work on this in a pythonic style
I also wanted to be able to have some type overrides converting strings to pythonic types
I also wanted to indicated elements that are optional. This is where I start loving Pydantic.
The following code snippet can generate a model from an actual data Dict from a JSON API response, as keys are camelcase it will convert them to pythonic snake style but retain the CamelCase as Alias.
This pydantic aliasing enables easy consumption of a JSON converted to Dict without key conversion and also the direct export of JSON formatted output. NB observe the config of the dynamic model DynamicModel.__config__.allow_population_by_field_name = True this allow the creation of a dynamicModel from Alias or Pythonic field names.
This Code is not fully featured currently cannot handle Lists but it is working well for me for simple cases.
Example of use is in the docstring of the pydanticModelGenerator
from inflection import underscore
from typing import Any, Dict, Optional
from pydantic import BaseModel, Field, create_model
class ModelDef(BaseModel):
"""Assistance Class for Pydantic Dynamic Model Generation"""
field: str
field_alias: str
field_type: Any
class pydanticModelGenerator:
"""
Takes source_data:Dict ( a single instance example of something like a JSON node) and self generates a pythonic data model with Alias to original source field names. This makes it easy to popuate or export to other systems yet handle the data in a pythonic way.
Being a pydantic datamodel all the richness of pydantic data validation is available and these models can easily be used in FastAPI and or a ORM
It does not process full JSON data structures but takes simple JSON document with basic elements
Provide a model_name, an example of JSON data and a dict of type overrides
Example:
source_data = {'Name': '48 Rainbow Rd',
'GroupAddressStyle': 'ThreeLevel',
'LastModified': '2020-12-21T07:02:51.2400232Z',
'ProjectStart': '2020-12-03T07:36:03.324856Z',
'Comment': '',
'CompletionStatus': 'Editing',
'LastUsedPuid': '955',
'Guid': '0c85957b-c2ae-4985-9752-b300ab385b36'}
source_overrides = {'Guid':{'type':uuid.UUID},
'LastModified':{'type':datetime },
'ProjectStart':{'type':datetime },
}
source_optionals = {"Comment":True}
#create Model
model_Project=pydanticModelGenerator(
model_name="Project",
source_data=source_data,
overrides=source_overrides,
optionals=source_optionals).generate_model()
#create instance using DynamicModel
project_instance=model_Project(**project_info)
"""
def __init__(
self,
model_name: str = None,
source_data: str = None,
overrides: Dict = {},
optionals: Dict = {},
):
def field_type_generator(k, overrides, optionals):
pass
field_type = str if not overrides.get(k) else overrides[k]["type"]
return field_type if not optionals.get(k) else Optional[field_type]
self._model_name = model_name
self._json_data = source_data
self._model_def = [
ModelDef(
field=underscore(k),
field_alias=k,
field_type=field_type_generator(k, overrides, optionals),
)
for k in source_data.keys()
]
def generate_model(self):
"""
Creates a pydantic BaseModel
from the json and overrides provided at initialization
"""
fields = {
d.field: (d.field_type, Field(alias=d.field_alias)) for d in self._model_def
}
DynamicModel = create_model(self._model_name, **fields)
DynamicModel.__config__.allow_population_by_field_name = True
return DynamicModel

Here is a customized code for data model generation using python dicts.
Code mostly borrowed from #data_wiz
Helper Functions
from pydantic import create_model
# https://stackoverflow.com/questions/62267544/generate-pydantic-model-from-a-dict
from copy import deepcopy
def get_default_values(input_schema_copy):
"""Get the default values from the structured schema dictionary. Recursive Traversal of the Schema is performed here.
Args:
input_schema_copy (dict): The input structured dictionary schema. Preferred deepcopy of the input schema to avoid inplace changes for the same.
Returns:
default_values (dict): The default values of the input schema.
"""
for k, v in input_schema_copy.items():
if isinstance(v, dict):
input_schema_copy[k] = get_default_values(v)
else:
input_schema_copy[k] = v[1]
return input_schema_copy
def get_defaults(input_schema):
"""Wrapper around get_default_values to get the default values of the input schema using a deepcopy of the same to avoid arbitrary value changes.
Args:
input_schema (dict): The input structured dictionary schema.
Returns:
default_values (dict): The default values of the input schema.
"""
input_schema_copy = deepcopy(input_schema)
return get_default_values(input_schema_copy)
def are_any_defaults_empty(default_values):
"""Check if any of the default values are empty (Ellipsis - ...)?
Args:
default_values (dict): The default values of the input schema.
Returns:
Bool: True if any of the default values are empty (Ellipsis - ...), False otherwise.
"""
for _, v in default_values.items():
if isinstance(v, dict):
are_any_defaults_empty(v)
else:
if v is Ellipsis: # ... symbol
return True
return False
def correct_schema_structure(input_schema_copy):
for k, v in input_schema_copy.items():
if isinstance(v, dict):
input_schema_copy[k] = correct_schema_structure(v)
elif type(v) == type:
input_schema_copy[k] = (v,...)
elif not hasattr(v, '__iter__') or isinstance(v, str):
input_schema_copy[k] = (type(v),v)
return input_schema_copy
def dict_model(dict_def:dict, name :str = "Demo_Pydantic_Nested_Model"):
"""Helper function to create the Pydantic Model from the dictionary.
Args:
name (str): The Model Name that you wish to give to the Pydantic Model.
dict_def (dict): The Schema Definition using a Dictionary.
Raises:
ValueError: When the Schema Definition is not a Tuple/Dictionary.
Returns:
pydantic.Model: A Pydantic Model.
"""
fields = {}
for field_name,value in dict_def.items():
if isinstance(value,tuple):
fields[field_name]=value
elif isinstance(value,dict):
# assign defaults to nested structures here (if present)
default_value = get_defaults(value)
default_value = Ellipsis if are_any_defaults_empty(default_value) else default_value
fields[field_name]=(dict_model(value, f'{name}_{field_name}'),default_value)
else:
raise ValueError(f"Field {field_name}:{value} has invalid syntax")
print(fields) # helpful for debugging
return create_model(name,**fields)
Schema Correction
input_schema = {
"a":(int,...),
"b":{
"c":(str,"hi"),
"d":{
"e":(bool,True),
"f":(float,0.5)
},
},
"g":"hello",
"h" : 123,
"i" : str,
"k" : int
}
input_schema_corrected = correct_schema_structure(input_schema)
input_schema_corrected
Output :
{'a': (int, Ellipsis),
'b': {'c': (str, 'hi'), 'd': {'e': (bool, True), 'f': (float, 0.5)}},
'g': (str, 'hello'),
'h': (int, 123),
'i': (str, Ellipsis),
'k': (int, Ellipsis)}
Actual Model Creation
model = dict_model(dict_def= input_schema, name= "Demo_Pydantic_Nested_Model")
Checking the Model Schema
model.schema()
{'title': 'Demo_Pydantic_Nested_Model',
'type': 'object',
'properties': {'a': {'title': 'A', 'type': 'integer'},
'b': {'title': 'B',
'default': {'c': 'hi', 'd': {'e': True, 'f': 0.5}},
'allOf': [{'$ref': '#/definitions/Demo_Pydantic_Nested_Model_b'}]},
'g': {'title': 'G', 'default': 'hello', 'type': 'string'},
'h': {'title': 'H', 'default': 123, 'type': 'integer'},
'i': {'title': 'I', 'type': 'string'},
'k': {'title': 'K', 'type': 'integer'}},
'required': ['a', 'i', 'k'],
'definitions': {'Demo_Pydantic_Nested_Model_b_d': {'title': 'Demo_Pydantic_Nested_Model_b_d',
'type': 'object',
'properties': {'e': {'title': 'E', 'default': True, 'type': 'boolean'},
'f': {'title': 'F', 'default': 0.5, 'type': 'number'}}},
'Demo_Pydantic_Nested_Model_b': {'title': 'Demo_Pydantic_Nested_Model_b',
'type': 'object',
'properties': {'c': {'title': 'C', 'default': 'hi', 'type': 'string'},
'd': {'title': 'D',
'default': {'e': True, 'f': 0.5},
'allOf': [{'$ref': '#/definitions/Demo_Pydantic_Nested_Model_b_d'}]}}}}}
Validation on Test Data
test_dict = { "a" : 0, "i" : "hello", "k" : 123}
model(**test_dict).dict()
Advantages over original answer :
Extended Default Values (for nested structures)
Easier Type Declarations

Looping through a function

I am struggling with figuring out the best way to loop through a function. The output of this API is a Graph Connection and I am a-little out of my element. I really need to obtain ID's from an api output and have them in a dict or some sort of form that I can pass to another API call.
**** It is important to note that the original output is a graph connection.... print(type(api_response) does show it as a list however, if I do a print(type(api_response[0])) it returns a
This is the original output from the api call:
[{'_from': None, 'to': {'id': '5c9941fcdd2eeb6a6787916e', 'type': 'user'}}, {'_from': None, 'to': {'id': '5cc9055fcc5781152ca6eeb8', 'type': 'user'}}, {'_from': None, 'to': {'id': '5d1cf102c94c052cf1bfb3cc', 'type': 'user'}}]
This is the code that I have up to this point.....
api_response = api_instance.graph_user_group_members_list(group_id, content_type, accept,limit=limit, skip=skip, x_org_id=x_org_id)
def extract_id(result):
result = str(result).split(' ')
for i, r in enumerate(result):
if 'id' in r:
id = (result[i+1].translate(str.maketrans('', '', string.punctuation)))
print( id )
return id
extract_id(api_response)
def extract_id(result):
result = str(result).split(' ')
for i, r in enumerate(result):
if 'id' in r:
id = (result[i+8].translate(str.maketrans('', '', string.punctuation)))
print( id )
return id
extract_id(api_response)
def extract_id(result):
result = str(result).split(' ')
for i, r in enumerate(result):
if 'id' in r:
id = (result[i+15].translate(str.maketrans('', '', string.punctuation)))
print( id )
return id
extract_id(api_response)
I have been able to use a function to extract the ID's but I am doing so through a string. I am in need of a scalable solution that I can use to pass these ID's along to another API call.
I have tried to use a for loop but because it is 1 string and i+1 defines the id's position, it is redundant and just outputs 1 of the id's multiple times.
I am receiving the correct output using each of these functions however, it is not scalable..... and just is not a solution. Please help guide me......

So to solve the response as a string issue I would suggest using python's builtin json module. Specifically, the method .loads() can convert a string to a dict or list of dicts. From there you can iterate over the list or dict and check if the key is equal to 'id'. Here's an example based on what you said the response would look like.
import json
s = "[{'_from': None, 'to': {'id': '5c9941fcdd2eeb6a6787916e', 'type': 'user'}}, {'_from': None, 'to': {'id': '5cc9055fcc5781152ca6eeb8', 'type': 'user'}}, {'_from': None, 'to': {'id': '5d1cf102c94c052cf1bfb3cc', 'type': 'user'}}]"
# json uses double quotes and null; there is probably a better way to do this though
s = s.replace("\'", '\"').replace('None', 'null')
response = json.loads(s) # list of dicts
for d in response:
for key, value in d['to'].items():
if key == 'id':
print(value) # or whatever else you want to do
# 5c9941fcdd2eeb6a6787916e
# 5cc9055fcc5781152ca6eeb8
# 5d1cf102c94c052cf1bfb3cc

Flatten a nested dict structure into a dataset

For some post-processing, I need to flatten a structure like this
{'foo': {
'cat': {'name': 'Hodor', 'age': 7},
'dog': {'name': 'Mordor', 'age': 5}},
'bar': { 'rat': {'name': 'Izidor', 'age': 3}}
}
into this dataset:
[{'foobar': 'foo', 'animal': 'dog', 'name': 'Mordor', 'age': 5},
{'foobar': 'foo', 'animal': 'cat', 'name': 'Hodor', 'age': 7},
{'foobar': 'bar', 'animal': 'rat', 'name': 'Izidor', 'age': 3}]
So I wrote this function:
def flatten(data, primary_keys):
out = []
keys = copy.copy(primary_keys)
keys.reverse()
def visit(node, primary_values, prim):
if len(prim):
p = prim.pop()
for key, child in node.iteritems():
primary_values[p] = key
visit(child, primary_values, copy.copy(prim))
else:
new = copy.copy(node)
new.update(primary_values)
out.append(new)
visit(data, { }, keys)
return out
out = flatten(a, ['foo', 'bar'])
I was not really satisfied because I have to use copy.copy to protect my inputs. Obviously, when using flatten one does not want the inputs be altered.
Then I thought about one alternative that uses more global variables (at least global to flatten) and uses an index instead of directly passing primary_keys to visit. However, this does not really help me to get rid of the ugly initial copy:
keys = copy.copy(primary_keys)
keys.reverse()
So here is my final version:
def flatten(data, keys):
data = copy.copy(data)
keys = copy.copy(keys)
keys.reverse()
out = []
values = {}
def visit(node, id):
if id:
id -= 1
for key, child in node.iteritems():
values[keys[id]] = key
visit(child, id)
else:
node.update(values)
out.append(node)
visit(data, len(keys))
return out
Is there a better implementation (that can avoid the use of copy.copy)?

Edit: modified to account for variable dictionary depth.
By using the merge function from my previous answer (below), you can avoid calling update which modifies the caller. There is then no need to copy the dictionary first.
def flatten(data, keys):
out = []
values = {}
def visit(node, id):
if id:
id -= 1
for key, child in node.items():
values[keys[id]] = key
visit(child, id)
else:
out.append(merge(node, values)) # use merge instead of update
visit(data, len(keys))
return out
One thing I don't understand is why you need to protect the keys input. I don't see them being modified anywhere.
Previous answer
How about list comprehension?
def merge(d1, d2):
return dict(list(d1.items()) + list(d2.items()))
[[merge({'foobar': key, 'animal': sub_key}, sub_sub_dict)
for sub_key, sub_sub_dict in sub_dict.items()]
for key, sub_dict in a.items()]
The tricky part was merging the dictionaries without using update (which returns None).

Unable to access dict values indjango view

I want to save an array of objects passed from javascript through ajax to me database. This is my view code:
data2 = json.loads(request.raw_get_data)
for i in data2:
print(key)
obj = ShoppingCart(quantity = i.quantity , user_id = 3, datetime = datetime.now(), product_id = i.pk)
obj.save()
return render_to_response("HTML.html",RequestContext(request))
After the first line, i get this in my dictionary:
[{'model': 'Phase_2.product', 'fields': {'name': 'Bata', 'category': 2, 'quantity': 1, 'subcategory': 1, 'count': 2, 'price': 50}, 'imageSource': None, 'pk': 1}]
(Only one object in the array right now)
I want to be able access individual fields like quantity, id, etc in order to save the data to my database. When i debug this code, it gives a name error on 'i'. I also tried accessing the fields like this: data2[0].quantity but it gives this error: {AttributeError}dict object has no attribute quantity.
Edited code:
for i in data2:
name = i["fields"]["name"]
obj = ShoppingCart(quantity = i["fields"]["quantity"] , user_id = 3, datetime = datetime.now(), product_id = i["fields"]["pk"])
obj.save()

It might help you to visualise the returned dict with proper formatting:
[
{
'model': 'Phase_2.product',
'fields': {
'name': 'Bata',
'category': 2,
'quantity': 1,
'subcategory': 1,
'count': 2,
'price': 50
},
'imageSource': None,
'pk': 1
}
]
The most likely reason for your error is that you are trying to access values of of the inner 'fields' dictionary as if they belong to the outer i dictionary.
i.e.
# Incorrect
i["quantity"]
# Gives KeyError
# Correct
i["fields"]["quantity"]
Edit
You have the same problem in your update:
# Incorrect
i["fields"]["pk"]
# Correct
i["pk"]
The "pk" field is in the outer dictionary, not the inner "fields" dictionary.

You may try:
i['fields']['quantity']
The json.loads() returns you a dictionary, which should be accessed by key.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python-marshmallow: deserializing nested schema with only one exposed key - python

Related

Applying keys sequentially to a dict from a list

Generate pydantic model from a dict

Looping through a function

Flatten a nested dict structure into a dataset

Unable to access dict values indjango view

Categories

Resources