Python Marshmallow: Dict validation Error - python

I'm quite new to marshmallow but my question refers to the issue of handling dict-like objects. There are no workable examples in the Marshmallow documentation. I came across with a simple example here in stack overflow Original question and this is the original code for the answer suppose this should be quite simple
from marshmallow import Schema, fields, post_load, pprint
class UserSchema(Schema):
name = fields.String()
email = fields.Email()
friends = fields.List(fields.String())
class AddressBookSchema(Schema):
contacts =fields.Dict(keys=fields.String(),values=fields.Nested(UserSchema))
#post_load
def trans_friends(self, item):
for name in item['contacts']:
item['contacts'][name]['friends'] = [item['contacts'][n] for n in item['contacts'][name]['friends']]
data = """
{"contacts": {
"Steve": {
"name": "Steve",
"email": "steve#example.com",
"friends": ["Mike"]
},
"Mike": {
"name": "Mike",
"email": "mike#example.com",
"friends": []
}
}
}
"""
deserialized_data = AddressBookSchema().loads(data)
pprint(deserialized_data)
However, when I run the code I get the following NoneType value
`None`
The input hasn't been marshalled.
I'm using the latest beta version of marshmallow 3.0.0b20. I can't find a way to make this work even it looks so simple. The information seems to indicate that nested dictionaries are being worked by the framework.
Currently I'm working in a cataloging application for flask where I'm receiving JSON messages where I can't really specify the schema beforehand. My specific problem is the following:
data = """
{"book": {
"title": {
"english": "Don Quixiote",
"spanish": "Don Quijote"
},
"author": {
"first_name": "Miguel",
"last_name": "Cervantes de Saavedra"
}
},
"book": {
"title": {
"english": "20000 Leagues Under The Sea",
"french": "20000 Lieues Sous Le Mer",
"japanese": "海の下で20000リーグ",
"spanish": "20000 Leguas Bajo El Mar",
"german": "20000 Meilen unter dem Meeresspiegel",
"russian": "20000 лиг под водой"
},
"author": {
"first_name": "Jules",
"last_name": "Verne"
}
}
}
This is just toy data but exemplifies that the keys in the dictionaries are not fixed, they change in number and text.
So the questions are why am I getting the validation error in a simple already worked example and if it's possible to use the marshmallow framework to validate my data,
Thanks

There are two issues in your code.
The first is the indentation of the post_load decorator. You introduced it when copying the code here, but you don't have it in the code you're running, otherwise you wouldn't get None.
The second is due to a documented change in marshmallow 3. pre/post_load/dump functions are expected to return the value rather than mutate it.
Here's a working version. I also reworked the decorator:
from marshmallow import Schema, fields, post_load, pprint
class UserSchema(Schema):
name = fields.String()
email = fields.Email()
friends = fields.List(fields.String())
class AddressBookSchema(Schema):
contacts = fields.Dict(keys=fields.String(),values=fields.Nested(UserSchema))
#post_load
def trans_friends(self, item):
for contact in item['contacts'].values():
contact['friends'] = [item['contacts'][n] for n in contact['friends']]
return item
data = """
{
"contacts": {
"Steve": {
"name": "Steve",
"email": "steve#example.com",
"friends": ["Mike"]
},
"Mike": {
"name": "Mike",
"email": "mike#example.com",
"friends": []
}
}
}
"""
deserialized_data = AddressBookSchema().loads(data)
pprint(deserialized_data)
And finally, the Dict in marshmallow 2 doesn't have key/value validation feature, so it will just silently ignore the keys and values argument and perform no validation.

Related

FastAPI create a generic response model that would suit requirements

I've been working with FastAPI for some time, it's a great framework.
However real life scenarios can be surprising, sometimes a non-standard approach is necessary. There's a one case I'd like to ask your help with.
There's a strange external requirement that a model response should be formatted as stated in example:
Desired behavior:
GET /object/1
{status: ‘success’, data: {object: {id:‘1’, category: ‘test’ …}}}
GET /objects
{status: ‘success’, data: {objects: [...]}}}
Current behavior:
GET /object/1 would respond:
{id: 1,field1:"content",... }
GET /objects/ would send a List of Object e.g.,:
{
[
{id: 1,field1:"content",... },
{id: 1,field1:"content",... },
...
]
}
You can substitute 'object' by any class, it's just for description purposes.
How to write a generic response model that will suit those reqs?
I know I can produce response model that would contain status:str and (depending on class) data structure e.g ticket:Ticket or tickets:List[Ticket].
The point is there's a number of classes so I hope there's a more pythonic way to do it.
Thanks for help.
Generic model with static field name
A generic model is a model where one field (or multiple) are annotated with a type variable. Thus the type of that field is unspecified by default and must be specified explicitly during subclassing and/or initialization. But that field is still just an attribute and an attribute must have a name. A fixed name.
To go from your example, say that is your model:
{
"status": "...",
"data": {
"object": {...} # type variable
}
}
Then we could define that model as generic in terms of the type of its object attribute.
This can be done using Pydantic's GenericModel like this:
from typing import Generic, TypeVar
from pydantic import BaseModel
from pydantic.generics import GenericModel
M = TypeVar("M", bound=BaseModel)
class GenericSingleObject(GenericModel, Generic[M]):
object: M
class GenericMultipleObjects(GenericModel, Generic[M]):
objects: list[M]
class BaseGenericResponse(GenericModel):
status: str
class GenericSingleResponse(BaseGenericResponse, Generic[M]):
data: GenericSingleObject[M]
class GenericMultipleResponse(BaseGenericResponse, Generic[M]):
data: GenericMultipleObjects[M]
class Foo(BaseModel):
a: str
b: int
class Bar(BaseModel):
x: float
As you can see, GenericSingleObject reflects the generic type we want for data, whereas GenericSingleResponse is generic in terms of the type parameter M of GenericSingleObject, which is the type of its data attribute.
If we now want to use one of our generic response models, we would need to specify it with a type argument (a concrete model) first, e.g. GenericSingleResponse[Foo].
FastAPI deals with this just fine and can generate the correct OpenAPI documentation. The JSON schema for GenericSingleResponse[Foo] looks like this:
{
"title": "GenericSingleResponse[Foo]",
"type": "object",
"properties": {
"status": {
"title": "Status",
"type": "string"
},
"data": {
"$ref": "#/definitions/GenericSingleObject_Foo_"
}
},
"required": [
"status",
"data"
],
"definitions": {
"Foo": {
"title": "Foo",
"type": "object",
"properties": {
"a": {
"title": "A",
"type": "string"
},
"b": {
"title": "B",
"type": "integer"
}
},
"required": [
"a",
"b"
]
},
"GenericSingleObject_Foo_": {
"title": "GenericSingleObject[Foo]",
"type": "object",
"properties": {
"object": {
"$ref": "#/definitions/Foo"
}
},
"required": [
"object"
]
}
}
}
To demonstrate it with FastAPI:
from fastapi import FastAPI
app = FastAPI()
#app.get("/foo/", response_model=GenericSingleResponse[Foo])
async def get_one_foo() -> dict[str, object]:
return {"status": "foo", "data": {"object": {"a": "spam", "b": 123}}}
Sending a request to that route returns the following:
{
"status": "foo",
"data": {
"object": {
"a": "spam",
"b": 123
}
}
}
Dynamically created model
If you actually want the attribute name to also be different every time, that is obviously no longer possible with static type annotations. In that case we would have to resort to actually creating the model type dynamically via pydantic.create_model.
In that case there is really no point in genericity anymore because type safety is out of the window anyway, at least for the data model. We still have the option to define a GenericResponse model, which we can specify via our dynamically generated models, but this will make every static type checker mad, since we'll be using variables for types. Still, it might make for otherwise concise code.
We just need to define an algorithm for deriving the model parameters:
from typing import Any, Generic, Optional, TypeVar
from pydantic import BaseModel, create_model
from pydantic.generics import GenericModel
M = TypeVar("M", bound=BaseModel)
def create_data_model(
model: type[BaseModel],
plural: bool = False,
custom_plural_name: Optional[str] = None,
**kwargs: Any,
) -> type[BaseModel]:
data_field_name = model.__name__.lower()
if plural:
model_name = f"Multiple{model.__name__}"
if custom_plural_name:
data_field_name = custom_plural_name
else:
data_field_name += "s"
kwargs[data_field_name] = (list[model], ...) # type: ignore[valid-type]
else:
model_name = f"Single{model.__name__}"
kwargs[data_field_name] = (model, ...)
return create_model(model_name, **kwargs)
class GenericResponse(GenericModel, Generic[M]):
status: str
data: M
Using the same Foo and Bar examples as before:
class Foo(BaseModel):
a: str
b: int
class Bar(BaseModel):
x: float
SingleFoo = create_data_model(Foo)
MultipleBar = create_data_model(Bar, plural=True)
This also works as expected with FastAPI including the automatically generated schemas/documentations:
from fastapi import FastAPI
app = FastAPI()
#app.get("/foo/", response_model=GenericResponse[SingleFoo]) # type: ignore[valid-type]
async def get_one_foo() -> dict[str, object]:
return {"status": "foo", "data": {"foo": {"a": "spam", "b": 123}}}
#app.get("/bars/", response_model=GenericResponse[MultipleBar]) # type: ignore[valid-type]
async def get_multiple_bars() -> dict[str, object]:
return {"status": "bars", "data": {"bars": [{"x": 3.14}, {"x": 0}]}}
Output is essentially the same as with the first approach.
You'll have to see, which one works better for you. I find the second option very strange because of the dynamic key/field name. But maybe that is what you need for some reason.

Dynamically create classes- Python

I'm trying to figure out what would be the best way to create classes in a dynamic manner based on the contents of a JSON file. So for example, here's a snippet from the JSON file:
{
"stuff": [{
"name": "burger",
"aka": ["cheeseburger", "hamburger"]
},
{
"name": "fries",
"aka": ["french fries", "potatoes"]
},
{
"name": "meal",
"items": [{
"name": "burger",
"value": "<burger>"
},
{
"name": "fries",
"value": "<fries>"
}
]
}
]
}
And now based on this JSON, I want classes that represent these objects. So for example, something like:
class Burger:
def __init__(self):
self.name = "burger"
self.aka = ["cheeseburger", "hamburger"]
class Meal:
def __init__(self):
self.name = "meal"
self.burger = Burger()
self.fries = Fries()
So basically, based on that JSON, I want to be able to create classes that represent the same attributes and relationships that we see in the JSON. Any ideas about the best way to approach this would be appreciated!
Assuming json variable contains your json data try this:
for d in json:
name = d.pop('name')
t = type(name, (object,), d)
What it does is to call type, which will create new type in python (exactly the same as if you did class name, which correct name set to content of name variable, with base class object and attributes in d. Variable t will contain class object you want.

How to access ForeignKey data without making extra queries in Wagtail(django)

I have the following two classes in my app.models and i'm using the wagtail APIs to get the data as json
class AuthorMeta(Page):
author=models.OneToOneField(User)
city = models.ForeignKey('Cities', related_name='related_author')
class Cities(Page):
name = models.CharField(max_length=30)
So, when I try /api/v1/pages/?type=dashboard.AuthorMeta&fields=title,city, it returns the following data:
{
"meta": {
"total_count": 1
},
"pages": [
{
"id": 11,
"meta": {
"type": "dashboard.AuthorMeta",
"detail_url": "http://localhost:8000/api/v1/pages/11/"
},
"title": "Suneet Choudhary",
"city": {
"id": 10,
"meta": {
"type": "dashboard.Cities",
"detail_url": "http://localhost:8000/api/v1/pages/10/"
}
}
}
]
}
In the city field, it returns the id and meta of the city. How can I get the name of the city in the response here, without making an extra query? :/
I couldn't find any solution in the Documentation. Am I missing something?
Use Django model property to return through the ForeignKey:
class AuthorMeta(Page):
author=models.OneToOneField(User)
city = models.ForeignKey('Cities', related_name='related_author')
city_name = property(get_city_name)
def get_city_name(self):
return self.city.name
Check Term Property to better understand the concept
In case you have the foreign key in a Streamfield, e.g. a PageChooserBlock, you can customize the api response by overwriting the get_api_representation of a block, as described in the example as provided here:
class CustomPageChooserBlock(blocks.PageChooserBlock):
""" Customize the api response. """
def get_api_representation(self, value, context=None):
""" Return the url path instead of the id. """
return value.url_path

Elasticsearch/Python - Re-index data after changing the mappings?

I'm a little stuck on how to re-index data in elastic search after a mapping or a data type has been changed.
According to elastic search docs
Pull the documents in from your old index, using a scrolled search and index them into the new index using the bulk API. Many of the client APIs provide a reindex() method which will do all of this for you. Once you are done, you can delete the old index.
This is my old mapping
{
"test-index2": {
"mappings": {
"business": {
"properties": {
"address": {
"type": "nested",
"properties": {
"country": {
"type": "string"
},
"full_address": {
"type": "string"
}
}
}
}
}
}
}
}
New Index mapping, I'm changing full_address -> location_address
{
"test-index2": {
"mappings": {
"business": {
"properties": {
"address": {
"type": "nested",
"properties": {
"country": {
"type": "string"
},
"location_address": {
"type": "string"
}
}
}
}
}
}
}
}
I'm using the python client for elasticsearch
https://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.reindex
from elasticsearch import Elasticsearch
from elasticsearch.helpers import reindex
es = Elasticsearch(["es.node1"])
reindex(es, "source_index", "target_index")
However this transfers the data from one index to another.
How may i use this to change the mappings/(data types etc) for my case above?
It's Straightforward if you use the scan&scroll and the Bulk API already implemented in the python client of elasticsearch
First -> Fetch all the documents by scan&scroll method
Loop through and make neccessary modifications to each document
Insert the modified documents into a new index using the Bulk API
from elasticsearch import Elasticsearch, helpers
es = Elasticsearch()
# Use the scan&scroll method to fetch all documents from your old index
res = helpers.scan(es, query={
"query": {
"match_all": {}
},
"size":1000
},index="old_index")
new_insert_data = []
# Change the mapping and everything else by looping through all your documents
for x in res:
x['_index'] = 'new_index'
# Change "address" to "location_address"
x['_source']['location_address'] = x['_source']['address']
del x['_source']['address']
# This is a useless field
del x['_score']
es.indices.refresh(index="testing_index3")
# Add the new data into a list
new_insert_data.append(x)
es.indices.refresh(index="new_index")
print new_insert_data
#Use the Bulk API to insert the list of your modified documents into the database
helpers.bulk(es,new_insert_data)
The reindex() API simply "moves" documents from one index to another. There is no way it can detect/infer that the field name full_address in documents of the old index should be location_address in documents in the new index. I doubt there is any API provided by standard Elasticsearch clients that can do what you desire. The only way I can think of achieving this is through additional custom logic on the client side which maintains a dictionary of field names from old index to new index and then read documents from old index and indexes the corresponding document to the new index with new field names obtained from the field name dictionary.
After updating the mapping, this can be done by updating the exiting documents using bulk API.
POST /_bulk
{"update":{"_id":"59519","_type":"asset","_index":"assets"}}
{"doc":{"facility_id":491},"detect_noop":false}
Note - Use 'detect_noop' for detecting the noop update.

Dereference relations in MongoEngine embedded documents

I have a schema that using MongoEngine that looks like this
class User(db.Document)
email = db.EmailField(unique=True)
class QueueElement(db.EmbeddedDocument):
accepts = db.ListField(db.ReferenceField('Resource'))
user = db.ReferenceField(User)
class Resource(db.Document):
name = db.StringField(max_length=255, required=True)
current_queue_element = db.EmbeddedDocumentField('QueueElement')
class Queue(db.EmbeddedDocument):
name = db.StringField(max_length=255, required=True)
resources = db.ListField(db.ReferenceField(Resource))
queue_elements = db.ListField(db.EmbeddedDocumentField('QueueElement'))
class Room(db.Document):
name = db.StringField(max_length=255, required=True)
queues = db.ListField(db.EmbeddedDocumentField('Queue'))
and I would like to return a JSON object of a Room object that would include the information about its queues (together with the referenced resources), and the nested queue_elements ( together with their referenced "accepts" references, and user references)
However, when I want to return a Room with its relationships dereferenced:
room = Room.objects(slug=slug).select_related()
if (room):
return ast.literal_eval(room.to_json())
abort(404)
I don't get any dereferencing. I get:
{
"_cls":"Room",
"_id":{
"$oid":"552ab000605cd92f22347d79"
},
"created_at":{
"$date":1428842482049
},
"name":"second",
"queues":[
{
"created_at":{
"$date":1428842781490
},
"name":"myQueue",
"queue_elements":[
{
"accepts":[
{
"$oid":"552aafb3605cd92f22347d78"
},
{
"$oid":"552aafb3605cd92f22347d78"
},
{
"$oid":"552ab1f8605cd92f22347d7a"
}
],
"created_at":{
"$date":1428849389503
},
"user":{
"$oid":"552ac8c7605cd92f22347d7b"
}
}
],
"resources":[
{
"$oid":"552aafb3605cd92f22347d78"
},
{
"$oid":"552aafb3605cd92f22347d78"
},
{
"$oid":"552ab1f8605cd92f22347d7a"
}
]
}
],
"slug":"secondslug"
}
even though I'm using the select_related() function. I believe this is because MongoEngine may not follow references on embedded documents. Note, I can actually dereference in the python if I do something like this:
room = Room.objects(slug=slug).first().queues[0].queue_elements[0].accepts[0]
return ast.literal_eval(room.to_json())
which yields
{
"_id":{
"$oid":"552aafb3605cd92f22347d78"
},
"created_at":{
"$date":1428842849393
},
"name":"myRes"
}
which is clearly the dereferenced Resource document.
Is there a way I can follow references on embedded documents? Or is this coming up because I'm following a bad pattern, and should be finding a different way to store this information in MongoDB (or indeed, switch to a Relational DB) ? Thanks!

Categories

Resources