'Missing data' when try to load data with data_key using marshmallow

'Missing data' when try to load data with data_key using marshmallow - python

I try to use marshmallow 2.18.0 on python 3.7 for validating data. I waiting for json {'name': 'foo', 'emailAddress': 'x#x.org'} and load it with schema:
class FooLoad(Schema):
name = fields.Str()
email = fields.Email(data_key='emailAddress', required=True)
I except that data_key on load will return me somesing like {'name': 'foo', 'email': 'x#x.org'}, but i got error in errors field:
schema_load = FooLoad()
after_load = schema_load.load({'name': 'foo', 'emailAddress': 'x#x.org'})
after_load.errors # return {'email': ['Missing data for required field.']}
But according example from marshmallow docs with devDependencies or github issue after_load must contain data like {'name': 'foo', 'email': 'x#x.org'}.
I want to deserialize the incoming date with names differ than schema attribute names (specifying what is required on the date_key), but i got errors when try it. How i can deserialize input data with names, different from schema attribute and declarited in data_key field of this attributes?

data_key was introduced in marshmallow 3.
See changelog entry:
Backwards-incompatible: Add data_key parameter to fields for specifying the key in the input and output data dict. This parameter replaces both load_from and dump_to (#717).
and associated pull-request.
When using marshmallow 2, you must use load_from/dump_to:
class FooLoad(Schema):
name = fields.Str()
email = fields.Email(load_from='emailAddress', dump_to='emailAddress', required=True)
You're using marshmallow 2 but reading the docs for marshmallow 3.
Note that marshmallow 3 contains a bunch of improvements and is in RC state, so if you're starting a project, you could go for marshmallow 3 and save yourself some transition work in the future.

I was experiencing the same phenomenon, trying to parse an API response. It turned out though I needed to drill 1 level deeper into the response, earlier than I was doing.
The response was:
{
"meta": {
"status": 200,
"message": null
},
"response": {
"ownerId": "…",
"otherData": […]
}
}
Then I was calling:
MySchema().load(response.json())
…
class MySchema(Schema):
owner_id = fields.String(data_key='ownerId')
…
Meta:
unknown = INCLUDE
#post_load
def load_my_object(self, data, **kwargs):
inner = data.get('response', data)
return MyObject(**inner)
But really, it should have been:
inner = data.get('response', data)
return MySchema().load(inner)
…
class MySchema(Schema):
owner_id = fields.String(data_key='ownerId')
…
Meta:
unknown = INCLUDE
#post_load
def load_my_object(self, data, **kwargs):
return MyObject(**data)

Related

Django elasticsearch DSL with custom model fields (hashid field)

I have a model that uses django hashid fields for the id.
class Artwork(Model):
id = HashidAutoField(primary_key=True, min_length=8, alphabet="0123456789abcdefghijklmnopqrstuvwxyz")
title = ....
This is the related item of another model
class ArtworkFeedItem(FeedItem):
artwork = models.OneToOneField('artwork.Artwork', related_name='artwork_feeditem', on_delete=models.CASCADE)
Now I'm trying to setup [django elasticsearch dsl] (https://github.com/django-es/django-elasticsearch-dsl) and to that end have the Document
#registry.register_document
class ArtworkFeedItemDocument(Document):
class Index:
name = 'feed'
settings = {
'number_of_shards': 1,
'number_of_replicas': 0
}
artwork = fields.ObjectField(
properties={
'id': fields.TextField(),
'title': fields.TextField(
attr='title',
fields={
'suggest': fields.Completion(),
}
)
}
)
class Django:
model = ArtworkFeedItem
fields = []
related_models = [Artwork]
However, when I try to rebuild indices with python manage.py search_index --rebuild I get the following exception
elasticsearch.exceptions.SerializationError: ({'index': {'_id': Hashid(135): l2vylzm9, '_index': 'feed'}}, TypeError("Unable to serialize Hashid(135): l2vylzm9 (type: <class 'hashid_field.hashid.Hashid'>)",))
Django elasticsearch dsl clearly does not know what to do with such a hashid field.
I thought maybe I could just make my own HashIdField like
from elasticsearch_dsl import Field
class HashIdField(Field):
"""
Custom DSL field to support HashIds
"""
name = "hashid"
def _serialize(self, data):
return data.hashid
then use it in 'id': HashIdField, but this gave me yet another exception
elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'No handler for type [hashid] declared on field [id]')
Does anyone know how I could get this to work?

For anyone interested, I managed to solve this by overriding the generate_id method of the Document so that the _id used is just a plain string:
#classmethod
def generate_id(cls, object_instance):
return object_instance.id.hashid

python marshmallow missing option custom function keep output the same value

python version 3.7, marshmallow 3.1.1
class userSchema(Schema):
created_datetime = fields.Str(required=False, missing=str(datetime.datetime.now()))
name = fields.Str()
for i in range(3):
time.sleep(5)
user_data = {"name": "test"}
test_load = userSchema.load(user_data)
print(test_load)
I found the loaded data are all with the same created_datetime whereas I expect them to be different.
Is this the case that missing and default can only be a fixed value?

You need to provide a callable that accepts no arguments to generate a dynamic default/missing value. For example, using your code above:
from marshmallow import Schema, feilds
from datetime import datetime, timezone
def datetime_str_now():
""" create timezone aware datetime """
return datetime.now(
tz=timezone.utc # remove this if you don't need timezone aware dates
).isoformat()
class userSchema(Schema):
created_datetime = fields.Str(
required=False,
missing=datetime_str_now
)
name = fields.Str()
for i in range(3):
time.sleep(5)
user_data = {"name": "test"}
test_load = userSchema().load(user_data)
print(test_load)
""" Outputs:
{'created_datetime': '2020-10-07T09:08:00.847929+00:00', 'name': 'test'}
{'created_datetime': '2020-10-07T09:08:01.851066+00:00', 'name': 'test'}
{'created_datetime': '2020-10-07T09:08:02.854573+00:00', 'name': 'test'}
"""

Datatype conversion using Python Marshmallow

I am trying to use Marshmallow schema to serialize the python object. Below is the schema I have defined for my data.
from marshmallow import Schema, fields
class User:
def __init__(self, name = None, age = None, is_active = None, details = None):
self.name = name
self.age = age
self.is_active = is_active
self.details = details
class UserSchema(Schema):
name = fields.Str()
age = fields.Int()
is_active = fields.Bool()
details = fields.Dict()
The input will be in dictionary format and all the values will be in string.
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details":"{'key1':'val1', 'key2':'val2'}"}
When I try to run the below snippet, values of age and is_active got converted into respective datatype but details remains unchanged.
user_schema = UserSchema()
user_dump_data = user_schema.dump(user_data)
print(user_dump_data)
Output:
{'name': 'xyz', 'is_active': True, 'details': "{'key1':'val1', 'key2':'val2'}", 'age': 20}
I need to serialize the input data into respective datatype I defined in my schema. Is there anything I am doing wrongly? Can anyone guide me how to acheive this using Marshmallow?
I am using
python 3.6
marshmallow 3.5.1
Edit
The above input data is fetched from HBase. By default HBase stores all its values as bytes and return as bytes. Below is the format I get from HBase
{b'name': b'xyz', b'age': b'20', b'is_active': b'true', b'details': b"{'key1':'val1', 'key2':'val2'}"}
Then I decode this dictionary and pass it to my UserSchema to serialize it to be used in web API.

You're confusing serializing (dumping) and deserializing (loading).
Dumping is going from object form to json-serializable basic python types (using Schema.dump) or json string (using Schema.dumps). Loading is the reverse operation.
Typically, your API loads (and validates) data from the outside world and dumps (without validation) your objects to the outside world.
If your input data is this data and you want to load it into objects, you need to use load, not dump.
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details":"{'key1':'val1', 'key2':'val2'}"}
user_loaded_data = user_schema.load(user_data)
user = User(**user_loaded_data)
Except if you do so, you'll be caught by another issue. DictField expects data as a dict, not a str. You need to enter
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details": {'key1':'val1', 'key2':'val2'}}

As Jérôme mentioned you're confusing serializing(dumping) with deserializing(loading). As per your requirement, you should use Schema.load as suggested.
Since, all the input values are expected to be of string type. You can use, pre_load to register a method for pre-processing the data as follows:
from marshmallow import Schema, fields, pre_load
class UserSchema(Schema):
name = fields.Str()
age = fields.Int()
is_active = fields.Bool()
details = fields.Dict()
#pre_load
def pre_process_details(self, data, **kwarg):
data['details'] = eval(data['details'])
return data
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details":"{'key1':'val1', 'key2':'val2'}"}
user_schema = UserSchema()
user_loaded_data = user_schema.load(user_data)
print(user_loaded_data)
Here, pre_process_details will convert string type to dictionary for correct deserialization.

SQLAlchemy insert from a JSON list to database

I have a list with a JSON like so:
print(type(listed)) # <class 'list'>
print (listed)
[
{
"email": "x#gmail.com",
"fullname": "xg gf",
"points": 5,
"image_url", "https://imgur.com/random.pmg"
},
{
... similar json for the next user and so on
}
]
I'm trying to insert them into my postgres database that has a model like this:
class Users(db.Model):
__tablename__ = 'users'
email = db.Column(db.String(), primary_key=True)
displayName = db.Column(db.String())
image = db.Column(db.String())
points = db.Column(db.Integer())
But I'm quite stuck, I've tried several approaches but none worked, anyone can guide me with an example on how to do it properly?

Here's a solution without pandas, using SQLAlchemy Core
create engine
engine = sqlalchemy.create_engine('...')
load the metadata using the engine as the bind parameter
metadata = sqalchemy.Metadata(bind=engine)
make a reference to the table
users_table = sqlalchemy.Table('users', metadata, autoload = True)
you can then start your inserts
for user in json:
query = users_table.insert()
query.values(**user)
my_session = Session(engine)
my_session.execute(query)
my_session.close()
This creates a session for every user in json, but I thought you might like it anyway. Its very flexible and works for any table, you don't even need a model. Just make sure the json doesnt contain any columns that dont exist in the db (this means you will need to use "img_url" (column name) in both the json key and in the db column name)

Here is an example json list, like you provided.
json = [
{
"email": "x#gmail.com",
"fullname": "xg gf",
"points": 5,
"image_url": "https://imgur.com/random.pmg"
},
{
"email": "onur#gmail.com",
"fullname": "o g",
"points": 7,
"image_url": "https://imgur.com/random_x.pmg"
}
]
Now create an empty dataframe all_df and run iterations inside your json list.
Each iteration creates a dataframe with the data from dictionary inside the list, transpose it and append to all_df.
import pandas as pd
all_df = pd.DataFrame()
for i in json:
df = pd.DataFrame.from_dict(data=i, orient='index').T
all_df = all_df.append(df)
Output:
Now you can go ahead create a session to your database and push all_df
all_df.to_sql(con=your_session.bind, name='your_table_name', if_exists='your_preferred_method', index=False)

Using marshmallow-sqlalchemy
validate the incoming JSON
create general utilities for loading and dumping data
Define schemas
schema.py
from marshmallow import EXCLUDE
from marshmallow_sqlalchemy import ModelSchema
from app import db
class UserSchema(ModelSchema):
class Meta(ModelSchema.Meta):
model = Users
sqla_session = db.session
user_schema_full = UserSchema(only=(
'email',
'displayName',
'image',
'points'
))
utils.py
Exact details below don't matter but create general utility for going from JSON to ORM objects and ORM objects to JSON. schema_partial used for auto generated primary keys.
def loadData(data, schema_partial, many=False,
schema_full=None, instance=None):
try:
if instance is not None:
answer = schema_full.load(data, instance=instance, many=many)
else:
answer = schema_partial.load(data, many=many)
except ValidationError as errors:
raise InvalidData(errors, status_code=400)
return answer
def loadUser(data, instance=None, many=False):
return loadData(data=data,
schema_partial=user_schema_full,
many=many,
schema_full=user_schema_full,
instance=instance)
def dumpData(load_object, schema, many=False):
try:
answer = schema.dump(load_object, many=many)
except ValidationError as errors:
raise InvalidDump(errors, status_code=400)
return answer
def dumpUser(load_object, many=False):
return dumpData(load_object, schema=user_schema_full, many=many)
Use loadUser and dumpUser within api to produce clean flat code.
api.py
#app.route('/users/', methods=['POST'])
def post_users():
"""Post many users"""
users_data = request.get_json()
users = loadUser(users_data, many=True)
for user in users:
db.session.add(user)
object_dump = dumpUser(users, many=True)
db.session.commit()
return jsonify(object_dump), 201

Make serializer for nested values in Django rest serializer

I am trying to integrate my django project with the api from mailchimp, to add users to a list I need to generate some json in the following format:
{
"email_address": "EMAIL",
"status": "subscribed",
"merge_fields": {
"FNAME": "FIRST_NAME",
"LNAME": "SURNAME"
},
Sadly I am having some struggles with the nested merge_field. I expected the following to work:
class MergeSerializer(serializers.Serializer):
FNAME = serializers.SerializerMethodField('get_first_name')
LNAME = serializers.SerializerMethodField('get_surname')
def get_first_name(self, obj):
return obj.first_name
def get_surname(self, obj):
return obj.surname
class CreateContactSerializer(serializers.Serializer):
email_address = serializers.EmailField()
status = serializers.SerializerMethodField('get_alternative_status')
merge_fields = MergeSerializer(read_only=True)
def get_alternative_status(self, obj):
return "subscribed"
This only generates some json with the email_address and the status, and completely ignores the merge_fields. After hours I have absolutely no clue what to try next. Does anybody know how to solve this problem?
Since I thought that the documentation for the marshmallow framework was a bit clearer I also tried it with their package, this however returend exactly the same result (so ignoring my merge_fields):
class MergeFieldsSchema(Schema):
FNAME = fields.String(attribute="first_name")
LNAME = fields.String(attribute="surname")
class CreateContactSerializer(Schema):
merge_fields = fields.Nested(MergeFieldsSchema)
email_address = fields.String()
status = fields.Constant("subscribed")

You don't say this, but I am assuming that surname and first_name are also part of the same object as email_address on your model, which is why the nested serializer does not work (as nested serializers are for foreign keys). If this is not the case, please add the model to the OP.
Because you just want to customize the output, you can use a Serializer Method on your main CreateContactSerializer:
class CreateContactSerializer(serializers.Serializer):
email_address = serializers.EmailField()
status = serializers.SerializerMethodField('get_alternative_status')
merge_fields = serializers.SerializerMethodField('get_merge_fields')
def get_alternative_status(self, obj):
return "subscribed"
def get_merge_fields(self, obj):
return {
"FNAME": obj.first_name,
"LNAME": obj.surname
}
If you want, you could even reuse the serializer that you already used and do
def get_merge_fields(self, obj):
serializer = MergeSerializer(obj)
return serializer.data;
Don't forget to add merge_fields to your fields

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

'Missing data' when try to load data with data_key using marshmallow - python

Related

Django elasticsearch DSL with custom model fields (hashid field)

python marshmallow missing option custom function keep output the same value

Datatype conversion using Python Marshmallow

SQLAlchemy insert from a JSON list to database

Make serializer for nested values in Django rest serializer

Categories

Resources