Datatype conversion using Python Marshmallow - python

I am trying to use Marshmallow schema to serialize the python object. Below is the schema I have defined for my data.
from marshmallow import Schema, fields
class User:
def __init__(self, name = None, age = None, is_active = None, details = None):
self.name = name
self.age = age
self.is_active = is_active
self.details = details
class UserSchema(Schema):
name = fields.Str()
age = fields.Int()
is_active = fields.Bool()
details = fields.Dict()
The input will be in dictionary format and all the values will be in string.
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details":"{'key1':'val1', 'key2':'val2'}"}
When I try to run the below snippet, values of age and is_active got converted into respective datatype but details remains unchanged.
user_schema = UserSchema()
user_dump_data = user_schema.dump(user_data)
print(user_dump_data)
Output:
{'name': 'xyz', 'is_active': True, 'details': "{'key1':'val1', 'key2':'val2'}", 'age': 20}
I need to serialize the input data into respective datatype I defined in my schema. Is there anything I am doing wrongly? Can anyone guide me how to acheive this using Marshmallow?
I am using
python 3.6
marshmallow 3.5.1
Edit
The above input data is fetched from HBase. By default HBase stores all its values as bytes and return as bytes. Below is the format I get from HBase
{b'name': b'xyz', b'age': b'20', b'is_active': b'true', b'details': b"{'key1':'val1', 'key2':'val2'}"}
Then I decode this dictionary and pass it to my UserSchema to serialize it to be used in web API.

You're confusing serializing (dumping) and deserializing (loading).
Dumping is going from object form to json-serializable basic python types (using Schema.dump) or json string (using Schema.dumps). Loading is the reverse operation.
Typically, your API loads (and validates) data from the outside world and dumps (without validation) your objects to the outside world.
If your input data is this data and you want to load it into objects, you need to use load, not dump.
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details":"{'key1':'val1', 'key2':'val2'}"}
user_loaded_data = user_schema.load(user_data)
user = User(**user_loaded_data)
Except if you do so, you'll be caught by another issue. DictField expects data as a dict, not a str. You need to enter
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details": {'key1':'val1', 'key2':'val2'}}

As Jérôme mentioned you're confusing serializing(dumping) with deserializing(loading). As per your requirement, you should use Schema.load as suggested.
Since, all the input values are expected to be of string type. You can use, pre_load to register a method for pre-processing the data as follows:
from marshmallow import Schema, fields, pre_load
class UserSchema(Schema):
name = fields.Str()
age = fields.Int()
is_active = fields.Bool()
details = fields.Dict()
#pre_load
def pre_process_details(self, data, **kwarg):
data['details'] = eval(data['details'])
return data
user_data = {"name":"xyz", "age":"20", "is_active": 'true',"details":"{'key1':'val1', 'key2':'val2'}"}
user_schema = UserSchema()
user_loaded_data = user_schema.load(user_data)
print(user_loaded_data)
Here, pre_process_details will convert string type to dictionary for correct deserialization.

Related

python marshmallow missing option custom function keep output the same value

python version 3.7, marshmallow 3.1.1
class userSchema(Schema):
created_datetime = fields.Str(required=False, missing=str(datetime.datetime.now()))
name = fields.Str()
for i in range(3):
time.sleep(5)
user_data = {"name": "test"}
test_load = userSchema.load(user_data)
print(test_load)
I found the loaded data are all with the same created_datetime whereas I expect them to be different.
Is this the case that missing and default can only be a fixed value?
You need to provide a callable that accepts no arguments to generate a dynamic default/missing value. For example, using your code above:
from marshmallow import Schema, feilds
from datetime import datetime, timezone
def datetime_str_now():
""" create timezone aware datetime """
return datetime.now(
tz=timezone.utc # remove this if you don't need timezone aware dates
).isoformat()
class userSchema(Schema):
created_datetime = fields.Str(
required=False,
missing=datetime_str_now
)
name = fields.Str()
for i in range(3):
time.sleep(5)
user_data = {"name": "test"}
test_load = userSchema().load(user_data)
print(test_load)
""" Outputs:
{'created_datetime': '2020-10-07T09:08:00.847929+00:00', 'name': 'test'}
{'created_datetime': '2020-10-07T09:08:01.851066+00:00', 'name': 'test'}
{'created_datetime': '2020-10-07T09:08:02.854573+00:00', 'name': 'test'}
"""

SQLAlchemy insert from a JSON list to database

I have a list with a JSON like so:
print(type(listed)) # <class 'list'>
print (listed)
[
{
"email": "x#gmail.com",
"fullname": "xg gf",
"points": 5,
"image_url", "https://imgur.com/random.pmg"
},
{
... similar json for the next user and so on
}
]
I'm trying to insert them into my postgres database that has a model like this:
class Users(db.Model):
__tablename__ = 'users'
email = db.Column(db.String(), primary_key=True)
displayName = db.Column(db.String())
image = db.Column(db.String())
points = db.Column(db.Integer())
But I'm quite stuck, I've tried several approaches but none worked, anyone can guide me with an example on how to do it properly?
Here's a solution without pandas, using SQLAlchemy Core
create engine
engine = sqlalchemy.create_engine('...')
load the metadata using the engine as the bind parameter
metadata = sqalchemy.Metadata(bind=engine)
make a reference to the table
users_table = sqlalchemy.Table('users', metadata, autoload = True)
you can then start your inserts
for user in json:
query = users_table.insert()
query.values(**user)
my_session = Session(engine)
my_session.execute(query)
my_session.close()
This creates a session for every user in json, but I thought you might like it anyway. Its very flexible and works for any table, you don't even need a model. Just make sure the json doesnt contain any columns that dont exist in the db (this means you will need to use "img_url" (column name) in both the json key and in the db column name)
Here is an example json list, like you provided.
json = [
{
"email": "x#gmail.com",
"fullname": "xg gf",
"points": 5,
"image_url": "https://imgur.com/random.pmg"
},
{
"email": "onur#gmail.com",
"fullname": "o g",
"points": 7,
"image_url": "https://imgur.com/random_x.pmg"
}
]
Now create an empty dataframe all_df and run iterations inside your json list.
Each iteration creates a dataframe with the data from dictionary inside the list, transpose it and append to all_df.
import pandas as pd
all_df = pd.DataFrame()
for i in json:
df = pd.DataFrame.from_dict(data=i, orient='index').T
all_df = all_df.append(df)
Output:
Now you can go ahead create a session to your database and push all_df
all_df.to_sql(con=your_session.bind, name='your_table_name', if_exists='your_preferred_method', index=False)
Using marshmallow-sqlalchemy
validate the incoming JSON
create general utilities for loading and dumping data
Define schemas
schema.py
from marshmallow import EXCLUDE
from marshmallow_sqlalchemy import ModelSchema
from app import db
class UserSchema(ModelSchema):
class Meta(ModelSchema.Meta):
model = Users
sqla_session = db.session
user_schema_full = UserSchema(only=(
'email',
'displayName',
'image',
'points'
))
utils.py
Exact details below don't matter but create general utility for going from JSON to ORM objects and ORM objects to JSON. schema_partial used for auto generated primary keys.
def loadData(data, schema_partial, many=False,
schema_full=None, instance=None):
try:
if instance is not None:
answer = schema_full.load(data, instance=instance, many=many)
else:
answer = schema_partial.load(data, many=many)
except ValidationError as errors:
raise InvalidData(errors, status_code=400)
return answer
def loadUser(data, instance=None, many=False):
return loadData(data=data,
schema_partial=user_schema_full,
many=many,
schema_full=user_schema_full,
instance=instance)
def dumpData(load_object, schema, many=False):
try:
answer = schema.dump(load_object, many=many)
except ValidationError as errors:
raise InvalidDump(errors, status_code=400)
return answer
def dumpUser(load_object, many=False):
return dumpData(load_object, schema=user_schema_full, many=many)
Use loadUser and dumpUser within api to produce clean flat code.
api.py
#app.route('/users/', methods=['POST'])
def post_users():
"""Post many users"""
users_data = request.get_json()
users = loadUser(users_data, many=True)
for user in users:
db.session.add(user)
object_dump = dumpUser(users, many=True)
db.session.commit()
return jsonify(object_dump), 201

How to make a python object json-serialized?

I want to serialize a python object, after saved it into mysql(based on Django ORM) I want to get it and pass this object to a function which need this kind of object as a param.
Following two parts are my main logic code:
1 save param part :
class Param(object):
def __init__(self, name=None, targeting=None, start_time=None, end_time=None):
self.name = name
self.targeting = targeting
self.start_time = start_time
self.end_time = end_time
#...
param = Param()
param.name = "name1"
param.targeting= "targeting1"
task_param = {
"task_id":task_id, # string
"user_name":user_name, # string
"param":param, # Param object
"save_param":save_param_dict, # dictionary
"access_token":access_token, # string
"account_id": account_id, # string
"page_id": page_id, # string
"task_name":"sync_create_ad" # string
}
class SyncTaskList(models.Model):
task_id = models.CharField(max_length=128, blank=True, null=True)
ad_name = models.CharField(max_length=128, blank=True, null=True)
user_name = models.CharField(max_length=128, blank=True, null=True)
task_status = models.SmallIntegerField(blank=True, null=True)
task_fail_reason = models.CharField(max_length=255, blank=True, null=True)
task_name = models.CharField(max_length=128, blank=True, null=True)
start_time = models.DateTimeField()
end_time = models.DateTimeField(blank=True, null=True)
task_param = models.TextField(blank=True, null=True)
class Meta:
managed = False
db_table = 'sync_task_list'
SyncTaskList(
task_id=task_id,
ad_name=param.name,
user_name=user_name,
task_status=0,
task_param = task_param,
).save()
2 use param part
def add_param(param, access_token):
pass
task_list = SyncTaskList.objects.filter(task_status=0)
for task in task_list:
task_param = json.loads(task.task_param)
add_param(task_param["param"], task_param["access_token"]) # pass param object to function add_param
If I directly use Django ORM to save task_param into mysql, I get error,
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
for after ORM operation, I get string who's property name enclosed in single quotes like :
# in mysql it saved as
task_param: " task_param: {'task_id': 'e4b8b240cefaf58fa9fa5a591221c90a',
'user_name': 'jimmy',
'param': Param(name='name1',
targeting='geo_locations',
),
'save_param': {}}"
I am now confused with serializing an python object, then how to load this original object and pass it to a function?
Any commentary is very welcome. great thanks.
update my solution so far
task_param = {
# ...
"param":vars(param), # turn Param object to dictionary
# ...
}
SyncTaskList(
#...
task_param = json.dumps(task_param),
#...
).save()
#task_list = SyncTaskList.objects.filter(task_status=0)
#for task in task_list:
task_param = json.loads(task.task_param)
add_param(Param(**task_param["param"]), task_param["access_token"])
update based on #AJS's answer
directly pickle dumps and saved it as an binary field, then pickle loadsit also works
Any better solution for this?
Try looking into msgpack
https://msgpack.org/index.html
unlike pickle, which is python-specific, msgpack is supported by many languages (so the language you use to write to mysql can be different than the language used to read).
There are also some projects out there that integrate these serializer-libraries into Django model fields:
Pickle: https://pypi.org/project/django-picklefield/
MsgPack: https://github.com/vakorol/django-msgpackfield/blob/master/msgpackfield/msgpackfield.py
You can use pickle basically you are serializing your python object and save it as bytes in your MySQL db using BinaryField as your model field type in Django, as i don't think JSON serialization would work in your case as you have a python object as a value as well in your dict, when you fetch your data from db simpily unpickle it syntax is similar to json library see below.
import pickle
#to pickle
data = pickle.dumps({'name':'testname'})
# to unpickle just do
pickle.loads(data)
so in your case when you unpickle your object you should get your data in same form as it was before you did pickle.
Hope this helps.

'Missing data' when try to load data with data_key using marshmallow

I try to use marshmallow 2.18.0 on python 3.7 for validating data. I waiting for json {'name': 'foo', 'emailAddress': 'x#x.org'} and load it with schema:
class FooLoad(Schema):
name = fields.Str()
email = fields.Email(data_key='emailAddress', required=True)
I except that data_key on load will return me somesing like {'name': 'foo', 'email': 'x#x.org'}, but i got error in errors field:
schema_load = FooLoad()
after_load = schema_load.load({'name': 'foo', 'emailAddress': 'x#x.org'})
after_load.errors # return {'email': ['Missing data for required field.']}
But according example from marshmallow docs with devDependencies or github issue after_load must contain data like {'name': 'foo', 'email': 'x#x.org'}.
I want to deserialize the incoming date with names differ than schema attribute names (specifying what is required on the date_key), but i got errors when try it. How i can deserialize input data with names, different from schema attribute and declarited in data_key field of this attributes?
data_key was introduced in marshmallow 3.
See changelog entry:
Backwards-incompatible: Add data_key parameter to fields for specifying the key in the input and output data dict. This parameter replaces both load_from and dump_to (#717).
and associated pull-request.
When using marshmallow 2, you must use load_from/dump_to:
class FooLoad(Schema):
name = fields.Str()
email = fields.Email(load_from='emailAddress', dump_to='emailAddress', required=True)
You're using marshmallow 2 but reading the docs for marshmallow 3.
Note that marshmallow 3 contains a bunch of improvements and is in RC state, so if you're starting a project, you could go for marshmallow 3 and save yourself some transition work in the future.
I was experiencing the same phenomenon, trying to parse an API response. It turned out though I needed to drill 1 level deeper into the response, earlier than I was doing.
The response was:
{
"meta": {
"status": 200,
"message": null
},
"response": {
"ownerId": "…",
"otherData": […]
}
}
Then I was calling:
MySchema().load(response.json())
…
class MySchema(Schema):
owner_id = fields.String(data_key='ownerId')
…
Meta:
unknown = INCLUDE
#post_load
def load_my_object(self, data, **kwargs):
inner = data.get('response', data)
return MyObject(**inner)
But really, it should have been:
inner = data.get('response', data)
return MySchema().load(inner)
…
class MySchema(Schema):
owner_id = fields.String(data_key='ownerId')
…
Meta:
unknown = INCLUDE
#post_load
def load_my_object(self, data, **kwargs):
return MyObject(**data)

Peewee - How to Convert a Dict into a Model

Lets say I have
import peewee
class Foo(Model):
name = CharField()
I would like to do the following:
f = {id:1, name:"bar"}
foo = Foo.create_from_dict(f)
Is this native in Peewee? I was unable to spot anything in the source code.
I've wrote this function which works but would rather use the native function if it exists:
#clazz is a string for the name of the Model, i.e. 'Foo'
def model_from_dict(clazz, dictionary):
#convert the string into the actual model class
clazz = reduce(getattr, clazz.split("."), sys.modules[__name__])
model = clazz()
for key in dictionary.keys():
#set the attributes of the model
model.__dict__['_data'][key] = dictionary[key]
return model
I have a web page that displays all the foos and allows the user to edit them. I would like to be able to pass a JSON string to the controller, where I would convert it to a dict and then make Foos out of it, so I can update as necessary.
If you have a dict, you can simply:
class User(Model):
name = CharField()
email = CharField()
d = {'name': 'Charlie', 'email': 'foo#bar.com'}
User.create(**d)
You could use PickledKeyStore which allows you to save any value as a python dict and it works like Python's pickle library.

Categories

Resources