python, mongo and marshmallow: datetime struggles

python, mongo and marshmallow: datetime struggles - python

I'm trying to do something pretty simple: get the current time, validate my object with marshmallow, store it in mongo
python 3.7
requirements:
datetime==4.3
marshmallow==3.5.1
pymongo==3.10.1
schema.py
from marshmallow import Schema, fields
...
class MySchema(Schema):
user_id = fields.Str(required=True)
user_name = fields.Str()
date = fields.DateTime()
account_type = fields.Str()
object = fields.Raw()
preapredata.py
from datetime import datetime
from schema.py import Myschema
...
dt = datetime.now()
x = dt.isoformat()
data = {
"user_id": '123123123',
"user_name": 'my cool name',
"date": x,
"account_type": 'another sting',
"trade": {'some':'dict'}
}
# validate the schema for storage
validator = MySchema().load(data)
if 'errors' in validator:
log.info('validator.errors')
log.info(validator.errors)
...
res = MyService().create(
data
)
myservice.py
def create(self, data):
log.info("in creating data service")
log.info(data)
self.repo.create(data)
return MySchema().dump(data)
connector to mongo is fine, am saving other data that has no datetime with no issue.
I seem to have gone through a hundred different variations of formatting the datetime before passing it to the date key, as well as specifying the 'format' option in the schema field both inline and in the meta class, example:
#class Meta:
# datetimeformat = '%Y-%m-%dT%H:%M:%S+03:00'
Most variations I try result in:
{'date': ['Not a valid datetime.']}
i've finally managing to pass validation going in by using simply
x = dt.isoformat()
and leaving the field schema as default ( date = fields.DateTime() )
but when i dump back through marshmallow i get
AttributeError: 'str' object has no attribute 'isoformat'
the record is created in mongo DB fine, but the field type is string, ideally I'd like to leverage the native mongo date field
if i try and pass
datetime.now()
to the date, it fails with
{'date': ['Not a valid datetime.']}
same for
datetime.utcnow()
Any guidance really appreciated.
Edit: when bypassing marshmallow, and using either
datetime.now(pytz.utc)
or
datetime.utcnow()
field data stored in mongo as expected as date, so the issue i think can be stated more succinctly as: how can i have marshmallow fields.DateTime() validate either of these formats?
Edit 2:
so we have already begun refactoring thanks to Jérôme's insightful answer below.
for anyone who wants to 'twist' marshmallow to behave like the original question stated, we ended up going with:
date = fields.DateTime(
#dump_only=True,
default=lambda: datetime.utcnow(),
missing=lambda: datetime.utcnow(),
allow_none=False
)
i.e. skip passing date at all, have marshmallow generate it from missing, which was satisfying our use case.

The point of marshmallow is to load data from serialized (say, JSON, isoformat string, etc.) into actual Python objects (int, datetime,...). And conversely to dump it from object to a serialized string.
Marshmallow also provides validation on load, and only on load. When dumping, the data comes from the application and shouldn't need validation.
It is useful in an API to load and validate data from the outside world before using it in an application. And to serialize it back to the outside world.
If your data is in serialized form, which is the case when you call isoformat() on your datetime, then marshmallow can load it, and you get a Python object, with a real datetime in it. This is what you should feed pymongo.
# load/validate the schema for storage
try:
loaded_data = MySchema().load(data)
except ValidationError as exc:
log.info('validator.errors')
log.info(exc.errors)
...
# Store object in database
res = MyService().create(loaded_data)
Since marshmallow 3, load always returns deserialized content and you need to try/catch validation errors.
If your data does not come to your application in deserialized form (if it is in object form already), then maybe marshmallow is not the right tool for the job, because it does not perform validation on deserialized objects (see https://github.com/marshmallow-code/marshmallow/issues/1415).
Or maybe it is. You could use an Object-Document Mapper (ODM) to manage the validation and database management. This is an extra layer other pymongo. umongo is a marshmallow-based mongoDB ODM. There are other ODMs out there: mongoengine, pymodm.
BTW, what is this
datetime==4.3
Did you install DateTime? You don't need this.
Disclaimer: marshmallow and umongo maintainer speaking.

Related

How Do I Make a Field Either fields.Dict or Specify a Schema?

I have a field on a schema that should be a specific schema based on a value in that field if it exists. To elaborate, the field should be fields.Dict() if it doesn't contain a version property. Otherwise, the schema should be retrieved from a map
def pricing_schema_serialization(base_object, parent_obj):
# Use dictionary access to support using this function to look up schema for both both serialization directions,
# and test data
if type(base_object) is not dict:
object_dict = base_object.__dict__
else:
object_dict = base_object
pricing_version = object_dict.get("version", None)
if pricing_version in version_to_schema:
return version_to_schema[pricing_version]() # Works fine
return fields.Dict(missing=dict, default=dict) # Produces error
class Product:
# pricing = fields.Dict(missing=dict, default=dict) # This worked
pricing = PolyField(
serialization_schema_selector=pricing_schema_serialization,
deserialization_schema_selector=pricing_schema_serialization,
required=False,
missing=dict,
default=dict,
)
This is the error produced:
{'pricing': ["Unable to use schema. Ensure there is a deserialization_schema_selector and that it returns a schema when the function is passed in {}. This is the class I got. Make sure it is a schema: <class 'marshmallow.fields.Dict'>"]}
I saw that marhmallow 3.x has a from_dict() method. Unfortunately, we're stuck on 2.x.
Things I tried:
The code above
Creating my own from_dict class method following the same implementation here. Same result

Store Subtitles in a Database

I'm working on a project that uses AI to recognise the speech of an audio file. The output of this AI is a huge JSON object with tons of values. I'll remove some keys, and the final structure will look as follows.
{
text: "<recognised text>",
language: "<detected language>"
segments: [
{startTimestamp: "00:00:00", endTimestamp: "00:00:10", text: "<some text>"},
{startTimestamp: "00:00:10", endTimestamp: "00:00:17", text: "<some text>"},
{startTimestamp: "00:00:17", endTimestamp: "00:00:26", text: "<some text>"},
{ ... },
{ ... }
]
}
Now, I wish to store this new trimmed object in a SQL database because I wish to be able to edit it manually. I'll create a React application to edit segments, delete segments, etc. Additionally, I want to add this feature to the React application, where the information will be saved every 5 seconds using an AJAX call.
Now, I don't understand how I should store this object in the SQL database. Initially, I thought I would store the whole object as a string in a database. Whenever some change is made to the object, I'll send a JSON object from the React application, the backend will sanitize it and then replace the old stringified object in the database with the new sanitised string object. This way updating and deletion will happen with ease but there can be issues in case of searching. But I'm wondering if there are any better approaches to do this.
Could someone guide me on this?
Tech Stack
Frontend - React
Backend - Django 3.2.15
Database - PostgreSQL
Thank you

Now, I don't understand how I should store this object in the SQL database. Initially, I thought I would store the whole object as a string in a database.
If the data has a clear structure, you should not store it as a JSON blob in a relational database. While relational databases have some support for JSON nowadays, it is still not very effective, and normally it means you can not effectively filter, aggregate, and manipulate data, nor can you check referential integrity.
You can work with two models that look like:
from django.db import models
from django.db.models import F, Q
class Subtitle(models.Model):
text = models.CharField(max_length=128)
language = models.CharField(max_length=128)
class Segment(models.Model):
startTimestamp = models.DurationField()
endTimestamp = models.DurationField()
subtitle = models.ForeignKey(
Subtitle, on_delete=models.CASCADE, related_name='segments'
)
text = models.CharField(max_length=512)
class Meta:
ordering = ('subtitle', 'startTimestamp', 'endTimestamp')
constraints = [
models.CheckConstraint(
check=Q(startTimestamp__gt=F('endTimestamp')),
name='start_before_end',
)
]
This will also guarantee that the startTimestamp is before the endTimestamp for example, that these fields store durations (and not "foo" for example).
You can convert from and to JSON with serializers [drf-doc]:
from rest_framework import serializers
class SegmentSerializer(serializers.ModelSerializer):
class Meta:
model = Segment
fields = ['startTimestamp', 'endTimestamp', 'text']
class SubtitleSerializer(serializers.ModelSerializer):
segments = SegmentSerializer(many=True)
class Meta:
model = Subtitle
fields = ['text', 'language', 'segments']

Getting 'str' object has no attribute 'isoformat' for peewee DateTimeField

I am using peewee ORM for read data from a MySQL database. My DB model class as below
import peewee
import datetime
from collections import OrderedDict
...............
class User(peewee.Model):
...............
created_by = CharField(null=True)
update_by = CharField(null=True)
updated_date = DateTimeField(default=datetime.datetime.now)
.................
def __to_dict__(self):
user_dict = OrderedDict([
.................
('created_by', self.created_by),
('update_by', self.update_by),
('updated_date', self.updated_date.isoformat())
])
.............
I am setting data from ORM in following code
users= User.select().distinct()
return [user.__to_dict__() for user in users]
I am getting following error for some of data rows which having updated_date fields as '0000-00-00 00:00:00'
user = user.__to_dict__()
File "/opt/appserver/app1/app/models/user.py", line 172, in __to_dict__
('updated_date', self.updated_date.isoformat())
AttributeError: 'str' object has no attribute 'isoformat'
why I am getting this error?
PS: AttributeError: 'str' object has no attribute 'isoformat' does not answers my question

Probably the database you are using contains datetimes that are not parse-able or otherwise cannot be handled correctly when reading the data off the cursor. Peewee will automatically try to convert string datetime values into the appropriate python datetime instance, but if you have garbage or weirdly-formatted data in the table it will not work.
This is typically only a problem for sqlite databases, as the other dbs will enforce a valid datetime being stored in a datetime-affinity column.
You can try to work around it by extending the supported formats of the relevant fields. e.g., DateTimeField has a list of formats it will automatically parse (field.formats). You can extend this to include whatever format you are using.

Flask Sqlalchemy add multiple row

I am using flask-restful this is
My class I want to insert
class OrderHistoryResource(Resource):
model = OrderHistoryModel
schema = OrderHistorySchema
order = OrderModel
product = ProductModel
def post(self):
value = req.get_json()
data = cls.schema(many=True).load(value)
data.insert()
In my model
def insert(self):
db.session.add(self)
db.session.commit()
schema
from config.ma import ma
from model.orderhistory import OrderHistoryModel
class OrderHistorySchema(ma.ModelSchema):
class Meta:
model = OrderHistoryModel
include_fk = True
Example Data I want to insert
[
{
"quantity":99,
"flaskSaleStatus":true,
"orderId":"ORDER_64a79028d1704406b6bb83b84ad8c02a_1568776516",
"proId":"PROD_9_1568779885_64a79028d1704406b6bb83b84ad8c02a"
},
{
"quantity":89,
"flaskSaleStatus":true,
"orderId":"ORDER_64a79028d1704406b6bb83b84ad8c02a_1568776516",
"proId":"PROD_9_1568779885_64a79028d1704406b6bb83b84ad8c02a"
}
]
this is what i got after insert method has started
TypeError: insert() takes exactly 2 arguments (0 given)
or there is another way to do this action?

Edited - released marshmallow-sqlalchemy loads directly to instance
You need to loop through the OrderModel instances in your list.
You can then use add_all to add the OrderModel objects to the session, then bulk update - see the docs
Should be something like:
db.session.add_all(data)
db.session.commit()
See this post for brief discussion on why add_all is best when you have complex ORM relationships.
Also - not sure you need to have all your models/schemas as class variables, it's fine to have them imported (or just present in the same file, as long as they're declared before the resource class).

You are calling insert on list cause data is list of model OrderHistoryModel instances.
Also post method doesn't need to be classmethod and you probably had an error there as well.
Since data is list of model instances you can use db.session.add_all method to add them to session in bulk.
def post(self):
value = req.get_json()
data = self.schema(many=True).load(value)
db.session.add_all(data)
db.session.commit()

Is it possible to add fields not present in the structure on the fly?

I was trying out mongokit and I'm having a problem. I thought it would be possible to add fields not present in the schema on the fly, but apparently I can't save the document.
The code is as below:
from mongokit import *
connection = Connection()
#connection.register
class Test(Document):
structure = {'title': unicode, 'body': unicode}
On the python shell:
test = connection.testdb.testcol.Test()
test['foo'] = u'bar'
test['title'] = u'my title'
test['body'] = u'my body'
test.save()
This gives me a
StructureError: unknown fields ['foo'] in Test
I have an application where, while I have a core of fields that are always present, I can't predict what new fields will be necessary beforehand. Basically, in this case, it's up to the client to insert what fields it find necessary. I'll just receive whatever he sends, do my thing, and store them in mongodb.
But there is still a core of fields that are common to all documents, so it would be nice to type and validate them.
Is there a way to solve this with mongokit?

According to the MongoKit structure documentation you can have optional fields if you use the Schemaless Structure feature.
As of version 0.7, MongoKit allows you to save partially structured documents.
So if you set up your class like this, it should work:
from mongokit import *
class Test(Document):
use_schemaless = True
structure = {'title': unicode, 'body': unicode}
required_fields = [ 'title', 'body' ]
That will require title and body but should allow any other fields to be present. According to the docs:
MongoKit will raise an exception only if required fields are missing

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python, mongo and marshmallow: datetime struggles - python

Related

How Do I Make a Field Either fields.Dict or Specify a Schema?

Store Subtitles in a Database

Getting 'str' object has no attribute 'isoformat' for peewee DateTimeField

Flask Sqlalchemy add multiple row

Is it possible to add fields not present in the structure on the fly?

Categories

Resources