marshmallow schema validation - python

I have the following Joi schema validation in my node project, which I am planning to convert into python using marshmallow library.
Joi Schema:
aws_access_key: Joi.string().label('AWS ACCESS KEY').required().token().min(20),
aws_secret_key: Joi.string().label('AWS SECRET KEY').required().base64().min(40),
encryption: Joi.string().label('AWS S3 server-side encryption').valid('SSE_S3', 'SSE_KMS', 'CSE_KMS').optional(),
kmsKey: Joi.string().label('AWS S3 server-side encryption KMS key').when('encryption', { is: Joi.valid('SSE_KMS', 'CSE_KMS'), then: Joi.string().required() })
Here is what I did so far using marshmallow in python
from marshmallow import Schema, fields
from marshmallow.validate import OneOf, Length
class AWSSchema(Schema):
aws_access_key = fields.String("title", required=True, validate=Length(min=20))
aws_secret_key = fields.String(required=True, validate=Length(min=40))
encryption = fields.String(required=False, validate=OneOf(['SSE_S3', 'SSE_KMS', 'CSE_KMS']))
kmskey = fields.String(validate=lambda obj: fields.String(required=True) if obj['encryption'] in ('SSE_KMS', 'CSE_KMS') else fields.String(required=False))
demo = {
"aws_access_key": "AKXXXXXXXXXXXXXXXXXXX",
"aws_secret_key": "YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY",
"encryption_type": "SSE_KMS"
}
schema = AWSSchema()
print(schema.dump(demo))
if encryption_type value is set to SSE_KMS or CSE_KMS, then I need kmskey field should be required field. But the validation is not working as expected. Any help is appreciated?

Marshmallow has methods you can overwrite to do top-level validation at various points in the dump or load process. The documentation for pre_dump can be found here. Also checkout pre_load and post_dump.
https://marshmallow.readthedocs.io/en/stable/api_reference.html#marshmallow.decorators.pre_dump

Related

The alias field of the Pydantic Model schema is by default in swagger instead of the original field

I need to receive data from an external platform (cognito) that uses PascalCase, and the Pydantic model supports this through field aliases, adding an alias_generator = to_camel in the settings I make all fields have a PascalCase alias corresponding
In this way, the model:
class AuthenticationResult(BaseModel):
access_token: str
expires_in: int
token_type: str
refresh_token: str
id_token: str
class Config:
alias_generator = to_camel
allow_population_by_field_name = True
it can receive the following dictionary without the slightest problem:
data = {
"AccessToken": "myToken",
"ExpiresIn": 0,
"TokenType": "string",
"RefreshToken": "string",
"IdToken": "string"
}
auth_data = AuthenticationResult(**data)
print(auth_data.access_token)
# Output: myToken
However, in the application's Swagger documentation it is also in PascalCase format, but it must return in snake_case format, which is strange, since by_alias is False by default.
Here's the format:
I need it to be in snake_case format to send it to the client. How can I do this, so that it continues to accept being built using a PascalCase dictionary
Might be easiest to create a sub model inheriting from your main model to set the alias generator on, then user that model for validation and the first one to generate the schema.

python, mongo and marshmallow: datetime struggles

I'm trying to do something pretty simple: get the current time, validate my object with marshmallow, store it in mongo
python 3.7
requirements:
datetime==4.3
marshmallow==3.5.1
pymongo==3.10.1
schema.py
from marshmallow import Schema, fields
...
class MySchema(Schema):
user_id = fields.Str(required=True)
user_name = fields.Str()
date = fields.DateTime()
account_type = fields.Str()
object = fields.Raw()
preapredata.py
from datetime import datetime
from schema.py import Myschema
...
dt = datetime.now()
x = dt.isoformat()
data = {
"user_id": '123123123',
"user_name": 'my cool name',
"date": x,
"account_type": 'another sting',
"trade": {'some':'dict'}
}
# validate the schema for storage
validator = MySchema().load(data)
if 'errors' in validator:
log.info('validator.errors')
log.info(validator.errors)
...
res = MyService().create(
data
)
myservice.py
def create(self, data):
log.info("in creating data service")
log.info(data)
self.repo.create(data)
return MySchema().dump(data)
connector to mongo is fine, am saving other data that has no datetime with no issue.
I seem to have gone through a hundred different variations of formatting the datetime before passing it to the date key, as well as specifying the 'format' option in the schema field both inline and in the meta class, example:
#class Meta:
# datetimeformat = '%Y-%m-%dT%H:%M:%S+03:00'
Most variations I try result in:
{'date': ['Not a valid datetime.']}
i've finally managing to pass validation going in by using simply
x = dt.isoformat()
and leaving the field schema as default ( date = fields.DateTime() )
but when i dump back through marshmallow i get
AttributeError: 'str' object has no attribute 'isoformat'
the record is created in mongo DB fine, but the field type is string, ideally I'd like to leverage the native mongo date field
if i try and pass
datetime.now()
to the date, it fails with
{'date': ['Not a valid datetime.']}
same for
datetime.utcnow()
Any guidance really appreciated.
Edit: when bypassing marshmallow, and using either
datetime.now(pytz.utc)
or
datetime.utcnow()
field data stored in mongo as expected as date, so the issue i think can be stated more succinctly as: how can i have marshmallow fields.DateTime() validate either of these formats?
Edit 2:
so we have already begun refactoring thanks to Jérôme's insightful answer below.
for anyone who wants to 'twist' marshmallow to behave like the original question stated, we ended up going with:
date = fields.DateTime(
#dump_only=True,
default=lambda: datetime.utcnow(),
missing=lambda: datetime.utcnow(),
allow_none=False
)
i.e. skip passing date at all, have marshmallow generate it from missing, which was satisfying our use case.
The point of marshmallow is to load data from serialized (say, JSON, isoformat string, etc.) into actual Python objects (int, datetime,...). And conversely to dump it from object to a serialized string.
Marshmallow also provides validation on load, and only on load. When dumping, the data comes from the application and shouldn't need validation.
It is useful in an API to load and validate data from the outside world before using it in an application. And to serialize it back to the outside world.
If your data is in serialized form, which is the case when you call isoformat() on your datetime, then marshmallow can load it, and you get a Python object, with a real datetime in it. This is what you should feed pymongo.
# load/validate the schema for storage
try:
loaded_data = MySchema().load(data)
except ValidationError as exc:
log.info('validator.errors')
log.info(exc.errors)
...
# Store object in database
res = MyService().create(loaded_data)
Since marshmallow 3, load always returns deserialized content and you need to try/catch validation errors.
If your data does not come to your application in deserialized form (if it is in object form already), then maybe marshmallow is not the right tool for the job, because it does not perform validation on deserialized objects (see https://github.com/marshmallow-code/marshmallow/issues/1415).
Or maybe it is. You could use an Object-Document Mapper (ODM) to manage the validation and database management. This is an extra layer other pymongo. umongo is a marshmallow-based mongoDB ODM. There are other ODMs out there: mongoengine, pymodm.
BTW, what is this
datetime==4.3
Did you install DateTime? You don't need this.
Disclaimer: marshmallow and umongo maintainer speaking.

falcon-autocrud: how to handle unique rows?

I want to create a simple app with Falcon that is able to handle small sqlite database with hostname: ip records. I want to be able to replace rows in sqlite, so I decide that hostname is unique field. I have a model.py:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine, Column, Integer, String
Base = declarative_base()
DB_URI = 'sqlite:///clients.db'
class Client(Base):
__tablename__ = 'clients'
id = Column(Integer, primary_key=True)
hostname = Column(String(50), unique=True)
ip = Column(String(50))
My simple resources.py:
from falcon_autocrud.resource import CollectionResource, SingleResource
from models import *
class ClientCollectionResource(CollectionResource):
model = Client
methods = ['GET', 'POST']
When I make a POST-request with updated information about hostname:ip i get an Unique constraint violated error:
req = requests.post('http://localhost:8000/clients',
headers={'Content-Type': 'application/json'},
data=json.dumps({'hostname': 'laptop1', 'ip': '192.168.0.33'}));
req.content
>> b'{"title": "Conflict", "description": "Unique constraint violated"}'
Is there any way to replace existing records using sqlalchemy? Or maybe I was wrong choosing sqlite for these purposes?
When building a REST-ful API you should not use POST to update existing resources, POST to a resource should only ever create new resources. falcon-autocrud is doing the right thing here.
Instead, use PUT on the individual resource (the SingleResource resource registered for .../clients/<identifier>) to alter existing resources.
If you use hostname in your SingleResource definition then falcon-autocrud should automatically use that column as the identifier (assuming that your SingleResource subclass is called ClientResource):
app.add_route('/clients/{hostname}', ClientResource(db_engine))
at which point you can PUT the new ip value directly with:
requests.put('http://localhost:8000/clients/laptop1', json={'ip': '192.168.0.33'})
(Note that requests supports JSON requests directly; the json= keyword argument is encoded to JSON for you, and the Content-Type header is set for you automatically when you use it).
You may want to limit what fields are returned for your Client objects. With a unique hostname you wouldn't want to confuse clients by also sending the primary key column. I'd limit the response fields by setting the response_fields attribute on your resource classes:
class ClientCollectionResource(CollectionResource):
model = Client
response_fields = ['hostname', 'ip']
methods = ['GET', 'POST']
class ClientResource(SingleResource):
model = Client
response_fields = ['hostname', 'ip']
I see that falcon-autocrud doesn't yet support PATCH requests on the collection that alter existing resources (only "op": "add" is supported), otherwise that'd be another route to alter existing entries too.

Combine Flask-Marshmallow with marshmallow-jsonapi

Overview
I am using Flask-SqlAlchemy and now I am looking into marshmallow to help me serialize and deserialize request data.
I was able to successfully:
Create my models using Flask-SqlAlchemy
Use Flask-Marshmallow to serialize database objects using the same model, by using the Optional Flask-SqlAlchemy Integration
Use marshmallow-jsonapi to quickly generate Json API compliant responses. This required me to declare new Schemas to specify which attributes I want to include (this is duplicate from Flask-SqlAlchemy Models)
Code Samples
Flask-SqlAlchemy Declarative Model
class Space(db.Model):
__tablename__ = 'spaces'
id = sql.Column(sql.Integer, primary_key=True)
name = sql.Column(sql.String)
version = sql.Column(sql.String)
active = sql.Column(sql.Boolean)
flask_marshmallow Schema Declaration (Inherits from SqlAlchemy Model)
ma = flask_marshmallow.Marshmallow(app)
class SpaceSchema(ma.ModelSchema):
class Meta:
model = Space
# API Response
space = Space.query.first()
return SpaceSchema().dump(space).data
# Returns:
{
'id': 123,
'version': '0.1.0',
'name': 'SpaceName',
'active': True
}
marshmallow_json api - requires new Schema Declaration, must include each attribute and type manually
class SpaceJsonSchema(marshmallow_json.Schema):
id = fields.Str(dump_only=True)
name = fields.Str()
version = fields.Str()
active = fields.Bool()
class Meta:
type_ = 'spaces'
self_url = '/spaces/{id}'
self_url_kwargs = {'id': '<id>'}
self_url_many = '/spaces/'
strict = True
# Returns Json API Compliant
{
'data': {
'id': '1',
'type': 'spaces',
'attributes': {
'name': 'Phonebooth',
'active': True,
'version': '0.1.0'
},
'links': {'self': '/spaces/1'}
},
'links': {'self': '/spaces/1'}
}
Issue
As shown in the code, marshmallow-jsonapi allows me to create json api compliant responses, but I end up having to maintain a Declarative Model + Schema Response model.
flask-marshmallow allows me to create Schema responses from the SqlAlchemy models, so I don't have to maintain a separate set of properties for each model.
Question
Is it at all possible to use flask-marshmallow and marshmallow-jsonapi together so 1. Create Marshmallow Schema from a SqlAlchemy model, AND automatically generate json api responses?
I tried creating Schema declaration that inherited from ma.ModelSchema and marshmallow_json.Schema, in both orders, but it does not work (raises exception for missing methods and properties)
marshmallow-jsonapi
marshmallow-jsonapi provides a simple way to produce JSON
API-compliant data in any Python web framework.
flask-marshmallow
Flask-Marshmallow includes useful extras for integrating with
Flask-SQLAlchemy and marshmallow-sqlalchemy.
Not a solution to this exact problem but I ran into similar issues when implementing this library : https://github.com/thomaxxl/safrs (sqlalchemy + flask-restful + jsonapi compliant spec).
I don't remember exactly how I got around it, but if you try it and serialization doesn't work I can help you resolve it if you open an issue in github

Query google datastore by key in gcloud api

I'm trying to query for some data using the gcloud api that I just discovered. I'd like to query for a KeyPropery. e.g.:
from google.appengine.ext import ndb
class User(ndb.Model):
email = ndb.StringProperty()
class Data(ndb.Model):
user = ndb.KeyProperty('User')
data = ndb.JsonProperty()
In GAE, I can query this pretty easily assuming I have a user's key:
user = User.query(User.email == 'me#domain.com').get()
data_records = Data.query(Data.user == user.key).fetch()
I'd like to do something similar using gcloud:
from gcloud import datastore
client = datastore.Client(project='my-project-id')
user_qry = client.query(kind='User')
user_qry.add_filter('email', '=', 'me#domain.com')
users = list(user_qry.fetch())
user = users[0]
data_qry = client.query(kind='Data')
data_qry.add_filter('user', '=', user.key) # This doesn't work ...
results = list(data_qry.fetch()) # results = []
Looking at the documentation for add_filter, it doesn't appear that Entity.key is a supported type:
value (int, str, bool, float, NoneType, :classdatetime.datetime) – The value to filter on.
Is it possible to add filters for key properties?
I've done a bit more sleuthing to try to figure out what is really going on here. I'm not sure that this is helpful for me to understand this issue at the present, but maybe it'll be helpful for someone else.
I've mocked out the underlying calls in the respective libraries to record the protocol buffers that are being serialized and sent to the server. For GAE, it appears to be Batch.create_async in the datastore_query module.
For gcloud, it is the datastore.Client.connection.run_query method. Looking at the resulting protocol buffers (anonymized), I see:
gcloud query pb.
kind {
name: "Data"
}
filter {
composite_filter {
operator: AND
filter {
property_filter {
property {
name: "user"
}
operator: EQUAL
value {
key_value {
partition_id {
dataset_id: "s~app-id"
}
path_element {
kind: "User"
name: "user_string_id"
}
}
}
}
}
}
}
GAE query pb.
kind: "Data"
Filter {
op: 5
property <
name: "User"
value <
ReferenceValue {
app: "s~app-id"
PathElement {
type: "User"
name: "user_string_id"
}
}
>
multiple: false
>
}
The two libraries are using different versions of the proto as far as I can tell, but the data being passed looks very similar...
This is a subtle bug with your use of the ndb library:
All ndb properties accept a single positional argument that specifies the property's name in Datastore
Looking at your model definition, you'll see user = ndb.KeyProperty('User'). This isn't actually saying that the user property is a key of a User entity, but that it should be stored in Datastore with the property name User. You can verify this in your gae protocol buffer query where the property name is (case sensitive) User.
If you want to limit the key to a single kind, you need to specify it using the kind option.
user = ndb.KeyProperty(kind="User")
The KeyProperty also supports:
user = ndb.KeyProperty(User) # User is a class here, not a string
Here is a description of all the magic.
As it is now, your gcloud query is querying for the wrong cased user and should be:
data_qry = client.query(kind='Data')
data_qry.add_filter('User', '=', user.key)

Categories

Resources