Serializing SQLAlchemy models for a REST API while respecting access control? - python

Currently, the way our, as well as most web frameworks', serialization works is there's some type of method invocation which dumps the model into some type of format. In our case, we have a to_dict() method on every model that constructs and returns a key-value dictionary with the key being the field name and the value being the instance variable.
All throughout our code, we have snippets like the following: json.dumps(**some_model_object.to_dict()) which will serialize a some_model_object to json. Recently, we've decided to expose some internal resources to our users, but some of these resources have specific private instance values that we do not want to transmit back during serialization if the requesting user is not a super user.
I'm trying to come up with a clean design that will allow easier serialization, as well as allow us to serialize to a format other than json. I think this is a pretty good use case for Aspect Oriented Design/Programming, where the aspects respect the requesting access controls and serialize the object based on the requesting user's persmissions.
Here's something similar to what I have now:
from framework import current_request
class User(SQLAlchemyDeclarativeModel):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
first_name = Column(Unicode(255))
last_name = Column(Unicode(255))
private_token = Column(Unicode(4096))
def to_dict(self):
serialized = dict((column_name, getattr(self, column_name))
for column_name in self.__table__.c.keys())
# current request might not be bound yet, could be in a unit test etc.
if current_request and not current_request.user.is_superuser():
# we explicitly define the allowed items because if we accidentally add
# a private variable to the User table, then it might be exposed.
allowed = ['id', 'first_name', 'last_name']
serialized = dict((k, v) for k, v in serialized.iteritems() if k in allowed)
return serialized
As one can see, this is less than ideal because now I have to couple the database model with the current request. While this is very explicit, the request coupling is a code smell and I'm trying to see how to do this cleanly.
One way I've thought about doing it is to register some fields on the model like so:
class User(SQLAlchemyDeclarativeModel):
__tablename__ = 'users'
__public__ = ['id', 'first_name', 'last_name']
__internal__ = User.__exposed__ + ['private_token']
id = Column(Integer, primary_key=True)
first_name = Column(Unicode(255))
last_name = Column(Unicode(255))
private_token = Column(Unicode(4096))
Then, I would have a serializer class that is bound with the current request on every WSGI call that will take the desired serializer. For example:
import simplejson
from framework import JSONSerializer # json serialization strategy
from framework import serializer
# assume response format was requested as json
serializer.register_serializer(JSONSerializer(simplejson.dumps))
serializer.bind(current_request)
Then in my view somewhere, I would just do:
from framework import Response
user = session.query(User).first()
return Response(code=200, serializer.serialize(user))
serialize would be implemented as follows:
def serialize(self, db_model_obj):
attributes = '__public__'
if self.current_request.user.is_superuser():
attributes = '__private__'
payload = dict((c, getattr(db_model_obj, c))
for c in getattr(db_model_obj, attributes))
return self.serialization_strategy.execute(payload)
Thoughts on this approach's readability and clarity? Is this a pythonic approach to the problem?
Thanks in advance.

establish the "serialization" contract via a mixin:
class Serializer(object):
__public__ = None
"Must be implemented by implementors"
__internal__ = None
"Must be implemented by implementors"
def to_serializable_dict(self):
# do stuff with __public__, __internal__
# ...
keep it simple with the WSGI integration. "register", JSONSerializer as an object, and all that is some kind of Java/Spring thing, don't need that fanfare. Below is my pylons 1.0-style solution, I'm not on pyramid yet:
def my_controller(self):
# ...
return to_response(request, response, myobject)
# elsewhere
def to_response(req, resp, obj):
# this would be more robust, look in
# req, resp, catch key errors, whatever.
# xxx_serialize are just functions. don't need state
serializer = {
'application/json':json_serialize,
'application/xml':xml_serialize,
# ...
}[req.headers['content-type']]
return serializer(obj)

Related

Flask Mongoengine ValidationError Field is required on .save() but fields already exist in db

Problem: I get a ValidationError when trying to perform a .save() when appending a value to an EmbeddedDocumentListField because I am missing required fields that already exist on the document.
Note that at this point the User document has already been created as part of the signup process, so it already has an email and password in the DB.
My classes:
class User(gj.Document):
email = db.EmailField(required=True, unique=True)
password = db.StringField(required=True)
long_list_of_thing_1s = db.EmbeddedDocumentListField("Thing1")
long_list_of_thing_2s = db.EmbeddedDocumentListField("Thing2")
class Thing1(gj.EmbeddedDocument):
some_string = db.StringField()
class Thing2(gj.EmbeddedDocument):
some_string = db.StringField()
Trying to append a new EmbeddedDocument to the EmbeddedDocumentListField in my User class in the Thing2 Resource endpoint:
class Thing2(Resource):
def post(self):
try:
body = request.get_json()
user_id = body["user_id"]
user = UserModel.objects.only("long_list_of_thing_2s").get(id=user_id)
some_string = body["some_string"]
new_thing_2 = Thing2Model()
new_thing_2.some_string = some_string
user.long_list_of_thing_2s.append(new_thing_2)
user.save()
return 201
except Exception as exception:
raise InternalServerError
On hitting this endpoint I get the following error on the user.save()
mongoengine.errors.ValidationError: ValidationError (User:603e39e7097f3e9a6829f422) (Field is required: ['email', 'password'])
I think this is because of the .only("long_list_of_thing_2s")
But I am specifically using UserModel.objects.only("long_list_of_thing_2s") because I don't want to be inefficient in bringing the entire UserModel into memory when I only want to append something the long_list_of_thing_2s
Is there a different way I should be going about this? I am relatively new to Flask and Mongoengine so I am not sure what all the best practices are when going about this process.
You are correct, this is due to the .only and is a known "bug" in MongoEngine.
Unless your Model is really large, using .only() will not make a big difference so I'd recommend to use it only if you observe performance issues.
If you do have to keep the .only() for whatever reason, you should be able to make use of the push atomic operator. An advantage of using the push operator is that in case of race conditions (concurrent requests), it will gracefully deal with the different updates, this is not the case with regular .save() which will overwrite the list.

How do I build a Django model that retrieves some fields from an API?

Question
How can I build a Model that that stores one field in the database, and then retrieves other fields from an API behind-the-scenes when necessary?
Details:
I'm trying to build a Model called Interviewer that stores an ID in the database, and then retrieves name from an external API. I want to avoid storing a copy of name in my app's database. I also want the fields to be retrieved in bulk rather than per model instance because these will be displayed in a paginated list.
My first attempt was to create a custom Model Manager called InterviewManager that overrides get_queryset() in order to set name on the results like so:
class InterviewerManager(models.Manager):
def get_queryset(self):
query_set = super().get_queryset()
for result in query_set:
result.name = 'Mary'
return query_set
class Interviewer(models.Model):
# ID provided by API, stored in database
id = models.IntegerField(primary_key=True, null=False)
# Fields provided by API, not in database
name = 'UNSET'
# Custom model manager
interviewers = InterviewerManager()
However, it seems like the hardcoded value of Mary is only present if the QuerySet is not chained with subsequent calls. I'm not sure why. For example, in the django shell:
>>> list(Interviewer.interviewers.all())[0].name
'Mary' # Good :)
>>> Interviewer.interviewers.all().filter(id=1).first().name
'UNSET' # Bad :(
My current workaround is to build a cache layer inside of InterviewManager that the model accesses like so:
class InterviewerManager(models.Manager):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.api_cache = {}
def get_queryset(self):
query_set = super().get_queryset()
for result in query_set:
# Mock querying a remote API
self.api_cache[result.id] = {
'name': 'Mary',
}
return query_set
class Interviewer(models.Model):
# ID provided by API, stored in database
id = models.IntegerField(primary_key=True, null=False)
# Custom model
interviewers = InterviewerManager()
# Fields provided by API, not in database
#property
def name(self):
return Interviewer.interviewers.api_cache[self.id]['name']
However this doesn't feel like idiomatic Django. Is there a better solution for this situation?
Thanks
why not just make the API call in the name property?
#property
def name(self):
name = get_name_from_api(self.id)
return name
If that isnt possible by manipulating a get request where you can add a list of names and recieve the data. The easy way is to do it is in a loop.
I would recommand you to build a so called proxy where you load the articles in a dataframe/dict, save this varible data ( with for example pickle ) and use it when nessary. It reduces loadtimes and is near efficient.

Flask Sqlalchemy add multiple row

I am using flask-restful this is
My class I want to insert
class OrderHistoryResource(Resource):
model = OrderHistoryModel
schema = OrderHistorySchema
order = OrderModel
product = ProductModel
def post(self):
value = req.get_json()
data = cls.schema(many=True).load(value)
data.insert()
In my model
def insert(self):
db.session.add(self)
db.session.commit()
schema
from config.ma import ma
from model.orderhistory import OrderHistoryModel
class OrderHistorySchema(ma.ModelSchema):
class Meta:
model = OrderHistoryModel
include_fk = True
Example Data I want to insert
[
{
"quantity":99,
"flaskSaleStatus":true,
"orderId":"ORDER_64a79028d1704406b6bb83b84ad8c02a_1568776516",
"proId":"PROD_9_1568779885_64a79028d1704406b6bb83b84ad8c02a"
},
{
"quantity":89,
"flaskSaleStatus":true,
"orderId":"ORDER_64a79028d1704406b6bb83b84ad8c02a_1568776516",
"proId":"PROD_9_1568779885_64a79028d1704406b6bb83b84ad8c02a"
}
]
this is what i got after insert method has started
TypeError: insert() takes exactly 2 arguments (0 given)
or there is another way to do this action?
Edited - released marshmallow-sqlalchemy loads directly to instance
You need to loop through the OrderModel instances in your list.
You can then use add_all to add the OrderModel objects to the session, then bulk update - see the docs
Should be something like:
db.session.add_all(data)
db.session.commit()
See this post for brief discussion on why add_all is best when you have complex ORM relationships.
Also - not sure you need to have all your models/schemas as class variables, it's fine to have them imported (or just present in the same file, as long as they're declared before the resource class).
You are calling insert on list cause data is list of model OrderHistoryModel instances.
Also post method doesn't need to be classmethod and you probably had an error there as well.
Since data is list of model instances you can use db.session.add_all method to add them to session in bulk.
def post(self):
value = req.get_json()
data = self.schema(many=True).load(value)
db.session.add_all(data)
db.session.commit()

HTTP requests for nested objects

Is it possible to use the library 'requests' (HTTP library for python) to post and update nested objects in django rest framework?
I made a new create method in serializers, but I can't post outside the shell, nor with the requests library or in the api webview.
My Serializers:
class QualityParameterSerializer(serializers.ModelSerializer):
class Meta:
model = QualityParameter
fields = ("id","name", "value")
class ProductQualityMonitorSerializer(serializers.ModelSerializer):
parameters = QualityParameterSerializer(many=True)
class Meta:
model = ProductQualityMonitor
fields = ("id","product_name", "area", "timeslot", "processing_line",
"updated_on",'parameters')
def create(self, validated_data):
params_data = validated_data.pop('parameters')
product = ProductQualityMonitor.objects.create(**validated_data)
for param_data in params_data:
QualityParameter.objects.create(product=product, **param_data)
return product
POST HTTP request
If I may suggest the following form for your serializer:
from django.db import transaction
class ProductQualityMonitorSerializer(serializers.ModelSerializer):
parameters = QualityParameterSerializer(many=True)
class Meta:
model = ProductQualityMonitor
fields = (
"id",
"updated_on",
"product_name",
"area",
"timeslot",
"processing_line",
"parameters",
)
def create(self, validated_data):
# we will use transactions, so that if one of the Paramater objects isn't valid
# that we will rollback even the parent ProductQualityMonitor object creation,
# leaving no dangling objects in the database
params_data = validated_data.pop('parameters')
with transaction.atomic():
product = ProductQualityMonitor.objects.create(**validated_data)
# you can create the objects in a batch, hitting the dB only once
params = [QualityParameter(product=product, **param) for param in params_data]
QualityParameter.objects.bulk_create(params)
return product
About using python requests library: you will have to pay attention to the following aspects when posting to a django back-end:
you must provide a valid CSRF token in your request; the way this is done is via csrf-token cookie;
you must provide the proper authentication headers / tokens / cookies; this is your choice, depends how you're implementing this on the DRF back-end
if this is a request from one domain to another domain, then you have to care for CORS setup.
More to the point: what have you tried already and didn't worked ?

Serializing Python Arrow objects for the Flask-Restless API

I am currently developing an application with Flask-Restless. When I substituted my SQLAlchemy models' typical DateTime fields with corresponding arrow fields, all went smoothly. This was due to the help of SQLAlchemy-Utils and its ArrowType field.
However, after using the API to return a JSON representation of these objects, I received the following error:
TypeError: Arrow [2015-01-05T01:17:48.074707] is not JSON serializable
Where would be the ideal place to modify how the model gets serialized? Do I modify Flask-Restless code to support Arrow objects or write a model method that Flask-Restless can identify and use to retrieve a JSON-compatible object?
I could also write an ugly post-processor function in the meantime but that solution seems like a terrible hack.
Below is an example model with the ArrowType field:
class Novel(db.Model):
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.Unicode, unique=True, nullable=False)
created_at = db.Column(ArrowType, nullable=False)
def __init__(self, title):
self.title = title
self.created_at = arrow.utcnow()
Arrow now has a for_json method. For example: arrow.utcnow().for_json()
How about a custom JSONEncoder which supports Arrow types? Looking at the Flask-Restless source code, it uses Flask's built in jsonify under the hood. See this snippet for an example which serializes regular datetime objects in a different format: http://flask.pocoo.org/snippets/119/
Here's a full self-contained example for good measure:
import flask
import flask.ext.sqlalchemy
import flask.ext.restless
from flask.json import JSONEncoder
import sqlalchemy_utils
import arrow
app = flask.Flask(__name__)
app.config['DEBUG'] = True
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///test.db'
db = flask.ext.sqlalchemy.SQLAlchemy(app)
class Event(db.Model):
id = db.Column(db.Integer, primary_key=True)
timestamp = db.Column(sqlalchemy_utils.ArrowType)
class ArrowJSONEncoder(JSONEncoder):
def default(self, obj):
try:
if isinstance(obj, arrow.Arrow):
return obj.format('YYYY-MM-DD HH:mm:ss ZZ')
iterable = iter(obj)
except TypeError:
pass
else:
return list(iterable)
return JSONEncoder.default(self, obj)
app.json_encoder = ArrowJSONEncoder
db.create_all()
manager = flask.ext.restless.APIManager(app, flask_sqlalchemy_db=db)
manager.create_api(Event, methods=['GET','POST'])
app.run()
From the two options in your post, I'd suggest adding the method to your model class to retrieve a JSON-compatible object, only because it's simpler and more maintainable. If you want to modify Flask-Restless, you either need to fork it or monkey patch it.

Categories

Resources