I've just started working with the marshmallow-sqlalchemy package in python for my flask application. All works fine and the api spits out the content of my database, but seems to sort the fields alphabetically as opposed to the order I have created them using the SQLAlchemy.Model class. Now I was wondering if there is a way to prohibit that or at least sort the fields somehow manually?
This is how I create my database table:
class Product(db.Model):
p_id = db.Column(db.Integer, primary_key=True)
p_name = db.Column(db.String(100), nullable=False)
p_type = db.Column(db.String, nullable=False)
p_size = db.Column(db.Integer, nullable=False)
p_color = db.Column(db.String, nullable=False)
def __repr__(self):
return f"Product(name={self.p_name}, type={self.p_type}, size={self.p_size}, color={self.p_color})"
And this is my schema:
class ProductSchema(SQLAlchemyAutoSchema):
class Meta:
ordered = True #I have read about this property on another post, but doesn't do anything here
model = Product
My function returning the content in json format:
def get(self, id):
if id:
product = Product.query.filter_by(p_id=id).first()
if product:
product_schema = ProductSchema()
output = product_schema.dump(product)
else:
abort(Response('product not found', 400))
else:
products = Product.query.all()
products_schema = ProductSchema(many=True)
output = products_schema.dump(products)
return jsonify(output), 200
Aaand the output I get (alphabetically sorted):
[
{
"p_color": "test color1",
"p_id": 1,
"p_name": "test name1",
"p_size": 8,
"p_type": "test type1"
},
{
"p_color": "test color2",
"p_id": 2,
"p_name": "test name2",
"p_size": 8,
"p_type": "test type2"
},
{
"p_color": "test color3",
"p_id": 3,
"p_name": "test name3",
"p_size": 8,
"p_type": "test type3"
},
{
"p_color": "test color4",
"p_id": 4,
"p_name": "test name4",
"p_size": 8,
"p_type": "test type4"
}
]
As described above, my application is fully functional. But I'd at least like to know what's going on. So any help is apreciated!
By default, Flask sorts the keys when dumping json output. This is done so that the order is deterministic, which allows to compute hashes and such.
See the docs.
You may disable this with the JSON_SORT_KEYS parameter.
If you want to debug the marshallow part, just print the dump (output) in your view function.
Unless you have a good reason to force the output order, you're probably better-off just letting it go.
Note: In flask-smorest, I don't mind the payload being alphabetically ordered in the responses but I like it ordered when publishing the OpenAPI spec file, so I don't modify JSON_SORT_KEYS and in the resource serving the spec file, I don't use jsonify but raw json.dumps.
Related
The error is clear:
RecursionError: maximum recursion depth exceeded while calling a Python object
A model cycles through its properties, including its relationships, outputs the properties. The relationships have a backref, so it's an endless recursion cycle.
Example
Consider an Author describing its Books. During the formatting (default method), the Author model says, "is the object a Book?" If so, it asks Book to serialize itself. In other examples, the Author might hardcode the Book's key/value pairs instead of asking Book to describe itself. I'd like to avoid that as I want to reduce the amount of awareness one model has of another.
Is there a way to track/pass what level is being called?
What I'd prefer is to track the recursion level, such that
book = Book()
book.to_json
Will display something like
{
"id": 1,
"name": "Python on Stack Overflow",
"authors": [
{
"id": 300,
"name": "Mike",
"books": [
{ "id": 1, "name": "Python on Stack Overflow", "authors": ["<Author id=300>"] },
{ "id": 2, "name": "The Worst Question Ever Asked", "authors": ["<Author id=100>", "<Author id=200>", "<Author id=300>", "<Author id=400>"] },
{ "id": 3, "name": "The Greatest Question Ever Answered", "authors": ["<Author id=300>", "<Author id=400>"] },
]
},
...
]
}
Don't ask Book to describe its authors if Book calls Author calling Book (greater than 1 level deep).
Models
Disclaimer: This is a limited example and don't include imports or other attributes, methods, mixing, or functions.
Book.py
# models/book.py
def default(object):
# format dates
if isinstance(object, (date, datetime)):
return object.strftime('%Y-%m-%d %H:%M %z')
# Call 'Author' to serialize itself
if object.__class__.__name__ == 'Author': # <-- one place to be call-aware; `and level==1`
return object.to_json
# instance display
return f'<{object.__class__.__name__} id={object.id}>'
class Book(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.Text, index=True, unique=True, nullable=False)
authors = db.relationship('Author', secondary=Published.__table__, back_populates='authors')
#property
def to_json(self):
columns = self.keys()
response = {}
for column in columns:
response[column] = getattr(self, column)
return json.loads(json.dumps(response, default=default))
Author.py
# models/author.py
def default(object):
# format dates
if isinstance(object, (date, datetime)):
return object.strftime('%Y-%m-%d %H:%M %z')
# Call 'Book' to serialize itself
if object.__class__.__name__ == 'Book': # <-- one place to be call-aware; `and level==1`
return object.to_json
# instance display
return f'<{object.__class__.__name__} id={object.id}>'
class Author(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.Text, index=True, unique=True, nullable=False)
books = db.relationship('Book', secondary=Published.__table__, back_populates='authors')
#property
def to_json(self):
columns = self.keys()
response = {}
for column in columns:
response[column] = getattr(self, column)
return json.loads(json.dumps(response, default=default))
One potential solution is to use a global tracking variable.
recursion_level = None
def default(object):
global recursion_level
# format dates
if isinstance(object, (date, datetime)):
return object.strftime('%Y-%m-%d %H:%M %z')
# ask object to serialize itself
max_recursion = 1
classes = [ model.class_.__name__ for model in app.db.Model.registry.mappers ]
if object.__class__.__name__ in classes and recursion_level < max_recursion:
recursion_level += 1
json_str = object.to_json
recursion_level -= 1
return json_str
# instance display
return f'<{object.__class__.__name__} id={object.id}>'
class Author(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.Text, index=True, unique=True, nullable=False)
books = db.relationship('Book', secondary=Published.__table__, back_populates='authors')
#property
def to_json(self):
columns = self.keys()
response = {}
for column in columns:
response[column] = getattr(self, column)
global recursion_level # <-- new block
if recursion_level is None:
recursion_level = 1
# NOTE: I don't know how to pass `recursion_level` to `default`,
# which is why it's a global variable for now
return json.loads(json.dumps(response, default=default))
Comment:
to_json and default are actually defined in one place on a base model class to keep the code DRY. Try not to be distracted by the placement here.
Even though this answer uses global variables, it is not my preference. Python is supposedly single-threaded so it might be safe enough if not using async, but since I'm new to Python and don't fully understand the call stack or the scoping of globals. I defer to experts to poke the holes.
My preference is to pass a variable to default for the recursion level, used as the recursive terminating condition. I'm not sure how to pass the value in json.dumps
object.id is used in default's instance display output, but because the function may handle multiple classes (and not just Book), those classes may not include an id column. A more robust solution is to survey the primary keys and use those values. Something like:
pks = object.__table__.primary_key.columns.values()
pk_pairs = [ f'{pk.name}={object[pk.name]}' for pk in pks ]
return f'<{object.__class__.name} {" ".join(pk_pairs)}>'
NOTE: this all depends on how much control over your models and priamry keys you have. This could be made even safer, but for the purpose of this demo, this should suffice.
I have a question you guys might be able to answer.
I have a json file that looks something like this:
[
{
"address": "some address",
"full_time_school": false,
"name": "some name",
"official_id": "722154",
"school_type": "Grundschule",
"school_type_entity": "Grundschule",
"state": "BW"
},
{
"address": "some other address",
"name": "some other name",
"official_id": "722190",
"state": "BW"
}
]
The point is that not every entry has all keys.
I have a flask-sqlalchemy model that looks like this:
class School(db.Model):
__tablename__ = "school" # pragma: no cover
address = db.Column(db.String)
full_time_school = db.Column(db.Boolean)
name = db.Column(db.String)
official_id = db.Column(db.Integer)
school_type = db.Column(db.String)
school_type_entity = db.Column(db.String)
state = db.Column(db.String)
def __repr__(self):
return f"<name {self.name}"
And I have a python script to add the json entries into my postgresql database that looks like this:
from my_project import db
from my_project.models import School
import json
import os
# insert data
for filename in os.listdir("datamining"):
if filename.endswith(".json"):
file = open(os.path.join("datamining", filename))
print(f"Add schools from {filename.strip('.json')}")
data = json.load(file)
cleaned_data = {school["official_id"]: school for school in data}.values()
print(f"Adding {len(data)} schools to the database.")
for school in cleaned_data:
entry = School(
id=school["official_id"]
)
for key, value in school.items():
entry.key = value
db.session.add(entry)
db.session.commit()
file.close()
print("Added all schools!!!")
I don't know why but somehow every cell is NULL except the official_id field. How so and how can I fix that? I'm at the end of my wits right now. Every pointer or help is much appreciated.
EDIT:
What I found out so far is, that entry.key is not interpreted as entry.state for example, but actually creates a reference entry.key = "BW" for example. Why is that?
Your problem is
entry.key = value
You are just writing your values over and over into the attribute 'key' within your School model. I'm actually surprised SQLAlchemy doesn't raise some kind of error here...
Just pass all your values into the constructor and you should be fine:
school["id"] = school.pop("official_id")
entry = School(**school)
EDIT: It's "BW" because this happens to be the last value that is written into the attribute.
You can do this much easier and faster all in one go by executing this native parameterized query passing the text contents of the JSON file as parameter jsontext:
insert into school
select * from jsonb_populate_recordset(null::school, :jsontext::jsonb);
I have a feeling that I've made things more complex than they need to be - this can't possibly be such a rare case. It seems to me that it should be possible - or perhaps that I'm doing something fundamentally wrong.
The issue at hand is this: I've declared a database element, Element, which consists of about 10 many-to-many relations with other elements, one of which is Tag.
I want to enable the user of my application to filter Element by all of these relations, some of them or none of them. Say the user wants to see only Elements which are related to a certain Tag.
To make things even more difficult, the function that will carry out this objective is called from a graphql API, meaning it will recieve ID's instead of ORM objects.
I'm trying to build a resolver in my Python Flask project, using SQLAlchemy, which will provide an interface like so:
# graphql request
query getElements {
getElements(tags:[2, 3] people:[8, 13]) {
id
}
}
# graphql response
{
"data": {
"getElements": [
{
"id": "2"
},
{
"id": "3"
},
{
"id": "8"
}
]
}
}
I imagine the resolver would look something like this simplified pseudo-code, but I can't for the life of me figure out how to pull it off:
def get_elements(tags=None, people=None):
args = {'tags' : tags, 'people' : people}
if any(args):
data_elements = DataElement.query.filter_by(this in args) # this is the tricky bit - for each of DataElements related elements, I want to check if its ID is given in the corresponding argument
else:
data_elements = DataElement.query.all()
return data_elements
Here's a peek at the simplified database model, as requested. DataElement holds a lot of relations like this, and it works perfectly:
class DataElement(db.Model):
__tablename__ = 'DataElement'
id = db.Column(db.Integer, primary_key=True)
tags = db.relationship('Tag', secondary=DataElementTag, back_populates='data_elements')
class Tag(db.Model):
__tablename__ = 'Tag'
id = db.Column(db.Integer, primary_key=True)
data_elements = db.relationship('DataElement', secondary=DataElementTag, back_populates='tags')
DataElementTag = db.Table('DataElementTag',
db.Column('id', db.Integer, primary_key=True),
db.Column('data_element_id', db.Integer, db.ForeignKey('DataElement.id')),
db.Column('tag_id', db.Integer, db.ForeignKey('Tag.id'))
)
Please, ORM wizards and python freaks, I call upon thee!
I've solved it in a rather clunky manner. I suppose there must be a more elegant way to pull this off, and am still holding out for better answers.
I ended up looping over all the given arguments and using eval() (not on user input, don't worry) to get the corresponding database model. From there, I was able to grab the DataElement object with the many-to-many relationship. My final solutions looks like this:
args = {
'status' : status,
'person' : people,
'tag' : tags,
'event' : events,
'location' : locations,
'group' : groups,
'year' : year
} # dictionary for args for easier data handling
if any(args.values()):
final = [] # will contain elements matching criteria
for key, value in args.items():
if value:
model = eval(key.capitalize()) # get ORM model from dictionary key name (eval used on hardcoded string, hence safe)
for id in value:
filter_element = model.query.filter_by(id=id).one_or_none() # get the element in question from db
if filter_element:
elements = filter_element.data_elements # get data_elements linked to element in question
for element in elements:
if not element in final: # to avoid duplicates
final.append(element)
return final
I want to be able to query a database and jsonify() to results to send over the server.
My function is supposed to incrementally send x amount of posts every time it called, i.e. Sending posts 1 - 10, ..., Sending posts 31 - 40, ...
I have the following query:
q = Post.query.filter(Post.column.between(x, x + 10))
result = posts_schema.dump(q)
return make_response(jsonify(result), 200) // or would it be ...jsonify(result.data), 200)?
Ideally, it would return something like this:
[
{
"id": 1,
"title": "Title",
"description": "A descriptive description."
},
{
"id": 2,
...
},
...
]
The SQLAlchemy model I am using and the Marshmallow schema:
class Post(db.Model):
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String(30))
content = db.Column(db.String(150))
def __init__(self, title, description):
self.title = title
self.description = description
class PostSchema(ma.Schema):
class Meta:
fields = ('id', 'title', 'description')
posts_schema = PostSchema(many=True)
I am new to SQLAlchemy, so I don't know too much about query yet. Another user had to point me in the direction I am in now with the current query, but I don't think it is quite right.
In SQL, I am looking to reproduce the following:
SELECT * FROM Post WHERE id BETWEEN value1 AND value2
To paginate with SQL Alchemy you would do the following:
# In the view function, collect the page and per page values
#app.route('/posts/<int:page>/<int:per_page>', methods=['GET'])
def posts(page=1, per_page=30):
#... insert other logic here
posts = Post.query.order_by(Post.id.asc()) # don't forget to order these by ID
posts = posts.paginate(page=page, per_page=per_page)
return jsonify({
'page': page,
'per_page': per_page,
'has_next': posts.has_next,
'has_prev': posts.has_prev,
'page_list': [iter_page if iter_page else '...' for iter_page in posts.iter_pages()],
'posts': [{
'id': p.id,
'title': p.title,
'content': p.content
} for p in posts.items]
})
On the front end, you would use the page_list, page, per_page, has_next, and has_prev values to help the user choose which page to go to next.
The values you pass in the URL will dictate which page to go to next. This is all handily built into SQLAlchemy for you, which is another reason it is such a great library.
I found out a solution to my question:
Post.query.filter((Post.id >= x) & (Post.id <= (x + 10))).all()
I am using flask-marshmallow along with marshmallow-sqlalchemy
I would like to have my own kind of HATEOAS implementation: for n-to-many relationships, along with the link, I d like to have the count of objects
For that, I have a regular sqlalchemy model with a many-to-many relationship:
class ParentChild(Model):
__tablename__ = 'parrent_child'
parent_id =Column(Integer, ForeignKey('parent.id'), primary_key=True)
child_id = Column(Integer, ForeignKey('child.id'), primary_key=True)
class Parent(Model):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
name = Column(String())
children = relationship('Child', secondary='parent_child', back_populates='parents')
class Child(Model):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
name = Column(String())
parents = relationship('Parent', secondary='parent_child', back_populates='children')
Using the following marshmallow schema, I manage to get the data I want:
class ParentSchema(Schema):
class Meta:
model = Parent
children = URLFor('api.parents_children_by_parent_id', parent_id='<id>')
children_count = base_fields.Function(lambda obj: len(obj.children))
Returns:
{
"id" : 42,
"name" : "Bob",
"children" : "/api/parents/42/children",
"children_count" : 3
}
But I have issues when I want to encapsulate the fields like this:
{
"id": 42
"name": "bob",
"children": {
"link": "/api/parents/42/children",
"count": 3
}
}
I tried using a base_fields.Dict:
children = base_fields.Dict(
link = URLFor('api.parents_children_by_parent_id', parent_id='<id>'),
count = base_fields.Function(lambda obj: len(obj.children))
)
But I get
TypeError: Object of type 'Child' is not JSON serializable
I tried various other solutions, without success :
flask-marshmallow's Hyperlinks only accepts
dictionaries of Hyperlinks, and not Functions.
I think the solution would be to use a base_fields.Nested but it breaks the behaviour of URLFor that cannot catch the '<id>'.
I can't find a solution to this in the documentation.
At some point it s hard to think out of the box. Am I missing something? Any help would be appreciated.
So I found a workaround that I'm going to post, but I think it can be improved.
To override the children field with the object I want, I use a base_fields.Method:
class ParentSchema(Schema):
class Meta:
model = Parent
children = base_fields.Method('build_children_obj')
def build_children_obj(self, obj):
return {
"count": len(obj.children),
"link": URLFor('api.parents_children_by_parent_id', parent_id=obj.id)
}
At that point, I was getting TypeError: Object of type 'URLFor' is not JSON serializable
So after checking the source of the _serialize method of URLFor I added a check in my (customized) JSONEncoder:
if isinstance(o, URLFor):
return str(o._serialize(None, None, o))
And I finally got the payload I wanted, but I dont find it very clean. Any ideas?
EDIT : After testing, I found that len(obj.children) to get the count was very expensive in resources by loading the entire list of children. Instead, I do db.session.query(func.count(Children.id)).filter(Children.parents.any(id=obj.id)).scalar() Which is more optimized.