Import json with missing keys into postgres with Flask-Sqlalchemy

Import json with missing keys into postgres with Flask-Sqlalchemy - python

I have a question you guys might be able to answer.
I have a json file that looks something like this:
[
{
"address": "some address",
"full_time_school": false,
"name": "some name",
"official_id": "722154",
"school_type": "Grundschule",
"school_type_entity": "Grundschule",
"state": "BW"
},
{
"address": "some other address",
"name": "some other name",
"official_id": "722190",
"state": "BW"
}
]
The point is that not every entry has all keys.
I have a flask-sqlalchemy model that looks like this:
class School(db.Model):
__tablename__ = "school" # pragma: no cover
address = db.Column(db.String)
full_time_school = db.Column(db.Boolean)
name = db.Column(db.String)
official_id = db.Column(db.Integer)
school_type = db.Column(db.String)
school_type_entity = db.Column(db.String)
state = db.Column(db.String)
def __repr__(self):
return f"<name {self.name}"
And I have a python script to add the json entries into my postgresql database that looks like this:
from my_project import db
from my_project.models import School
import json
import os
# insert data
for filename in os.listdir("datamining"):
if filename.endswith(".json"):
file = open(os.path.join("datamining", filename))
print(f"Add schools from {filename.strip('.json')}")
data = json.load(file)
cleaned_data = {school["official_id"]: school for school in data}.values()
print(f"Adding {len(data)} schools to the database.")
for school in cleaned_data:
entry = School(
id=school["official_id"]
)
for key, value in school.items():
entry.key = value
db.session.add(entry)
db.session.commit()
file.close()
print("Added all schools!!!")
I don't know why but somehow every cell is NULL except the official_id field. How so and how can I fix that? I'm at the end of my wits right now. Every pointer or help is much appreciated.
EDIT:
What I found out so far is, that entry.key is not interpreted as entry.state for example, but actually creates a reference entry.key = "BW" for example. Why is that?

Your problem is
entry.key = value
You are just writing your values over and over into the attribute 'key' within your School model. I'm actually surprised SQLAlchemy doesn't raise some kind of error here...
Just pass all your values into the constructor and you should be fine:
school["id"] = school.pop("official_id")
entry = School(**school)
EDIT: It's "BW" because this happens to be the last value that is written into the attribute.

You can do this much easier and faster all in one go by executing this native parameterized query passing the text contents of the JSON file as parameter jsontext:
insert into school
select * from jsonb_populate_recordset(null::school, :jsontext::jsonb);

Related

How to deter recursion in Flask JSON output for many-to-many relationships?

The error is clear:
RecursionError: maximum recursion depth exceeded while calling a Python object
A model cycles through its properties, including its relationships, outputs the properties. The relationships have a backref, so it's an endless recursion cycle.
Example
Consider an Author describing its Books. During the formatting (default method), the Author model says, "is the object a Book?" If so, it asks Book to serialize itself. In other examples, the Author might hardcode the Book's key/value pairs instead of asking Book to describe itself. I'd like to avoid that as I want to reduce the amount of awareness one model has of another.
Is there a way to track/pass what level is being called?
What I'd prefer is to track the recursion level, such that
book = Book()
book.to_json
Will display something like
{
"id": 1,
"name": "Python on Stack Overflow",
"authors": [
{
"id": 300,
"name": "Mike",
"books": [
{ "id": 1, "name": "Python on Stack Overflow", "authors": ["<Author id=300>"] },
{ "id": 2, "name": "The Worst Question Ever Asked", "authors": ["<Author id=100>", "<Author id=200>", "<Author id=300>", "<Author id=400>"] },
{ "id": 3, "name": "The Greatest Question Ever Answered", "authors": ["<Author id=300>", "<Author id=400>"] },
]
},
...
]
}
Don't ask Book to describe its authors if Book calls Author calling Book (greater than 1 level deep).
Models
Disclaimer: This is a limited example and don't include imports or other attributes, methods, mixing, or functions.
Book.py
# models/book.py
def default(object):
# format dates
if isinstance(object, (date, datetime)):
return object.strftime('%Y-%m-%d %H:%M %z')
# Call 'Author' to serialize itself
if object.__class__.__name__ == 'Author': # <-- one place to be call-aware; `and level==1`
return object.to_json
# instance display
return f'<{object.__class__.__name__} id={object.id}>'
class Book(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.Text, index=True, unique=True, nullable=False)
authors = db.relationship('Author', secondary=Published.__table__, back_populates='authors')
#property
def to_json(self):
columns = self.keys()
response = {}
for column in columns:
response[column] = getattr(self, column)
return json.loads(json.dumps(response, default=default))
Author.py
# models/author.py
def default(object):
# format dates
if isinstance(object, (date, datetime)):
return object.strftime('%Y-%m-%d %H:%M %z')
# Call 'Book' to serialize itself
if object.__class__.__name__ == 'Book': # <-- one place to be call-aware; `and level==1`
return object.to_json
# instance display
return f'<{object.__class__.__name__} id={object.id}>'
class Author(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.Text, index=True, unique=True, nullable=False)
books = db.relationship('Book', secondary=Published.__table__, back_populates='authors')
#property
def to_json(self):
columns = self.keys()
response = {}
for column in columns:
response[column] = getattr(self, column)
return json.loads(json.dumps(response, default=default))

One potential solution is to use a global tracking variable.
recursion_level = None
def default(object):
global recursion_level
# format dates
if isinstance(object, (date, datetime)):
return object.strftime('%Y-%m-%d %H:%M %z')
# ask object to serialize itself
max_recursion = 1
classes = [ model.class_.__name__ for model in app.db.Model.registry.mappers ]
if object.__class__.__name__ in classes and recursion_level < max_recursion:
recursion_level += 1
json_str = object.to_json
recursion_level -= 1
return json_str
# instance display
return f'<{object.__class__.__name__} id={object.id}>'
class Author(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.Text, index=True, unique=True, nullable=False)
books = db.relationship('Book', secondary=Published.__table__, back_populates='authors')
#property
def to_json(self):
columns = self.keys()
response = {}
for column in columns:
response[column] = getattr(self, column)
global recursion_level # <-- new block
if recursion_level is None:
recursion_level = 1
# NOTE: I don't know how to pass `recursion_level` to `default`,
# which is why it's a global variable for now
return json.loads(json.dumps(response, default=default))
Comment:
to_json and default are actually defined in one place on a base model class to keep the code DRY. Try not to be distracted by the placement here.
Even though this answer uses global variables, it is not my preference. Python is supposedly single-threaded so it might be safe enough if not using async, but since I'm new to Python and don't fully understand the call stack or the scoping of globals. I defer to experts to poke the holes.
My preference is to pass a variable to default for the recursion level, used as the recursive terminating condition. I'm not sure how to pass the value in json.dumps
object.id is used in default's instance display output, but because the function may handle multiple classes (and not just Book), those classes may not include an id column. A more robust solution is to survey the primary keys and use those values. Something like:
pks = object.__table__.primary_key.columns.values()
pk_pairs = [ f'{pk.name}={object[pk.name]}' for pk in pks ]
return f'<{object.__class__.name} {" ".join(pk_pairs)}>'
NOTE: this all depends on how much control over your models and priamry keys you have. This could be made even safer, but for the purpose of this demo, this should suffice.

How To set edit / delete permissions for the creator only in FAST API

i want to set up edit/delete permission for creator only. The main problem is in Frontend any user can update and delete without creator permission. I tried uuid for not guessing the id value. But the problem is still there.
def create_user_education(request: schemas.StudentEducation, db: Session,current_user: My_Education = Depends(oauth2.get_current_user)):
try:
uid = str(uuid.uuid4().hex)
new_education = My_Education(id=uid,user_id=request.user_id,institute=request.institute,website=request.website,country=request.country,city=request.city,degree=request.degree,start_date=request.start_date + timedelta(hours=+6),end_date=request.end_date + timedelta(hours=+6),description=request.description)
db.add(new_education)
db.commit()
db.refresh(new_education)
return {properties.create_message}
except SQLAlchemyError:
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=properties.error_message)
def update_user_education(id: str, request: schemas.StudentEducation, db: Session,current_user: My_Education = Depends(oauth2.get_current_user)):
try:
education = db.query(My_Education).filter(My_Education.id == id)
if not education.first():
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND,
detail=f"user educaton with id {id} not found")
education.update({'user_id':request.user_id,'institute':request.institute,'website':request.website,'country':request.country,'city':request.city,'degree':request.degree,'start_date':request.start_date + timedelta(hours=+6),'end_date':request.end_date + timedelta(hours=+6),'description':request.description})
db.commit()
return {properties.update_message}
except SQLAlchemyError:
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=properties.error_message)
Like i want that
"id": "a8a8caa2f94f492c9a8e72276d116a3c",
"user_id": 2,
"institute": "Texas_high_school",
"website": "https://mjrgeorge.netlify.app/",
"country": "Denmark",
"city": " Copenhagen",
"degree": "MBA",
"start_date": "2007-02-06T00:00:00",
"end_date": "2008-02-21T00:00:00",
"description": "Hello description Hello description Hello description Hello description "
Only
**"id":"a8a8caa2f94f492c9a8e72276d116a3c",
"user_id": 2,
can update and delete themselves. No one else
Here is my model class:
class My_Education(Base):
__tablename__ = properties.My_Education
id = Column(String, primary_key=True, index=True)
user_id=Column(Integer, ForeignKey('tbl_stu_usr-users.id'))
institute=Column(String)
website=Column(String)
country=Column(String)
city=Column(String)
degree=Column(String)
start_date=Column(DateTime)
end_date=Column(DateTime)
description=Column(String)

You should have an authentication. To elaborate, the intended user should first login. You can issue a login token, which you can later validate before updating.
How do you do that?
There are many ways, I’ll add a few links below for reference, you may use any of it.
Simple solution from docs
FastAPI users project

Can I sort my SQLAlchemyAutoSchema manually?

I've just started working with the marshmallow-sqlalchemy package in python for my flask application. All works fine and the api spits out the content of my database, but seems to sort the fields alphabetically as opposed to the order I have created them using the SQLAlchemy.Model class. Now I was wondering if there is a way to prohibit that or at least sort the fields somehow manually?
This is how I create my database table:
class Product(db.Model):
p_id = db.Column(db.Integer, primary_key=True)
p_name = db.Column(db.String(100), nullable=False)
p_type = db.Column(db.String, nullable=False)
p_size = db.Column(db.Integer, nullable=False)
p_color = db.Column(db.String, nullable=False)
def __repr__(self):
return f"Product(name={self.p_name}, type={self.p_type}, size={self.p_size}, color={self.p_color})"
And this is my schema:
class ProductSchema(SQLAlchemyAutoSchema):
class Meta:
ordered = True #I have read about this property on another post, but doesn't do anything here
model = Product
My function returning the content in json format:
def get(self, id):
if id:
product = Product.query.filter_by(p_id=id).first()
if product:
product_schema = ProductSchema()
output = product_schema.dump(product)
else:
abort(Response('product not found', 400))
else:
products = Product.query.all()
products_schema = ProductSchema(many=True)
output = products_schema.dump(products)
return jsonify(output), 200
Aaand the output I get (alphabetically sorted):
[
{
"p_color": "test color1",
"p_id": 1,
"p_name": "test name1",
"p_size": 8,
"p_type": "test type1"
},
{
"p_color": "test color2",
"p_id": 2,
"p_name": "test name2",
"p_size": 8,
"p_type": "test type2"
},
{
"p_color": "test color3",
"p_id": 3,
"p_name": "test name3",
"p_size": 8,
"p_type": "test type3"
},
{
"p_color": "test color4",
"p_id": 4,
"p_name": "test name4",
"p_size": 8,
"p_type": "test type4"
}
]
As described above, my application is fully functional. But I'd at least like to know what's going on. So any help is apreciated!

By default, Flask sorts the keys when dumping json output. This is done so that the order is deterministic, which allows to compute hashes and such.
See the docs.
You may disable this with the JSON_SORT_KEYS parameter.
If you want to debug the marshallow part, just print the dump (output) in your view function.
Unless you have a good reason to force the output order, you're probably better-off just letting it go.
Note: In flask-smorest, I don't mind the payload being alphabetically ordered in the responses but I like it ordered when publishing the OpenAPI spec file, so I don't modify JSON_SORT_KEYS and in the resource serving the spec file, I don't use jsonify but raw json.dumps.

SQLAlchemy insert from a JSON list to database

I have a list with a JSON like so:
print(type(listed)) # <class 'list'>
print (listed)
[
{
"email": "x#gmail.com",
"fullname": "xg gf",
"points": 5,
"image_url", "https://imgur.com/random.pmg"
},
{
... similar json for the next user and so on
}
]
I'm trying to insert them into my postgres database that has a model like this:
class Users(db.Model):
__tablename__ = 'users'
email = db.Column(db.String(), primary_key=True)
displayName = db.Column(db.String())
image = db.Column(db.String())
points = db.Column(db.Integer())
But I'm quite stuck, I've tried several approaches but none worked, anyone can guide me with an example on how to do it properly?

Here's a solution without pandas, using SQLAlchemy Core
create engine
engine = sqlalchemy.create_engine('...')
load the metadata using the engine as the bind parameter
metadata = sqalchemy.Metadata(bind=engine)
make a reference to the table
users_table = sqlalchemy.Table('users', metadata, autoload = True)
you can then start your inserts
for user in json:
query = users_table.insert()
query.values(**user)
my_session = Session(engine)
my_session.execute(query)
my_session.close()
This creates a session for every user in json, but I thought you might like it anyway. Its very flexible and works for any table, you don't even need a model. Just make sure the json doesnt contain any columns that dont exist in the db (this means you will need to use "img_url" (column name) in both the json key and in the db column name)

Here is an example json list, like you provided.
json = [
{
"email": "x#gmail.com",
"fullname": "xg gf",
"points": 5,
"image_url": "https://imgur.com/random.pmg"
},
{
"email": "onur#gmail.com",
"fullname": "o g",
"points": 7,
"image_url": "https://imgur.com/random_x.pmg"
}
]
Now create an empty dataframe all_df and run iterations inside your json list.
Each iteration creates a dataframe with the data from dictionary inside the list, transpose it and append to all_df.
import pandas as pd
all_df = pd.DataFrame()
for i in json:
df = pd.DataFrame.from_dict(data=i, orient='index').T
all_df = all_df.append(df)
Output:
Now you can go ahead create a session to your database and push all_df
all_df.to_sql(con=your_session.bind, name='your_table_name', if_exists='your_preferred_method', index=False)

Using marshmallow-sqlalchemy
validate the incoming JSON
create general utilities for loading and dumping data
Define schemas
schema.py
from marshmallow import EXCLUDE
from marshmallow_sqlalchemy import ModelSchema
from app import db
class UserSchema(ModelSchema):
class Meta(ModelSchema.Meta):
model = Users
sqla_session = db.session
user_schema_full = UserSchema(only=(
'email',
'displayName',
'image',
'points'
))
utils.py
Exact details below don't matter but create general utility for going from JSON to ORM objects and ORM objects to JSON. schema_partial used for auto generated primary keys.
def loadData(data, schema_partial, many=False,
schema_full=None, instance=None):
try:
if instance is not None:
answer = schema_full.load(data, instance=instance, many=many)
else:
answer = schema_partial.load(data, many=many)
except ValidationError as errors:
raise InvalidData(errors, status_code=400)
return answer
def loadUser(data, instance=None, many=False):
return loadData(data=data,
schema_partial=user_schema_full,
many=many,
schema_full=user_schema_full,
instance=instance)
def dumpData(load_object, schema, many=False):
try:
answer = schema.dump(load_object, many=many)
except ValidationError as errors:
raise InvalidDump(errors, status_code=400)
return answer
def dumpUser(load_object, many=False):
return dumpData(load_object, schema=user_schema_full, many=many)
Use loadUser and dumpUser within api to produce clean flat code.
api.py
#app.route('/users/', methods=['POST'])
def post_users():
"""Post many users"""
users_data = request.get_json()
users = loadUser(users_data, many=True)
for user in users:
db.session.add(user)
object_dump = dumpUser(users, many=True)
db.session.commit()
return jsonify(object_dump), 201

how to get mongoengine object id in flask program

i am using mongoengine to integrate with flask , i wanted to know how to get document object id every time i try i get File "/var/www/flask_projects/iot_mongo/app.py", line 244, in post return jsonify(user.id) AttributeError: 'BaseQuerySet' object has no attribute 'id'
class Test(Resource):
def post(self):
parser = reqparse.RequestParser()
parser.add_argument('email',required=True, help='email')
args=parser.parse_args()
user=AdminUser.objects(email=args['email'])
return jsonify(user.id)
api.add_resource(Test,'/test')
if __name__ == '__main__':
app.run(debug=True)

I've been doing this. An example User model would be like~
class User(Document):
first_name = StringField(required=True, max_length=50)
last_name = StringField(required=True, max_length=50)
username = StringField(required=True)
password = StringField(required=True, min_length=6)
def to_json(self):
return {
"_id": str(self.pk),
"first_name": self.first_name,
"last_name": self.last_name,
"username": self.username,
"password": self.password
}
I convert the id to a string. I would then get a single object with~
user = User.objects.get(pk=user_id)
return user.to_json()
for a whole object, but if I just want the id I would do...
user.pk()
I created my own to_json method that converts the primary key to a string because otherwise it would return "id": ObjectID("SomeID") instead of neatly printing "id": "SomeID".
Hope this helps!
If you want to find someone by email I suggest~
User.objects.get(email=args['email'])

Check out the documentation, Document.objects is a QuerySet object.
You seem to be expecting that this part of your code
user=AdminUser.objects(email=args['email']) # user will be a QuerySet
will give you a single result, which is not the case, it will give you a QuerySet with zero or more results. It does not have an attribute id, this is why you get the error message you are seeing when you try to access this attribute here:
return jsonify(user.id) # QuerySet does not have the attribute id
You need to fetch the actual result(s) you want from it, assuming you are sure your query will return a single result, or do not care that there might be more than one result and just want the first one, you probably want something along these lines:
user=AdminUser.objects(email=args['email']).first() # extract first result
return jsonfiy(user)
Alernatively, returning all results would look like this:
users=AdminUser.objects(email=args['email']).all() # extract all results
return jsonfiy(users)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Import json with missing keys into postgres with Flask-Sqlalchemy - python

You can do this much easier and faster all in one go by executing this native parameterized query passing the text contents of the JSON file as parameter jsontext: insert into school select * from jsonb_populate_recordset(null::school, :jsontext::jsonb);

Related

How to deter recursion in Flask JSON output for many-to-many relationships?

How To set edit / delete permissions for the creator only in FAST API

Can I sort my SQLAlchemyAutoSchema manually?

SQLAlchemy insert from a JSON list to database

how to get mongoengine object id in flask program

Categories

Resources