how to load pre-existing data flask-sqlalchemy

how to load pre-existing data flask-sqlalchemy - python

I am writing a REST API using flask_restful and managing the mysql db using flask-sqlalchemy. I would like to know what the best practice for loading existing data into a table when the app starts is.
I am currently calling the db.create_all() method withing an endpoint with the #app.before_first_request decorator. I would like to then fill in one of the tables created with existing data from a csv file. Should the code to push the data in a separate script or within the function?
Thanks!

I would separate loading initial database data from application initialization, because probably initial data from my experience would not be changed often and can take some time if file is bigger, and usually you don't need to reload it in the database each time application loads.
I think you will most certainly need database migrations at some point in your application development, so I would suggest setting up Flask-Migrate to handle that, and running its upgrade method on application creation (create_app method if you are using Flask application factories pattern) which will handle database migrations. I am saying this since it will save you some headache when you are introducing it later on database already populated with actual data which is initialized with db.create_all().
And for populating database with seed data I would go with Flask CLI or Flask-Script. In one of my recent projects I used Flask-Script for this, and created separate manage.py file which amongst other application management methods contained initial data seeding method which looked something like this:
#manager.command
def seed():
"Load initial data into database."
db.session.add(...)
db.session.commit()
And it was run on demand by following command:
python manage.py seed

Related

Flask-SQLAlchemy with multiple gunicorn workers causes inconsistent reads

I have developed a Flask application, and so far I have only ever deployed it using a single worker. The app connects to an SQLite DB using Flask-SQLAlchemy. In the beginning, I check if my DB already has data, and if not, I initialize some data like a default setup user like this:
root_user = User.query.filter_by(username='root').one_or_none()
if not root_user:
new_user = User(username="root", password_hash="SomeLongAndSecurePasswordHash")
new_user.roles = [serveradmin_role]
db.session.add(new_user)
When I run this code with multiple gunicorn workers and threads, the workers crash because they try to create multiple root users, which fails my UNIQUE constraint in the DB. Apparently, they all read the DB at the same time, when the root user does not exists yet, and then they all try to write the user to the DB, which only works for one of the workers.
What would be a good way of preventing this? I feel like my code should just deal better with the SQLAlchemy error being thrown, or is there anything I am missing here? The same thing might also happen in production, if two people try to create the same user at exactly the same time, how would I deal with it there?

Do I authenticate at database level, at Flask User level, or both?

I have an MS-SQL deployed on AWS RDS, that I'm writing a Flask front end for.
I've been following some intro Flask tutorials, all of which seem to pass the DB credentials in the connection string URI. I'm following the tutorial here:
https://medium.com/#rodkey/deploying-a-flask-application-on-aws-a72daba6bb80#.e6b4mzs1l
For deployment, do I prompt for the DB login info and add to the connection string? If so, where? Using SQLAlchemy, I don't see any calls to create_engine (using the code in the tutorial), I just see an initialization using config.from_object, referencing the config.py where the SQLALCHEMY_DATABASE_URI is stored, which points to the DB location. Trying to call config.update(dict(UID='****', PASSWORD='******')) from my application has no effect, and looking in the config dict doesn't seem to have any applicable entries to set for this purpose. What am I doing wrong?
Or should I be authenticating using Flask-User, and then get rid of the DB level authentication? I'd prefer authenticating at the DB layer, for ease of use.

The tutorial you are using uses Flask-Sqlalchemy to abstract the database setup stuff, that's why you don't see engine.connect().
Frameworks like Flask-Sqlalchemy are designed around the idea that you create a connection pool to the database on launch, and share that pool amongst your various worker threads. You will not be able to use that for what you are doing... it takes care of initializing the session and things early in the process.
Because of your requirements, I don't know that you'll be able to make any use of things like connection pooling. Instead, you'll have to handle that yourself. The actual connection isn't too hard...
engine = create_engine('dialect://username:password#host/db')
connection = engine.connect()
result = connection.execute("SOME SQL QUERY")
for row in result:
# Do Something
connection.close()
The issue is that you're going to have to do that in every endpoint. A database connection isn't something you can store in the session- you'll have to store the credentials there and do a connect/disconnect loop in every endpoint you write. Worse, you'll have to either figure out encrypted sessions or server side sessions (without a db connection!) to prevent keeping those credentials in the session from becoming a horrible security leak.
I promise you, it will be easier both now and in the long run to figure out a simple way to authenticate users so that they can share a connection pool that is abstracted out of your app endpoints. But if you HAVE to do it this way, this is how you will do it. (make sure you are closing those connections every time!)

Setup Cassandra DB in django using cqlengine but without using django-cassandra-engine

I'm a Django beginner and have developed 1 app using mysql as primary DB, but in my next project I have to use Cassandra db using https://github.com/cqlengine/cqlengine but do not use https://github.com/r4fek/django-cassandra-engine (which is a wrapper over cqlengine?).
I dont have any clue How do I start? I mean how and where should I create db connection and then create models in models.py file?
Should I create connection in init.py file?in views.py? what would be the most efficient way?
would be great(for future readers too) if someone provide a simple configuration and a model.

The twissandra demo should be a good example of how to build an app using Cassandra and Django.
In this implementation there is no models.py and the connection is maintained in the file cass.py.
You'll see cass.py also hosts all the functions required to return data from the C* database and make objects which are used by the system. This is where you would swap out the api requests with your CqlEngine code.
I hope these resources get you pointed in the right direction

Rustyrazorblade shows an easy way to accomplish this via his CQLEngine tutorial branch HERE.
You can easily setup the connection by doing something like this in your_app_project/models/connection.py:
from cqlengine import management
from cqlengine.connection import setup
def connect():
setup(["127.0.0.1", "127.0.1.1", "127.0.1.2"], "tutorial", retry_connect=True)
management.create_keyspace("tutorial", replication_factor=1, strategy_class="SimpleStrategy")
In this example: "tutorial" is the keyspace we are using, strategy_class is the replication strategy your C* instance is using, replication_factor is the amount of replications that will be stored throughout the ring, 127.0.0.1 is a Cassandra cluster node IP address (you can pass this a list or a string) and retry_connect specifies whether or not you would like it to attempt to reconnect if there is a connection failure.
From here, it is very easy for new C* users to get confused. You can call this anytime Before syncing the C* tables or using a C* query.
So, you'll want to do something like:
from cqlengine.management import sync_table
from models.connection import connect
from models.somemodels import MyCassandraModel
# This will fire off our previously setup 'connect' method
connect()
# This will setup the Model as a table in your C* DB
sync_table(MyCassandraModel)
You can even drop this into manage.py, just as long as that CQLEngine setup() is properly executed.

Django TestCase: don't flush the DB after each test

I'm working on a Django API project with a rather unusual configuration (I think):
I have indeed two Django projects: one is the main API and one is the user API. Whenever I create a user using the main API, the user is in fact created in the database of the user API (I communicate between the two API using http requests). In the main API, I keep a table of users that contains only a unique id. When the user is created in the user API, it's created with the same unique id as in the main API.
I have to do this because in production, I have to store the data in different servers.
Now comes my problem.
I want to write tests for my main API (for instance test the user creation, user update, user deletion). The problem is that when I run the tests, I have to run a separate instance of Django (using another port) that represents the user API. Running the tests, Django creates a test database for the main API but since I use http requests to communicate with the user API, there is no test database for the user API so I have to flush the DB after I run all the tests. Until now, I used the unittest library and everything was fine. But I would like to be able to override some settings during the tests (for instance, the address of the user API is a setting and I would like to have a different address for the tests). For that, I have to use django.test.TestCase but I have the following problem:
imagine I have a test_a method that creates a user A and a test_b method that creates a user B. With django.test.TestCase, test_a is run, user A is created with id 1. Then I believe that the test database (of the main API) is flushed because when test_b is run, user B is created with id 1 also. The problem is that between the two tests, the database of the user API is not flushed so I get an error because I cannot create user B in the test database.
I'm looking for an elegant way to deal with this problem but I really have not idea.
(Sorry, this is quite long but I wanted to be a little bit precise).

Can't you do the DB flushing in the setUp method of your TestCase? Since that method runs once before each test you can have a clean DB for test_a and test_b.
To flush the db using a bash script, you can use subprocess, like so:
def setUp(self):
import subprocess
subprocess.call(['<path-to-bash-script>', 'arg1', 'arg2'])

Update and deploy PostgreSQL schema to Heroku

I have a PostgreSQL schema that resides in a schema.sql file that gets run each time a database connection is initiated in Python. It looks something like:
CREATE TABLE IF NOT EXISTS users (
id SERIAL PRIMARY KEY,
facebook_id TEXT NOT NULL,
name TEXT NOT NULL,
access_token TEXT,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
The app is deployed on Heroku, using their PostgreSQL and everything works as expected.
Now, what if I want to change a bit the structure of my users table? How can I do this the easiest and the best way? I thought of writing an ALTER... line in schema.sql for each change I want to produce in the database, but I don't think this is the best approach, since after some time the schema file will be full of ALTERs and it will slow down my app.
What's the indicated way to deploy changes made to a database?

Running a hard-coded script on each connection is not a great way to handle schema management.
You need to either manage the schema manually, or use a full-fledged tool that keeps a schema version identifier in the database, checks that, and applies a script to upgrade to the next schema version if it's different to the latest one. Rails calls this "migrations" and it kind-of works. If you're using Django it has schema management too.
If you're not using a framework like that, I suggest just writing your own schema upgrade scripts. Add a "schema_version" table with a single row. SELECT it when the app first starts after a redeploy and if it's lower than the current version the app knows about, apply the update script(s) in order, eg schema_1_to_2, schema_2_to_3, etc.
I don't recommend doing this on connect, do it on app start, or better, as a special maintenance command. If you do it on every connection you'll have multiple connections trying to make the same changes and you'll land up with duplicated columns and all sorts of other mess.

I support several django apps on heroku with Postgres. I just connect via PgAdmin and run my scripts when changes are required. I don't see any need for running a script every time a connection is made.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.