Django TestCase: don't flush the DB after each test

Django TestCase: don't flush the DB after each test - python

I'm working on a Django API project with a rather unusual configuration (I think):
I have indeed two Django projects: one is the main API and one is the user API. Whenever I create a user using the main API, the user is in fact created in the database of the user API (I communicate between the two API using http requests). In the main API, I keep a table of users that contains only a unique id. When the user is created in the user API, it's created with the same unique id as in the main API.
I have to do this because in production, I have to store the data in different servers.
Now comes my problem.
I want to write tests for my main API (for instance test the user creation, user update, user deletion). The problem is that when I run the tests, I have to run a separate instance of Django (using another port) that represents the user API. Running the tests, Django creates a test database for the main API but since I use http requests to communicate with the user API, there is no test database for the user API so I have to flush the DB after I run all the tests. Until now, I used the unittest library and everything was fine. But I would like to be able to override some settings during the tests (for instance, the address of the user API is a setting and I would like to have a different address for the tests). For that, I have to use django.test.TestCase but I have the following problem:
imagine I have a test_a method that creates a user A and a test_b method that creates a user B. With django.test.TestCase, test_a is run, user A is created with id 1. Then I believe that the test database (of the main API) is flushed because when test_b is run, user B is created with id 1 also. The problem is that between the two tests, the database of the user API is not flushed so I get an error because I cannot create user B in the test database.
I'm looking for an elegant way to deal with this problem but I really have not idea.
(Sorry, this is quite long but I wanted to be a little bit precise).

Can't you do the DB flushing in the setUp method of your TestCase? Since that method runs once before each test you can have a clean DB for test_a and test_b.
To flush the db using a bash script, you can use subprocess, like so:
def setUp(self):
import subprocess
subprocess.call(['<path-to-bash-script>', 'arg1', 'arg2'])

Related

Flask-SQLAlchemy with multiple gunicorn workers causes inconsistent reads

I have developed a Flask application, and so far I have only ever deployed it using a single worker. The app connects to an SQLite DB using Flask-SQLAlchemy. In the beginning, I check if my DB already has data, and if not, I initialize some data like a default setup user like this:
root_user = User.query.filter_by(username='root').one_or_none()
if not root_user:
new_user = User(username="root", password_hash="SomeLongAndSecurePasswordHash")
new_user.roles = [serveradmin_role]
db.session.add(new_user)
When I run this code with multiple gunicorn workers and threads, the workers crash because they try to create multiple root users, which fails my UNIQUE constraint in the DB. Apparently, they all read the DB at the same time, when the root user does not exists yet, and then they all try to write the user to the DB, which only works for one of the workers.
What would be a good way of preventing this? I feel like my code should just deal better with the SQLAlchemy error being thrown, or is there anything I am missing here? The same thing might also happen in production, if two people try to create the same user at exactly the same time, how would I deal with it there?

how to load pre-existing data flask-sqlalchemy

I am writing a REST API using flask_restful and managing the mysql db using flask-sqlalchemy. I would like to know what the best practice for loading existing data into a table when the app starts is.
I am currently calling the db.create_all() method withing an endpoint with the #app.before_first_request decorator. I would like to then fill in one of the tables created with existing data from a csv file. Should the code to push the data in a separate script or within the function?
Thanks!

I would separate loading initial database data from application initialization, because probably initial data from my experience would not be changed often and can take some time if file is bigger, and usually you don't need to reload it in the database each time application loads.
I think you will most certainly need database migrations at some point in your application development, so I would suggest setting up Flask-Migrate to handle that, and running its upgrade method on application creation (create_app method if you are using Flask application factories pattern) which will handle database migrations. I am saying this since it will save you some headache when you are introducing it later on database already populated with actual data which is initialized with db.create_all().
And for populating database with seed data I would go with Flask CLI or Flask-Script. In one of my recent projects I used Flask-Script for this, and created separate manage.py file which amongst other application management methods contained initial data seeding method which looked something like this:
#manager.command
def seed():
"Load initial data into database."
db.session.add(...)
db.session.commit()
And it was run on demand by following command:
python manage.py seed

How to log a user out in django (without the request)

In django I can log a user out with:
from django.contrib.auth import logout
logout(request)
However, how would I manually log a use out -- for example I want to "sign users out of all tabs" -- that is, how do I flush all session for that user in my db?

You can add the following method to your user object:
from django.utils import timezone
from django.contrib.sessions.models import Session
class User(models.model):
...
def remove_all_sessions(self):
user_sessions = []
all_sessions = Session.objects.filter(expire_date__gte=timezone.now())
for session in Session.objects.all():
if str(self.pk) == session.get_decoded().get('_auth_user_id'):
user_sessions.append(session.pk)
return Session.objects.filter(pk__in=user_sessions).delete()

You can iterate over all sessions in DB, decode all of them, and delete those belonging to that user. But it's slow, particularly if your site has high traffic and there are lots of sessions.
If you need a faster solution, you can use a session backend that lets you query and get the sessions of a specific user. In these session backends, Session has a foreign key to User, so you don't need to iterate over all session objects:
django-qsessions (based on django's db, cached_db session backends)
django-user-sessions (based on django's db session backend)
Using these backends, deleting all sessions of a user can be done in a single line of code:
user.session_set.all().delete()
Disclaimer: I am the author of django-qsessions.

To understand this problem, consider what happens with the database backend. When a user logs in, Django adds a row to the django_session database table. Django updates this row each time the session data changes. If the user logs out manually, Django deletes the row. But if the user does not log out, the row never gets deleted. A similar process happens with the file backend.
Django does not provide automatic purging of expired sessions. Therefore, it’s your job to purge expired sessions on a regular basis. Django provides a clean-up management command for this purpose: clearsessions. It’s recommended to call this command on a regular basis, for example as a daily cron job.
Note that the cache backend isn’t vulnerable to this problem, because caches automatically delete stale data. Neither is the cookie backed, because the session data is stored by the users’ browsers.
so run:
$ ./manage.py clearsessions

How to maintain a Class instance for each User's session in Flask? [duplicate]

This question already has an answer here:
Store large data or a service connection per Flask session
(1 answer)
Closed 5 years ago.
I am currently building a web application built on Flask Framework with around 10 user accounts in the future when the application has been finished.
There is a Class with heavy module (Compute-intensive), built and used in this application, served as one of the frequently used key features, and I have run into some issues and am seeking for some solutions (let's named it as Class A in file a.py)
Originally, I imported the Class A directly into one of the view file, and created a route function for it, that once an user clicks the button which invokes this route, the route function will then create an instance of Class A, and this instance runs based on received data (like Json). But I found the system can be slow down as the instance of Class A has to be created every single time when the user uses the feature frequently, (also there can be 10 users), and Class A is too heavy to be created again and again.
Therefore I am thinking is there anyway that I can create the instance of Class A for only one time (e.g., the time that the Flask application starts), and each logged in user can access this instance rather than create it over and over again?
Thanks in advance

Flask Requests are stateless, so to preserve data for a user across requests the options are limited. Here are some ideas:
Serialize the class instance, store it in a flask session (just a wrapper for browser session cookies), retrieve later.
Store it in a database, retrieve later when needed
Pickle it, dump it using user name, retrieve when needed.
Alternatively, depending on the application, a cache solution might good enough (ig
Flask-caching). The route/view would instantiate the class the first time it's called and return a value.
If the view is called again with the same arguments/data, the previous return value is returned without running the view function again.

Flask has extensions, which can be setup at startup, exactly like you need.
The docs are here: http://flask.pocoo.org/docs/0.12/extensiondev/
You can probably ignore the whole first part about diskutils etc. and jump to "Initializing Extensions".
We used this exact extension point for this purpose and it works at it should.
You could also use a singleton pattern in your class, but the extension point works well with the rest of the flask ecosystem.

Do I authenticate at database level, at Flask User level, or both?

I have an MS-SQL deployed on AWS RDS, that I'm writing a Flask front end for.
I've been following some intro Flask tutorials, all of which seem to pass the DB credentials in the connection string URI. I'm following the tutorial here:
https://medium.com/#rodkey/deploying-a-flask-application-on-aws-a72daba6bb80#.e6b4mzs1l
For deployment, do I prompt for the DB login info and add to the connection string? If so, where? Using SQLAlchemy, I don't see any calls to create_engine (using the code in the tutorial), I just see an initialization using config.from_object, referencing the config.py where the SQLALCHEMY_DATABASE_URI is stored, which points to the DB location. Trying to call config.update(dict(UID='****', PASSWORD='******')) from my application has no effect, and looking in the config dict doesn't seem to have any applicable entries to set for this purpose. What am I doing wrong?
Or should I be authenticating using Flask-User, and then get rid of the DB level authentication? I'd prefer authenticating at the DB layer, for ease of use.

The tutorial you are using uses Flask-Sqlalchemy to abstract the database setup stuff, that's why you don't see engine.connect().
Frameworks like Flask-Sqlalchemy are designed around the idea that you create a connection pool to the database on launch, and share that pool amongst your various worker threads. You will not be able to use that for what you are doing... it takes care of initializing the session and things early in the process.
Because of your requirements, I don't know that you'll be able to make any use of things like connection pooling. Instead, you'll have to handle that yourself. The actual connection isn't too hard...
engine = create_engine('dialect://username:password#host/db')
connection = engine.connect()
result = connection.execute("SOME SQL QUERY")
for row in result:
# Do Something
connection.close()
The issue is that you're going to have to do that in every endpoint. A database connection isn't something you can store in the session- you'll have to store the credentials there and do a connect/disconnect loop in every endpoint you write. Worse, you'll have to either figure out encrypted sessions or server side sessions (without a db connection!) to prevent keeping those credentials in the session from becoming a horrible security leak.
I promise you, it will be easier both now and in the long run to figure out a simple way to authenticate users so that they can share a connection pool that is abstracted out of your app endpoints. But if you HAVE to do it this way, this is how you will do it. (make sure you are closing those connections every time!)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.