django migration doesn't progress and makes database lock - python

I've tried to deploy (includes migration) production environment. But my Django migration (like add columns) very often stops and doesn't progress anymore.
I'm working with postgresql 9.3, and I find some reasons of this problem. If postgresql has an active transaction, alter table query is not worked. So until now, restarting postgresql service before migration was a solution, but I think this is a bad idea.
Is there any good idea to progress deploying nicely?

Open connections will likely stop schema updates. If you can't wait for existing connections to finish, or if your environment is such that long-running connections are used, you may need to halt all connections while you run the update(s).
The downtime, if it's likely to be significant to you, could be mitigated if you have a read-only slave that could stay online. If not, ensuring your site fails over to some sort of error/explanation page/redirect would at least avoid raw failure code responses to requests that come in if downtime for migrations is acceptable.

Related

How do I analyze what is hanging my Flask application

I have a Python Flask web application, which uses a Postgresql database.
When I put a load on my application, it stops to respond. This only happens when I request pages which uses the database.
My setup:
nginx frontend (although in my test environment, skipping this tier doesn't make a difference), connecting via UNIX socket to:
gunicorn application server with 3 child processes, connecting via UNIX socket to:
pgbouncer, connection pooler for PostgreSQL, connecting via TCP/IP to:
I need pgbouncer, because SQLAlchemy has connection pooling per process. If I don't use pgbouncer, my database get's overloaded with connection requests very quickly.
postgresql 13, the database server.
I have a test environment on Debian Linux (with nginx) and on my iMac, and the application hang occurs on both machines.
I put load on the application with hey, a http load generator. I use the default, which generates 200 requests with 50 workers. The test-page issues two queries to the database.
When I run my load test, I see gunicorn getting worker timeouts. It's killing the timedout processes, and starts up new ones. Eventually (after a lot of timeouts) everything is fine again. For this, I lowered the statement timeout setting of Postgresql. First is was 30 and later I set it to 15 seconds. Gunicorn's worker timeouts happend more quickly now. (I don't understand this behaviour; why would gunicorn recycle a worker, when a query times out?)
When I look at pgbouncer, with the show clients; command I see some waiting clients. I think this is a hint of the problem. My Web application is waiting on pgbouncer, and pgbouncer seems to be waiting for Postgres. When the waiting lines are gone, the application behaves normally again (trying a few requests). Also, when I restart the gunicorn process, everything goes back to normal.
But with my application under stress, when I look at postgresql (querying with a direct connection, by-passing pgbouncer), I can't see anything wrong, or waiting or whatever. When I query pg_stat_activity, all I see are idle connections (except from then connection I use to query the view).
How do I debug this? I'm a bit stuck. pg_stat_activity should show queries running, but this doesn't seem to be the case. Is there something else wrong? How do I get my application to work under load, and how to analyze this.
So, I solved my question.
As it turned out, not being able to see what SqlAlchemy was doing turned out to be the most confusing part. I could see what Postgres was doing (pg_stat_activity), and also what pgbouncer was doing (show clients;).
SqlAlchemy does have an echo and pool_echo setting, but for some reason this didn't help me.
What helped me was the realization that SqlAlchemy uses standard python logging. For me, the best way to check it out was to add the default Flask logging handler to these loggers, something like this:
log_level = "INFO"
app.logger.setLevel(log_level)
for log_name in ["sqlalchemy.dialects", "sqlalchemy.engine", "sqlalchemy.orm", "sqlalchemy.pool"]:
additional_logger = logging.getLogger(log_name)
additional_logger.setLevel(log_level)
additional_logger.addHandler(app.logger.handlers[0])
(of course I can control my solution via a config-file, but I left that part out for clarity)
Now I could see what was actually happening. Still no statistics, like with the other tiers, but this helped.
Eventually I found the problem. I was using two (slightly) different connection strings to the same database. I had them because the first was for authentication (used by Flask-Session and Flask-Login via ORM), and the other for application queries (used by my own queries via PugSQL). In the end, different connection strings were not necessary. However it made SqlAlchemy do strange things when in stress.
I'm still not sure what the actual problem was (probably there were two connection pools which were fighting each other), but this solved it.
Nice benefit: I don't need pg_bouncer in my situation, so that removes a lot of complexity.

sqlite3: avoiding "database locked" collision

I am running two python files on one cpu in parallel, both of which make use of the same sqlite3 database. I am handling the sqlite3 database using sqlalchemy and my understanding is that sqlalchemy handles all the threading database issues within one app. My question is how to handle the access from the two different apps?
One of my two programs is a flask application and the other is a cronjob which updates the database from time to time.
It seems that even read-only tasks on the sqlite database lock the database, meaning that if both apps want to read or write at the same time I get an error.
OperationalError: (sqlite3.OperationalError) database is locked
Lets assume that my cronjob app runs every 5min. How can I make sure that there are no collisions between my two apps? I could write some read flag into a file which I check before accessing the database, but it seems to me there should be a standard way to do this?
Furthermore I am running my app with gunicorn and in principle it is possible to have multiple jobs running... so I eventually want more than 2 parallel jobs for my flask app...
thanks
carl
It's true. Sqlite isn't built for this kind of application. Sqlite is really for lightweight single-threaded, single-instance applications.
Sqlite connections are one per instance, and if you start getting into some kind of threaded multiplexer (see https://www.sqlite.org/threadsafe.html) it'd be possible, but it's more trouble than it's worth. And there are other solutions that provide that function-- take a look at Postgresql or MySQL. Those DB's are open source, are well documented, well supported, and support the kind of concurrency you need.
I'm not sure how SQLAlchemy handles connections, but if you were using Peewee ORM then the solution is quite simple.
When your Flask app initiates a request, you will open a connection to the DB. Then when Flask sends the response, you close the DB.
Similarly, in your cron script, open a connection when you start to use the DB, then close it when the process is finished.
Another thing you might consider is using SQLite in WAL mode. This can improve concurrency. You set the journaling mode with a PRAGMA query when you open your connection.
For more info, see http://charlesleifer.com/blog/sqlite-small-fast-reliable-choose-any-three-/

master slave postgresql with logging and monitoring for a django application

I have a django application running.
The database backend that i use for it is PostGreSql.
Everything is working fine for me.
Now I want to create a master slave replication for my database, such that:
Whatever change happens on master, is replicated on slave.
If the master shuts down, the slave takes charge, and an error notification is sent.
Backup is created automatically of the database.
Logging is taken care of.
Monitoring is taken care of.
I went through https://docs.djangoproject.com/en/dev/topics/db/multi-db/ the entire article.
But I don't have much idea, how to implement the all 5 steps above. As you would have understood, I don't have much experience, hence please suggest pointers around, how to proceed. Thanks.
Have I missed, anything which should be kept in mind for database purpose??
It sounds like you want a dual-node HA setup for PostgreSQL, using synchronous streaming replication and failover.
Check out http://repmgr.org/ for one tool that'll help with this, particularly when coupled with a PgBouncer front-end. You may also want to read about "heartbeat", "high availability", "fencing" and "STONITH".
You need to cope with the master continuing to run but failing, not just it shutting down. Consider what happens if the master runs out of disk space; all write queries will return errors, but it won't shut down or crash.
This is really an issue of database administration / server management.

Postgres 8.4.4 + psycopg2 + python 2.6.5 + Win7 instability

You can see the combination of software components I'm using in the title of the question.
I have a simple 10-table database running on a Postgres server (Win 7 Pro). I have client apps (python using psycopg to connect to Postgres) who connect to the database at random intervals to conduct relatively light transactions. There's only one client app at a time doing any kind of heavy transaction, and those are typically < 500ms. The rest of them spend more time connecting than actually waiting for the database to execute the transaction. The point is that the database is under light load, but the load is evenly split between reads and writes.
My client apps run as servers/services themselves. I've found that it is pretty common for me to be able to (1) take the Postgres server completely down, and (2) ruin the database by killing the client app with a keyboard interrupt.
By (1), I mean that the Postgres process on the server aborts and the service needs to be restarted.
By (2), I mean that the database crashes again whenever a client tries to access the database after it has restarted and (presumably) finished "recovery mode" operations. I need to delete the old database/schema from the database server, then rebuild it each time to return it to a stable state. (After recovery mode, I have tried various combinations of Vacuums to see whether that improves stability; the vacuums run, but the server will still go down quickly when clients try to access the database again.)
I don't recall seeing the same effect when I kill the client app using a "taskkill" - only when using a keyboard interrupt to take the python process down. It doesn't happen all the time, but frequently enough that it's a major concern (25%?).
Really surprised that anything on a client would actually be able to take down an "enterprise class" database. Can anyone share tips on how to improve robustness, and hopefully help me to understand why this is happening in the first place? Thanks, M
If you're having problems with postgresql acting up like this, you should read this page:
http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
For an example of a real bug, and how to ask a question that gets action and answers, read this thread.
http://archives.postgresql.org/pgsql-general/2010-12/msg01030.php

Turning on DEBUG on a Django production site

I'm using the Django ORM in a non-Django application and would like to turn on the DEBUG setting so that I can periodically log my queries. So I have something vaguely like this:
from django.db import connection
def thread_main_loop():
while keep_going:
connection.queries[:] = []
do_something()
some_logging_function(connection.queries)
I would like to do this on my production server, but the doc warns, "It is also important to remember that when running with DEBUG turned on, Django will remember every SQL query it executes. This is useful when you are debugging, but on a production server, it will rapidly consume memory."
Because the connection.queries list is cleared every time through the main loop of each thread, I believe that Django query logging will not cause my application to consume memory. Is this correct? And are there any other reasons not to turn DEBUG on in a production environment if I'm only using the Django ORM?
In DEBUG mode any error in your application will lead to the detailed Django stacktrace. This is very undesirable in a production environment as it will probably leak sensitive information that attackers can use against your site. Even if your application seems pretty stable, I wouldn't risk it.
I would rather employ a middleware that somehow logs queries to a file. Or take statistics of the database directly, e.g (for MySQL).
watch -n 1 mysqladmin --user=<user> --password=<password> processlist
Edit:
If you are only using the Django ORM, then afaik only two things will be different:
Queries will be saved with the CursorDebugWrapper
If a query results in a database warning, this will raise an exception.

Categories

Resources