Django ORM and multiprocessing - python

I am using Django ORM in my python script in a decoupled fashion i.e. it's not running in context of a normal Django Project.
I am also using the multi processing module. And different process in turn are making queries.
The process ran successfully for an hr and exited with this message
"IOError: [Errno 32] Broken pipe"
Upon futhur diagnosis and debugging this error pops up when I call save() on the model instance.
I am wondering
Is Django ORM Process save ?
Why would this error arise else ?
Cheers
Ankur
Found the Answer I was calling a return after starting the process. This error sneaked in as i did a small cut and paste of a function.

It's a little hard to say without more information, but the problem is probably caused by having an open database connection as you spawn new processes, and then trying to use that database connection in the separate processes. Don't re-use database connections from the parent process in multiprocessing workers you spawn; always recreate database connections.

Related

Run code on first Django start

I have a Django application written to handle displaying a webpage with data from a model based on the primary key passed in the URL, this all works fine and the Django component is working perfectly for the most part.
My question though is, and I have tried multiple methods such as using an AppConfig, is how I can make it so when the Django server boots up, code is called that would then create a separate thread which would then monitor an external source, logging valid data from that source as a model into the database.
I have the threading code written along with the section that creates the model and saves it in the database, my issue though is that if I try to use an AppConfig to create the thread which would then handle the code, I get an django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet. error and the server does not boot up.
Where would be appropriate to place the code? Is my approach incorrect to the matter?
Trying to use threading to get around blocking processes like web servers is an exercise in pain. I've done it before and it's fragile and often yields unpredictable results.
A much easier idea is to create a separate worker that runs in a totally different process that you start separately. It would have the same database access and could even use your Django models. This is how hosts like Heroku approach this problem. It comes with the added benefit of being able to be tested separately and doesn't need to run at all while you're working on your main Django application.
These days, with a multitude of virtualization options like Vagrant and containerization options like Docker, running parallel processes and workers is trivial. In the wild they may literally be running on separate servers with your database on yet another server. As was mentioned in the comments, starting a worker process could easily be delegated to a separate Django management command. This, in turn, can be fairly easily turned into separate worker processes by gunicorn on your web server.

sqlite3: avoiding "database locked" collision

I am running two python files on one cpu in parallel, both of which make use of the same sqlite3 database. I am handling the sqlite3 database using sqlalchemy and my understanding is that sqlalchemy handles all the threading database issues within one app. My question is how to handle the access from the two different apps?
One of my two programs is a flask application and the other is a cronjob which updates the database from time to time.
It seems that even read-only tasks on the sqlite database lock the database, meaning that if both apps want to read or write at the same time I get an error.
OperationalError: (sqlite3.OperationalError) database is locked
Lets assume that my cronjob app runs every 5min. How can I make sure that there are no collisions between my two apps? I could write some read flag into a file which I check before accessing the database, but it seems to me there should be a standard way to do this?
Furthermore I am running my app with gunicorn and in principle it is possible to have multiple jobs running... so I eventually want more than 2 parallel jobs for my flask app...
thanks
carl
It's true. Sqlite isn't built for this kind of application. Sqlite is really for lightweight single-threaded, single-instance applications.
Sqlite connections are one per instance, and if you start getting into some kind of threaded multiplexer (see https://www.sqlite.org/threadsafe.html) it'd be possible, but it's more trouble than it's worth. And there are other solutions that provide that function-- take a look at Postgresql or MySQL. Those DB's are open source, are well documented, well supported, and support the kind of concurrency you need.
I'm not sure how SQLAlchemy handles connections, but if you were using Peewee ORM then the solution is quite simple.
When your Flask app initiates a request, you will open a connection to the DB. Then when Flask sends the response, you close the DB.
Similarly, in your cron script, open a connection when you start to use the DB, then close it when the process is finished.
Another thing you might consider is using SQLite in WAL mode. This can improve concurrency. You set the journaling mode with a PRAGMA query when you open your connection.
For more info, see http://charlesleifer.com/blog/sqlite-small-fast-reliable-choose-any-three-/

How is wsgi application handled by apache server with mod_wsgi

I'm from PHP/Apache background. With PHP/Apache setup PHP interpreter is loaded by apache module and then on every page request new worker is created that executes entire script. I've been working with WSGI applications in Python recently and it seems that Apache (mod_wsgi) server loads the entire application and keeps it alive. Then based on incoming requests executes pieces of code. If my understanding is correct it would explain why some of my objects won't execute __del__?
Edit:
In my case I have a wrapper class around python mysql module. There's only one instance used in entire application. It's responsibilities are to run queries and reuse mysql connection throughout the code when possible, but I have to make sure connection will be closed when everything is processed. In some parts I'm using multiprocessing where I tell the object not to reuse connection for spawn child processes. Initially I thought I can use __del__ to implement closing mysql connection but noticed It would be never called. I did end up using flask teardown function to make sure connection is closed after each request but was wondering if there are any other options to handle this nicely.

When to disconnect from mongodb

I am fairly new to databases and have just figured out how to use MongoDB in python2.7 on Ubuntu 12.04. An application I'm writing uses multiple python modules (imported into a main module) that connect to the database. Basically, each module starts by opening a connection to the DB, a connection which is then used for various operations.
However, when the program exits, the main module is the only one that 'knows' about the exiting, and closes its connection to MongoDB. The other modules do not know this and have no chance of closing their connections. Since I have little experience with databases, I wonder if there are any problems leaving connections open when exiting.
Should I:
Leave it like this?
Instead open the connection before and close it after each operation?
Change my application structure completely?
Solve this in a different way?
You can use one pymongo connection across different modules. You can open it in a separate module and import it to other modules on demand. After program finished working, you are able to close it. This will be the best option.
About other questions:
You can leave like this (all connections will be closed when script finishes execution), but leaving something unclosed is a bad form.
You can open/close connection for each operation (but establishing connection is a time-expensive operation.
That what I'd advice you (see this answer's first paragraph)
I think this point can be merged with 3.

caching issues in MySQL response with MySQLdb in Django

I use MySQL with MySQLdb module in Python, in Django.
I'm running in autocommit mode in this case (and Django's transaction.is_managed() actually returns False).
I have several processes interacting with the database.
One process fetches all Task models with Task.objects.all()
Then another process adds a Task model (I can see it in a database management application).
If I call Task.objects.all() on the first process, I don't see anything. But if I call connection._commit() and then Task.objects.all(), I see the new Task.
My question is: Is there any caching involved at connection level? And is it a normal behaviour (it does not seems to me)?
This certainly seems autocommit/table locking - related.
If mysqldb implements the dbapi2 spec it will probably have a connection running as one single continuous transaction. When you say: 'running in autocommit mode': do you mean MySQL itself or the mysqldb module? Or Django?
Not intermittently commiting perfectly explains the behaviour you are getting:
i) a connection implemented as one single transaction in mysqldb (by default, probably)
ii) not opening/closing connections only when needed but (re)using one (or more) persistent database connections (my guess, could be Django-architecture-inherited).
ii) your selects ('reads') cause a 'simple read lock' on a table (which means other connections can still 'read' this table but connections wanting to 'write data' can't (immediately) because this lock prevents them from getting an 'exclusive lock' (needed 'for writing') on this table. The writing is thus postponed indefinitely (until it can get a (short) exclusive lock on the table for writing - when you close the connection or manually commit).
I'd do the following in your case:
find out which table locks are on your database during the scenario above
read about Django and transactions here. A quick skim suggests using standard Django functionality implicitely causes commits. This means sending handcrafted SQL maybe won't (insert, update...).

Categories

Resources