I'm working on a distributed system where one process is controlling a hardware piece and I want it to be running as a service. My app is Django + Twisted based, so Twisted maintains the main loop and I access the database (SQLite) through Django, the entry point being a Django Management Command.
On the other hand, for user interface, I am writing a web application on the same Django project on the same database (also using Crossbar as websockets and WAMP server). This is a second Django process accessing the same database.
I'm looking for some validation here. Is anything fundamentally wrong to this approach? I'm particularly scared of issues with database (two different processes accessing it via Django ORM).
Consider that Django, like all WSGI-based web servers, almost always has multiple processes accessing the database. Because a single WSGI process can handle only one connection at a time, it's normal for servers to run multiple processes in parallel when they get any significant amount of traffic.
That doesn't mean there's no cause for concern. You have the database as if data might change between any two calls to it. Familiarize yourself with how Django uses transactions (default is autocommit mode, not atomic requests), and …
and oh, you said sqlite. Yeah, sqlite is probably not the best database to use when you need to write to it from multiple processes. I can imagine that might work for a single-user interface to a piece of hardware, but if you run in to any problems when adding the webapp, you'll want to trade up to a database server like postgresql.
No there is nothing inherently wrong with that approach. We currently use a similar approach for a lot of our work.
Related
I have a Django application written to handle displaying a webpage with data from a model based on the primary key passed in the URL, this all works fine and the Django component is working perfectly for the most part.
My question though is, and I have tried multiple methods such as using an AppConfig, is how I can make it so when the Django server boots up, code is called that would then create a separate thread which would then monitor an external source, logging valid data from that source as a model into the database.
I have the threading code written along with the section that creates the model and saves it in the database, my issue though is that if I try to use an AppConfig to create the thread which would then handle the code, I get an django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet. error and the server does not boot up.
Where would be appropriate to place the code? Is my approach incorrect to the matter?
Trying to use threading to get around blocking processes like web servers is an exercise in pain. I've done it before and it's fragile and often yields unpredictable results.
A much easier idea is to create a separate worker that runs in a totally different process that you start separately. It would have the same database access and could even use your Django models. This is how hosts like Heroku approach this problem. It comes with the added benefit of being able to be tested separately and doesn't need to run at all while you're working on your main Django application.
These days, with a multitude of virtualization options like Vagrant and containerization options like Docker, running parallel processes and workers is trivial. In the wild they may literally be running on separate servers with your database on yet another server. As was mentioned in the comments, starting a worker process could easily be delegated to a separate Django management command. This, in turn, can be fairly easily turned into separate worker processes by gunicorn on your web server.
I am running two python files on one cpu in parallel, both of which make use of the same sqlite3 database. I am handling the sqlite3 database using sqlalchemy and my understanding is that sqlalchemy handles all the threading database issues within one app. My question is how to handle the access from the two different apps?
One of my two programs is a flask application and the other is a cronjob which updates the database from time to time.
It seems that even read-only tasks on the sqlite database lock the database, meaning that if both apps want to read or write at the same time I get an error.
OperationalError: (sqlite3.OperationalError) database is locked
Lets assume that my cronjob app runs every 5min. How can I make sure that there are no collisions between my two apps? I could write some read flag into a file which I check before accessing the database, but it seems to me there should be a standard way to do this?
Furthermore I am running my app with gunicorn and in principle it is possible to have multiple jobs running... so I eventually want more than 2 parallel jobs for my flask app...
thanks
carl
It's true. Sqlite isn't built for this kind of application. Sqlite is really for lightweight single-threaded, single-instance applications.
Sqlite connections are one per instance, and if you start getting into some kind of threaded multiplexer (see https://www.sqlite.org/threadsafe.html) it'd be possible, but it's more trouble than it's worth. And there are other solutions that provide that function-- take a look at Postgresql or MySQL. Those DB's are open source, are well documented, well supported, and support the kind of concurrency you need.
I'm not sure how SQLAlchemy handles connections, but if you were using Peewee ORM then the solution is quite simple.
When your Flask app initiates a request, you will open a connection to the DB. Then when Flask sends the response, you close the DB.
Similarly, in your cron script, open a connection when you start to use the DB, then close it when the process is finished.
Another thing you might consider is using SQLite in WAL mode. This can improve concurrency. You set the journaling mode with a PRAGMA query when you open your connection.
For more info, see http://charlesleifer.com/blog/sqlite-small-fast-reliable-choose-any-three-/
I am a self taught Python programmer and I have an idea for a project that I'd like to use to better my understanding of Socket programming and networking in general. I was hoping that someone would be able to tell me if I am on the right path, or point me in another direction.
The general idea is to be able to update a database via a website UI, then a Python Server would then be consistently checking that database for changes. If it notices changes it would then hand out a set of instructions to the first available connected client.
The objective is to
A - Create a server that reads from a database.
B - Create clients that connect to said server from a remote machine.
C - The server would then consistently read the database and and look for changes, for instance a column that is a Boolean that would signify Run/Don't run.
D - If say the run Boolean is true, the server than would hand the instructions off to the first available client.
E - The client itself would then handle updating the database of certain runtime occurrences
Questions/Concerns
A - My first concern is the resources it would take to be constantly reading the database and searching for changes. Is there a better way of doing this? Or Could I write a loop to do this and not worry much about it?
B - I have been reading the documentation/tutorials on Twisted and at this moment this looks like a viable option to be able to handle the connections of multiple clients (20-30 for arguments sake). From what I've read Threading looks to be more of a hassle than it's worth.
Am I on the right track? Any suggestions? Or reading material worth looking at?
Thank you in advance
The Django web framework implements something called Signals which at the ORM level lets you detect changes to specific database objects and attach handlers to those changes. Since it is an open source project, you might want to check the source code to understand how Django does it while supporting multiple database backends (Mysql, postgres, oracle, sqlite).
If you directly want to listen to database changes then most databases have some kind of a system that logs every transactional change. For example MySql has the binary log you can keep reading from to detect changes to the database.
Also while Twisted is great I would recommend you use Gevent/Greenlet + this. If you want to integrate with Django, how to combine django plus gevent the basics? would help
I am developing web app on flask, python, sqlalchemy and postgresql.
My question is here regarding concurrency handling in this app.
How I wrote the app :
I take the example of adding user in database. I post the form and a view is called. I process all the form data and then call add_user(*arg) which uses sqlalchemy code to insert user in database and returns on successful execution and I return the response from the view.
What I assumed:
Ok now I assumed that my web server (which I have not decided yet) will either spawn a thread or a process if two users are trying to signup at the same time and will handle all the concurreny requirements.
Do i need to write threaded code here? By threaded code I mean that before writing I acquire a lock and after write release it.
I am pretty new to web development and multithreading/multiprocessing programing and would like some guidance on how write web app which can handle concurrency well.
Writing concurrency handling from start is right or this thought should come when a large number of concurrent users are using the webapp. Even If it should be done later I would like some pointers about it.
Basically I have no idea about concurrency part of webapp development. If you can point to resources from where I can learn more about it would be really helpful.
Flask will execute each request in a separate thread or even in separate processes. The number of threads and processes to spawn is determined by the WSGI server (for example, Apache with mod_wsgi).
If you use SQLAlchemy ScopedSessions, the session is perfectly thread-safe. You must not share ORM-controlled objects across threads (but in the large majority of cases, you won't let your objects live longer than a request anyway so this is usually not a concern).
In other words, as long as you don't intend to share state between requests other than through the database or cookies, you don't need to worry about concurrency issues. You don't need to create a lock for writing to the database.
If you create your own long-lived objects within your application, which you most likely don't need to do, and if those objects communicate or share state with request handling code, then you must take appropriate precautions to avoid synchronization issues (race conditions, deadlocks, use of libraries that are not thread-safe, etc.)
I am searching for some way to scale one instance of tornado application to many. I have 5 servers and want to run at each 4 instances of application. The main issue I don't know how to resolve - is to make communication between instances in right way. I see next approaches to make it:
Use memcached for sharing data. I don't think this approach is good, because much traffic would go to server with memcached. Therefore in the future there can be trafic-related issues.
Open sockets between each instance. As for me it will be too hard to maintain such way of communication.
Use tools like ZeroMQ. I am not familiar with this technology. Is it can be the way to scale application between servers?
I'm actually looking at something similar and the thought I have come up with is this. Use the Python Multiprocessing module ( http://docs.python.org/library/multiprocessing.html ) to link the processes together in that way on the individual servers. Then use a memcached server for session specific data. (SessionIDs, IP information, information used to tie the session to a specific user and to the thread of activity they are using) The rest being data driven from a DB instance.
What you could do is for each server you run a memcached instance and a tornado instance. Make the memcached instances "Master replicate" with each other using repcached so each instance of tornado can access memcached data from its machine. Four servers for the tornado and memcached instances and the fifth to run haproxy to load balance the others.
www.haproxy.org/
repcached.lab.klab.org/