I am currently planning a django project which consists of two parts. The normal django application and an additional application which uses MQTT to read sensors. For better loading times for the HTTP responses, I planned to receive the MQTT publish messages in an external process or thread and write it into the database used by django. The sensor data in this database then is always used whenever a HTTP request is made.
Do you guys have any better architectural solutions to my problem?
Best regards
Django background tasks could be a simple way of doing what you need.
It uses your existing database to run tasks in the background and is very easy to set up. It also supports scheduling repeated tasks.
Related
The technology I would like to use in this example is Celery for queueing and python for component implementation.
Imagine a simple project hat exists of 2 components. One is a web app that connects to an API and gathers data. Component 2 is a processor that can then process the data. When the web app has gotten a piece of data from the API it is supposed to send a task into a task queue including the just crawled data which is then consumed by the processor to process the Data.
Whether or not this is a sensible way to go about a task like this is debatable and not the point of my question.
My question is, the tasks to process things are defined within the processor since they state what processing function shall be executed and the definition of that function is obviously within the processor. Now that the web app doesn't have access to the task definition how does he communicate the task to the processor?
Do you have to hold a copy of the source code of the processor within the web app?
Do you make the processor a dependency of the web app?
What is the best practice approach to handle such a scenario?
What you are describing is probably one of the most common use-cases for Celery. Just look how many people are asking Django/Flask + Celery questions here on StackOverflow... If you are a Django user, there is an entire section in the Celery documentation describing how to do exactly what you want. Things should be similar with other frameworks.
Do you have to hold a copy of the source code of the processor within the web app?
As far as I know you do not have to (I do not use any web framework) but it could be that you do need to because of some deeper integration with Celery. If your web application knows the Celery task name, and its parameters, it can schedule it to run without actually having access to the Python code. This is accomplished using send_task(task_name, ...).
Do you make the processor a dependency of the web app?
As I wrote above there are several ways to use it. If you want tighter integration then yes. If you just want to run task and get result using the send_task() than your web application should only depend on Celery.
What is the best practice approach to handle such a scenario?
Follow the Django guide. I advise you to run Celery independently, run some tasks, just so you learn about basic principles how it distributes the work, etc.
There's a lot of info out there and honestly it's a bit too much to digest and I'm a bit lost.
My web app has to do so some very resource intensive tasks. Standard setup right now app on server static / media on another for hosting. What I would like to do is setup celery so I can call task.delay for these resource intensive tasks.
I'd like to dedicate the resources of entire separate servers to these resource intensive tasks.
Here's the question: How do I setup celery in this way so that from my main server (where the app is hosted) the calls for .delay are sent from the apps to these servers?
Note: These functions will be kicking data back to the database / affecting models so data integrity is important here. So, how does the data (assuming the above is possible...) retrieved get sent back to the database from the seperate servers while preserving integrity?
Is this possible and if so wth do I begin - information overload?
If not what should I be doing / what am I doing wrong?
The whole point of Celery is to work in exactly this way, ie as a distributed task server. You can spin up workers on as many machines as you like, and the broker - ie rabbitmq - will distribute them as necessary.
I'm not sure what you're asking about data integrity, though. Data doesn't get "sent back" to the database; the workers connect directly to the database in exactly the same way as the rest of your Django code.
I have a simple Django project.
Each time a user hits the homepage,some operations are performed based on which,view is generated. Now the problem is that when a user hits the homepage ,sometimes the operations take a long time based on network connectivity. If in the meantime, a new user hits the homepage,he has to wait for the request from the previous user to get serviced before the page gets rendered.
I found Celery is used for task scheduling and queuing . But I wonder if Celery is what i need.I need each user to have his request be processed independently and not queued.
My project is a single app project and will receive a maximum of 100 users a time.
Thanks.
If the long process needs to be done in order to serve the request and generate the proper response then you cannot use Celery.
The debug web-server that is shipped with Django is a multi-threaded-single-process server, but is really very limited and should not be used in production.
If you use gunicorn or other wsgi servers you can run your application in multiple processes but you will hit the limit quickly if you're doing heavy processing.
The solution would be in my opinion is to either change the way you're processing stuff, either prepare ahead or serve the request and do the processing in the background, you can show the user a Please wait... message, here you can use Celery to do the processing.
The other solution would be to use event-based web-server like Twisted or cyclone or others
I am developing web app on flask, python, sqlalchemy and postgresql.
My question is here regarding concurrency handling in this app.
How I wrote the app :
I take the example of adding user in database. I post the form and a view is called. I process all the form data and then call add_user(*arg) which uses sqlalchemy code to insert user in database and returns on successful execution and I return the response from the view.
What I assumed:
Ok now I assumed that my web server (which I have not decided yet) will either spawn a thread or a process if two users are trying to signup at the same time and will handle all the concurreny requirements.
Do i need to write threaded code here? By threaded code I mean that before writing I acquire a lock and after write release it.
I am pretty new to web development and multithreading/multiprocessing programing and would like some guidance on how write web app which can handle concurrency well.
Writing concurrency handling from start is right or this thought should come when a large number of concurrent users are using the webapp. Even If it should be done later I would like some pointers about it.
Basically I have no idea about concurrency part of webapp development. If you can point to resources from where I can learn more about it would be really helpful.
Flask will execute each request in a separate thread or even in separate processes. The number of threads and processes to spawn is determined by the WSGI server (for example, Apache with mod_wsgi).
If you use SQLAlchemy ScopedSessions, the session is perfectly thread-safe. You must not share ORM-controlled objects across threads (but in the large majority of cases, you won't let your objects live longer than a request anyway so this is usually not a concern).
In other words, as long as you don't intend to share state between requests other than through the database or cookies, you don't need to worry about concurrency issues. You don't need to create a lock for writing to the database.
If you create your own long-lived objects within your application, which you most likely don't need to do, and if those objects communicate or share state with request handling code, then you must take appropriate precautions to avoid synchronization issues (race conditions, deadlocks, use of libraries that are not thread-safe, etc.)
I've got my Django project running well, and a separate background process which will collect data from various sources and store that data in an index.
I've got a model in a Django app called Sources which contains, essentially, a list of sources that data can come from! I've successfully managed to create a signal that is activated/called when a new entry is put in the Sources model.
My question is, is there a simple way that anybody knows of whereby I can send some form of signal/message to my background process indicating that the Sources model has been changed? Or should I just resort to polling for changes every x seconds, because it's so much simpler?
Many thanks for any help received.
It's unclear how are you running the background process you're talking about.
Anyway, I'd suggest that in your background task you use the Sources model directly. There are convenient ways to run the task without leaving realm of Django (so as to have an access to your models. You can use Celery [1], for example, or RQ [2].
Then you won't need to pass any messages, any changes to Sources model will take effect the next time your task is run.
[1] Celery is an open source asynchronous task queue/job queue, it isn't hard to set up and integrates with Django well.
Celery: general introduction
Django with celery introduction
[2] RQ means "Redis Queue", it is ‘a simple Python library for queueing jobs and processing them in the background with workers’.
Introductory post
GitHub repository
Polling is probably the easiest if you don't need split-second latency.
If you do, however, then you'll probably want to look into either, say,
sending an UNIX signal (or other methods of IPC, depending on platform) to the process
having the background process have a simple listening socket that you just send, say, a byte to (which is, admittedly, a form of IPC), and that triggers the action you want to trigger
or some sort of task/message queue. Celery or ZeroMQ come to mind.