I am hoping to gain a basic understanding of scheduled task processes and why things like Celery are recommended for Flask.
My situation is a web-based tool which generates spreadsheets based on user input. I save those spreadsheets to a temp directory, and when the user clicks the "download" button, I use Flask's "send_from_directory" function to serve the file as an attachment. I need a background service to run every 15 minutes or so to clear the temp directory of all files older than 15 minutes.
My initial plan was a basic python script running in a while(True) loop, but I did some research to find what people normally do, and everything recommends Celery or other task managers. I looked into Celery and found that I also need to learn about redis, and I need to apparently host redis in a unix environment. This is a lot of trouble for a script that just deletes files every 15 minutes.
I'm developing my Flask app locally in Windows with the built-in development server and deploying to a virtual machine on company intranet with IIS. I'm learning as I go, so please explain why this much machinery is needed to regularly call a script that simply deletes things. It seems like a vast overcomplication, but as I said, I'm trying to learn as I go so I want to do/learn it correctly.
Thanks!
You wouldn't use Celery or redis for this. A cron job would be perfectly appropriate.
Celery is for jobs that need to be run asynchronously but in response to events in the main server processes. For example, if a sign up form requires sending an email notification, that would be scheduled and run via Celery so as not to block the main web response.
Related
I'm hosting a Flask application in Heroku(free) which acts as an API and reads from an SQLite database file.
The way the project ran on my computer, I had scheduled Python scripts which would run every night and append new data to my SQLite database, which would then in turn be able to be used by my Flask Application.
However, hosted on Heroku, I don't think I will be able to run my Flask application and a python script 24/7. I know there is an alternate solution APScheduler on Flask, which would carry out tasks in Python functions in the Flask application. However, according to Heroku's free use guidelines, if there is no traffic to my page in 30 minutes, the application will "sleep." I'm assuming that means scheduled tasks will no longer work once the application is asleep, which defeats the purpose of using APScheduler.
Are there any alternatives I could use to go about this?
I have prototyped a system using python on linux. I am now designing the architecture to move to a web based system. I will use Django to serve public and private admin pages. I also need a service running, which will periodically run scripts, connect to the internet and allow API messaging with an admin user. Thus there will be 3 components : web server, api_service and database.
1) What is best mechanism for deploying a python api_service on the VM? My background is mainly C++/C# and I would have usually deployed a C#-written service on the same VM as the web server and used some sort of TCP messaging wrapper for the API. My admin API code will be ad hoc python scripts run from my machine to execute functionality in this service.
2) All my database code is written to an interface that presently uses flat-files. Any database suggestion? PostgreSQL, MongoDB, ...
Many thanks in advance for helpful suggestions. I am an ex-windows/C++/C# developer who now absolutely loves Python/Cython and needs a little help please ...
Right, am answering my own question. Have done a fair bit of research since posting.
2) PostgreSQL seems a good choice. There seem to be no damning warnings against using it and there is much searchable help. I am therefore implementing concrete PostgreSQL classes to implement my serialization interfaces.
1) Rather than implement my own service in python that sits on a remote machine, I am going to use Celery. RabbitMQ will act as the distributed TCP message wrapper. I can put required functionality in python scripts on the VM that Celery can find and execute as tasks. I can run these Celery tasks in 3 ways. i) A web request through Django can queue a task. ii) I can manually queue a remote Celery task from my machine by running a python script. iii) I can use Celery Beat to schedule tasks periodically. This fits my needs perfectly as I have a handful of daily/periodic tasks that can be scheduled plus a few rare maintenance tasks that I can fire off from my machine.
To summarize then, where before I would have created a windows service that handled both incoming TCP commands and scheduled behaviour, I can use RabbitMQ, Celery, Celery Beat and python scripts that sit on the VM.
Hope this helps anybody with a similar 'how to get started' problem .....
I'm looking for some advice to run intensive jobs on demand in somewhere like AWS or Digital Ocean.
Here's my scenario:
I have a template/configuration of a VM with dependencies (imagemagick, ruby, python etc)
I have a codebase that runs a job, eg: querying a db and running reports, then emailing those reports to my user base
I want to be able to run and trigger this job externally (i.e maybe via some webapp somewhere else, or via a command line somewhere - maybe some cron on another cloud instance somewhere)
When I run this job, it needs to spin up a copy of this template on AWS or DO, run the job, which could run for any length of time, until all reports are generated and sent out
Once the job has finished, shutdown the instance so I'm not paying for something to always be running in the background
I'd like to not have to commit to one service (i.e AWS) but rather have a template that can be dropped in anywhere to test out the differences in cloud providers
Initially, I was thinking rubber but this seems more something you'd use for CI, rather than being able to spin up an instance, run a long running job, then shut down the instance once finished.
Does anything already exist for this, or would I need to build something myself hooking into the relevant APIs?
I have a Django app that is intended to be run on Virtualbox VMs on LANs. The basic user will be a savvy IT end-user, not a sysadmin.
Part of that app's job is to connect to external databases on the LAN, run some python batches against those databases and save the results in its local db. The user can then explore the systems using Django pages.
Run time for the batches isn't all that long, but runs to minutes, tens of minutes potentially, not seconds. Run frequency is infrequent at best, I think you could spend days without needing a refresh.
This is not celery's normal use case of long tasks which will eventually push the results back into the web UI via ajax and/or polling. It is more similar to a dev's occasional use of the django-admin commands, but this time intended for an end user.
The user should be able to initiate a run of one or several of those batches when they want in order to refresh the calculations of a given external database (the target db is a parameter to the batch).
Until the batches are done for a given db, the app really isn't useable. You can access its pages, but many functions won't be available.
It is very important, from a support point of view that the batches remain easily runnable at all times. Dropping down to the VMs SSH would probably require frequent handholding which wouldn't be good - it is best that you could launch them from the Django webpages.
What I currently have:
Each batch is in its own script.
I can run it on the command line (via if __name__ == "main":).
The batches are also hooked up as celery tasks and work fine that way.
Given the way I have written them, it would be relatively easy for me to allow running them from subprocess calls in Python. I haven't really looked into it, but I suppose I could make them into django-admin commands as well.
The batches already have their own rudimentary status checks. For example, they can look at the calculated data and tell whether they have been run and display that in Django pages without needing to look at celery task status backends.
The batches themselves are relatively robust and I can make them more so. This is about their launch mechanism.
What's not so great.
In Mac dev environment I find the celery/celerycam/rabbitmq stack to be somewhat unstable. It seems as if sometime rabbitmqs daemon balloons up in CPU/RAM use and then needs to be terminated. That mightily confuses the celery processes and I find I have to kill -9 various tasks and relaunch them manually. Sometimes celery still works but celerycam doesn't so no task updates. Some of these issues may be OSX specific or may be due to the DEBUG flag being switched for now, which celery warns about.
So then I need to run the batches on the command line, which is what I was trying to avoid, until the whole celery stack has been reset.
This might be acceptable on a normal website, with an admin watching over it. But I can't have that happen on a remote VM to which only the user has access.
Given that these are somewhat fire-and-forget batches, I am wondering if celery isn't overkill at this point.
Some options I have thought about:
writing a cleanup shell/Python script to restart rabbitmq/celery/celerycam and generally make it more robust. i.e. whatever is required to make celery & all more stable. I've already used psutil to figure out rabbit/celery process are running and display their status in Django.
Running the batches via subprocess instead and avoiding celery. What about django-admin commands here? Does that make a difference? Still needs to be run from the web pages.
an alternative task/process manager to celery with less capability but also less moving parts?
not using subprocess but relying on Python multiprocessing module? To be honest, I have no idea how that compares to launches via subprocess.
environment:
nginx, wsgi, ubuntu on virtualbox, chef to build VMs.
I'm not sure how your celery configuration makes it unstable but sounds like it's still the best fit for your problem. I'm using redis as the queue system and it works better than rabbitmq from my own experience. Maybe you can try it see if it improves things.
Otherwise, just use cron as a driver to run periodic tasks. You can just let it run your script periodically and update the database, your UI component will poll the database with no conflict.
I need to run a python script (which is listening to Twitter) which will call various methods on my django app when it gets tweets that match a particular hashtag.
At the moment, I just run the script by hand on the command line but I'd like it to run inside django if possible so that I can control it from there and so it doesn't have to perform HTTP POSTs when it gets new data.
I've looked at celery (briefly) but this seems to be for performing certain small tasks at regular intervals to me.
Is there a way to use celery (or anything else) to be able to control this long-running "listen to Twitter" script that I've got?
You should Supervisord to run your django application and your script. Making the script a part of the Django project, will let you use Django signals which you can use to write a custom signal that will be emitted every time your twitter logic is done doing what it is supposed to. Signals are blocking. If you want them to be asynchronous use Celery with Django
An alternative would be to run your django application and the twitter script via supervisord and then expose a REST API which does a HTTP POST to the Django application. You can use TastyPie for that.