My current project contains quite a few custom commands inside an app which act as listeners from a BUS, and each of the task are blocking means they will have to run in their own processes.
[bus]
consume_pay_transaction_completed
consume_pay_transaction_declined
consume_pay_transaction_failed
This makes development/testing difficult because I will have to run each command individually to test the workflow.
I am wondering how easy to write a master command and make the other ones as slaves, monitor their health and respawn them if necessary. Are there any existing utilities/libraries in Django or Python to assist me to write a command 'start_all'
[bus]
consume_pay_transaction_completed
consume_pay_transaction_declined
consume_pay_transaction_failed
start_all
The start_all-command could be done with call_command.
Monitoring health and respawning them if necessary sounds like a job for something like celery.
Related
I have a django website, where I can register some event listeners and monitoring tasks on certain websites, see an info about these tasks, edit, delete, etc. These tasks are long running, so I launch them as tasks in a asyncio event loop. I want them to be independent on the django website, so I run these tasks in event loop alongside Sanic webserver, and control it with api calls from the django server. I dont know why, but I still feel that this solution is pretty scuffed, so is there a better way to do it? I was thinking about using kubernetes, but these tasks arent resource heavy and are simple, so I dont think it's worth launching new pod for each.
Thanks for help.
Ideally, it is always a good idea to launch a new pod for a new event or job.
You can use cronjob in Kubernetes so they auto-deleted when work is done.
It's always better keep to separate and small microservices rather than running the whole monolith application inside the container.
On the management side using starting the new pod will be easy to manage also, also cost-efficient if you scale up & down your cluster as per resource requirement.
You can also use the message broker and listener which will listen to the channel in the message broker and perform the async task or event if any. Listen consider as separate pod.
I have a DJango app and where users upload many images. What I want is to trigger a management command using the call_command function from management once a new object is created but the problem is I want to do it asynchronously. I don't want to keep the user waiting for the management command to finish. Yes I am aware about third party services like Celery but I want to keep it simple. I also know that I can schedule a cron job but the thing is I want the change to reflect instantly. Is there any other way to do so?
I've got a django project with simple form to take users details in. I want to use python bot running in the background and constantly checking django database for any changes. Is it Celery the right tool for this job? Any other solution? Thank you
I don't think Celery is really what you want here - Celery is primarily for moving tasks that don't need to be dealt with in the same process to a separate worker, such as sending registration emails.
For this situation I'd be inclined to use Django's signals to trigger the required functionality whenever the appropriate changes are made to the database. For instance, if it needed to be triggered when a particular type of object was created, such as a new user, then you might use the post_save signal of the user model.
The bot would be in a separate process, but it's not too hard to communicate between processes using Redis. Just have the signal publish a message to Redis, and have the bot listen for that message and carry out the required action on that event.
I don't have the details of your needs but, there are a few ways to achieve such things:
The Constantly checking approach:
A crontab which launch your python script every minute.
Like you said, you could use Celery beat, to achieve what a crontab would do, in your python environment
"On change" approach:
Probably the best, if you have control of the Django project, you could have your script run on the form validation/save! For this, You can add a celery task, run the python script, use Django signals...
I'm wondering what kind of options there are for monitoring celery tasks from a browser, after they have been deployed to a worker?
My current application stack is a flask app running inside twisted, using celery to run dozens to thousands of small background tasks (updating metadata in a repository, creating image derivatives, etc.) I'm envisioning using ajax long-polling to monitor the status of the celery tasks initiated by the user. I'm using redis for the backend broker and results.
I see celery has some command line ways to monitor tasks, or flower for a web dashboard. But if I wanted to see more detailed status from a particular task sent to celery, would it make more sense for that task to print / write to a log file, then long-poll that file for changes from the flask front-end?
At this point a user can say, "update these 10,000 items", the tasks are sent to celery, and the front-end very quickly says, "job sent!". And the tasks do complete. But I'd like to have the user navigate to "/status" and see the status of those 10,000 small jobs - even a scrolling log file would probably work.
Any suggestions would be greatly appreciated. Took a lot of head scratching to make it this far sketching things out, but I'm spinning my wheels figuring out exactly WHAT to long-poll from the user front-end.
Try Jobstatic, which is extending Celery.
From project description:
Jobtastic gives you goodies like:
Easy progress estimation/reporting
Job status feedback
Helper methods for gracefully handling a dead task broker (delay_or_eager and delay_or_fail)
Super-easy result caching
Thundering herd avoidance
Integration with a celery jQuery plugin for easy client-side progress display
Memory leak detection in a task run
Jobtastic was a great idea, but not quite what worked for us. In the end, decided to create an incrementing job number (stored in Redis alongside results and broker), push all celery task id's associated with that job number into a python object, then pickle and store that in redis. We can then use that later to see if the entire "job" is complete, or the status thereof. For our purposes, works just lovely.
I want to write a long running process (linux daemon) that serves two purposes:
responds to REST web requests
executes jobs which can be scheduled
I originally had it working as a simple program that would run through runs and do the updates which I then cron’d, but now I have the added REST requirement, and would also like to change the frequency of some jobs, but not others (let’s say all jobs have different frequencies).
I have 0 experience writing long running processes, especially ones that do things on their own, rather than responding to requests.
My basic plan is to run the REST part in a separate thread/process, and figured I’d run the jobs part separately.
I’m wondering if there exists any patterns, specifically python, (I’ve looked and haven’t really found any examples of what I want to do) or if anyone has any suggestions on where to begin with transitioning my project to meet these new requirements.
I’ve seen a few projects that touch on scheduling, but I’m really looking for real world user experience / suggestions here. What works / doesn’t work for you?
If the REST server and the scheduled jobs have nothing in common, do two separate implementations, the REST server and the jobs stuff, and run them as separate processes.
As mentioned previously, look into existing schedulers for the jobs stuff. I don't know if Twisted would be an alternative, but you might want to check this platform.
If, OTOH, the REST interface invokes the same functionality as the scheduled jobs do, you should try to look at them as two interfaces to the same functionality, e.g. like this:
Write the actual jobs as programs the REST server can fork and run.
Have a separate scheduler that handles the timing of the jobs.
If a job is due to run, let the scheduler issue a corresponding REST request to the local server.
This way the scheduler only handles job descriptions, but has no own knowledge how they are implemented.
It's a common trait for long-running, high-availability processes to have an additional "supervisor" process that just checks the necessary demons are up and running, and restarts them as necessary.
One option is to simply choose a lightweight WSGI server from this list:
http://wsgi.org/wsgi/Servers
and let it do the work of a long-running process that serves requests. (I would recommend Spawning.) Your code can concentrate on the REST API and handling requests through the well defined WSGI interface, and scheduling jobs.
There are at least a couple of scheduling libraries you could use, but I don't know much about them:
http://sourceforge.net/projects/pycron/
http://code.google.com/p/scheduler-py/
Here's what we did.
Wrote a simple, pure-wsgi web application to respond to REST requests.
Start jobs
Report status of jobs
Extended the built-in wsgiref server to use the select module to check for incoming requests.
Activity on the socket is ordinary REST request, we let the wsgiref handle this.
It will -- eventually -- call our WSGI applications to respond to status and
submit requests.
Timeout means that we have to do two things:
Check all children that are running to see if they're done. Update their status, etc.
Check a crontab-like schedule to see if there's any scheduled work to do. This is a SQLite database that this server maintains.
I usually use cron for scheduling. As for REST you can use one of the many, many web frameworks out there. But just running SimpleHTTPServer should be enough.
You can schedule the REST service startup with cron #reboot
#reboot (cd /path/to/my/app && nohup python myserver.py&)
The usual design pattern for a scheduler would be:
Maintain a list of scheduled jobs, sorted by next-run-time (as Date-Time value);
When woken up, compare the first job in the list with the current time. If it's due or overdue, remove it from the list and run it. Continue working your way through the list this way until the first job is not due yet, then go to sleep for (next_job_due_date - current_time);
When a job finishes running, re-schedule it if appropriate;
After adding a job to the schedule, wake up the scheduler process.
Tweak as appropriate for your situation (eg. sometimes you might want to re-schedule jobs to run again at the point that they start running rather than finish).