I'm launching jobs on a server.
The server can process only one job at a time
So, I use the trick of using several user accounts on the server : userA , userB , userC , userD
for the moment I launch job with a function
run_job_on_server(some_args , user_name)
my question is quite simple : how can , using multiprocess (or another module), launch many jobs using the different users available, and when a job finished, make the user re-available and immediately after launch a new job with this user
Thanks for your help !
I think your question goes into the library selection (multiprocessing) too quickly. The first thing to do is to establish the design pattern. As a start, I think you could look at the dispatcher or mailbox pattern, and the active object pattern.
As for libraries, you're not stuck with the python standard lib. pip has many nice options too. I personally love ZeroMQ for distributed systems, but that's step two. Maybe the standard lib like Queue and multiprocessing will do.
Related
I've got django project with celery 2.5.5 and rabbitmq backend on debian 6. I've got over 6000 tasks of different types in one queue. There were some bug in code and I need to list all tasks in that queue and pull out some of them. All I need is findout all task ids in rabbitmq queue. I cant findout way how to connect rabbitmq queue and list it's content, best without starting up management plugin.
Great would be something pythonish like:
import somelib
conn = somelib.server(credentials, vhost)
queue = conn.get_queue(queue_name)
messages = queue.get_messages()
But any other tool to list such queue helps. Found out some tool installed using npm, but debian 6 does not know npm and building it from source is not quite pleasant way.
Or something to backup rabbitmq queues in human readable form is also appreciated.
Thanks for ideas
Pavel
You can use celery flower library to do that.
It will provide you with multiple features like displaying task progress and history, showing the task details and graphs and statistics in a pretty dashboard-style interface.
Below are some screenshots for reference.
Task Dashboard:
Worker tasks:
Task info:
If you are up for a premade interface you will like flower. It shows you all tasks in a nice web view.
If you are however trying to process each task programmatically flower isn't the right thing, since it doesn't support it. Then you would have to use a rabbitmq/AMQP library for python which has been discussed about e.g. here: Good Python library for AMQP
With this it should definitely be possible to do your imagined code in some or another way, but you'll have to read into that, since I've been fine with celery and flower for now.
I have been looking at daemons for Linux such as httpd and have also looked at some code that can be used as a skeleton. I have done a fair amount of research and now I want to practice writing it. However, I'm not sure of what can I use a daemon for. Any good examples/ideas that I can try to execute?
I was thinking of using a daemon along with libnotify on Ubuntu to have pop-up notifications of select tweets.
Is this a bad example for implementing a daemon?
Will you even need a daemon for this?
Can this be implemented as a service rather than a daemon?
First: PEP 3143 tries to enumerate all of the fiddly details you have to get right to write a daemon in Python. And it specifies a library that takes care of those details for you.
The PEP was deferred—at least in part because the community felt it was more a responsibility of POSIX or some Linux standards group or something to first define exactly what is essential to being a daemon, before Python could have its own position on how to implement one. But it's still a great guide. However, the reference implementation of that proposed library still lives on, as python-daemon, which you can install from PyPI.
Meanwhile, the really interesting question for this project isn't so much service vs. daemon, as root vs. user. Do you want a single process that keeps track of all users' twitter accounts, and sends notifications to anyone who's logged in? Just a per-user process? Or maybe both, a single process watching all the tweets, then sending notifications via user processes?
Of course you don't really need a daemon or service for this. For example, it could be a GUI app whose main window is a configuration dialog, which keeps running (maybe with a traybar thingy) even when you close the config dialog, and it would work just as well. The question isn't whether you need a daemon, but whether it's more appropriate. Which really is a design choice.
I have a Django website, and one page has a button (or link) that when clicked will launch a somewhat long running task. Obviously I want to launch this task as a background task and immediately return a result to the user. I want to implement this using a simple approach that will not require me to install and learn a whole new messaging architecture like Celery for example. I do not want to use Celery! I just want to use a simple approach that I can set up and get running over the next half hour or so. Isn't there a simple way to do this in Django without having to add (yet another) 3rd party package?
Just use a thread.
import threading
t = threading.Thread(target=long_process,
args=args,
kwargs=kwargs)
t.setDaemon(True)
t.start()
return HttpResponse()
See this question for more details:
Can Django do multi-thread works?
Have a look at django-background-tasks - it does exactly what you need and doesn't need any additional services to be running like RabbitMQ or Redis. It manages a task queue in the database and has a Django management command which you can run once or as a cron job.
If you're willing to install a 3rd party library, but you want something a whole lot simpler than Celery, check out Redis Queue. It does require Redis, which is pretty easy in itself, but that can provide a lot of other benefits as well.
RQ itself has almost zero configuration. It's startlingly simple.
References:
http://python-rq.org/
http://nvie.com/posts/introducing-rq/
https://devcenter.heroku.com/articles/python-rq (RQ on Heroku)
we are trying to solve a problem related to cluster job scheduler.
The problem is the following we have a set of python scripts which are executed in a cluster, the launching process is currently done by means of the human interaction, I mean to start the test we have a bash script which interact with the cluster to request the resources needed for the execution. What we are intending to do is to build an automatic launching process (which should be sound in the sense that it realizes the job status and based on that wait the job ending, restart the execution, etc...). Basically we have to implement a layer between the user workstation and the cluster.
Another additional difficulty is that our layer must be clever enough to interact with the different cluster job schedulers. We wonder if there exists a tool or framework which help us to interact with the cluster without having to deal with each cluster scheduler details. We have searched in the web but we did not find anything suitable for our needs.
By the way the programming language we use is Python.
Thanks in advance!
Br.-
Use supervisor: http://supervisord.org/
and celery http://www.celeryproject.org/
together
Take a look at the ipcluster_tools. The documentation is sparse but it is easy to use.
I want to write a long running process (linux daemon) that serves two purposes:
responds to REST web requests
executes jobs which can be scheduled
I originally had it working as a simple program that would run through runs and do the updates which I then cron’d, but now I have the added REST requirement, and would also like to change the frequency of some jobs, but not others (let’s say all jobs have different frequencies).
I have 0 experience writing long running processes, especially ones that do things on their own, rather than responding to requests.
My basic plan is to run the REST part in a separate thread/process, and figured I’d run the jobs part separately.
I’m wondering if there exists any patterns, specifically python, (I’ve looked and haven’t really found any examples of what I want to do) or if anyone has any suggestions on where to begin with transitioning my project to meet these new requirements.
I’ve seen a few projects that touch on scheduling, but I’m really looking for real world user experience / suggestions here. What works / doesn’t work for you?
If the REST server and the scheduled jobs have nothing in common, do two separate implementations, the REST server and the jobs stuff, and run them as separate processes.
As mentioned previously, look into existing schedulers for the jobs stuff. I don't know if Twisted would be an alternative, but you might want to check this platform.
If, OTOH, the REST interface invokes the same functionality as the scheduled jobs do, you should try to look at them as two interfaces to the same functionality, e.g. like this:
Write the actual jobs as programs the REST server can fork and run.
Have a separate scheduler that handles the timing of the jobs.
If a job is due to run, let the scheduler issue a corresponding REST request to the local server.
This way the scheduler only handles job descriptions, but has no own knowledge how they are implemented.
It's a common trait for long-running, high-availability processes to have an additional "supervisor" process that just checks the necessary demons are up and running, and restarts them as necessary.
One option is to simply choose a lightweight WSGI server from this list:
http://wsgi.org/wsgi/Servers
and let it do the work of a long-running process that serves requests. (I would recommend Spawning.) Your code can concentrate on the REST API and handling requests through the well defined WSGI interface, and scheduling jobs.
There are at least a couple of scheduling libraries you could use, but I don't know much about them:
http://sourceforge.net/projects/pycron/
http://code.google.com/p/scheduler-py/
Here's what we did.
Wrote a simple, pure-wsgi web application to respond to REST requests.
Start jobs
Report status of jobs
Extended the built-in wsgiref server to use the select module to check for incoming requests.
Activity on the socket is ordinary REST request, we let the wsgiref handle this.
It will -- eventually -- call our WSGI applications to respond to status and
submit requests.
Timeout means that we have to do two things:
Check all children that are running to see if they're done. Update their status, etc.
Check a crontab-like schedule to see if there's any scheduled work to do. This is a SQLite database that this server maintains.
I usually use cron for scheduling. As for REST you can use one of the many, many web frameworks out there. But just running SimpleHTTPServer should be enough.
You can schedule the REST service startup with cron #reboot
#reboot (cd /path/to/my/app && nohup python myserver.py&)
The usual design pattern for a scheduler would be:
Maintain a list of scheduled jobs, sorted by next-run-time (as Date-Time value);
When woken up, compare the first job in the list with the current time. If it's due or overdue, remove it from the list and run it. Continue working your way through the list this way until the first job is not due yet, then go to sleep for (next_job_due_date - current_time);
When a job finishes running, re-schedule it if appropriate;
After adding a job to the schedule, wake up the scheduler process.
Tweak as appropriate for your situation (eg. sometimes you might want to re-schedule jobs to run again at the point that they start running rather than finish).