Is it possible to empty a job queue on a Gearman server - python

Is it possible to empty a job queue on a Gearman server? I am using the python driver for Gearman, and the documentation does not have any information about emptying queues. I would imagine that this functionality should exist, possibly, with a direct connection to the Gearman server.

I came across this method:
/usr/bin/gearman -t 1000 -n -w -f function_name > /dev/null
which basically dumps all the jobs into /dev/null.

The telnetable administrative protocol (search for "Administrative Protocol") doesn't have a command to empty a queue either, there is only a shutdown command.
If you wish to avoid downtime, you could write a generic "job consumer" worker and use that to empty the queues. I've set one up as a script which takes a list of job names, and just sits there accepting jobs and consuming them.
Something like:
# generic_consumer.py job1 job2 job3
You can use the administrative protocol's status command to get a list of the function names and counts on the queue. The administrative protocol docs tell you the format of the response.
# (echo status ; sleep 0.1) | netcat 127.0.0.1 4730

As far as i have been able to tell from the docs and using gearman with PHP, the only way to clear the job queue is to restart to the gearmand job server. If you are using persistent job queues, you will also need to empty whatever you are using as the persistent storage, if this is DB storage, you will need to empty the appropriate tables of all the rows.
stop gearmand --> empty table rows --> start gearmand
Hope this is clear enough.

Related

During a long-running process, will Flask be insensitive to new requests?

My Flask project takes in orders as POST requests from multiple online stores, saves those orders to a database, and forwards the purchase information to a service which delivers the product. Sometimes, the product is not set up in the final service and the request sits in my service's database in an "unresolved" state.
When the product is set up in the final service, I want to kick off a long-running (maybe a minute) process to send all "unresolved" orders to the final service. During this process, will Flask still be able to receive orders from the stores and continue processing as normal? If not, do I need to offload this to a task runner like rq?
I'm not worried about speed as much as I am about consistency. The items being purchased are tickets to a live event so as long as the order information is passed along before the event begins, it should make no difference to the customer.
There's a few different answers that are all valid in different situations. The quick answer is that a job queue like RQ is usually the right solution, especially in the long run as your project grows.
As long as the WSGI server has workers available, another request can be handled. Each worker handles one request at a time. The development server uses threads, so an unlimited number of workers are available (with the performance constraints of threads in Python). Production servers like Gunicorn can use multiple workers, and different types of workers such as threads, processes, or eventlets. If you want to run a task in response to an HTTP request and wait until the task is finished to send a response, you'll need enough workers to block on those tasks along with handling regular requests.
#app.route("/admin/send-purchases")
def send_purchases():
... # do stuff, wait for it to finish
return "success"
However, the task you're describing seems like a cleanup task that should be run regardless of HTTP requests from a user. In that case, you should write a Flask CLI command and call it using cron or another scheduling system.
#app.cli.command()
def send_purchases():
...
click.echo("done")
# crontab hourly job
0 * * * * env FLASK_APP=myapp /path/to/venv/bin/flask send-purchases
If you do want a user to initiate the task, but don't want to block a worker waiting for it to finish, then you want a task queue such as RQ or Celery. You could make a CLI command that submits the job too, to be able to trigger it on request and on a schedule.
#rq.job
def send_purchases():
...
#app.route("/admin/send-purchases", endpoint="send_purchases")
def send_purchases_view():
send_purchases.queue()
return "started"
#app.cli.command("send-purchases")
def send_purchases_command():
send_purchases.queue()
click.echo("started")
Flask's development server will spawn a new thread for each request. Similary, production servers can be started with multiple workers.
You can run your app with gunicorn or similar with multiple processes. For example with four process workers:
gunicorn -w 4 app:app
For example with eventlet workers:
gunicorn -k eventlet app:app
See the docs on deploying in production as well: https://flask.palletsprojects.com/en/1.1.x/deploying/

How to create multiple workers in Python-RQ?

We have recently forced to replace celery with RQ as it is simpler and celery was giving us too many problems. Now, we are not able to find a way to create multiple queues dynamically because we need to get multiple jobs done concurrently. So basically every request to one of our routes should start a job and it doesn't make sense to have multiple users wait for one user's job to be done before we can proceed with next jobs. We periodically send a request to the server in order to get the status of the job and some meta data. This way we can update the user with a progress bar (It could be a lengthy process so this has to be done for the sake of UX)
We are using Django and Python's rq library. We are not using django-rq (Please let me know if there are advantages in using this)
So far we start a task in one of our controllers like:
redis_conn = Redis()
q = Queue(connection=redis_conn)
job = django_rq.enqueue(render_task, new_render.pk, domain=domain, data=csv_data, timeout=1200)
Then in our render_task method we add meta data to the job based on the state of the long task:
current_job = get_current_job()
current_job.meta['state'] = 'PROGRESS'
current_job.meta['process_percent'] = process_percent
current_job.meta['message'] = 'YOUTUBE'
current_job.save()
Now we have another endpoint that gets the current task and its meta data and passes it back to client (This happens through oeriodic AJAX request)
How do we go about running jobs concurrently without blocking other jobs? Should we make queues dynamically? Is there a way to make use of Workers in order to achieve this?
As far as I know RQ does not have any facility to manage multiple workers. You have to start a new worker process defining which queue it will consume. One way of doing this which works pretty well for me is using Supervisor. In supervisor you configure your worker for a given queue and number of processes to have concurrency. For example you can have queue "high-priority" with 5 workers and queue "low-priority" with 1 worker.
It is not only possible but ideal to run multiple workers. I use a bash file for the start command to enter the virtual env, and launch with a custom Worker class.
Here's a supervisor config that has worked very well for me for RQ workers, under a production workload as well. Note that startretries is high since this runs on AWS and needs retries during deployments.
[program:rq-workers]
process_name=%(program_name)s_%(process_num)02d
command=/usr/local/bin/start_rq_worker.sh
autostart=true
autorestart=true
user=root
numprocs=5
startretries=50
stopsignal=INT
killasgroup=true
stopasgroup=true
stdout_logfile=/opt/elasticbeanstalk/tasks/taillogs.d/super_logs.conf
redirect_stderr=true
Contents of start_rq_worker.sh
#!/bin/bash
date > /tmp/date
source /opt/python/run/venv/bin/activate
source /opt/python/current/env
/opt/python/run/venv/bin/python /opt/python/current/app/manage.py
rqworker --worker-class rq.SimpleWorker default
I would like to suggest a very simple solution using django-rq:
Sample settings.py
...
RQ_QUEUES = {
'default': {
'HOST': os.getenv('REDIS_HOST', 'localhost'),
'PORT': 6379,
'DB': 0,
'DEFAULT_TIMEOUT': 360,
},
'low': {
'HOST': os.getenv('REDIS_HOST', 'localhost'),
'PORT': 6379,
'DB': 0,
'DEFAULT_TIMEOUT': 360,
}
}
...
Run Configuration
Run python manage.py rqworker default low as many times (each time in its own shell, or as its own Docker container, for instance) as the number of desired workers. The order of queues in the command determines their priority. At this point, all workers are listening to both queues.
In the Code
When calling a job to run, pass in the desired queue:
For high/normal priority jobs, you can make the call without any parameters, and the job will enter the default queue. For low priority, you must specify, either at the job level:
#job('low')
def my_low_priority_job():
# some code
And then call my_low_priority_job.delay().
Alternatively, determine priority when calling:
queue = django_rq.get_queue('low')
queue.enqueue(my_variable_priority_job)

celery: remove empty queues that are more than 5 minutes old?

I am trying to clean up all the stale queues that linger. I want to remove queues that have been empty for over 5 minutes.
another way I was thinking of is using pyrabbit to access the queue directly but not sure how I can find out if a queue is older than 5 minutes.
You can do this from command line using
sudo rabbitmqctl set_policy expiry ".*" '{"expires":300000}' --apply-to queues
This deletes all unused queues after 300 seconds. Unused means the queue has no consumers, the queue has not been redeclared, and has not been invoked for a duration of at least the expiration period.
Note this expiry time can also be set when declaring a queue. More at rabbitmq docs.

Gearman worker client_id python

What is this command GearmanWorker.set_client_id(client_id) ?
http://packages.python.org/gearman/worker.html#gearman.worker.GearmanWorker.set_client_id
It means that the worker only serves clients with the specified id ?
If yes how can I find a client's id.
From the docs of Gearman protocol:
SET_CLIENT_ID
This sets the worker ID in a job server so monitoring and reporting
commands can uniquely identify the various workers, and different
connections to job servers from the same worker.
So it does not have anything to do with worker-client relationship. That is only handled by the function handle that the client is passing and the worker is registering for. This ID is probably seen in administrative commands' output and can help you in debugging / monitoring your application. As a matter of fact, some interfaces (e.g. PHP) do not support this setting and are still fully usable.

How to purge all tasks of a specific queue with celery in python?

How to purge all scheduled and running tasks of a specific que with celery in python? The questions seems pretty straigtforward, but to add I am not looking for the command line code
I have the following line, which defines the que and would like to purge that que to manage tasks:
CELERY_ROUTES = {"socialreport.tasks.twitter_save": {"queue": "twitter_save"}}
At 1 point in time I wanna purge all tasks in the que twitter_save with python code, maybe with a broadcast function? I couldn't find the documentation about this. Is this possible?
just to update #Sam Stoelinga answer for celery 3.1, now it can be done like this on a terminal:
celery amqp queue.purge <QUEUE_NAME>
For Django be sure to start it from the manage.py file:
./manage.py celery amqp queue.purge <QUEUE_NAME>
If not, be sure celery is able to point correctly to the broker by setting the --broker= flag.
The original answer does not work for Celery 3.1. Hassek's update is the correct command if you want to do it from the command line. But if you want to do it programmatically, do this:
Assuming you ran your Celery app as:
celery_app = Celery(...)
Then:
import celery.bin.amqp
amqp = celery.bin.amqp.amqp(app = celery_app)
amqp.run('queue.purge', 'name_of_your_queue')
This is handy for cases where you've enqueued a bunch of tasks, and one task encounters a fatal condition that you know will prevent the rest of the tasks from executing.
E.g. you enqueued a bunch of web crawler tasks, and in the middle of your tasks your server's IP address gets blocked. There's no point in executing the rest of the tasks. So in that case, your task it self can purge its own queue.
Lol it's quite easy, hope somebody can help me still though.
from celery.bin.camqadm import camqadm
camqadm('queue.purge', queue_name_as_string)
The only problem with this I still need to stop the celeryd before purging the que, after purging I need to run the celeryd again to handle tasks for the queue. Will update this question if i succeed.
I succeeded, but please correct me if this is not a good method to stop the celeryd, purge que and start it again. I know I am using term, because I actually want it to be terminated the task.
kill_command = "ps auxww | grep 'celeryd -n twitter_save' | awk '{print $2}' | xargs kill -9"
subprocess.call(kill_command, shell=True)
camqadm('queue.purge', 'twitter_save')
rerun_command = "/home/samos/Software/virt_env/twittersyncv1/bin/python %s/manage.py celeryd -n twitter_save -l info -Q twitter_save" % settings.PROJECT_ROOT
os.popen(rerun_command+' &')
send_task("socialreport.tasks.twitter_save")

Categories

Resources