celery get tasks count - python

I am using python celery+rabbitmq. I can't find a way to get task count in some queue.
Some thing like this:
celery.queue('myqueue').count()
Is it posible to get tasks count from certaint queue?
One solution is to run external command from my python scrpit:
"rabbitmqctl list_queues -p my_vhost"
and parse results, is it good way to do this?

I suppose that using rabbitmqctl command is not good solution, especially on my ubuntu server, where rabbitmqctl can be executed only with root privileges.
By playing with pika objects I found working solution:
import pika
from django.conf import settings
def tasks_count(queue_name):
''' Connects to message queue using django settings and returns count of messages in queue with name queue_name. '''
credentials = pika.PlainCredentials(settings.BROKER_USER, settings.BROKER_PASSWORD)
parameters = pika.ConnectionParameters( credentials=credentials,
host=settings.BROKER_HOST,
port=settings.BROKER_PORT,
virtual_host=settings.BROKER_VHOST)
connection = pika.BlockingConnection(parameters=parameters)
channel = connection.channel()
queue = channel.queue_declare(queue=queue_name, durable=True)
message_count = queue.method.message_count
return message_count
I did not find documentation about inspecting the AMQP queue with pika, so I do not know about solution's correctness.

Related

python : dynamically spawn multithread workers with flask-socket io and python-binance

Hello fellow developers,
I'm actually trying to create a small webapp that would allow me to monitor multiple binance accounts from a dashboard and maybe in the futur perform some small automatic trading actions.
My frontend is implemented with Vue+quasar and my backend server is based on python Flask for the REST api.
What I would like to do is being able to start a background process dynamically when a specific endpoint of my server is called. Once this process is started on the server, I would like it to communicate via websocket with my Vue client.
Right now I can spawn the worker and create the websocket communication, but somehow, I can't figure out how to make all the threads in my worker to work all together. Let me get a bit more specific:
Once my worker is started, I'm trying to create at least two threads. One is the infinite loop allowing me to automate some small actions and the other one is the flask-socketio server that will handle the sockets connections. Here is the code of that worker :
customWorker.py
import time
from flask import Flask
from flask_socketio import SocketIO, send, emit
import threading
import json
import eventlet
# custom class allowing me to communicate with my mongoDD
from db_wrap import DbWrap
from binance.client import Client
from binance.exceptions import BinanceAPIException, BinanceWithdrawException, BinanceRequestException
from binance.websockets import BinanceSocketManager
def process_message(msg):
print('got a websocket message')
print(msg)
class customWorker:
def __init__(self, workerId, sleepTime, dbWrap):
self.workerId = workerId
self.sleepTime = sleepTime
self.socketio = None
self.dbWrap = DbWrap()
# this retrieves worker configuration from database
self.config = json.loads(self.dbWrap.get_worker(workerId))
keys = self.dbWrap.get_worker_keys(workerId)
self.binanceClient = Client(keys['apiKey'], keys['apiSecret'])
def handle_message(self, data):
print ('My PID is {} and I received {}'.format(os.getpid(), data))
send(os.getpid())
def init_websocket_server(self):
app = Flask(__name__)
socketio = SocketIO(app, async_mode='eventlet', logger=True, engineio_logger=True, cors_allowed_origins="*")
eventlet.monkey_patch()
socketio.on_event('message', self.handle_message)
self.socketio = socketio
self.app = app
def launch_main_thread(self):
while True:
print('My PID is {} and workerId {}'
.format(os.getpid(), self.workerId))
if self.socketio is not None:
info = self.binanceClient.get_account()
self.socketio.emit('my_account', info, namespace='/')
def launch_worker(self):
self.init_websocket_server()
self.socketio.start_background_task(self.launch_main_thread)
self.socketio.run(self.app, host="127.0.0.1", port=8001, debug=True, use_reloader=False)
Once the REST endpoint is called, the worker is spawned by calling birth_worker() method of "Broker" object available within my server :
from custom_worker import customWorker
#...
def create_worker(self, workerid, sleepTime, dbWrap):
worker = customWorker(workerid, sleepTime, dbWrap)
worker.launch_worker()
def birth_worker(workerid, 5, dbwrap):
p = Process(target=self.create_worker, args=(workerid,10, botPipe, dbWrap))
p.start()
So when this is done, the worker is launched in a separate process that successfully creates threads and listens for socket connection. But my problem is that I can't use my binanceClient in my main thread. I think that it is using threads and the fact that I use eventlet and in particular the monkey_patch() function breaks it. When I try to call the binanceClient.get_account() method I get an error AttributeError: module 'select' has no attribute 'poll'
I'm pretty sure about that it comes from monkey_patch because if I use it in the init() method of my worker (before patching) it works and I can get the account info. So I guess there is a conflict here that I've been trying to resolve unsuccessfully.
I've tried using only the thread mode for my socket.io app by using async_mode=threading but then, my flask-socketio app won't start and listen for sockets as the line self.socketio.run(self.app, host="127.0.0.1", port=8001, debug=True, use_reloader=False) blocks everything
I'm pretty sure I have an architecture problem here and that I shouldn't start my app by launching socketio.run. I've been unable to start it with gunicorn for example because I need it to be dynamic and call it from my python scripts. I've been struggling to find the proper way to do this and that's why I'm here today.
Could someone please give me a hint on how is this supposed to be achieved ? How can I dynamically spawn a subprocess that will manage a socket server thread, an infinite loop thread and connections with binanceClient ? I've been roaming stack overflow without success, every advice is welcome, even an architecture reforge.
Here is my environnement:
Manjaro Linux 21.0.1
pip-chill:
eventlet==0.30.2
flask-cors==3.0.10
flask-socketio==5.0.1
pillow==8.2.0
pymongo==3.11.3
python-binance==0.7.11
websockets==8.1

RQ - Empty & Delete Queues

I'm using RQ, and I have a failed queue with thousands of items, and another test queue I created a while back for testing which is now empty and unused. I'm wondering how to remove all jobs from the failed queue, and delete the test queue altogether?
Apologies for the basic question, but I can't find info on this in the RQ docs, and I'm completely new to both Redis and RQ... Thanks in advance!
Cleanup using rq
RQ offers methods to make any queue empty:
>>> from redis import Redis
>>> from rq import Queue
>>> qfail = Queue("failed", connection=Redis())
>>> qfail.count
8
>>> qfail.empty()
8L
>>> qfail.count
0
You can do the same for test queue, if you have it still present.
Cleanup using rq-dashboard
Install rq-dashboard:
$ pip install rq-dashboard
Start it:
$ rq-dashboard
RQ Dashboard, version 0.3.4
* Running on http://0.0.0.0:9181/
Open in browser.
Select the queue
Click the red button "Empty"
And you are done.
Python function Purge jobs
If you run too old Redis, which fails on command used by RQ, you still might sucess with deleting
jobs by python code:
The code takes a name of a queue, where are job ids.
Usilg LPOP we ask for job ids by one.
Adding prefix (by default "rq:job:") to job id we have a key, where is job stored.
Using DEL on each key we purge our database job by job.
>>> import redis
>>> r = redis.StrictRedis()
>>> qname = "rq:queue:failed"
>>> def purgeq(r, qname):
... while True:
... jid = r.lpop(qname)
... if jid is None:
... break
... r.delete("rq:job:" + jid)
... print(jid)
...
>>> purge(r, qname)
a0be3624-86c1-4dc4-bb2e-2043d2734b7b
3796c312-9b02-4a77-be89-249aa7325c25
ca65f2b8-044c-41b5-b5ac-cefd56699758
896f70a7-9a35-4f6b-b122-a08513022bc5
- 2016 -
You can now use rq's empty option form command line:
/path/to/rq empty queue_name
So you can use it to empty any queue not just the failed one
none of the above solutions worked
failed Queue is not registered under queues
so I move all of the failed jobs to default Queue and use
rq empty queue_name --url [redis-url]
Monitoring tool rqinfo can empty failed queue.
Just make sure you have an active virtualenv with rq installed, and run
$ rqinfo --empty-failed-queue
See rqinfo --help for more details.
you can just login to redis and clear all queues
to login
user#user:~$ redis-cli
enter this command and hit enter
FLUSHALL
And you're done
Edit: This will delete everything stored in redis
Here's how to clear the failed job registry using django_rq:
import django_rq
from rq.registry import FailedJobRegistry
queue = django_rq.get_queue("your_queue_with_failed_jobs")
registry = FailedJobRegistry(queue=queue)
for job_id in registry.get_job_ids():
registry.remove(job_id)
- 2022 -
I was struggling with this as well and this is a piece of code which works for me.
It loops over queues name (in my case, 'default' and 'low'), fetch all failed jobs for each queue and remove them
import django_rq
from rq.registry import FailedJobRegistry
from redis import Redis
from rq.job import Job
from django.conf import settings
redis = Redis(host=settings.REDIS_HOST, port=settings.REDIS_PORT)
queues = ["default", "low"]
for q in queues:
queue = django_rq.get_queue(q)
registry = FailedJobRegistry(queue=queue)
for job_id in registry.get_job_ids():
job = Job.fetch(job_id, connection=redis)
registry.remove(job)
By default 'rq' jobs are prefixed by 'rq:job'. So you can delete these jobs from the redis using following command,
redis-cli KEYS rq:job:* | xargs redis-cli DEL

Detect whether Celery is Available/Running

I'm using Celery to manage asynchronous tasks. Occasionally, however, the celery process goes down which causes none of the tasks to get executed. I would like to be able to check the status of celery and make sure everything is working fine, and if I detect any problems display an error message to the user. From the Celery Worker documentation it looks like I might be able to use ping or inspect for this, but ping feels hacky and it's not clear exactly how inspect is meant to be used (if inspect().registered() is empty?).
Any guidance on this would be appreciated. Basically what I'm looking for is a method like so:
def celery_is_alive():
from celery.task.control import inspect
return bool(inspect().registered()) # is this right??
EDIT: It doesn't even look like registered() is available on celery 2.3.3 (even though the 2.1 docs list it). Maybe ping is the right answer.
EDIT: Ping also doesn't appear to do what I thought it would do, so still not sure the answer here.
Here's the code I've been using. celery.task.control.Inspect.stats() returns a dict containing lots of details about the currently available workers, None if there are no workers running, or raises an IOError if it can't connect to the message broker. I'm using RabbitMQ - it's possible that other messaging systems might behave slightly differently. This worked in Celery 2.3.x and 2.4.x; I'm not sure how far back it goes.
def get_celery_worker_status():
ERROR_KEY = "ERROR"
try:
from celery.task.control import inspect
insp = inspect()
d = insp.stats()
if not d:
d = { ERROR_KEY: 'No running Celery workers were found.' }
except IOError as e:
from errno import errorcode
msg = "Error connecting to the backend: " + str(e)
if len(e.args) > 0 and errorcode.get(e.args[0]) == 'ECONNREFUSED':
msg += ' Check that the RabbitMQ server is running.'
d = { ERROR_KEY: msg }
except ImportError as e:
d = { ERROR_KEY: str(e)}
return d
From the documentation of celery 4.2:
from your_celery_app import app
def get_celery_worker_status():
i = app.control.inspect()
availability = i.ping()
stats = i.stats()
registered_tasks = i.registered()
active_tasks = i.active()
scheduled_tasks = i.scheduled()
result = {
'availability': availability,
'stats': stats,
'registered_tasks': registered_tasks,
'active_tasks': active_tasks,
'scheduled_tasks': scheduled_tasks
}
return result
of course you could/should improve the code with error handling...
To check the same using command line in case celery is running as daemon,
Activate virtualenv and go to the dir where the 'app' is
Now run : celery -A [app_name] status
It will show if celery is up or not plus no. of nodes online
Source:
http://michal.karzynski.pl/blog/2014/05/18/setting-up-an-asynchronous-task-queue-for-django-using-celery-redis/
The following worked for me:
import socket
from kombu import Connection
celery_broker_url = "amqp://localhost"
try:
conn = Connection(celery_broker_url)
conn.ensure_connection(max_retries=3)
except socket.error:
raise RuntimeError("Failed to connect to RabbitMQ instance at {}".format(celery_broker_url))
One method to test if any worker is responding is to send out a 'ping' broadcast and return with a successful result on the first response.
from .celery import app # the celery 'app' created in your project
def is_celery_working():
result = app.control.broadcast('ping', reply=True, limit=1)
return bool(result) # True if at least one result
This broadcasts a 'ping' and will wait up to one second for responses. As soon as the first response comes in, it will return a result. If you want a False result faster, you can add a timeout argument to reduce how long it waits before giving up.
I found an elegant solution:
from .celery import app
try:
app.broker_connection().ensure_connection(max_retries=3)
except Exception as ex:
raise RuntimeError("Failed to connect to celery broker, {}".format(str(ex)))
You can use ping method to check whether any worker (or specific worker) is alive or not https://docs.celeryproject.org/en/latest/_modules/celery/app/control.html#Control.ping
celey_app.control.ping()
You can test on your terminal by running the following command.
celery -A proj_name worker -l INFO
You can review every time your celery runs.
The below script is worked for me.
#Import the celery app from project
from application_package import app as celery_app
def get_celery_worker_status():
insp = celery_app.control.inspect()
nodes = insp.stats()
if not nodes:
raise Exception("celery is not running.")
logger.error("celery workers are: {}".format(nodes))
return nodes
Run celery status to get the status.
When celery is running,
(venv) ubuntu#server1:~/project-dir$ celery status
-> celery#server1: OK
1 node online.
When no celery worker is running, you get the below information displayed in terminal.
(venv) ubuntu#server1:~/project-dir$ celery status
Error: No nodes replied within time constraint

How can I list or discover queues on a RabbitMQ exchange using python?

I need to have a python client that can discover queues on a restarted RabbitMQ server exchange, and then start up a clients to resume consuming messages from each queue. How can I discover queues from some RabbitMQ compatible python api/library?
There does not seem to be a direct AMQP-way to manage the server but there is a way you can do it from Python. I would recommend using a subprocess module combined with the rabbitmqctl command to check the status of the queues.
I am assuming that you are running this on Linux. From a command line, running:
rabbitmqctl list_queues
will result in:
Listing queues ...
pings 0
receptions 0
shoveled 0
test1 55199
...done.
(well, it did in my case due to my specific queues)
In your code, use this code to get output of rabbitmqctl:
import subprocess
proc = subprocess.Popen("/usr/sbin/rabbitmqctl list_queues", shell=True, stdout=subprocess.PIPE)
stdout_value = proc.communicate()[0]
print stdout_value
Then, just come up with your own code to parse stdout_value for your own use.
As far as I know, there isn't any way of doing this. That's nothing to do with Python, but because AMQP doesn't define any method of queue discovery.
In any case, in AMQP it's clients (consumers) that declare queues: publishers publish messages to an exchange with a routing key, and consumers determine which queues those routing keys go to. So it does not make sense to talk about queues in the absence of consumers.
You can add plugin rabbitmq_management
sudo /usr/lib/rabbitmq/bin/rabbitmq-plugins enable rabbitmq_management
sudo service rabbitmq-server restart
Then use rest-api
import requests
def rest_queue_list(user='guest', password='guest', host='localhost', port=15672, virtual_host=None):
url = 'http://%s:%s/api/queues/%s' % (host, port, virtual_host or '')
response = requests.get(url, auth=(user, password))
queues = [q['name'] for q in response.json()]
return queues
I'm using requests library in this example, but it is not significantly.
Also I found library that do it for us - pyrabbit
from pyrabbit.api import Client
cl = Client('localhost:15672', 'guest', 'guest')
queues = [q['name'] for q in cl.get_queues()]
Since I am a RabbitMQ beginner, take this with a grain of salt, but there's an interesting Management Plugin, which exposes an HTTP interface to "From here you can manage exchanges, queues, bindings, virtual hosts, users and permissions. Hopefully the UI is fairly self-explanatory."
http://www.rabbitmq.com/blog/2010/09/07/management-plugin-preview-release/
I use https://github.com/bkjones/pyrabbit. It's talks directly to RabbitMQ's mgmt plugin's API interface, and is very handy for interrogating RabbitMQ.
Management features are due in a future version of AMQP. So for now you will have to wait till for a new version that will come with that functionality.
I found this works for me, /els being my demo vhost name..
rabbitmqctl list_queues --vhost /els
pyrabbit didn't work so well for me; However, the Management Plugin itself has its own command line script that you can download from your own admin GUI and use later on (for example, I downloaded mine from
http://localhost:15672/cli/
for local use)
I would use simply this:
Just replace the user(default= guest), passwd(default= guest) and port with your values.
import requests
import json
def call_rabbitmq_api(host, port, user, passwd):
url = 'http://%s:%s/api/queues' % (host, port)
r = requests.get(url, auth=(user,passwd))
return r
def get_queue_name(json_list):
res = []
for json in json_list:
res.append(json["name"])
return res
if __name__ == '__main__':
host = 'rabbitmq_host'
port = 55672
user = 'guest'
passwd = 'guest'
res = call_rabbitmq_api(host, port, user, passwd)
print ("--- dump json ---")
print (json.dumps(res.json(), indent=4))
print ("--- get queue name ---")
q_name = get_queue_name(res.json())
print (q_name)
Referred from here: https://gist.github.com/hiroakis/5088513#file-example_rabbitmq_api-py-L2

tornado - transferring a file to cdn without blocking

I have the nginx upload module handling site uploads, but still need to transfer files (let's say 3-20mb each) to our cdn, and would rather not delegate that to a background job.
What is the best way to do this with tornado without blocking other requests? Can i do this in an async callback?
You may find it useful in the overall architecture of your site to add a message queuing service such as RabbitMQ.
This would let you complete the upload via the nginx module, then in the tornado handler, post a message containing the uploaded file path and exit. A separate process would be watching for these messages and handle the transfer to your CDN. This type of service would be useful for many other tasks that could be handled offline ( sending emails, etc.. ). As your system grows, this also provides you a mechanism to scale by moving queue processing to separate machines.
I am using an architecture very similar to this. Just make sure to add your message consumer process to supervisord or whatever you are using to manage your processes.
In terms of implementation, if you are on Ubuntu installing RabbitMQ is a simple:
sudo apt-get install rabbitmq-server
On CentOS w/EPEL repositories:
yum install rabbit-server
There are a number of Python bindings to RabbitMQ. Pika is one of them and it happens to be created by an employee of LShift, who is responsible for RabbitMQ.
Below is a bit of sample code from the Pika repo. You can easily imagine how the handle_delivery method would accept a message containing a filepath and push it to your CDN.
import sys
import pika
import asyncore
conn = pika.AsyncoreConnection(pika.ConnectionParameters(
sys.argv[1] if len(sys.argv) > 1 else '127.0.0.1',
credentials = pika.PlainCredentials('guest', 'guest')))
print 'Connected to %r' % (conn.server_properties,)
ch = conn.channel()
ch.queue_declare(queue="test", durable=True, exclusive=False, auto_delete=False)
should_quit = False
def handle_delivery(ch, method, header, body):
print "method=%r" % (method,)
print "header=%r" % (header,)
print " body=%r" % (body,)
ch.basic_ack(delivery_tag = method.delivery_tag)
global should_quit
should_quit = True
tag = ch.basic_consume(handle_delivery, queue = 'test')
while conn.is_alive() and not should_quit:
asyncore.loop(count = 1)
if conn.is_alive():
ch.basic_cancel(tag)
conn.close()
print conn.connection_close
advice on the tornado google group points to using an async callback (documented at http://www.tornadoweb.org/documentation#non-blocking-asynchronous-requests) to move the file to the cdn.
the nginx upload module writes the file to disk and then passes parameters describing the upload(s) back to the view. therefore, the file isn't in memory, but the time it takes to read from disk–which would cause the request process to block itself, but not other tornado processes, afaik–is negligible.
that said, anything that doesn't need to be processed online shouldn't be, and should be deferred to a task queue like celeryd or similar.

Categories

Resources