Celery send task but worker sometimes doesn't received it - python

Basically, everything works as expected, but occasionally the worker does not receive the sent task. There is no the sent task in flower and no logs in worker logs, but we know that task was sent because we log task id.
We are usin Redis for Backend and Broker and have multiple celery workers with the following configuration:
import logging
from config import (
CELERY_BROKER_URL,
CELERY_RESULT_BACKEND,
)
worker_concurrency = 1
worker_redirect_stdouts_level = logging.INFO
worker_prefetch_multiplier = 1
broker_url = CELERY_BROKER_URL
result_backend = CELERY_RESULT_BACKEND
result_backend_transport_options = {
# Reschedule tasks if they are not processed in 3 hours.
"visibility_timeout": 60 * 60 * 3,
# Enable priorities
"queue_order_strategy": "priority",
}
Sending a task:
def process(self, message):
config = get_configs()
link_result = signature(...)
fallback = signature(...)
link_error = create_link_error(...)
result = self._app.send_task(
TASK_NAME,
args=(
message.session_id,
message.chunk_id,
message.audio_resource_uri,
message.start_offset_sec,
config,
),
priority=5, # Medium redis priority
link=link_result,
link_error=link_error,
)
logger.info(f"Executed task {TASK_NAME} with id: {result.task_id}")
Has anyone experienced similar behavior?

Related

Flask + Celery task duplications on 3rd party notifications

I have a flask app which sends emails/SMSs to users at a specific time using the ETA/Countdown celery functions with Redis as a broker, The issue is the emails & SMS tasks duplicate randomly - sometimes users get 10 emails/SMSs sometimes users get 20+ for these tasks and the task is only supposed to run once off. The data flow:
Initial function schedule_event_main calls the ETA tasks with the notifications
date_event = datetime.combine(day, time.max)
schedule_ratings_email.apply_async([str(event[0])], eta=date_event)
schedule_ratings_sms.apply_async([str(event[0])], eta=date_event)
Inside function schedule_ratings_email & schedule_ratings_sms task is the .delay task function which creates the individual celery tasks to send out the emails + SMSs to the various guests for an event.
#app.task(bind=True)
def schedule_ratings_email(self,event_id):
""" Fetch feed of URLs to crawl and queue up a task to grab and process
each url. """
try:
url = SITE_URL + 'user/dashboard'
guests = db.session.query(EventGuest).filter(EventGuest.event_id == int(event_id)).all()
event_details = db.session.query(Event).filter(Event.id == event_id).first()
if guests:
if event_details.status == "archived":
for guest in guests:
schedule_individual_ratings_emails.delay(guest.user.first_name, guest.event.host.first_name, guest.user.email,url)
except Exception as e:
log.error("Error processing ratings email for %s" % event_id, exc_info=e)
# self.retry()
This is the final .delay individual task for sending the notifications:
#app.task()
def schedule_individual_ratings_emails(guest_name, host, guest, url):
try:
email_rating(guest_name, host, guest, url)
except Exception as e:
log.error("Error processing ratings email for %s", exc_info=e)
I've tried multiple SO answers and tweaked a lot of variables including celery settings however the notifications are still duplicating. It's only the ETA/Countdown tasks and ONLY with 3rd party servers as I have certain ETA tasks which have DB data writing and those tasks don't have any issues.
This is both an issue on local and heroku (production). Current tech stack:
Flask==1.0.2
celery==4.1.0
Redis 4.0.9
Celery startup: worker: celery worker --app openseat.tasks --beat --concurrency 1 --loglevel info
Celery config details:
CELERY_ACKS_LATE = True
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TIMEZONE = 'Africa/Johannesburg'
CELERY_ENABLE_UTC = True

Can't seem to make python celery signals work

I'm rather new to celery development and I have an issue implementing signals.
I have an application that consists of many different workers.
currently it uses rabbitmq as a broker and redis as a backend.
Each worker has its own queue. This is the way we have it configured at the moment :
celery = Celery(queueDict['test'], broker=config.REDIS_SERVER, backend=config.REDIS_SERVER)
default_exchange = Exchange('default', type='direct')
test_queue = Queue(queueDict['test'], default_exchange, routing_key=queueDict['test'])
logger = get_task_logger(__name__)
celery.conf.task_queues = (test_queue, )
#celery.task(name='signal2', bind=True)
def signal2(self, param):
print("dog" + param)
I would like to use signals so that I will be able to catch failed tasks on any worker in the application. When I use it inside the same worker with a task_failure event it works.
But I would like to have another worker catch these events (or even my flask app)
but I seem to be missing something...
Here is my current attempt at making it work.
celery = Celery('consumer', broker=config.REDIS_SERVER, backend=config.REDIS_SERVER)
default_exchange = Exchange('default', type='direct')
default_queue = Queue(queueDict['default'], default_exchange, routing_key=queueDict['default'])
logger = get_task_logger(__name__)
celery.conf.task_queues = (default_queue, )
#task_failure.connect
def process_failure_signal(sender=None, task_id=None, exception=None,
args=None, kwargs=None, traceback=None, einfo=None, **akwargs):
msg = 'Signal exception: %s (%s)' % (
exception.__class__.__name__, exception)
exc_info = (type(exception), exception, traceback)
extra = {
'data': {
'task_id': str(task_id),
'sender': str(sender),
'args': str(args),
'kwargs': str(kwargs),
}
}
logger.error(msg, exc_info=exc_info, extra=extra)
But it never receives any signals...
Thanks for the help.
DejanLekic was correct and the page he shared had exactly what I wanted.
for those interested:
https://docs.celeryproject.org/en/stable/userguide/monitoring.html#real-time-processing
This can be easily used to capture events and monitor tasks.
Real-time processing
To process events in real-time you need the following
An event consumer (this is the Receiver)
A set of handlers called when events come in.
You can have different handlers for each event type, or a catch-all handler can be used ('*')
State (optional)
app.events.State is a convenient in-memory representation of tasks and workers in the cluster that’s updated as events come in.
It encapsulates solutions for many common things, like checking if a worker is still alive (by verifying heartbeats), merging event fields together as events come in, making sure time-stamps are in sync, and so on.
Combining these you can easily process events in real-time:
from celery import Celery
def my_monitor(app):
state = app.events.State()
def announce_failed_tasks(event):
state.event(event)
# task name is sent only with -received event, and state
# will keep track of this for us.
task = state.tasks.get(event['uuid'])
print('TASK FAILED: %s[%s] %s' % (
task.name, task.uuid, task.info(),))
with app.connection() as connection:
recv = app.events.Receiver(connection, handlers={
'task-failed': announce_failed_tasks,
'*': state.event,
})
recv.capture(limit=None, timeout=None, wakeup=True)
if __name__ == '__main__':
app = Celery(broker='amqp://guest#localhost//')
my_monitor(app)
Note: The wakeup argument to capture sends a signal to all workers to force them to send a heartbeat. This way you can immediately see workers when the monitor starts.
You can listen to specific events by specifying the handlers:
from celery import Celery
def my_monitor(app):
state = app.events.State()
def announce_failed_tasks(event):
state.event(event)
# task name is sent only with -received event, and state
# will keep track of this for us.
task = state.tasks.get(event['uuid'])
print('TASK FAILED: %s[%s] %s' % (
task.name, task.uuid, task.info(),))
with app.connection() as connection:
recv = app.events.Receiver(connection, handlers={
'task-failed': announce_failed_tasks,
})
recv.capture(limit=None, timeout=None, wakeup=True)
if __name__ == '__main__':
app = Celery(broker='amqp://guest#localhost//')
my_monitor(app)
Monitoring and Management Guide — Celery 4.4.2 documention

Celery: client_recent_max_output_buffer and connected_clients continue to increase in redis

I use celery as MQ for a project about face recognition.
I have three task queues "task_gpu0, task_gpu1, task_download" to provide service for gunicorn flask server, which uses redis as broker and backend.
When I use jmeter to stress the server, after about 20 min, the program raise a exception:
OperationalError: Error 104 while writing to socket. Connection reset by peer.
On checking redis log, I find client_recent_max_output_buffer and connected_clients continue to increase in info clients. But when I check the key-value by redis desktop manager, the result is ok.
I don't know why the redis output buffer and connected_clients continue to increase.
redis log:
:15:M 01 Apr 2019 09:11:18.113 # Client id=58081 addr=172.16.3.22:33832 fd=54 name= age=428 idle=0 flags=P db=1 sub=74995 psub=0 multi=-1 qbuf=79 qbuf-free=32689 obl=0 oll=452 omem=9267808 events=rw cmd=subscribe scheduled to be closed ASAP for overcoming of output buffer limits.
info clients:
connected_clients:587
client_recent_max_input_buffer:4
client_recent_max_output_buffer:55524832
blocked_clients:2
task_download.py
from celery import Celery
app = Celery()
app.config_from_object("celery_app_tmp.celeryconfig")
#app.task
def download(addImageInput, faceSetId):
download_someting()
return result_dict
task0.py is same as task1.py
from celery import Celery
app = Celery()
app.config_from_object("celery_app_tmp.celeryconfig")
#app.task
def faceRec(addImageInput, faceSetId):
do someting()
return result_dict
celeryconfig.py
from kombu import Queue
from kombu import Exchange
result_serializer = 'msgpack'
task_serializer = 'msgpack'
accept_content = ['json', 'msgpack']
broker_url = "redis://:redis#172.16.3.22:7369/1"
result_backend = "redis://:redis#172.16.3.22:7369/1"
worker_concurrency = 8
result_exchange_type = 'direct'
result_expires = 5
task_queues = (
Queue('gpu_0', exchange=Exchange('gpu_0'), routing_key='gpu_0'),
Queue('gpu_1', exchange=Exchange('gpu_1'), routing_key='gpu_1'),
)
task_routes = {
'celery_app_tmp.task0.faceRec': {'queue': 'gpu_0', 'routing_key': 'gpu_0'},
'celery_app_tmp.task1.faceRec': {'queue': 'gpu_1', 'routing_key': 'gpu_1'},
'celery_app_tmp.task_download.download': {'queue': 'download', 'routing_key': 'download'},
'celery_app_tmp.task_download.delFile': {'queue': 'download', 'routing_key': 'download'}
}
main.py
import celery_app_tmp.task0
import celery_app_tmp.task1
import celery_app_tmp.task_download
result_exc = chain(celery_app_tmp.task_download.download.s(addImageInput), random.choice(list_task).faceRec.s(faceSetId))()
while True:
if result_exc.ready():
dict_exc = result_exc.get()
break

Celery worker doesn't takes updated values from the database

I am building a webapp using flask and using celery to send mails periodically.
The problem is, whenever the there is a new entry in the database, celery doesn't sees it and continues to use the old entries. I have to restart the celery worker each time to make it work properly. Celery beat is running and I am using redis as a broker.
Celery related functions:
#celery.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(30.0, appointment_checkout, name='appointment_checkout')
#celery.task(name='app.Blueprints.periods.appointment_checkout')
def appointment_checkout():
from app.Blueprints.api.routes import fetchAllAppointments, fetch_user_email, fetch_user_phone
from app.Blueprints import db
dt = datetime.now() + timedelta(minutes=10)
#fa = Appointment.query.filter_by(date = dt.strftime("%Y-%m-%d"))
fa = fetchAllAppointments()
for i in fa:
# send emails to clients and counsellors
try:
if(str(i.date.year) != dt.strftime("%Y") or str(i.date.month) != dt.strftime("%m") or str(i.date.day) != dt.strftime("%d")):
continue
except:
continue
if(i.reminderFlag == 1):
continue
if(int(dt.strftime("%H")) == int(i.time.hour) and int(dt.strftime("%M")) == int(i.time.minute)):
client = fetch_user_email(i.user)
counsellor = fetch_user_email(i.counsellor)
client_phone = fetch_user_phone(i.user)
counsellor_phone = fetch_user_phone(i.counsellor)
i.reminderFlag = 1
db.session.add(i)
db.session.commit()
# client email
subject = "appointment notification"
msg = "<h1>Greetings</h1></p>This is to notify you that your appointment is about to begin soon.</p>"
sendmail.delay(subject, msg, client)
sendmail.delay(subject, msg, counsellor)
sendmsg.delay(client_phone, msg)
sendmsg.delay(counsellor_phone, msg)
When I add something in the appointment table, celery doesn't sees the new entry. After restarting the celery worker it sees it.
I am running beat and worker using the following commands:
celery -A periods beat --loglevel=INFO
celery -A periods worker --loglevel=INFO --concurrency=2

celery task routes not working as expected

I am practicing celery and I want to assign my task to a specific queue however it does not work as expected
My __init__.py
import os
import sys
from celery import Celery
CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(CURRENT_DIR)
app = Celery()
app.config_from_object('celery_config')
My celery_config.py
amqp = 'amqp://guest:guest#localhost:5672//'
broker_url = amqp
result_backend = amqp
task_routes = ([
('import_feed', {'queue': 'queue_import_feed'})
])
My tasks.py
from . import app
#app.task(name='import_feed')
def import_feed():
pass
How I run my worker:
celery -A subscriber1.tasks worker -l info
My client's __init__.py :
import os
import sys
from celery import Celery
CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(CURRENT_DIR)
app = Celery()
app.config_from_object('celery_config')
My client's celery_config.py:
from kombu.common import Broadcast
amqp = 'amqp://guest:guest#localhost:5672//'
BROKER_URL = amqp
CELERY_RESULT_BACKEND = amqp
Then in my client's shell I tried:
from publisher import app
result = app.send_task('import_feed')
Then my worker got the task?! Which I expect should not because I assigned that to a specific queue. I tried in my client the command below and no task has been received by my worker which I expect to have received instead on the first one
result = app.send_task('import_feed', queue='queue_import_feed')
Seems like I misunderstood something in the routing part. But what I really want is import_feed task to run only if the queue_import_feed queue is specified when send a task
You can change the default queue that the worker processes.
app.send_task('import_feed') sends the task to celery queue.
app.send_task('import_feed', queue='queue_import_feed') sends the task to queue_import_feed but your worker is only processing tasks in celery queue.
To process specific queues, use the -Q switch
celery -A subscriber1.tasks worker -l info -Q 'queue_import_feed'
Edit
In order to place a restriction on send_task such that a worker reacts to import_feed task only when it's published with a queue, you need to override send_task on Celery and also provide a custom AMQP with a default_queue set to None.
reactor.py
from celery.app.amqp import AMQP
from celery import Celery
class MyCelery(Celery):
def send_task(self, name=None, args=None, kwargs=None, **options):
if 'queue' in options:
return super(MyCelery, self).send_task(name, args, kwargs, **options)
class MyAMQP(AMQP):
default_queue = None
celery_config.py
from kombu import Exchange, Queue
...
task_exchange = Exchange('default', type='direct')
task_create_missing_queues = False
task_queues = [
Queue('feed_queue', task_exchange, routing_key='feeds'),
]
task_routes = {
'import_feed': {'queue': 'feed_queue', 'routing_key': 'feeds'}
}
__init__.py
celeree = MyCelery(amqp='reactor.MyAMQP')

Categories

Resources