Python - Retry a failed Celery task from another queue

Python - Retry a failed Celery task from another queue - python

I'm posting a data to a web-service in Celery. Sometimes, the data is not posted to web-service because of the internet is down, and the task is retried infinite times until it is posted. The retrying of the task is un-necessary because the net was down and hence its not required to re-try it again.
I thought of a better solution, ie if a task fails thrice (retrying a min of 3 times), then it is shifted to another queue. This queue contains list of all failed tasks.
Now when the internet is up and the data is posted over the net , ie the task has been completed from the normal queue, it then starts processing the tasks from the queue having failed tasks.
This will not waste the CPU memory of retrying the task again and again.
Here's my code :- As of right now, I'm just retrying the task again, But I doubt whether that'll be the right way of doing it.
#shared_task(default_retry_delay = 1 * 60, max_retries = 10)
def post_data_to_web_service(data,url):
try :
client = SoapClient(
location = url,
action = 'http://tempuri.org/IService_1_0/',
namespace = "http://tempuri.org/",
soap_ns='soap', ns = False
)
response= client.UpdateShipment(
Weight = Decimal(data['Weight']),
Length = Decimal(data['Length']),
Height = Decimal(data['Height']),
Width = Decimal(data['Width']) ,
)
except Exception, exc:
raise post_data_to_web_service.retry(exc=exc)
How do I maintain 2 queues simultaneous and trying to execute tasks from both the queues.
Settings.py
BROKER_URL = 'redis://localhost:6379/0'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'

By default celery adds all tasks to queue named celery. So you can run your task here and when an exception occurs, it retries, once it reaches maximum retries, you can shift them to a new queue say foo
from celery.exceptions import MaxRetriesExceededError
#shared_task(default_retry_delay = 1 * 60, max_retries = 10)
def post_data_to_web_service(data,url):
try:
#do something with given args
except MaxRetriesExceededError:
post_data_to_web_service([data, url], queue='foo')
except Exception, exc:
raise post_data_to_web_service.retry(exc=exc)
When you start your worker, this task will try to do something with given data. If it fails it will retry 10 times with a dealy of 60 seconds. Then when it encounters MaxRetriesExceededError it posts the same task to new queue foo.
To consume these tasks you have to start a new worker
celery worker -l info -A my_app -Q foo
or you can also consume this task from the default worker if you start it with
celery worker -l info -A my_app -Q celery,foo

Related

Scheduling/Queueing a job within another job with redis queue

Tasker class setups the initial job when instantiated. Basically what I want is put a job in the 'main_queue', decide if job is running or if there is already same job that is queued in the 'process_queue', return from the the current 'main_queue' job. Else queue a job in the 'process_queue'. When that process queue finishes, put a job in the 'main_queue'.
however, the 'process_queue' has the same job with id for that duration, despite it should have been finished looking at the outputs. So a new job is never put in to process. Is there a deadlock happening that I am unable to see?
main_queue worker
$ rq worker main_queue --with-scheduler
22:44:19 Worker rq:worker:7fe23a24ae404135a10e301f7509eb7e: started, version 1.9.0
22:44:19 Subscribing to channel rq:pubsub:7fe23a24ae404135a10e301f7509eb7e
22:44:19 *** Listening on main_queue...
22:44:19 Trying to acquire locks for main_queue
22:44:19 Scheduler for main_queue started with PID 3747
22:44:19 Cleaning registries for queue: main_queue
22:44:33 main_queue: tasks.redis_test_job() (e90e0dff-bbcc-48ab-afed-6d1ba8b020a8)
None
Job is enqueued to process_queue!
22:44:33 main_queue: Job OK (e90e0dff-bbcc-48ab-afed-6d1ba8b020a8)
22:44:33 Result is kept for 500 seconds
22:44:47 main_queue: tasks.redis_test_job() (1a7f91d0-73f4-466e-92f4-9f918a9dd1e9)
<Job test_job: tasks.print_job()>
!!Scheduler added job to main but same job is already queued in process_queue!!
22:44:47 main_queue: Job OK (1a7f91d0-73f4-466e-92f4-9f918a9dd1e9)
22:44:47 Result is kept for 500 seconds
process_queue worker
$ rq worker process_queue
22:44:24 Worker rq:worker:d70daf20ff324c18bc17f0ea9576df52: started, version 1.9.0
22:44:24 Subscribing to channel rq:pubsub:d70daf20ff324c18bc17f0ea9576df52
22:44:24 *** Listening on process_queue...
22:44:24 Cleaning registries for queue: process_queue
22:44:33 process_queue: tasks.print_job() (test_job)
The process job executed.
22:44:42 process_queue: Job OK (test_job)
22:44:42 Result is kept for 500 seconds
tasker.py
class Tasker():
def __init__(self):
self.tasker_conn = RedisClient().conn
self.process_queue = Queue(name='process_queue', connection=Redis(),
default_timeout=-1)
self.main_queue = Queue(name='main_queue', connection=Redis(),
default_timeout=-1)
self.__setup_tasks()
def __setup_tasks(self):
self.main_queue.enqueue_in(timedelta(seconds=3), tasks.redis_test_job)
tasks.py
import tasks
def redis_test_job():
q = Queue('process_queue', connection=Redis(), default_timeout=-1)
queued = q.fetch_job('test_job')
print(queued)
if queued:
print("!!Scheduler added job to main but same job is already queued in process_queue!!")
return False
else:
q.enqueue(tasks.print_job, job_id='test_job')
print("Job is enqueued to process_queue!")
return True
def print_job():
sleep(8)
print("The process job executed.")
q = Queue('main_queue', connection=Redis(), default_timeout=-1)
q.enqueue_in(timedelta(seconds=5), tasks.redis_test_job)

From the docs, enqueued jobs have a result_ttl that defaults to 500 seconds if you don't define it.
If you want to change it to e.g. make the job and result live for only 1 second, enqueue your job like this:
q.enqueue(tasks.print_job, job_id='test_job', result_ttl=1)

Flask + Celery task duplications on 3rd party notifications

I have a flask app which sends emails/SMSs to users at a specific time using the ETA/Countdown celery functions with Redis as a broker, The issue is the emails & SMS tasks duplicate randomly - sometimes users get 10 emails/SMSs sometimes users get 20+ for these tasks and the task is only supposed to run once off. The data flow:
Initial function schedule_event_main calls the ETA tasks with the notifications
date_event = datetime.combine(day, time.max)
schedule_ratings_email.apply_async([str(event[0])], eta=date_event)
schedule_ratings_sms.apply_async([str(event[0])], eta=date_event)
Inside function schedule_ratings_email & schedule_ratings_sms task is the .delay task function which creates the individual celery tasks to send out the emails + SMSs to the various guests for an event.
#app.task(bind=True)
def schedule_ratings_email(self,event_id):
""" Fetch feed of URLs to crawl and queue up a task to grab and process
each url. """
try:
url = SITE_URL + 'user/dashboard'
guests = db.session.query(EventGuest).filter(EventGuest.event_id == int(event_id)).all()
event_details = db.session.query(Event).filter(Event.id == event_id).first()
if guests:
if event_details.status == "archived":
for guest in guests:
schedule_individual_ratings_emails.delay(guest.user.first_name, guest.event.host.first_name, guest.user.email,url)
except Exception as e:
log.error("Error processing ratings email for %s" % event_id, exc_info=e)
# self.retry()
This is the final .delay individual task for sending the notifications:
#app.task()
def schedule_individual_ratings_emails(guest_name, host, guest, url):
try:
email_rating(guest_name, host, guest, url)
except Exception as e:
log.error("Error processing ratings email for %s", exc_info=e)
I've tried multiple SO answers and tweaked a lot of variables including celery settings however the notifications are still duplicating. It's only the ETA/Countdown tasks and ONLY with 3rd party servers as I have certain ETA tasks which have DB data writing and those tasks don't have any issues.
This is both an issue on local and heroku (production). Current tech stack:
Flask==1.0.2
celery==4.1.0
Redis 4.0.9
Celery startup: worker: celery worker --app openseat.tasks --beat --concurrency 1 --loglevel info
Celery config details:
CELERY_ACKS_LATE = True
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TIMEZONE = 'Africa/Johannesburg'
CELERY_ENABLE_UTC = True

I am trying to run an endless worker thread (daemon) from within Django

I have a worker thread which only task is to query a list of active users every 10 minutes from the database, and to send them an SMS message if a certain condition is fulfilled (which is checked every minute); also the worker thread does not hinder the main application at all.
So far I managed to get the thread up and running and sending SMS works also just fine. However, for some reasons the thread stops/gets killed after some random time (hours). I run a try: except Exception as e: within a while True, to catch occurring errors. Additionally, I print out a messages saying what error occurred.
Well, I never see any message and the thread is definitely down. Therefore, I suspect Gunicorn or Django to kill my thread sort of gracefully.
I have put log and print statements all over the code but haven't found anything indicating why my thread is getting killed.
My wsgi.py function where I call the function to start my thread
"""
WSGI config for django_web project.
It exposes the WSGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/2.1/howto/deployment/wsgi/
"""
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'django_web.settings')
application = get_wsgi_application()
'''
Start background services
Import has to happen after "get_wsgi_application()"; otherwise docker container crashes
'''
try:
from threading import Thread
from timesheet.admin import runWorkmateServices
runWorkmateServices()
except Exception as exp:
print(exp)
The function which is called from within the wsgi.py. I double check if the thread was started to avoid having two up and running.
def runWorkmateServices(request=None):
service_name = 'TimeKeeperWorkMateReminderService'
thread_found = False
for thread in threading.enumerate():
if service_name in thread.name:
thread_found = True
break # Leave loop now
if thread_found:
print(f'Service has already been started: {service_name}')
if request:
messages.add_message(request, messages.ERROR, f'Service has already been started:: {service_name}')
else:
Thread(target=WorkMateReminders, args=(), name=service_name, daemon=True).start()
print(f'Started Service: {service_name}')
if request:
messages.add_message(request, messages.SUCCESS, f'Started Service: {service_name}')
The worker thread itself
def WorkMateReminders():
print('Thread Started: WorkMateReminders')
timer = 0
employees = User.objects.none()
while True:
try:
# Update user list every n * sleep time (10 minutes)
if timer % 10 == 0:
timer = 0
# Get active employees
employees = User.objects.filter(is_active=True, profile__workmate_sms_reminders_activated=True)
print(f'Employees updated at {datetime.now().date()} - {datetime.now().time()}: {employees}')
WorkMateCheckClockOffTimes(employees=employees)
WorkMateClockOnReminder(employees=employees)
WorkMateEndOfBreakReminder(employees=employees)
timer += 1 # increment timer
except Exception as exp:
print(f'Error: {exp}')
time.sleep(60 * 1)
My goal is to have this worker thread running for as long as Django is up.

Most WSGI servers spawn workers that are killed/recycled fairly regularly, spawning threads from these workers is not the best solution to your problem. There are several ways to go about this
Cron
Create a management command that does what you want and configure cron to run it every 10 minutes
Celery/Celerybeat
Set up a celery worker, this is a process that runs asynchronously to your Django application and using celerybeat you can have tasks run at intervals

Undesired delay in the celery process

I am encountering an undesired delay in the celery process that I cannot explain. My intent is to manage live processing of incoming data (at a rate of 10 to 60 data per seconds). Processing of one piece of data is divided into two fully sequential tasks but parallelization is used to start processing the next piece of data (with task 1) while processing the current one (with task 2) is not finished yet. Getting the shortest delay in the process is of at-most importance since it is a live application.
Once in a while, I encounter a freeze in the process. To see where this problem came from I started monitoring the occupation of my workers. It appeared that it happened during the communication between workers. I designed the lightest and simplest example to illustrate it here.
Here is my code, as you can see I have two tasks doing nothing but waiting 10ms each. I call them by using celery chains once every 20ms. I track each workers occupation by using prerun and postrun along with logging. In most of the case all is happening sequentially as time spent by both the workers doesn't exceed the send rate.
from __future__ import absolute_import
import time
from celery import chain
from celery.signals import task_prerun, task_postrun
from celery import Celery
from kombu import Queue, Exchange
N_ITS = 100000 # Total number of chains sent
LOG_FILE = 'log_file.txt' # Path to the log file
def write_to_log_file(text):
with open(LOG_FILE, 'a') as f:
f.write(text)
# Create celery app
app = Celery('live')
app.config_from_object('celeryconfig')
default_exchange = Exchange('default', type='direct')
app.conf.task_queues = tuple(Queue(route['queue'], default_exchange, routing_key=route['queue'])
for route in app.conf.task_routes.values() + [{'queue': 'default'}])
app.conf.update(result_expires=3600)
# Define functions that record timings
#task_prerun.connect()
def task_prerun(signal=None, sender=None, task_id=None, task=None, **kwargs):
text = 'task_prerun; {0}; {1:.16g}\n'.format(task.name, time.time())
write_to_log_file(text)
#task_postrun.connect()
def task_postrun(signal=None, sender=None, task_id=None, task=None, **kwargs):
text = 'task_postrun; {0}; {1:.16g}\n'.format(task.name, time.time())
write_to_log_file(text)
# Define tasks
#app.task
def task_1(i):
print 'Executing task_1: {}'.format(i)
time.sleep(0.01)
#app.task
def task_2(i):
print 'Executing task_2: {}'.format(i)
time.sleep(0.01)
# Send chained tasks
def main():
celery_chains = []
for i in range(N_ITS):
print '[{}] - Dispatching tasks'.format(i)
celery_chains.append(chain(task_1.si(i) | task_2.si(i))())
time.sleep(0.02)
# wait for all tasks to complete
[c.get() for c in celery_chains]
if __name__ == '__main__':
main()
I also give the configuration of celery if needed:
from __future__ import absolute_import
import os
name = 'live'
broker_url = 'pyamqp://{}'.format(os.environ.get('RMQ_HOST', 'localhost'))
print 'broker_url:', broker_url
include = ['live']
DEFAULT_QUEUE = 'celery'
# A named queue that's not already defined in task_queues will be created automatically.
task_create_missing_queues = True
broker_pool_limit = 10000
task_routes = {
'live.task_1': {'queue': 'worker_1'},
'live.task_2': {'queue': 'worker_2'}
}
# We always set the routing key to be the queue name so we do it here automatically.
for v in task_routes.values():
v.update({'routing_key': v['queue']})
task_serializer = 'pickle'
result_serializer = 'pickle'
accept_content = ['json', 'pickle']
timezone = 'Europe/Paris'
enable_utc = True
For the broker, I use the docker image rabbitmq:3.6-alpine with basic configurations appart that I enabled rabbitmq_management.
This resuts in the following worker occupation chronogram: (the color indicates the index of the data being processed, so you can link tasks belonging to the same chain)
As you can see, usually everything goes well and task 2 is called right after task 1 is finished. However, sometimes (indicated by the arrows on the figure) task 2 doesn't start immediately even though worker 2 isn't occupied. It imputes a delay of 27ms, which is more than twice the duration of a single task. This happened approximately every 2 seconds during this execution.
I made some additionnal investigation using firehose to study the message exchange in rabbitmq and it resulted that the messages are effectively sent on time. To my understanding, the worker waits to go fetch the message and process the task, but I cannot understand why.
I tried setting the broker pool limit to a high number but the issue remains.

Retry with celery Http Callback Tasks

I looking at the http callback tasks - http://celeryproject.org/docs/userguide/remote-tasks.html in celery. They work well enough when the remote endpoint is available - but when it is unavailable I would like it to retry (as per retry policy) - or even if the remote endpoint returns a failure. at the moment it seems to be an error and the task is ignored.
Any ideas ?

You should be able to define your task like:
class RemoteCall(Task):
default_retry_delay = 30 * 60 # retry in 30 minutes
def Run(self, arg, **kwargs):
try:
res = URL("http://example.com/").get_async(arg)
except (InvalidResponseError, RemoteExecuteError), exc:
self.retry([arg], exc=exc, *kwargs)
This will keep trying it up to max_retries tries, once every 30 minutes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Retry a failed Celery task from another queue - python

Related

Scheduling/Queueing a job within another job with redis queue

Flask + Celery task duplications on 3rd party notifications

I am trying to run an endless worker thread (daemon) from within Django

Undesired delay in the celery process

Retry with celery Http Callback Tasks

Categories

Resources