celery worker does not retry task after calling retry()

celery worker does not retry task after calling retry() - python

I have a task:
#celery.task(name='request_task',default_retry_delay=2,acks_late=True)
def request_task(data):
try:
if some_condition:
request_task.retry()
except Exception as e:
request_task.retry()
I use celery with mongodb broker and mongodb results backend enabled.
When task's retry() method is called, neither from conditional statement nor after catching exception, the task is not retried.
in the worker's terminal I get message like this:
[2012-08-10 19:21:54,909: INFO/MainProcess] Task request_task[badb3131-8964-41b5-90a7-245a8131e68d] retry: Task can be retried
What can be wrong?
UPDATE: Finally, I did not solve this question and had to use while loop inside the task, so my tasks never get retried.

I know this answer is too late, but the log message you're seeing means that you're calling the task directly request_task() and not queuing it so it's not running on a worker, so doing this will raise the exception if there is any, or a Retry exception, this is the code from the Task.retry method if you want to look at:
# Not in worker or emulated by (apply/always_eager),
# so just raise the original exception.
if request.called_directly:
# raises orig stack if PyErr_Occurred,
# and augments with exc' if that argument is defined.
raise_with_context(exc or Retry('Task can be retried', None))
Using task.retry() does not retry the task on the same worker, it sends a new message using task.apply_async() so it might be retried using another worker, this is what you should take into account when handling retries, you can access the retries count using task.request.retries and you can also set max_retries option on the task decorator.
By the way using bind=True on the task decorator makes the task instance available as the first argument:
#app.task(bind=True, name='request_task', max_retries=3)
def request_task(self, data):
# Task retries count
self.request.retries
try:
# Your code here
except SomeException as exc:
# Retry the task
# Passing the exc argument will make the task fail
# with this exception instead of MaxRetriesExceededError on max retries
self.retry(exc=exc)

You should read the section on retrying in the Celery docs.
http://celery.readthedocs.org/en/latest/userguide/tasks.html#retrying
It looks like in order to retry, you must raise a retry exception.
raise request_task.retry()
That seems to make the retry be handled by the function that decorated your task.

Related

Overreaching FDS max limit, but where?

I have this piece of code that just downloads files from WebDav server.
_download(self) is a thread function, it is handled by multi_download(self) controller, that keeps the Thread count under 24 and it works fine. It should ensure that no more than 24 sockets are used. It is very straightforward, I am not even going to post the method here. Maybe relevant is that I am using Threads, not ThreadPoolExecutor - i am not a fan of Pool so much - handling the max ThreadCount manually.
Problem is when e.g. VPN drops and i cannot connect, or some other unhandled network problems. I could handle that ofc, but thats not the point here.
The unexpected behaviour is HERE:
After a while of running retrials and logging exceptions , the file descriptor count seems to overreach the limit because it starts throwing this Error. It never happened when there was no errors/retrials in the whole process.:
NOTE: webdav.download() library method uses with open(file, 'wb') to download data, so there should be no hanging FDs either.
2022-02-09 10:36:53,898 - DEBUG - 2294-1644212940.tdms thrd - Retried download successfull on 25 attempt [webdav.py:_download:183]
2022-02-09 10:36:53,904 - DEBUG - 2294-1644212940.tdms thrd - downloaded 900 files [webdav.py:add_download_counter:67]#just a log
2022-02-09 10:36:59,801 - DEBUG - 2294-1644219643.tdms thrd - Retried download successfull on 25 attempt [webdav.py:_download:183]
2022-02-09 10:36:59,856 - DEBUG - 2294-1644213248.tdms thrd - Retried download successfull on 25 attempt [webdav.py:_download:183]
2022-02-09 10:36:59,905 - WARNING - 2294-1643646904.tdms thrd - WebDav cannot connect: HTTPConnectionPool(host='123.16.456.123', port=987):
Max retries exceeded with url:/path/to/webdav/file (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7b3377d898>:
Failed to establish a new connection: [Errno 24] Too many open files'))
# Marked in code where this is thrown !
I assume that it means i am opening too many sockets, but I tried to clean after myself, in code you can see me closing the session and even deleting the reference to client to make it more neat. BUT after a while of debugging it, I cannot seem to get hold of WHERE I am forgetting something and where are the hanging sockets. I try ask for help before I start counting FDs and subclassing easywebdav2 classes :) Thanks, Q.
# Python 3.7.3
from easywebdav2 import Client as WebDavClient
# WebDav source:
# https://github.com/zabuldon/easywebdav/blob/master/easywebdav2/client.py
def clean_webdav(self, webdav):
"""Closing sockets after"""
try:
webdav.session.close()
except Exception as err:
logger.error(f'Err closing session: {err}')
finally:
del webdav
def _download(self, local, remote, *, retry=0):
"""This is a thread function, therefore raising SystemExit."""
try:
webdav = WebDavClient(**kw)
except Exception as err:
logger.error(f'There is an err creating client: {err}')
raise SystemExit
try:
webdav.download(remote, local) # < --------------- HERE THROWS
if retry != 0:
logger.info(f'Retry number {retry} was successfull')
except(ConnectionError, requests.exceptions.ConnectionError) as err:
if retry >= MAX_RETRY:
logger.exception(f'There was err: {err}')
return
retry += 1
self.clean_webdav(webdav)
self._download(local, remote, retry=retry)
except Exception as err:
logger.error(f'Unhandled Exception: {err}')
finally:
self.clean_webdav(webdav)
raise SystemExit
EDIT: Since one answer contained reference to WebDav being HTTP protocol expansion(which it is) - HTTP keep-alive should not play a role here if I am specifically closing the requests session by webdav.session.close() which is indeed THE requests session made by webdav. There should be no keep-alive after specifically closing right?

I'm not specifically familiar with the WebDAV client package you're using, but any WebDAV client would usually support HTTP Keep-Alive, which means that after you release a connection, it will keep it alive in a pool for a while in case you need it again.
The way you use a client like that would be to construct one WebDavClient for your application, and use that one client for all requests. Of course, you need to know that it's thread safe if you're going to call it from multiple download threads.
Since you are creating a WebDavClient in each thread, there's a good chance that total number of connections being kept alive across all of their pools exceeds your file handle limit.
--
EDIT: A quick look on the Web indicates that each WebDavClient creates a session of object from requests, which does indeed have a connection pool, but unfortunately isn't thread-safe. You should create a WebDavClient per thread and use it for all of the downloads that that thread does. That will probably require a little refactoring.

Celery - continue call back if group task fails(effective logging of task failures)

Using celery 4.4.2, I have a group of tasks that connects to remote devices and gathers data, when that is completed the results are collated then emailed with the callback task.
However if gathering the data of one of the remote devices fails, the call back also fails. I've read that using link_error should resolve this but I'm not sure if my implementation, I have executed the below but it still failed
for device in device_ids:
task_count +=1
tasks.append(config_auth.s(device.id, device.hostname, device.get_scripts_ip()))
callback = email_auth_results.s().set(link_error=error_handler.s())
tasks = group(tasks)
r = chord(tasks)(callback)
return '{} tasks sent to queue'.format(task_count)
#app.task
def error_handler(uuid):
result = AsyncResult(uuid)
exc = result.get(propagate=False)
print('Task {0} raised exception: {1!r}\n{2!r}'.format(
uuid, exc, result.traceback))
original error:
celery.exceptions.ChordError: Dependency 5ffc10c9-edc7-4b91-a660-08c372c60ab2 raised NetmikoTimeoutException('Connection to device timed-out')
I still want to log that a task failed so I can see the failures in flower, but I want to ignore the failures or append the results so they just say failure and I can see it in the email results
Thanks

Celery's exception 'TimeLimitExceeded' with group of tasks

I have this celery's settings:
WORKER_MAX_TASKS_PER_CHILD = 1
TASK_TIME_LIMIT = 30
When i run group of tasks:
from celery import group, shared_task
from time import sleep
#shared_task
def do_something(arg):
sleep(60)
return arg*2
group([do_something.s(i) for i in range(3)]).apply_async()
I'm geting TimeLimitExceeded inside of group and then worker is killed by celery at once. How can i handle it?

According to the documentation:
The soft time limit allows the task to catch an exception to clean up before it is killed: the hard timeout isn’t catch-able and force terminates the task.
Answer will be simple: do not use hard-time limits for tasks if you want to catch exception.

Python Tornado connections timing out early -- Any way to prevent timeout (HTTP 599 errors)?

I am using Tornado to asynchronously scrape data from many thousand URLS. Each of them is 5-50MB, so they take a while to download. I keep getting "Exception: HTTP 599: Connection closed http:…" errors, despite the fact that I am setting both connect_timeout and request_timeout to a very large number.
Why, despite the large timeout settings, am I still timing out on some requests after only a few minutes of running the script?* Is there a way to instruct httpclient.AsyncHTTPClient to NEVER time out? Or is there a better solution to prevent timeouts?
The following command is how I'm calling the fetch (each worker calls this request_and_save_url() sub-coroutine in the Worker() coroutine):
#gen.coroutine
def request_and_save_url(url, q_all):
try:
response = yield httpclient.AsyncHTTPClient().fetch(url, partial(handle_request, q_all=q_all), connect_timeout = 60*24*3*999999999, request_timeout = 60*24*3*999999999)
except Exception as e:
print('Exception: {0} {1}'.format(e, url))
raise gen.Return([])

As you note HTTPError 599 is raised on connection or request timeout, but this is not the only case. The other one is when connection has been closed by the server before request ends (including entire response fetch) e.g. due to its (server) timeout to handle request or whatever.

Retry with celery Http Callback Tasks

I looking at the http callback tasks - http://celeryproject.org/docs/userguide/remote-tasks.html in celery. They work well enough when the remote endpoint is available - but when it is unavailable I would like it to retry (as per retry policy) - or even if the remote endpoint returns a failure. at the moment it seems to be an error and the task is ignored.
Any ideas ?

You should be able to define your task like:
class RemoteCall(Task):
default_retry_delay = 30 * 60 # retry in 30 minutes
def Run(self, arg, **kwargs):
try:
res = URL("http://example.com/").get_async(arg)
except (InvalidResponseError, RemoteExecuteError), exc:
self.retry([arg], exc=exc, *kwargs)
This will keep trying it up to max_retries tries, once every 30 minutes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

celery worker does not retry task after calling retry() - python

Related

Overreaching FDS max limit, but where?

Celery - continue call back if group task fails(effective logging of task failures)

Celery's exception 'TimeLimitExceeded' with group of tasks

Python Tornado connections timing out early -- Any way to prevent timeout (HTTP 599 errors)?

Retry with celery Http Callback Tasks

Categories

Resources