celery task raise an error while redis backend connection socket timeout

celery task raise an error while redis backend connection socket timeout - python

I am using celery to run task one by one with redis broker, but when i run 2 task then after completing the first redis given an timeout socket error for second task so that second task would be failed.
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/result.py", line 194, in get
on_message=on_message,
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/backends/async.py", line 189, in wait_for_pending
for _ in self._wait_for_pending(result, **kwargs):
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/backends/async.py", line 256, in _wait_for_pending
on_interval=on_interval):
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/backends/async.py", line 57, in drain_events_until
yield self.wait_for(p, wait, timeout=1)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/backends/async.py", line 66, in wait_for
wait(timeout=timeout)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/backends/redis.py", line 69, in drain_events
m = self._pubsub.get_message(timeout=timeout)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/client.py", line 2513, in get_message
response = self.parse_response(block=False, timeout=timeout)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/client.py", line 2430, in parse_response
return self._execute(connection, connection.read_response)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/client.py", line 2408, in _execute
return command(*args)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/connection.py", line 624, in read_response
response = self._parser.read_response()
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/connection.py", line 284, in read_response
response = self._buffer.readline()
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/connection.py", line 216, in readline
self._read_from_socket()
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/connection.py", line 187, in _read_from_socket
raise TimeoutError("Timeout reading from socket")
TimeoutError: Timeout reading from socket
I am running celery by using this command:
celery -A flask_application.celery worker --loglevel=info --max-tasks-per-child=1 --concurrency=1
I am calling celery task by using: .delay() function
celery_response = run_algo.run_pipeline.delay(request.get_json())
Getting output by using: .get() function
output_file_path = celery_response.get()

There's a warning in the docs of AsyncResult.get saying that calling it it within an async task can cause a deadlock, which may be what's happening here, though it's hard to tell without more context of where these things are being called.

Related

How to add retry for celery backend connection?

I am using celery 5.0.1 and using CELERY_BACKEND_URL as redis://:password#redisinstance1:6379/0. It works fine, but when there is a Redis instance loose connection, it breaks out tasks with an error.
Exception: Error while reading from socket: (104, 'Connection reset by peer')
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 198, in _read_from_socket
data = recv(self._sock, socket_read_size)
File "/usr/local/lib/python3.7/dist-packages/redis/_compat.py", line 72, in recv
return sock.recv(*args, **kwargs)
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 477, in trace_task
uuid, retval, task_request, publish_result,
File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 154, in mark_as_done
self.store_result(task_id, result, state, request=request)
File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 439, in store_result
request=request, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 855, in _store_result
current_meta = self._get_task_meta_for(task_id)
File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 873, in _get_task_meta_for
meta = self.get(self.get_key_for_task(task_id))
File "/usr/local/lib/python3.7/dist-packages/celery/backends/redis.py", line 346, in get
return self.client.get(key)
File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 1606, in get
return self.execute_command('GET', name)
File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 901, in execute_command
return self.parse_response(conn, command_name, **options)
File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 915, in parse_response
response = connection.read_response()
File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 739, in read_response
response = self._parser.read_response()
File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 324, in read_response
raw = self._buffer.readline()
File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 256, in readline
self._read_from_socket()
File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 223, in _read_from_socket
(ex.args,))
redis.exceptions.ConnectionError: Error while reading from socket: (104, 'Connection reset by peer')
Celery worker: None
Celery task id: 244b56af-7c96-56cf-a01a-9256cfd98ade
Celery retry attempt: 0
Task args: []
Task kwargs: {'address': 'ipadd', 'uid': 'uid', 'hexID': 'hexID', 'taskID': '244b56af-7c96-56cf-a01a-9256cfd98ade'}
When I run the second tasks, it works fine, there is some glitch in the connection for a short period of time.
Can I set something by which, when celery tries to update the results to Redis, if it returns an error, it will retry after 2-5 seconds?
I know how to set retry in the task, but this does not task failure. My tasks work fine and it returns the data, but celery is losing connection while updating to the backend.

To deal with connection timeouts you can have the following in your Celery configuration:
app.conf.broker_transport_options = {
'retry_policy': {
'timeout': 5.0
}
}
app.conf.result_backend_transport_options = {
'retry_policy': {
'timeout': 5.0
}
}
There are few other Redis backend settings that you may want to consider having in your configuration, like the redis_retry_on_timeout for an example.

Error trying to connect Celery through SQS using STS

I'm trying to use Celery with SQS as broker. In order to use the SQS from my container I need to assume a role and for that I'm using STS. My code looks like this:
role_info = {
'RoleArn': 'arn:aws:iam::xxxxxxx:role/my-role-execution',
'RoleSessionName': 'roleExecution'
}
sts_client = boto3.client('sts', region_name='eu-central-1')
credentials = sts_client.assume_role(**role_info)
aws_access_key_id = credentials["Credentials"]['AccessKeyId']
aws_secret_access_key = credentials["Credentials"]['SecretAccessKey']
aws_session_token = credentials["Credentials"]["SessionToken"]
os.environ["AWS_ACCESS_KEY_ID"] = aws_access_key_id
os.environ["AWS_SECRET_ACCESS_KEY"] = aws_secret_access_key
os.environ["AWS_DEFAULT_REGION"] = 'eu-central-1'
os.environ["AWS_SESSION_TOKEN"] = aws_session_token
broker = "sqs://"
backend = 'redis://redis-service:6379/0'
celery = Celery('tasks', broker=broker, backend=backend)
celery.conf["task_default_queue"] = 'my-queue'
celery.conf["broker_transport_options"] = {
'region': 'eu-central-1',
'predefined_queues': {
'my-queue': {
'url': 'https://sqs.eu-central-1.amazonaws.com/xxxxxxx/my-queue'
}
}
}
In the same file I have the following task:
#celery.task(name='my-queue.my_task')
def my_task(content) -> int:
print("hello")
return 0
When I execute the following code I get an error:
[2020-09-24 10:38:03,602: CRITICAL/MainProcess] Unrecoverable error: ClientError('An error occurred (AccessDenied) when calling the ListQueues operation: Access to the resource https://eu-central-1.queue.amazonaws.com/ is denied.',)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 921, in create_channel
return self._avail_channels.pop()
IndexError: pop from empty list
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/celery/worker/worker.py", line 208, in start
self.blueprint.start(self)
File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 369, in start
return self.obj.start()
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 318, in start
blueprint.start(self)
File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/connection.py", line 23, in start
c.connection = c.connect()
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 405, in connect
conn = self.connection_for_read(heartbeat=self.amqheartbeat)
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 412, in connection_for_read
self.app.connection_for_read(heartbeat=heartbeat))
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 439, in ensure_connected
callback=maybe_shutdown,
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 422, in ensure_connection
callback, timeout=timeout)
File "/usr/local/lib/python3.6/site-packages/kombu/utils/functional.py", line 341, in retry_over_time
return fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 275, in connect
return self.connection
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 823, in connection
self._connection = self._establish_connection()
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 778, in _establish_connection
conn = self.transport.establish_connection()
File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 941, in establish_connection
self._avail_channels.append(self.create_channel(self))
File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 923, in create_channel
channel = self.Channel(connection)
File "/usr/local/lib/python3.6/site-packages/kombu/transport/SQS.py", line 100, in __init__
self._update_queue_cache(self.queue_name_prefix)
File "/usr/local/lib/python3.6/site-packages/kombu/transport/SQS.py", line 105, in _update_queue_cache
resp = self.sqs.list_queues(QueueNamePrefix=queue_name_prefix)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 337, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 656, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListQueues operation: Access to the resource https://eu-central-1.queue.amazonaws.com/ is denied.
If I use boto3 directly without Celery, I'm able to connect to the queue and retrieve data without this error. I don't know why Celery/Kombu try to list queues when I specify the predefined_queues configuration, tha is used to avoid these behavior (from docs):
If you want Celery to use a set of predefined queues in AWS, and to never attempt to list SQS queues, nor attempt to create or delete them, pass a map of queue names to URLs using the predefined_queue_urls setting
Source here
Anyone know what happens? How I should modify my code in order to make it work?. Seems that Celery is not using the credentials at all.
The versions I'm using:
celery==4.4.7
boto3==1.14.54
kombu==4.5.0
Thanks!
PS: I created and issue in Github to track if this can be a library error or not...

I solved the problem updating dependencies to the latest versions:
celery==5.0.0
boto3==1.14.54
kombu==5.0.2
pycurl==7.43.0.6

I was able to get celery==4.4.7 and kombu==4.6.11 working by setting the following configuration option:
celery.conf["task_create_missing_queues"] = False

Timeout context manager error using sanic and telepot same time

I'm trying to create a web app that communicates with Telegram. And trying to use Sanic web framework with Telepot. Both are asyncio based. Now I'm getting a very weird error.
This is my code:
import datetime
import telepot.aio
from sanic import Sanic
app = Sanic(__name__, load_env=False)
app.config.LOGO = ''
#app.listener('before_server_start')
async def server_init(app, loop):
app.bot = telepot.aio.Bot('anything', loop=loop)
# here we fall
await app.bot.sendMessage(
"#test",
"Wao! {}".format(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),)
)
if __name__ == "__main__":
app.run(
debug=True
)
The error that I'm getting is:
[2018-01-18 22:41:43 +0200] [10996] [ERROR] Experienced exception while trying to serve
Traceback (most recent call last):
File "/home/mk/Dev/project/venv/lib/python3.6/site-packages/sanic/app.py", line 646, in run
serve(**server_settings)
File "/home/mk/Dev/project/venv/lib/python3.6/site-packages/sanic/server.py", line 588, in serve
trigger_events(before_start, loop)
File "/home/mk/Dev/project/venv/lib/python3.6/site-packages/sanic/server.py", line 496, in trigger_events
loop.run_until_complete(result)
File "uvloop/loop.pyx", line 1364, in uvloop.loop.Loop.run_until_complete
File "/home/mk/Dev/project/sanic-telepot.py", line 14, in server_init
"Wao! {}".format(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),)
File "/usr/lib/python3.6/asyncio/coroutines.py", line 109, in __next__
return self.gen.send(None)
File "/home/mk/Dev/project/venv/lib/python3.6/site-packages/telepot/aio/__init__.py", line 100, in sendMessage
return await self._api_request('sendMessage', _rectify(p))
File "/usr/lib/python3.6/asyncio/coroutines.py", line 109, in __next__
return self.gen.send(None)
File "/home/mk/Dev/project/venv/lib/python3.6/site-packages/telepot/aio/__init__.py", line 78, in _api_request
return await api.request((self._token, method, params, files), **kwargs)
File "/usr/lib/python3.6/asyncio/coroutines.py", line 109, in __next__
return self.gen.send(None)
File "/home/mk/Dev/project/venv/lib/python3.6/site-packages/telepot/aio/api.py", line 139, in request
async with fn(*args, **kwargs) as r:
File "/home/mk/Dev/project/venv/lib/python3.6/site-packages/aiohttp/client.py", line 690, in __aenter__
self._resp = yield from self._coro
File "/home/mk/Dev/project/venv/lib/python3.6/site-packages/aiohttp/client.py", line 221, in _request
with timer:
File "/home/mk/Dev/project/venv/lib/python3.6/site-packages/aiohttp/helpers.py", line 712, in __enter__
raise RuntimeError('Timeout context manager should be used '
RuntimeError: Timeout context manager should be used inside a task
Telepot inside uses aiohttp as dependency and for the HTTP calls. And the very similar code is working if I make very similar functionality with just aiohttp.web.
I'm not sure to what project this problem is related. Also, all other dependencies like redis, database connections that I connected the same approach are working perfectly.
Any suggestions how to fix it?

Error 111 connection refused (Python, celery, redis)

I tried to get all the active/scheduled/reserved tasks in redis:
from celery.task.control import inspect
inspect_obj = inspect()
inspect_obj.active()
inspect_obj.scheduled()
inspect_obj.reserved()
But was greeted with a list of errors as follows:
My virtual environment ==> HubblerAPI.
Iam using this from the ec2 console
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/celery/app/control.py", line 81, in active
return self._request('dump_active', safe=safe)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/celery/app/control.py", line 71, in _request
timeout=self.timeout, reply=True,
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/celery/app/control.py", line 316, in broadcast
limit, callback, channel=channel,
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/pidbox.py", line 283, in _broadcast
chan = channel or self.connection.default_channel
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/connection.py", line 771, in default_channel
self.connection
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/connection.py", line 756, in connection
self._connection = self._establish_connection()
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/connection.py", line 711, in _establish_connection
conn = self.transport.establish_connection()
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/transport/pyamqp.py", line 116, in establish_connection
conn = self.Connection(**opts)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/connection.py", line 165, in __init__
self.transport = self.Transport(host, connect_timeout, ssl)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/connection.py", line 186, in Transport
return create_transport(host, connect_timeout, ssl)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/transport.py", line 299, in create_transport
return TCPTransport(host, connect_timeout)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/transport.py", line 95, in __init__
raise socket.error(last_err)
**OSError: [Errno 111] Connection refused**
My celery config file is as follows:
BROKER_TRANSPORT = 'redis'
BROKER_TRANSPORT_OPTIONS = {
'queue_name_prefix': 'dev-',
'wait_time_seconds': 10,
'polling_interval': 30,
# The polling interval decides the number of seconds to sleep
between unsuccessful polls
'visibility_timeout': 3600 * 5,
# If a task is not acknowledged within the visibility_timeout, the
task will be redelivered to another worker and executed.
}
CELERY_MESSAGES_DB = 6
BROKER_URL = "redis://%s:%s/%s" % (AWS_REDIS_ENDPOINT, AWS_REDIS_PORT,
CELERY_MESSAGES_DB)
What am i doing wrong here as the error log suggests that its not using the redis broker.

Looks like your python code doesn't recognize your configs since it is attempting to use RabbitMQ's ampq protocol instead of the configured broker.
I suggest the following
https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html
Your configs look similar to Django configs for Celery yet it doesn't seem you are using Celery with Django.
https://docs.celeryq.dev/en/latest/django/first-steps-with-django.html

The issue is using "BROKER_URL" instead of "CELERY_BROKER_URL" in settings.py. Celery wasn't finding the URL and was defaulting to the rabbitmq port instead of the redis port.

Celery upgrade (3.1->4.1) - Connection reset by peer

We are working with celery at the last year, with ~15 workers, each one defined with concurrency between 1-4.
Recently we upgraded our celery from v3.1 to v4.1
Now we are having the following errors in each one of the workers logs, any ideas what can cause to such error?
2017-08-21 18:33:19,780 94794 ERROR Control command error: error(104, 'Connection reset by peer') [file: pidbox.py, line: 46]
Traceback (most recent call last):
File "/srv/dy/venv/lib/python2.7/site-packages/celery/worker/pidbox.py", line 42, in on_message
self.node.handle_message(body, message)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 129, in handle_message
return self.dispatch(**body)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 112, in dispatch
ticket=ticket)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 135, in reply
serializer=self.mailbox.serializer)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 265, in _publish_reply
**opts
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 181, in publish
exchange_name, declare,
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 203, in _publish
mandatory=mandatory, immediate=immediate,
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/channel.py", line 1748, in _basic_publish
(0, exchange, routing_key, mandatory, immediate), msg
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 64, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/method_framing.py", line 178, in write_frame
write(view[:offset])
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/transport.py", line 272, in write
self._write(s)
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 104] Connection reset by peer
BTW: our tasks in the form:
#app.task(name='EXAMPLE_TASK'],
bind=True,
base=ConnectionHolderTask)
def example_task(self, arg1, arg2, **kwargs):
# task code

We are also having massive issues with celery... I spend 20% of my time just dancing around weird idle-hang/crash issues with our workers sigh
We had a similar case that was caused by a high concurrency combined with a high worker_prefetch_multiplier, as it turns out fetching thousands of tasks is a good way to frack the connection.
If that's not the case: try to disable the broker pool by setting broker_pool_limit to None.
Just some quick ideas that might (hopefully) help :-)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

celery task raise an error while redis backend connection socket timeout - python

There's a warning in the docs of AsyncResult.get saying that calling it it within an async task can cause a deadlock, which may be what's happening here, though it's hard to tell without more context of where these things are being called.

Related

How to add retry for celery backend connection?

Error trying to connect Celery through SQS using STS

Timeout context manager error using sanic and telepot same time

Error 111 connection refused (Python, celery, redis)

Celery upgrade (3.1->4.1) - Connection reset by peer

Categories

Resources