How to add retry for celery backend connection? - python

I am using celery 5.0.1 and using CELERY_BACKEND_URL as redis://:password#redisinstance1:6379/0. It works fine, but when there is a Redis instance loose connection, it breaks out tasks with an error.
Exception: Error while reading from socket: (104, 'Connection reset by peer')
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 198, in _read_from_socket
data = recv(self._sock, socket_read_size)
File "/usr/local/lib/python3.7/dist-packages/redis/_compat.py", line 72, in recv
return sock.recv(*args, **kwargs)
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/celery/app/trace.py", line 477, in trace_task
uuid, retval, task_request, publish_result,
File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 154, in mark_as_done
self.store_result(task_id, result, state, request=request)
File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 439, in store_result
request=request, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 855, in _store_result
current_meta = self._get_task_meta_for(task_id)
File "/usr/local/lib/python3.7/dist-packages/celery/backends/base.py", line 873, in _get_task_meta_for
meta = self.get(self.get_key_for_task(task_id))
File "/usr/local/lib/python3.7/dist-packages/celery/backends/redis.py", line 346, in get
return self.client.get(key)
File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 1606, in get
return self.execute_command('GET', name)
File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 901, in execute_command
return self.parse_response(conn, command_name, **options)
File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 915, in parse_response
response = connection.read_response()
File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 739, in read_response
response = self._parser.read_response()
File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 324, in read_response
raw = self._buffer.readline()
File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 256, in readline
self._read_from_socket()
File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 223, in _read_from_socket
(ex.args,))
redis.exceptions.ConnectionError: Error while reading from socket: (104, 'Connection reset by peer')
Celery worker: None
Celery task id: 244b56af-7c96-56cf-a01a-9256cfd98ade
Celery retry attempt: 0
Task args: []
Task kwargs: {'address': 'ipadd', 'uid': 'uid', 'hexID': 'hexID', 'taskID': '244b56af-7c96-56cf-a01a-9256cfd98ade'}
When I run the second tasks, it works fine, there is some glitch in the connection for a short period of time.
Can I set something by which, when celery tries to update the results to Redis, if it returns an error, it will retry after 2-5 seconds?
I know how to set retry in the task, but this does not task failure. My tasks work fine and it returns the data, but celery is losing connection while updating to the backend.

To deal with connection timeouts you can have the following in your Celery configuration:
app.conf.broker_transport_options = {
'retry_policy': {
'timeout': 5.0
}
}
app.conf.result_backend_transport_options = {
'retry_policy': {
'timeout': 5.0
}
}
There are few other Redis backend settings that you may want to consider having in your configuration, like the redis_retry_on_timeout for an example.

Related

Why is the connection between a python motor client and a mongoDB being dropped for about 50% of queries?

I'm seeing odd behavior in my application
I have a graphQL python application that lives in a docker container and is using FastAPI and Uvicorn. It connects to a MongoDB in another docker container using Motor. This is run in a Kubernetes instance.
About 50% of the time, but with no real pattern, I won't get results back. The other 50% of the time it works fine.
Based on the mongodb logs it looks like the python application is closing the connection before the response can be sent back. This is not a long running query (< 10ms). Additionally I've tried to set the various timeouts in Motor/PyMongo very high (~3600s) but still get the same behavior.
Client Initialization:
try:
# kwargs contains username, password, connectTimeoutMS, socketTimeoutMS, serverSelectionTimeoutMS
return AsyncIOMotorClient(host, port, appname=appname, connect=True, **kwargs)
except:
#log it
The Query Code:
def callQuery(vendor, name):
match_query = {
"$match": {
"_id.product_vendor": vendor,
"_id.product_name": name,
}
}
add_fields = {
"$addFields": {
"product_name": "$_id.product_name",
"product_vendor": "$_id.product_vendor",
}
}
project = {"$project": {"_id": 0}}
cursor = collection.aggregate([match_query, add_fields, project])
products = await cursor.to_list(length=None) # line 173 in product_lookup to compare to bt below
def submit_aggregation(pipeline: List[dict], collection, **kwargs):
return collection.aggregate(pipeline, **kwargs)
MongoDB logs:
{"t":{"$date":"2021-11-23T20:59:25.690+00:00"},"s":"D3", "c":"NETWORK", "id":22934, "ctx":"conn5","msg":"Starting server-side compression negotiation"}
{"t":{"$date":"2021-11-23T20:59:25.690+00:00"},"s":"D3", "c":"NETWORK", "id":22935, "ctx":"conn5","msg":"Compression negotiation not requested by client"}
{"t":{"$date":"2021-11-23T20:59:25.690+00:00"},"s":"D3", "c":"FTDC", "id":23905, "ctx":"conn5","msg":"Using exhaust for isMaster or hello protocol"}
{"t":{"$date":"2021-11-23T20:59:25.690+00:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn5","msg":"Slow query","attr":{"type":"command","ns":"admin.$cmd","appName":"mappingapi","command":{"hello":1,"topologyVersion":{"processId":{"$oid":"619d4e56e9dc669ff6ade24b"},"counter":0},"maxAwaitTimeMS":10000,"$db":"admin","$readPreference":{"mode":"primary"}},"numYields":0,"reslen":313,"locks":{},"remote":"127.0.0.1:41430","protocol":"op_msg","durationMillis":0}}
{"t":{"$date":"2021-11-23T20:59:25.690+00:00"},"s":"D2", "c":"QUERY", "id":22783, "ctx":"conn5","msg":"Received interrupt request for unknown op","attr":{"opId":25787,"knownOps":[]}}
{"t":{"$date":"2021-11-23T20:59:25.690+00:00"},"s":"D3", "c":"-", "id":5127803, "ctx":"conn5","msg":"Released the Client","attr":{"client":"conn5"}}
{"t":{"$date":"2021-11-23T20:59:25.690+00:00"},"s":"D3", "c":"-", "id":5127801, "ctx":"conn5","msg":"Setting the Client","attr":{"client":"conn5"}}
{"t":{"$date":"2021-11-23T20:59:25.690+00:00"},"s":"D2", "c":"COMMAND", "id":21965, "ctx":"conn5","msg":"About to run the command","attr":{"db":"admin","commandArgs":{"hello":1,"topologyVersion":{"processId":{"$oid":"619d4e56e9dc669ff6ade24b"},"counter":0},"maxAwaitTimeMS":10000,"$db":"admin","$readPreference":{"mode":"primary"}}}}
{"t":{"$date":"2021-11-23T20:59:25.690+00:00"},"s":"D3", "c":"FTDC", "id":23904, "ctx":"conn5","msg":"Using maxAwaitTimeMS for awaitable hello protocol","attr":{"maxAwaitTimeMS":10000}}
Backtrace of the disconnect from the python client:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/promise/promise.py", line 844, in handle_future_result
resolve(future.result())
File "./mappingdb/api.py", line 247, in resolve_productdbProducts
client=get_pdb_client(),
File "/usr/local/lib/python3.7/site-packages/productdata/_utils.py", line 42, in func_future
return await func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/productdata/query.py", line 173, in product_lookup
products = await cursor.to_list(length=None)
File "/usr/local/lib/python3.7/site-packages/motor/core.py", line 1372, in _to_list
result = get_more_result.result()
File "/usr/local/lib/python3.7/site-packages/motor/core.py", line 1611, in _on_started
pymongo_cursor = future.result()
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.7/site-packages/pymongo/collection.py", line 2507, in aggregate
**kwargs)
File "/usr/local/lib/python3.7/site-packages/pymongo/collection.py", line 2421, in _aggregate
retryable=not cmd._performs_write)
File "/usr/local/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1525, in _retryable_read
return func(session, server, sock_info, secondary_ok)
File "/usr/local/lib/python3.7/site-packages/pymongo/aggregation.py", line 149, in get_cursor
user_fields=self._user_fields)
File "/usr/local/lib/python3.7/site-packages/pymongo/pool.py", line 726, in command
self._raise_connection_failure(error)
File "/usr/local/lib/python3.7/site-packages/pymongo/pool.py", line 929, in _raise_connection_failure
_raise_connection_failure(self.address, error)
File "/usr/local/lib/python3.7/site-packages/pymongo/pool.py", line 238, in _raise_connection_failure
raise NetworkTimeout(msg)
graphql.error.located_error.GraphQLLocatedError: productdb:27017: timed out

Error trying to connect Celery through SQS using STS

I'm trying to use Celery with SQS as broker. In order to use the SQS from my container I need to assume a role and for that I'm using STS. My code looks like this:
role_info = {
'RoleArn': 'arn:aws:iam::xxxxxxx:role/my-role-execution',
'RoleSessionName': 'roleExecution'
}
sts_client = boto3.client('sts', region_name='eu-central-1')
credentials = sts_client.assume_role(**role_info)
aws_access_key_id = credentials["Credentials"]['AccessKeyId']
aws_secret_access_key = credentials["Credentials"]['SecretAccessKey']
aws_session_token = credentials["Credentials"]["SessionToken"]
os.environ["AWS_ACCESS_KEY_ID"] = aws_access_key_id
os.environ["AWS_SECRET_ACCESS_KEY"] = aws_secret_access_key
os.environ["AWS_DEFAULT_REGION"] = 'eu-central-1'
os.environ["AWS_SESSION_TOKEN"] = aws_session_token
broker = "sqs://"
backend = 'redis://redis-service:6379/0'
celery = Celery('tasks', broker=broker, backend=backend)
celery.conf["task_default_queue"] = 'my-queue'
celery.conf["broker_transport_options"] = {
'region': 'eu-central-1',
'predefined_queues': {
'my-queue': {
'url': 'https://sqs.eu-central-1.amazonaws.com/xxxxxxx/my-queue'
}
}
}
In the same file I have the following task:
#celery.task(name='my-queue.my_task')
def my_task(content) -> int:
print("hello")
return 0
When I execute the following code I get an error:
[2020-09-24 10:38:03,602: CRITICAL/MainProcess] Unrecoverable error: ClientError('An error occurred (AccessDenied) when calling the ListQueues operation: Access to the resource https://eu-central-1.queue.amazonaws.com/ is denied.',)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 921, in create_channel
return self._avail_channels.pop()
IndexError: pop from empty list
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/celery/worker/worker.py", line 208, in start
self.blueprint.start(self)
File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 369, in start
return self.obj.start()
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 318, in start
blueprint.start(self)
File "/usr/local/lib/python3.6/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/connection.py", line 23, in start
c.connection = c.connect()
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 405, in connect
conn = self.connection_for_read(heartbeat=self.amqheartbeat)
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 412, in connection_for_read
self.app.connection_for_read(heartbeat=heartbeat))
File "/usr/local/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 439, in ensure_connected
callback=maybe_shutdown,
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 422, in ensure_connection
callback, timeout=timeout)
File "/usr/local/lib/python3.6/site-packages/kombu/utils/functional.py", line 341, in retry_over_time
return fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 275, in connect
return self.connection
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 823, in connection
self._connection = self._establish_connection()
File "/usr/local/lib/python3.6/site-packages/kombu/connection.py", line 778, in _establish_connection
conn = self.transport.establish_connection()
File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 941, in establish_connection
self._avail_channels.append(self.create_channel(self))
File "/usr/local/lib/python3.6/site-packages/kombu/transport/virtual/base.py", line 923, in create_channel
channel = self.Channel(connection)
File "/usr/local/lib/python3.6/site-packages/kombu/transport/SQS.py", line 100, in __init__
self._update_queue_cache(self.queue_name_prefix)
File "/usr/local/lib/python3.6/site-packages/kombu/transport/SQS.py", line 105, in _update_queue_cache
resp = self.sqs.list_queues(QueueNamePrefix=queue_name_prefix)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 337, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 656, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListQueues operation: Access to the resource https://eu-central-1.queue.amazonaws.com/ is denied.
If I use boto3 directly without Celery, I'm able to connect to the queue and retrieve data without this error. I don't know why Celery/Kombu try to list queues when I specify the predefined_queues configuration, tha is used to avoid these behavior (from docs):
If you want Celery to use a set of predefined queues in AWS, and to never attempt to list SQS queues, nor attempt to create or delete them, pass a map of queue names to URLs using the predefined_queue_urls setting
Source here
Anyone know what happens? How I should modify my code in order to make it work?. Seems that Celery is not using the credentials at all.
The versions I'm using:
celery==4.4.7
boto3==1.14.54
kombu==4.5.0
Thanks!
PS: I created and issue in Github to track if this can be a library error or not...
I solved the problem updating dependencies to the latest versions:
celery==5.0.0
boto3==1.14.54
kombu==5.0.2
pycurl==7.43.0.6
I was able to get celery==4.4.7 and kombu==4.6.11 working by setting the following configuration option:
celery.conf["task_create_missing_queues"] = False

How to handle "Redis.exceptions.ConnectionError: Connection has data"

I receive following output:
Traceback (most recent call last):
File "/home/ec2-user/env/lib64/python3.7/site-packages/redis/connection.py", line 1192, in get_connection
raise ConnectionError('Connection has data')
redis.exceptions.ConnectionError: Connection has data
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ec2-user/env/lib64/python3.7/site-packages/eventlet/hubs/hub.py", line 457, in fire_timers
timer()
File "/home/ec2-user/env/lib64/python3.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
cb(*args, **kw)
File "/home/ec2-user/env/lib64/python3.7/site-packages/eventlet/greenthread.py", line 214, in main
result = function(*args, **kwargs)
File "crawler.py", line 53, in fetch_listing
url = dequeue_url()
File "/home/ec2-user/WebCrawler/helpers.py", line 109, in dequeue_url
return redis.spop("listing_url_queue")
File "/home/ec2-user/env/lib64/python3.7/site-packages/redis/client.py", line 2255, in spop
return self.execute_command('SPOP', name, *args)
File "/home/ec2-user/env/lib64/python3.7/site-packages/redis/client.py", line 875, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "/home/ec2-user/env/lib64/python3.7/site-packages/redis/connection.py", line 1197, in get_connection
raise ConnectionError('Connection not ready')
redis.exceptions.ConnectionError: Connection not ready
I couldn't find any issue related to this particular error. I emptied/flushed all redis databases, so there should be no data there. I assume it has something to do with eventlet and patching. But even when I put following code right at the beginning of the file, the error appears.
import eventlet
eventlet.monkey_path()
What does this error mean?
Finally, I came up with the answer to my problem.
When connecting to redis with python, I specified the database with the number 0.
redis = redis.Redis(host=example.com, port=6379, db=0)
After changing the dabase to number 1 it worked.
redis = redis.Redis(host=example.com, port=6379, db=1)
Another way is to set protected_mode to no in etc\redis\redis.conf. Recommended when running redis locally.

celery task raise an error while redis backend connection socket timeout

I am using celery to run task one by one with redis broker, but when i run 2 task then after completing the first redis given an timeout socket error for second task so that second task would be failed.
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/result.py", line 194, in get
on_message=on_message,
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/backends/async.py", line 189, in wait_for_pending
for _ in self._wait_for_pending(result, **kwargs):
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/backends/async.py", line 256, in _wait_for_pending
on_interval=on_interval):
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/backends/async.py", line 57, in drain_events_until
yield self.wait_for(p, wait, timeout=1)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/backends/async.py", line 66, in wait_for
wait(timeout=timeout)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/celery/backends/redis.py", line 69, in drain_events
m = self._pubsub.get_message(timeout=timeout)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/client.py", line 2513, in get_message
response = self.parse_response(block=False, timeout=timeout)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/client.py", line 2430, in parse_response
return self._execute(connection, connection.read_response)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/client.py", line 2408, in _execute
return command(*args)
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/connection.py", line 624, in read_response
response = self._parser.read_response()
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/connection.py", line 284, in read_response
response = self._buffer.readline()
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/connection.py", line 216, in readline
self._read_from_socket()
File "/home/ubuntu/.virtualenvs/aide_venv/local/lib/python2.7/site-packages/redis/connection.py", line 187, in _read_from_socket
raise TimeoutError("Timeout reading from socket")
TimeoutError: Timeout reading from socket
I am running celery by using this command:
celery -A flask_application.celery worker --loglevel=info --max-tasks-per-child=1 --concurrency=1
I am calling celery task by using: .delay() function
celery_response = run_algo.run_pipeline.delay(request.get_json())
Getting output by using: .get() function
output_file_path = celery_response.get()
There's a warning in the docs of AsyncResult.get saying that calling it it within an async task can cause a deadlock, which may be what's happening here, though it's hard to tell without more context of where these things are being called.

Error 111 connection refused (Python, celery, redis)

I tried to get all the active/scheduled/reserved tasks in redis:
from celery.task.control import inspect
inspect_obj = inspect()
inspect_obj.active()
inspect_obj.scheduled()
inspect_obj.reserved()
But was greeted with a list of errors as follows:
My virtual environment ==> HubblerAPI.
Iam using this from the ec2 console
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/celery/app/control.py", line 81, in active
return self._request('dump_active', safe=safe)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/celery/app/control.py", line 71, in _request
timeout=self.timeout, reply=True,
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/celery/app/control.py", line 316, in broadcast
limit, callback, channel=channel,
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/pidbox.py", line 283, in _broadcast
chan = channel or self.connection.default_channel
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/connection.py", line 771, in default_channel
self.connection
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/connection.py", line 756, in connection
self._connection = self._establish_connection()
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/connection.py", line 711, in _establish_connection
conn = self.transport.establish_connection()
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/transport/pyamqp.py", line 116, in establish_connection
conn = self.Connection(**opts)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/connection.py", line 165, in __init__
self.transport = self.Transport(host, connect_timeout, ssl)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/connection.py", line 186, in Transport
return create_transport(host, connect_timeout, ssl)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/transport.py", line 299, in create_transport
return TCPTransport(host, connect_timeout)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/transport.py", line 95, in __init__
raise socket.error(last_err)
**OSError: [Errno 111] Connection refused**
My celery config file is as follows:
BROKER_TRANSPORT = 'redis'
BROKER_TRANSPORT_OPTIONS = {
'queue_name_prefix': 'dev-',
'wait_time_seconds': 10,
'polling_interval': 30,
# The polling interval decides the number of seconds to sleep
between unsuccessful polls
'visibility_timeout': 3600 * 5,
# If a task is not acknowledged within the visibility_timeout, the
task will be redelivered to another worker and executed.
}
CELERY_MESSAGES_DB = 6
BROKER_URL = "redis://%s:%s/%s" % (AWS_REDIS_ENDPOINT, AWS_REDIS_PORT,
CELERY_MESSAGES_DB)
What am i doing wrong here as the error log suggests that its not using the redis broker.
Looks like your python code doesn't recognize your configs since it is attempting to use RabbitMQ's ampq protocol instead of the configured broker.
I suggest the following
https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html
Your configs look similar to Django configs for Celery yet it doesn't seem you are using Celery with Django.
https://docs.celeryq.dev/en/latest/django/first-steps-with-django.html
The issue is using "BROKER_URL" instead of "CELERY_BROKER_URL" in settings.py. Celery wasn't finding the URL and was defaulting to the rabbitmq port instead of the redis port.

Categories

Resources