When my celery service is running after 7-10 days I received this exception out of nowhere, this causes my Tasks not to be processed. A restart of celery fixes the problem.
INTERNAL ERROR: RuntimeError('Acquire on closed pool',)
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 253, in trace_task
I, R, state, retval = on_error(task_request, exc, uuid)
File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 201, in on_error
R = I.handle_error_state(task, eager=eager)
File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 85, in handle_error_state
}[self.state](task, store_errors=store_errors)
File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 118, in handle_failure
req.id, exc, einfo.traceback, request=req,
File "/usr/lib/python2.7/dist-packages/celery/backends/base.py", line 121, in mark_as_failure
traceback=traceback, request=request)
File "/usr/lib/python2.7/dist-packages/celery/backends/amqp.py", line 124, in store_result
with self.app.amqp.producer_pool.acquire(block=True) as producer:
File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 868, in acquire
R = self.prepare(R)
File "/usr/lib/python2.7/dist-packages/kombu/pools.py", line 63, in prepare
conn = self._acquire_connection()
File "/usr/lib/python2.7/dist-packages/kombu/pools.py", line 38, in _acquire_connection
return self.connections.acquire(block=True)
File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 859, in acquire
raise RuntimeError('Acquire on closed pool')
RuntimeError: Acquire on closed pool
Software versions
software -> celery:3.1.20 (Cipater) kombu:3.0.35 py:2.7.6
billiard:3.3.0.22 py-amqp:1.4.9
platform -> system:Linux arch:64bit, ELF imp:CPython
loader -> celery.loaders.default.Loader
settings -> transport:amqp results:amqp
CELERY_ACCEPT_CONTENT: ['json', 'pickle', 'yaml']
CELERY_ENABLE_UTC: True
CELERY_IGNORE_RESULT: False
CELERY_IMPORTS:
('catalogue.app.voice.cluster.deploy_cluster',
'catalogue.app.common.install_uc',
'hypervisor.app.deploy_esx',
'hypervisor.app.vm_operations',
'tools.deploy_tools')
CELERYD_CHDIR: '/usr/local/src/imbue/application/app'
CELERY_TASK_RESULT_EXPIRES: 18000
CELERY_RESULT_PERSISTENT: True
CELERY_TIMEZONE: 'US/Eastern'
BROKER_URL: 'amqp://******:********#rabbitmq:5672//'
CELERY_RESULT_BACKEND: 'amqp'
Only workaround now is to restart.
Ubuntu 14.04 2 GB RAM/2 CPU/40 GB HDD
This looks like a bug in celery. Asksol fixed this few days back.
You can install celery from source code and try it. If it is still causing problems, please create new issue on github.
Related
I am trying to use Azure Service Bus as the broker for my celery app.
I have patched the solution by referring to various sources.
The goal is to use Azure Service Bus as the broker and PostgresSQL as the backend.
I created an Azure Service Bus and copied the credentials for the RootManageSharedAccessKey to the celery app.
Following is the task.py
from time import sleep
from celery import Celery
from kombu.utils.url import safequote
SAS_policy = safequote("RootManageSharedAccessKey") #SAS Policy
SAS_key = safequote("1234222zUY28tRUtp+A2YoHmDYcABCD") #Primary key from the previous SS
namespace = safequote("bluenode-dev")
app = Celery('tasks', backend='db+postgresql://afsan.gujarati:admin#localhost/local_dev',
broker=f'azureservicebus://{SAS_policy}:{SAS_key}=#{namespace}')
#app.task
def divide(x, y):
sleep(30)
return x/y
When I try to run the Celery app using the following command:
celery -A tasks worker --loglevel=INFO
I get the following error
[2020-10-09 14:00:32,035: CRITICAL/MainProcess] Unrecoverable error: AzureHttpError('Unauthorized\n<Error><Code>401</Code><Detail>claim is empty or token is invalid. TrackingId:295f7c76-770e-40cc-8489-e0eb56248b09_G5S1, SystemTracker:bluenode-dev.servicebus.windows.net:$Resources/Queues, Timestamp:2020-10-09T20:00:31</Detail></Error>')
Traceback (most recent call last):
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/kombu/transport/virtual/base.py", line 918, in create_channel
return self._avail_channels.pop()
IndexError: pop from empty list
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/azure/servicebus/control_client/servicebusservice.py", line 1225, in _perform_request
resp = self._filter(request)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/azure/servicebus/control_client/_http/httpclient.py", line 211, in perform_request
raise HTTPError(status, message, respheaders, respbody)
azure.servicebus.control_client._http.HTTPError: Unauthorized
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/celery/bootsteps.py", line 365, in start
return self.obj.start()
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 311, in start
blueprint.start(self)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/celery/worker/consumer/connection.py", line 21, in start
c.connection = c.connect()
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 398, in connect
conn = self.connection_for_read(heartbeat=self.amqheartbeat)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 404, in connection_for_read
return self.ensure_connected(
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/celery/worker/consumer/consumer.py", line 430, in ensure_connected
conn = conn.ensure_connection(
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/kombu/connection.py", line 383, in ensure_connection
self._ensure_connection(*args, **kwargs)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/kombu/connection.py", line 435, in _ensure_connection
return retry_over_time(
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/kombu/utils/functional.py", line 325, in retry_over_time
return fun(*args, **kwargs)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/kombu/connection.py", line 866, in _connection_factory
self._connection = self._establish_connection()
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/kombu/connection.py", line 801, in _establish_connection
conn = self.transport.establish_connection()
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/kombu/transport/virtual/base.py", line 938, in establish_connection
self._avail_channels.append(self.create_channel(self))
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/kombu/transport/virtual/base.py", line 920, in create_channel
channel = self.Channel(connection)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/kombu/transport/azureservicebus.py", line 64, in __init__
for queue in self.queue_service.list_queues():
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/azure/servicebus/control_client/servicebusservice.py", line 313, in list_queues
response = self._perform_request(request)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/azure/servicebus/control_client/servicebusservice.py", line 1227, in _perform_request
return _service_bus_error_handler(ex)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/azure/servicebus/control_client/_serialization.py", line 569, in _service_bus_error_handler
return _general_error_handler(http_error)
File "/Users/afsan.gujarati/.pyenv/versions/3.8.1/envs/celery-servicebus/lib/python3.8/site-packages/azure/servicebus/control_client/_common_error.py", line 41, in _general_error_handler
raise AzureHttpError(message, http_error.status)
azure.common.AzureHttpError: Unauthorized
<Error><Code>401</Code><Detail>claim is empty or token is invalid. TrackingId:295f7c76-770e-40cc-8489-e0eb56248b09_G5S1, SystemTracker:bluenode-dev.servicebus.windows.net:$Resources/Queues, Timestamp:2020-10-09T20:00:31</Detail></Error>
I don't see a straight solution for this anywhere. What am I missing?
P.S. I did not create the Queue in Azure Service Bus. I am assuming that celery would create the Queue by itself when the celery app is executed.
P.S.S. I also tried to use the exact same credentials in Python's Service Bus Client and it seemed to work. It feels like a Celery issue, but I am not able to figure out exactly what.
If you want to use Azure Service Bus Transport to connect Azure service bus, the URL should be azureservicebus://{SAS policy name}:{SAS key}#{Service Bus Namespace}.
For example
Get Shared access policies RootManageSharedAccessKey
Code
from celery import Celery
from kombu.utils.url import safequote
SAS_policy = "RootManageSharedAccessKey" # SAS Policy
# Primary key from the previous SS
SAS_key = safequote("X/*****qyY=")
namespace = "bowman1012"
app = Celery('tasks', backend='db+postgresql://<>#localhost/<>',
broker=f'azureservicebus://{SAS_policy}:{SAS_key}#{namespace}')
#app.task
def add(x, y):
return x + y
We are working with celery at the last year, with ~15 workers, each one defined with concurrency between 1-4.
Recently we upgraded our celery from v3.1 to v4.1
Now we are having the following errors in each one of the workers logs, any ideas what can cause to such error?
2017-08-21 18:33:19,780 94794 ERROR Control command error: error(104, 'Connection reset by peer') [file: pidbox.py, line: 46]
Traceback (most recent call last):
File "/srv/dy/venv/lib/python2.7/site-packages/celery/worker/pidbox.py", line 42, in on_message
self.node.handle_message(body, message)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 129, in handle_message
return self.dispatch(**body)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 112, in dispatch
ticket=ticket)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 135, in reply
serializer=self.mailbox.serializer)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 265, in _publish_reply
**opts
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 181, in publish
exchange_name, declare,
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 203, in _publish
mandatory=mandatory, immediate=immediate,
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/channel.py", line 1748, in _basic_publish
(0, exchange, routing_key, mandatory, immediate), msg
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 64, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/method_framing.py", line 178, in write_frame
write(view[:offset])
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/transport.py", line 272, in write
self._write(s)
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 104] Connection reset by peer
BTW: our tasks in the form:
#app.task(name='EXAMPLE_TASK'],
bind=True,
base=ConnectionHolderTask)
def example_task(self, arg1, arg2, **kwargs):
# task code
We are also having massive issues with celery... I spend 20% of my time just dancing around weird idle-hang/crash issues with our workers sigh
We had a similar case that was caused by a high concurrency combined with a high worker_prefetch_multiplier, as it turns out fetching thousands of tasks is a good way to frack the connection.
If that's not the case: try to disable the broker pool by setting broker_pool_limit to None.
Just some quick ideas that might (hopefully) help :-)
I am a newbie in Toil and AWS trying to run HelloWorld.py example in the Toil Document. I have already successfully installed toil and related python packages on my local mac laptop and have setup my account at AWS. I have created a small leader/worker cluster
$ cgcloud create-cluster toil -s 2 -t m3.large
and started it:
$ cgcloud ssh toil-leader
This changed my screen prompt to:
mesosbox#ip-172-31-25-135:~$
Then from an other window on my mac, I started the Toil HellowWorld example with with command:
$ python2.7 HelloWorld.py --batchSystem=mesos --mesosMaster=mesos-master:5050 aws:us-west-2:my-aws-jobstore
And I got the following output:
Apples-Air 2017-06-02 19:30:53,524 MainThread INFO toil.lib.bioio: Root logger is at level 'INFO', 'toil' logger at level 'INFO'.
Apples-Air 2017-06-02 19:30:53,524 MainThread INFO toil.lib.bioio: Root logger is at level 'INFO', 'toil' logger at level 'INFO'.
Apples-Air 2017-06-02 19:30:54,852 MainThread WARNING toil.jobStores.aws.jobStore: Exception during panic
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 209, in initialize
self.destroy()
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 1334, in destroy
self._bind(create=False, block=False)
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 241, in _bind
versioning=True)
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 721, in _bindBucket
bucket = self.s3.get_bucket(bucket_name, validate=True)
File "/usr/local/lib/python2.7/site-packages/boto/s3/connection.py", line 502, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "/usr/local/lib/python2.7/site-packages/boto/s3/connection.py", line 535, in head_bucket
raise err
S3ResponseError: S3ResponseError: 403 Forbidden
Traceback (most recent call last):
File "helloWorld.py", line 22, in <module>
print(Job.Runner.startToil(j, options)) #Prints Hello, world!, ….
File "/usr/local/lib/python2.7/site-packages/toil/job.py", line 740, in startToil
with Toil(options) as toil:
File "/usr/local/lib/python2.7/site-packages/toil/common.py", line 614, in __enter__
jobStore.initialize(config)
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 209, in initialize
self.destroy()
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 206, in initialize
self._bind(create=True)
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 241, in _bind
versioning=True)
File "/usr/local/lib/python2.7/site-packages/toil/jobStores/aws/jobStore.py", line 721, in _bindBucket
bucket = self.s3.get_bucket(bucket_name, validate=True)
File "/usr/local/lib/python2.7/site-packages/boto/s3/connection.py", line 502, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "/usr/local/lib/python2.7/site-packages/boto/s3/connection.py", line 535, in head_bucket
raise err
boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
Please help.
Thanks.
---John
I realize that this answer is a little late. One problem I notice is with the mesosMaster argument.
Instead, your command should have look like
python2.7 HelloWorld.py --batchSystem=mesos --mesosMaster=172.31.25.135:5050 aws:us-west-2:my-aws-jobstore
Notice that I replaces mesos-master with the actual IP address from
mesosbox#ip-172-31-25-135:~$
Hopefully in the future, one will not need to pass this argument at all, however this is not yet implemented as of 26 July 2017.
Also for further problems with Toil you will probably have better luck posting a new issue to the Toil Github page.
I'm following the instructions on https://github.com/jorilallo/celery-flower-heroku to deploy Flower celery monitoring app to Heroku.
After configuring and deploying my app I see the following in heroku logs:
Traceback (most recent call last):
File "/app/.heroku/python/bin/flower", line 9, in <module>
load_entry_point('flower==0.7.0', 'console_scripts', 'flower')()
File "/app/.heroku/python/lib/python2.7/site-packages/flower/__main__.py", line 11, in main
flower.execute_from_commandline()
File "/app/.heroku/python/lib/python2.7/site-packages/celery/bin/base.py", line 306, in execute_from_commandline
return self.handle_argv(self.prog_name, argv[1:])
File "/app/.heroku/python/lib/python2.7/site-packages/flower/command.py", line 99, in handle_argv
return self.run_from_argv(prog_name, argv)
File "/app/.heroku/python/lib/python2.7/site-packages/flower/command.py", line 75, in run_from_argv
**app_settings)
File "/app/.heroku/python/lib/python2.7/site-packages/flower/app.py", line 40, in __init__
max_tasks_in_memory=max_tasks)
File "/app/.heroku/python/lib/python2.7/site-packages/flower/events.py", line 60, in __init__
state = shelve.open(self._db)
File "/app/.heroku/python/lib/python2.7/shelve.py", line 239, in open
return DbfilenameShelf(filename, flag, protocol, writeback)
File "/app/.heroku/python/lib/python2.7/shelve.py", line 223, in __init__
Shelf.__init__(self, anydbm.open(filename, flag), protocol, writeback)
File "/app/.heroku/python/lib/python2.7/anydbm.py", line 85, in open
return mod.open(file, flag, mode)
File "/app/.heroku/python/lib/python2.7/dumbdbm.py", line 250, in open
return _Database(file, mode)
File "/app/.heroku/python/lib/python2.7/dumbdbm.py", line 71, in __init__
f = _open(self._datfile, 'w')
IOError: [Errno 2] No such file or directory: 'postgres://USERNAME:PASSWORD#ec2-HOST.compute-1.amazonaws.com:5432/DBNAME.dat'
Notice the .dat appendix there? No idea where it comes from, its not present int my DATABASE_URL env variable.
Furthermore, the error above is with flower 0.7. I also tried installing 0.6, with which I do get further (namely the DB is correctly recognized and connection established), but I then get the following warnings once flower starts:
2014-06-19T15:14:02.464424+00:00 app[web.1]: [E 140619 15:14:02 state:138] Failed to inspect workers: '[Errno 104] Connection reset by peer', trying again in 128 seconds
2014-06-19T15:14:02.464844+00:00 app[web.1]: [E 140619 15:14:02 events:103] Failed to capture events: '[Errno 104] Connection reset by peer', trying again in 128 seconds.
Loading flower in my browser does show a few tabs of stuff, but there is no data.
How do I resolve these issues?
Flower doesn't support database persistence. It saves the state to file(s) using shelve module.
I recently upgraded to Celery 3.0.1 from 2.3.0 and all the tasks run fine. Unfortunately. I'm getting a "Framing Error" exception pretty frequently. I'm also running supervisor to restart the threads but since these are never really killed supervisor has no way of knowing that celery needs to be restarted. Has anyone seen this before?
2012-07-13 18:53:59,004: ERROR/MainProcess] Unrecoverable error: Exception('Framing Error, received 0x00 while expecting 0xce',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/worker/__init__.py", line 350, in start
component.start()
File "/usr/local/lib/python2.7/dist-packages/celery/worker/consumer.py", line 360, in start
self.consume_messages()
File "/usr/local/lib/python2.7/dist-packages/celery/worker/consumer.py", line 445, in consume_messages
drain_nowait()
File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 175, in drain_nowait
self.drain_events(timeout=0)
File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 171, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/amqplib.py", line 262, in drain_events
return connection.drain_events(**kwargs)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/amqplib.py", line 97, in drain_events
chanmap, None, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/amqplib.py", line 155, in _wait_multiple
channel, method_sig, args, content = read_timeout(timeout)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/amqplib.py", line 129, in read_timeout
return self.method_reader.read_method()
File "/usr/local/lib/python2.7/dist-packages/amqplib/client_0_8/method_framing.py", line 221, in read_method
raise m
Exception: Framing Error, received 0x00 while expecting 0xce
While I am not sure why this actually happens, switching from amqplib to librabbitmq helped me to overcome this trouble.
I haven't changed anything in configuration, just:
pip uninstall amqplib
pip install librabbitmq
And restarted celery workers.
Got this idea form https://github.com/celery/celery/issues/922