Celery shutting down worker from task_success handler not working - python

I'm trying to make a worker run only one task at a time, then shutdown. I've got the shutdown part working correctly (some background here: celery trying shutdown worker by raising SystemExit in task_postrun signal but always hangs and the main process never exits), but when it shuts down, I'm getting an error:
[2013-02-13 12:19:05,689: CRITICAL/MainProcess] Couldn't ack 1, reason:AttributeError("'NoneType' object has no attribute 'method_writer'",)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/kombu/transport/base.py", line 104, in ack_log_error
self.ack()
File "/usr/local/lib/python2.7/site-packages/kombu/transport/base.py", line 99, in ack
self.channel.basic_ack(self.delivery_tag)
File "/usr/local/lib/python2.7/site-packages/amqplib/client_0_8/channel.py", line 1742, in basic_ack
self._send_method((60, 80), args)
File "/usr/local/lib/python2.7/site-packages/amqplib/client_0_8/abstract_channel.py", line 75, in _send_method
self.connection.method_writer.write_method(self.channel_id,
AttributeError: 'NoneType' object has no attribute 'method_writer'
Why is this happening? Not only does it not ack, but it also purges all of the other tasks that are left in the queue (big problem).
How do I fix this?
UPDATE
Below is the stack trace with everything updated (pip install -U kombu amqp amqplib celery):
[2013-02-13 11:58:05,357: CRITICAL/MainProcess] Internal error: AttributeError("'NoneType' object has no attribute 'method_writer'",)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/worker/__init__.py", line 372, in process_task
req.execute_using_pool(self.pool)
File "/usr/local/lib/python2.7/dist-packages/celery/worker/job.py", line 219, in execute_using_pool
timeout=task.time_limit)
File "/usr/local/lib/python2.7/dist-packages/celery/concurrency/base.py", line 137, in apply_async
**options)
File "/usr/local/lib/python2.7/dist-packages/celery/concurrency/base.py", line 27, in apply_target
callback(target(*args, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/celery/worker/job.py", line 333, in on_success
self.acknowledge()
File "/usr/local/lib/python2.7/dist-packages/celery/worker/job.py", line 439, in acknowledge
self.on_ack(logger, self.connection_errors)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/base.py", line 98, in ack_log_error
self.ack()
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/base.py", line 93, in ack
self.channel.basic_ack(self.delivery_tag)
File "/usr/local/lib/python2.7/dist-packages/amqp/channel.py", line 1562, in basic_ack
self._send_method((60, 80), args)
File "/usr/local/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 57, in _send_method
self.connection.method_writer.write_method(
AttributeError: 'NoneType' object has no attribute 'method_writer'

Exiting in task_postrun is not recommended as task_postrun is executed outside of the "task body" error handling.
Exactly what happens when a task calls sys.exit is not well defined,
and actually it depends on the pool being used.
With multiprocessing the child process will simply be replaced by a new one.
In other pools the worker will shutdown, but this is something that is likely to change
so that it's consistent with multiprocessing behavior.
Calling exit outside of the task body is regarded as an internal error (crash).
The "task body" is whatever executes at task.__call__()
I think maybe a better solution for this would be to use a custom execution
strategy:
from celery.worker import strategy
from functools import wraps
#staticmethod
def shutdown_after_strategy(task, app, consumer):
default_handler = strategy.default(task, app, consumer)
def _shutdown_to_exit_after(fun):
#wraps(fun)
def _inner(*args, **kwargs):
try:
return fun(*args, **kwargs)
finally:
raise SystemExit()
return _inner
return _decorate_to_exit_after(default_handler)
#celery.task(Strategy=shutdown_after_strategy)
def shutdown_after():
print('will shutdown after this')
This isn't exactly beautiful, but the execution strategy is there to optimize
task execution and not to be easily extendable (the worker "precompiles" the execution
path for each task type by caching Task.Strategy)
In Celery 3.1 you can extend the worker and consumer using "bootsteps", so likely
there will be a pretty solution then.

Related

Stopping cocotb forked coroutine

I have a coroutine that waiting for a signal rising :
#cocotb.coroutine
def wait_for_rise(self):
yield RisingEdge(self.dut.mysignal)
I'm launching it in my «main» test function like it :
mythread = cocotb.fork(wait_for_rise())
I want to stop it after a while even if no signal rise happen. I tryed to «kill» it:
mythread.kill()
But exception happen :
Send raised exception: 'RunningCoroutine' object has no attribute '_join'
File "/opt/cocotb/cocotb/decorators.py", line 121, in send
return self._coro.send(value)
File "/myproject.py", line 206, in i2c_read
wTXDRwthread.kill()
File "/opt/cocotb/cocotb/decorators.py", line 151, in kill
cocotb.scheduler.unschedule(self)
File "/opt/cocotb/cocotb/scheduler.py", line 453, in unschedule
if coro._join in self._trigger2coros:
Is there a solution to stop forked coroutine properly ?
This very much looks like it is the same problem as in https://github.com/potentialventures/cocotb/issues/650 - you can subscribe to the issue to be notified when its status changes.

Celery upgrade (3.1->4.1) - Connection reset by peer

We are working with celery at the last year, with ~15 workers, each one defined with concurrency between 1-4.
Recently we upgraded our celery from v3.1 to v4.1
Now we are having the following errors in each one of the workers logs, any ideas what can cause to such error?
2017-08-21 18:33:19,780 94794 ERROR Control command error: error(104, 'Connection reset by peer') [file: pidbox.py, line: 46]
Traceback (most recent call last):
File "/srv/dy/venv/lib/python2.7/site-packages/celery/worker/pidbox.py", line 42, in on_message
self.node.handle_message(body, message)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 129, in handle_message
return self.dispatch(**body)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 112, in dispatch
ticket=ticket)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 135, in reply
serializer=self.mailbox.serializer)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 265, in _publish_reply
**opts
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 181, in publish
exchange_name, declare,
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 203, in _publish
mandatory=mandatory, immediate=immediate,
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/channel.py", line 1748, in _basic_publish
(0, exchange, routing_key, mandatory, immediate), msg
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 64, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/method_framing.py", line 178, in write_frame
write(view[:offset])
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/transport.py", line 272, in write
self._write(s)
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 104] Connection reset by peer
BTW: our tasks in the form:
#app.task(name='EXAMPLE_TASK'],
bind=True,
base=ConnectionHolderTask)
def example_task(self, arg1, arg2, **kwargs):
# task code
We are also having massive issues with celery... I spend 20% of my time just dancing around weird idle-hang/crash issues with our workers sigh
We had a similar case that was caused by a high concurrency combined with a high worker_prefetch_multiplier, as it turns out fetching thousands of tasks is a good way to frack the connection.
If that's not the case: try to disable the broker pool by setting broker_pool_limit to None.
Just some quick ideas that might (hopefully) help :-)

Python : _MainThread' object has no attribute '_state'

Hey Guys I am creating an application which takes in a request from the user. The main class in the server side is the Controller . I spawn a thread during init, which keeps actively listening for requests from the client ( I need to spawn a thread here. )
Once I get an request, I look into the type of request and call a function to handle it.
In that function, I want to create multiple processes to utilise my 8 cores effectively.
Here is the code:-
class Controller(app_manager.RyuApp):
OFP_VERSIONS = [ofproto_v1_3.OFP_VERSION]
def __init__(self, *args, **kwargs):
self.datapaths={}
self.monitor_thread=hub.spawn(self._monitor)
super(Controller, self).__init__( *args, **kwargs)
def _monitor(self):
global connstream
while True:
#Get Connection from client
data = connstream.read(15000)
data=eval(data)
print "Recieved a request from the client:-",data
for key,value in data.iteritems():
type=int(key)
request=value
if type==4:
self.get_route(type,request,connstream)
def get_route(self,type,request,connection):
global get_route_result
cities=request['Cities']
number_of_cities=request['Number_of_Cities']
city_count=0
processes=[]
pool = mp.Pool(processes=8)
for city,destination_ip in cities.iteritems():
args=(type,destination_ip)
processes.append(args)
city_count=city_count+1
if city_count==number_of_cities:
break
pool.map(self.get_route_process,processes)
def get_route_process(self,HOST,destination):
#Do Something
But the error I get is:-
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 325, in _handle_workers
while thread._state == RUN or (pool._cache and thread._state != TERMINATE):
AttributeError: '_MainThread' object has no attribute '_state'
So in a nutshell, I create a thread, which tries to create multiple processes, but the code fails.

Django-Celery worker hangs when SQL Server Database crashes using FreeTDS driver

I have a celery worker, running on a Ubuntu 14.04 server, that is reading and writing to an MSSQL Server database using pyodbc and the freetds driver. When the SQL Server goes down, the function fails as expected and the celery worker starts trying to clean up and get ready for the next task. At this time, the worker calls django's "connection.close()" method. This method appears to send a command to rollback any incomplete transactions. Since the server is down this throws an exception that is not caught by the celery worker. The worker then hangs and neither releases the task or move on to the next task.
I tried overriding the on_failure and after_return methods for the function and calling connection.close() there (as specified in other answers), but that didn't work. I suspect it is because the when I call connection.close() it has the same issue, and just bubbles the exception up, or because celery's cleanup code gets run before those two methods get called.
Any ideas on how to either catch this exception before it gets to celery, or avoid it all together?
Below is the stack trace of the exception:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 283, in trace_task
state, retval, uuid, args, kwargs, None,
File "/var/www/cortex/corespring/tasks.py", line 13, in after_return
connections['xxx'].close()
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/init.py", line 317, in close
self.connection.close()
File "/usr/local/lib/python2.7/dist-packages/pyodbc.py", line 2642, in close
self.rollback()
File "/usr/local/lib/python2.7/dist-packages/pyodbc.py", line 2565, in rollback
check_success(self, ret)
File "/usr/local/lib/python2.7/dist-packages/pyodbc.py", line 987, in check_success
ctrl_err(SQL_HANDLE_DBC, ODBC_obj.dbc_h, ret, ODBC_obj.ansi)
File "/usr/local/lib/python2.7/dist-packages/pyodbc.py", line 965, in ctrl_err
raise DatabaseError(state,err_text)
DatabaseError: (u'08S01', u'[08S01] [FreeTDS][SQL Server]Write to the server failed')

Circus/ZeroMQ "socket in use" error

I'm running a Flask app and internally using a library written in Node.js, which I access through ZeroRPC (the actual node process is managed by Circus). This works fine on its own; I can unit test with no issues. But when starting the Flask app as a listening process, and calling into a REST api which calls this libary, the program throws an exception when trying to start the process. The code to start the service is as follows:
from circus.watcher import Watcher
from circus.arbiter import ThreadedArbiter
from circus.util import (DEFAULT_ENDPOINT_DEALER, DEFAULT_ENDPOINT_SUB,
DEFAULT_ENDPOINT_MULTICAST)
class Node(object):
{... omitted code that initializes self._arbiter and self._client ...}
def start(self):
if self._arbiter and self._client:
return
port = 'ipc:///tmp/inlinejs_%s' % os.getpid()
args = 'lib/server.js --port %s' % port
watcher = Watcher('node', '/usr/local/bin/node', args,
working_dir=INLINEJS_DIR)
self._arbiter = ThreadedArbiter([watcher], DEFAULT_ENDPOINT_DEALER,
DEFAULT_ENDPOINT_SUB, multicast_endpoint=DEFAULT_ENDPOINT_MULTICAST)
self._arbiter.start()
self._client = zerorpc.Client()
self._client.connect(port)
This function returns, but shortly afterwards in a separate thread, I get this error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/python/lib/python2.7/site-packages/circus/_patch.py", line 21, in _bootstrap_inner
self.run()
File "/python/lib/python2.7/site-packages/circus/arbiter.py", line 647, in run
return Arbiter.start(self)
File "/python/lib/python2.7/site-packages/circus/util.py", line 319, in _log
return func(self, *args, **kw)
File "/python/lib/python2.7/site-packages/circus/arbiter.py", line 456, in start
self.initialize()
File "/python/lib/python2.7/site-packages/circus/util.py", line 319, in _log
return func(self, *args, **kw)
File "/python/lib/python2.7/site-packages/circus/arbiter.py", line 427, in initialize
self.evpub_socket.bind(self.pubsub_endpoint)
File "socket.pyx", line 432, in zmq.core.socket.Socket.bind (zmq/core/socket.c:4022)
File "checkrc.pxd", line 21, in zmq.core.checkrc._check_rc (zmq/core/socket.c:5838)
ZMQError: Address already in use
I have no idea why this is happening, especially since it doesn't happen in unit tests. Can anyone shed any light?
In general, when you get this type of "Address in use" error, it means that your program is trying to bind on an IP port number but something else got there first.
I am not familiar with this library, but since the error is caused by "evpub_socket.bind", I am going to guess that you have a conflict with the port number specified by the constant DEFAULT_ENDPOINT_SUB. From the circus source code I see these constants:
DEFAULT_ENDPOINT_DEALER = "tcp://127.0.0.1:5555"
DEFAULT_ENDPOINT_SUB = "tcp://127.0.0.1:5556"
DEFAULT_ENDPOINT_STATS = "tcp://127.0.0.1:5557"
Check your system (netstat) and see if any process is listening on ports 5555, 5556, 5557. Or perhaps you are running this program twice and you forgot about the first one.

Categories

Resources