Celery Beat process crashes after nslookup failure - python

I'm using Celery 3.1.19 with scheduled tasks. I start the process like so:
celery beat --app=my_app.celery.app:app --pidfile=/usr/local/celerybeat.pid --schedule=/usr/local/celerybeat-schedule -l INFO
I've had a couple occurrences where the celery process terminates after an nslookup failure. This causes future scheduled tasks to not run. Eventually I notice and restart celery beat.
As far as I can tell the hostname it's trying to lookup is my RabbitMQ host. The nslookup failures are temporary. The hostname is correct and evidently there was a blip in name resolution. Ideally that would not crash the process and instead it would retry until it the hostname lookup succeeded.
Questions:
Is this expected behavior?
Is there a common way to ensure that the scheduler keeps running?
Do people have a system to watch the process and restart if it crashes?
Stack trace:
Message Error: Couldn't apply scheduled task ping: Error opening socket: hostname lookup failed
File "/usr/local/bin/celery", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/celery/__main__.py", line 30, in main
main()
File "/usr/local/lib/python2.7/dist-packages/celery/bin/celery.py", line 81, in main
cmd.execute_from_commandline(argv)
File "/usr/local/lib/python2.7/dist-packages/celery/bin/celery.py", line 770, in execute_from_commandline
super(CeleryCommand, self).execute_from_commandline(argv)))
File "/usr/local/lib/python2.7/dist-packages/celery/bin/base.py", line 311, in execute_from_commandline
return self.handle_argv(self.prog_name, argv[1:])
File "/usr/local/lib/python2.7/dist-packages/celery/bin/celery.py", line 762, in handle_argv
return self.execute(command, argv)
File "/usr/local/lib/python2.7/dist-packages/celery/bin/celery.py", line 694, in execute
).run_from_argv(self.prog_name, argv[1:], command=argv[0])
File "/usr/local/lib/python2.7/dist-packages/celery/bin/base.py", line 315, in run_from_argv
sys.argv if argv is None else argv, command)
File "/usr/local/lib/python2.7/dist-packages/celery/bin/base.py", line 377, in handle_argv
return self(*args, **options)
File "/usr/local/lib/python2.7/dist-packages/celery/bin/base.py", line 274, in __call__
ret = self.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery/bin/beat.py", line 79, in run
return beat().run()
File "/usr/local/lib/python2.7/dist-packages/celery/apps/beat.py", line 83, in run
self.start_scheduler()
File "/usr/local/lib/python2.7/dist-packages/celery/apps/beat.py", line 112, in start_scheduler
beat.start()
File "/usr/local/lib/python2.7/dist-packages/celery/beat.py", line 473, in start
File "/usr/local/lib/python2.7/dist-packages/celery/beat.py", line 221, in tick

Related

asyncio problematic with apscheduler even as new process

I try to connect to IB
ib.connect(host,port , clientId=3, readonly=readonly)
this internally uses asnycio.
When I am doing it from a python file ib_test, it works well.
But when I try to do
from multiprocessing import Process
p=Process(target=main)
p.start()
p.join()
(the main is the main of the same file - the only thing that is get called), it doesn't work.It gets Timeout. What is strange is that in wireshark it seems that the server doesn't send the next packet, but maybe the client doesn't do recieve (although I'd expect it to appear anyway).
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python39\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python39\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\xxxx\tttt\ibtest.py", line 56, in main
mediator=IBMediator()
File "C:\Users\xxxx\tttt\ibtest.py", line 22, in __init__
self._ibsource : IBSource=IBSource(host='127.0.0.1',port=PORT,clientId=IBMediator.clientId)
File "c:\users\xxxx\zzzz\ibsource.py", line 22, in __init__
self.ib.connect(host,port , clientId=3, readonly=readonly)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python39\lib\site-packages\ib_insync-0.9.70-py3.9.egg\ib_insync\ib.py", line 269, in connect
return self._run(self.connectAsync(
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python39\lib\site-packages\ib_insync-0.9.70-py3.9.egg\ib_insync\ib.py", line 308, in _run
return util.run(*awaitables, timeout=self.RequestTimeout)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python39\lib\site-packages\ib_insync-0.9.70-py3.9.egg\ib_insync\util.py", line 332, in run
result = loop.run_until_complete(task)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 642, in run_until_complete
return future.result()
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python39\lib\site-packages\ib_insync-0.9.70-py3.9.egg\ib_insync\ib.py", line 1658, in connectAsync
await self.client.connectAsync(host, port, clientId, timeout)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python39\lib\site-packages\ib_insync-0.9.70-py3.9.egg\ib_insync\client.py", line 216, in connectAsync
await asyncio.wait_for(self.apiStart, timeout)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python39\lib\asyncio\tasks.py", line 494, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
Do you know what can be the cause?
Of course, also directly calling main won't work.
Before running it , I run BackgroundScheduler, which seems like the only plausible reason .
Adding asyncio event loop didn't work as well.

BrokenPipeError celery.bin.base in out

We have Ec2 fargate instance running on AWS, there is a celery worker which is running by
celery -A app status
Repeatedly we are getting the following error for this instance
BrokenPipeError: [Errno 32] Broken pipe
File "celery", line 8, in <module>
sys.exit(main())
File "celery/__main__.py", line 16, in main
_main()
File "celery/bin/celery.py", line 322, in main
cmd.execute_from_commandline(argv)
File "celery/bin/celery.py", line 499, in execute_from_commandline
super(CeleryCommand, self).execute_from_commandline(argv)))
File "celery/bin/base.py", line 305, in execute_from_commandline
return self.handle_argv(self.prog_name, argv[1:])
File "celery/bin/celery.py", line 491, in handle_argv
return self.execute(command, argv)
File "celery/bin/celery.py", line 419, in execute
).run_from_argv(self.prog_name, argv[1:], command=argv[0])
File "celery/bin/base.py", line 309, in run_from_argv
sys.argv if argv is None else argv, command)
File "celery/bin/base.py", line 393, in handle_argv
return self(*args, **options)
File "celery/bin/base.py", line 253, in __call__
ret = self.run(*args, **kwargs)
File "celery/bin/control.py", line 239, in run
nodecount, text.pluralize(nodecount, 'node')))
File "celery/bin/base.py", line 413, in out
print(s, file=fh or self.stdout)
Any insights into it?

Ariflow [Errno 104] Connection reset by peer

I am trying to run tasks through the command 'airflow scheduler' when it produced this error, AFTER I tried to run one of the dags.
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 28, in <module>
args.func(args)
File "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 839, in scheduler
job.run()
File "/usr/local/lib/python3.5/dist-packages/airflow/jobs.py", line 200, in run
self._execute()
File "/usr/local/lib/python3.5/dist-packages/airflow/jobs.py", line 1309, in _execute
self._execute_helper(processor_manager)
File "/usr/local/lib/python3.5/dist-packages/airflow/jobs.py", line 1441, in _execute_helper
self.executor.heartbeat()
File "/usr/local/lib/python3.5/dist-packages/airflow/executors/base_executor.py", line 132, in heartbeat
self.sync()
File "/usr/local/lib/python3.5/dist-packages/airflow/executors/celery_executor.py", line 88, in sync
state = async.state
File "/home/userName/.local/lib/python3.5/site-packages/celery/result.py", line 436, in state
return self._get_task_meta()['status']
File "/home/userName/.local/lib/python3.5/site-packages/celery/result.py", line 375, in _get_task_meta
return self._maybe_set_cache(self.backend.get_task_meta(self.id))
File "/home/userName/.local/lib/python3.5/site-packages/celery/backends/amqp.py", line 156, in get_task_meta
binding.declare()
File "/home/userName/.local/lib/python3.5/site-packages/kombu/entity.py", line 605, in declare
self._create_queue(nowait=nowait, channel=channel)
File "/home/userName/.local/lib/python3.5/site-packages/kombu/entity.py", line 614, in _create_queue
self.queue_declare(nowait=nowait, passive=False, channel=channel)
File "/home/userName/.local/lib/python3.5/site-packages/kombu/entity.py", line 649, in queue_declare
nowait=nowait,
File "/home/userName/.local/lib/python3.5/site-packages/amqp/channel.py", line 1147, in queue_declare
nowait, arguments),
File "/home/userName/.local/lib/python3.5/site-packages/amqp/abstract_channel.py", line 50, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/home/userName/.local/lib/python3.5/site-packages/amqp/method_framing.py", line 166, in write_frame
write(view[:offset])
File "/home/userName/.local/lib/python3.5/site-packages/amqp/transport.py", line 258, in write
self._write(s)
**ConnectionResetError: [Errno 104] Connection reset by peer**
I am using Python 3.5, Airflow 1.8, Celery 4.1.0, and RabbitMQ 3.5.7 as the worker :
It looks like I am having a problem on RabbitMQ, but I cannot figure out the reason.
The reported error seems to be a identified error solved in Airflow 1.10.0.
Had the same issue.
Your dag contains many API calls to a server and your airflow scheduler has a limit to follow, there isn't a specific number of request at once to abide by but you should do trial and error to find the number that works for your Airflow environment. usually occurs when your dag has n number of tasks to run alongside each other simultaneously.
this issue is not resolvable by any updates that claimed in answers, I was getting the error even when I was using the latest release.

Popen() fails with "[WinError 6] The handle is invalid" in _cleanup() sometimes

I'm working on a django project. One of the views calls Popen(). Most of the time everything works fine. But once in a while Popen() would fail with the following error messages:
Traceback (most recent call last):
File "C:\***\views.py", line 116, in process_request
proc = Popen(['python', exec_path, input_excel, current_tab, project])
File "C:\Python34\lib\subprocess.py", line 754, in __init__
_cleanup()
File "C:\Python34\lib\subprocess.py", line 474, in _cleanup
res = inst._internal_poll(_deadstate=sys.maxsize)
File "C:\Python34\lib\subprocess.py", line 1146, in _internal_poll
if _WaitForSingleObject(self._handle, 0) == _WAIT_OBJECT_0:
OSError: [WinError 6] The handle is invalid
Restarting the server usually solves the problem but it could show up again later. Attempts immediately after the failure usually fail too (implemented with loops). Manually reloading the page multiple times solve the problem sometimes.
I also tried both 64 bit and 32 bit python versions. The problem shows on both.
It seems that _cleanup() manages the _active list which is used to avoid zombie processes. Considering I'm on windows, I commented out the _cleanup() in Popen() which seems to be working fine so far. Clearly it's not a proper fix. Any better idea?
Update:
Following eryksun's advice I looked closer at the handles. It seems that the process handle is closed by django's autoreload.py for some reason. See below for details.
---------------------------------------------------
try handle:
Handle(908)
File "manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "C:\Python34\lib\site-packages\django\core\management\__init__.py", line 338, in execute_from_command_line
utility.execute()
File "C:\Python34\lib\site-packages\django\core\management\__init__.py", line 330, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "C:\Python34\lib\site-packages\django\core\management\base.py", line 393, in run_from_argv
self.execute(*args, **cmd_options)
File "C:\Python34\lib\site-packages\django\core\management\commands\runserver.py", line 49, in execute
super(Command, self).execute(*args, **options)
File "C:\Python34\lib\site-packages\django\core\management\base.py", line 444, in execute
output = self.handle(*args, **options)
File "C:\Python34\lib\site-packages\django\core\management\commands\runserver.py", line 88, in handle
self.run(**options)
File "C:\Python34\lib\site-packages\django\core\management\commands\runserver.py", line 97, in run
autoreload.main(self.inner_run, None, options)
File "C:\Python34\lib\site-packages\django\utils\autoreload.py", line 325, in main
reloader(wrapped_main_func, args, kwargs)
File "C:\Python34\lib\site-packages\django\utils\autoreload.py", line 291, in python_reloader
reloader_thread()
File "C:\Python34\lib\site-packages\django\utils\autoreload.py", line 267, in reloader_thread
change = fn()
File "C:\Python34\lib\site-packages\django\utils\autoreload.py", line 204, in code_changed
for filename in gen_filenames():
File "C:\Python34\lib\site-packages\django\utils\autoreload.py", line 92, in gen_filenames
_cached_filenames = clean_files(_cached_filenames)
File "C:\Python34\lib\site-packages\django\utils\autoreload.py", line 139, in clean_files
if os.path.exists(filename):
File "C:\Python34\lib\genericpath.py", line 19, in exists
os.stat(path)
File "C:\Python34\lib\subprocess.py", line 452, in Close
print(traceback.print_stack())
None
handle closed.
---------------------------------------------------
The above error information is generated by the modification below.
def Close(self, CloseHandle=_winapi.CloseHandle):
print ('---------------------------------------------------')
print ('try handle:')
print (self)
if not self.closed:
self.closed = True
CloseHandle(self)
print(traceback.print_stack())
print ('handle closed.')
print ('---------------------------------------------------')
Later the exceptions complain about Handle (908).
I can't quite follow how os.stat(path) closed the handle and why the process isn't taken off the _active list by subprocess.py.

UNEXPECTED_FRAME - expected content header for class 60, got non content header frame instead

What I am doing is, imagine that you have several workflows that need to execute. These workflows have tasks, and the target of the tasks are different hosts.
The fastest way to do this is running every workflow inside a process, and run them in parallel.
I am trying to run python multiprocessing to execute a remote function that I call with the help of celery. My program runs ok if I just run one process. But when I run more than one process, I get the error below. As far as I got, the issue is with concurrent publishing on the same channel. Channels should not be shared between threads/etc.
How I can make Celery to resolve this? Is is a parameter that I should launch with 'celeryd' command, or I need to do it in my python program?
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "testHello.py", line 16, in test_hello_aux
print output.get()
File "/usr/local/lib/python2.7/dist-packages/celery/result.py", line 169, in get
no_ack=no_ack,
File "/usr/local/lib/python2.7/dist-packages/celery/backends/amqp.py", line 155, in wait_for
on_interval=on_interval)
File "/usr/local/lib/python2.7/dist-packages/celery/backends/amqp.py", line 229, in consume
no_ack=no_ack, accept=self.accept) as consumer:
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 359, in __init__
self.revive(self.channel)
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 371, in revive
self.declare()
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 381, in declare
queue.declare()
File "/usr/local/lib/python2.7/dist-packages/kombu/entity.py", line 505, in declare
self.queue_declare(nowait, passive=False)
File "/usr/local/lib/python2.7/dist-packages/kombu/entity.py", line 531, in queue_declare
nowait=nowait)
File "/usr/local/lib/python2.7/dist-packages/amqp/channel.py", line 1254, in queue_declare
self._send_method((50, 10), args)
File "/usr/local/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 56, in _send_method
self.channel_id, method_sig, args, content,
File "/usr/local/lib/python2.7/dist-packages/amqp/method_framing.py", line 221, in write_method
write_frame(1, channel, payload)
File "/usr/local/lib/python2.7/dist-packages/amqp/transport.py", line 177, in write_frame
frame_type, channel, size, payload, 0xce,
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 32] Broken pipe
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "testHello.py", line 16, in test_hello_aux
print output.get()
File "/usr/local/lib/python2.7/dist-packages/celery/result.py", line 169, in get
no_ack=no_ack,
File "/usr/local/lib/python2.7/dist-packages/celery/backends/amqp.py", line 155, in wait_for
on_interval=on_interval)
File "/usr/local/lib/python2.7/dist-packages/celery/backends/amqp.py", line 229, in consume
no_ack=no_ack, accept=self.accept) as consumer:
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 359, in __init__
Process Process-3:
self.revive(self.channel)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 371, in revive
self.declare()
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 381, in declare
queue.declare()
File "/usr/local/lib/python2.7/dist-packages/kombu/entity.py", line 504, in declare
self.run()
self.exchange.declare(nowait)
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
File "/usr/local/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
self._target(*self._args, **self._kwargs)
nowait=nowait, passive=passive,
File "testHello.py", line 16, in test_hello_aux
File "/usr/local/lib/python2.7/dist-packages/amqp/channel.py", line 613, in exchange_declare
print output.get()
File "/usr/local/lib/python2.7/dist-packages/celery/result.py", line 169, in get
no_ack=no_ack,
File "/usr/local/lib/python2.7/dist-packages/celery/backends/amqp.py", line 155, in wait_for
on_interval=on_interval)
File "/usr/local/lib/python2.7/dist-packages/celery/backends/amqp.py", line 229, in consume
no_ack=no_ack, accept=self.accept) as consumer:
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 359, in __init__
self._send_method((40, 10), args)
File "/usr/local/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 56, in _send_method
self.channel_id, method_sig, args, content,
File "/usr/local/lib/python2.7/dist-packages/amqp/method_framing.py", line 221, in write_method
self.revive(self.channel)
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 371, in revive
self.declare()
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 381, in declare
write_frame(1, channel, payload)
queue.declare()
File "/usr/local/lib/python2.7/dist-packages/amqp/transport.py", line 177, in write_frame
File "/usr/local/lib/python2.7/dist-packages/kombu/entity.py", line 504, in declare
frame_type, channel, size, payload, 0xce,
File "/usr/lib/python2.7/socket.py", line 224, in meth
self.exchange.declare(nowait)
File "/usr/local/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
nowait=nowait, passive=passive,
File "/usr/local/lib/python2.7/dist-packages/amqp/channel.py", line 620, in exchange_declare
return getattr(self._sock,name)(*args)
error: [Errno 32] Broken pipe
(40, 11), # Channel.exchange_declare_ok
File "/usr/local/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 67, in wait
self.channel_id, allowed_methods)
File "/usr/local/lib/python2.7/dist-packages/amqp/connection.py", line 237, in _wait_method
self.method_reader.read_method()
File "/usr/local/lib/python2.7/dist-packages/amqp/method_framing.py", line 189, in read_method
raise m
error: [Errno 104] Connection reset by peer
Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "testHello.py", line 16, in test_hello_aux
print output.get()
File "/usr/local/lib/python2.7/dist-packages/celery/result.py", line 169, in get
no_ack=no_ack,
File "/usr/local/lib/python2.7/dist-packages/celery/backends/amqp.py", line 155, in wait_for
on_interval=on_interval)
File "/usr/local/lib/python2.7/dist-packages/celery/backends/amqp.py", line 229, in consume
no_ack=no_ack, accept=self.accept) as consumer:
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 359, in __init__
self.revive(self.channel)
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 371, in revive
self.declare()
File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 381, in declare
queue.declare()
File "/usr/local/lib/python2.7/dist-packages/kombu/entity.py", line 505, in declare
self.queue_declare(nowait, passive=False)
File "/usr/local/lib/python2.7/dist-packages/kombu/entity.py", line 531, in queue_declare
nowait=nowait)
File "/usr/local/lib/python2.7/dist-packages/amqp/channel.py", line 1258, in queue_declare
(50, 11), # Channel.queue_declare_ok
File "/usr/local/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 67, in wait
self.channel_id, allowed_methods)
File "/usr/local/lib/python2.7/dist-packages/amqp/connection.py", line 270, in _wait_method
self.wait()
File "/usr/local/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 69, in wait
return self.dispatch_method(method_sig, args, content)
File "/usr/local/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 87, in dispatch_method
return amqp_method(self, args)
File "/usr/local/lib/python2.7/dist-packages/amqp/connection.py", line 526, in _close
(class_id, method_id), ConnectionError)
UnexpectedFrame: Basic.publish: (505) UNEXPECTED_FRAME - expected content header for class 60, got non content header frame instead
celery --version 3.1.11 (Cipater)
amq --version 0.9.1
When using Celery you should not need to use the python multiprocessing module. Celery takes care of everything for you.
Define your task in a file called tasks.py
from celery import Celery
app = Celery('tasks', broker='amqp://guest#localhost//')
#app.task
def add(x, y):
return x + y
Now assume the add function is actually what ever you would like to run in parallel. Let's also consider terms. Parallel means at the same time, while async means not synchronously. I cannot guarantee you tasks will be run at the same time, though I can guarantee they will not be run synchronously. For that reason, lets stick with the term async.
Celery has Canvas, a set of primitives for async flow control. Two you would be interested in would be group and chord. group allows you to run a group of async tasks and ask block on the results of all of the async tasks (accomplishing what you were attempting with you join). chord provides the same functionality as group though fire a callback when all of the tasks complete.
An example of the calling code :
WAIT_TIME = 10 # how ever long you are willing to wait for your tasks
from tasks import add
from celery import group
future = group(add.s(i**i, i**i) for i in xrange(10))()
results = future.get(timeout=WAIT_TIME)
Celery tasks are automatically run in their own process (the workers you spawn) and do not require you to create further processes yourself.

Categories

Resources