Airflow scheduler crashes when a DAG is run

Airflow scheduler crashes when a DAG is run - python

The airflow scheduler crashes when I trigger it manually from the dashboard.
executor = DaskExecutor
Airflow version: = 1.10.7
sql_alchemy_conn = postgresql://airflow:airflow#localhost:5432/airflow
python version = 3.6
The logs on crash are:
[2020-08-20 07:01:49,288] {scheduler_job.py:1148} INFO - Sending ('hello_world', 'dummy_task', datetime.datetime(2020, 8, 20, 1, 31, 47, 20630, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1) to executor with priority 2 and queue default
[2020-08-20 07:01:49,288] {base_executor.py:58} INFO - Adding to queue: ['airflow', 'run', 'hello_world', 'dummy_task', '2020-08-20T01:31:47.020630+00:00', '--local', '--pool', 'default_pool', '-sd', '/workflows/dags/helloWorld.py']
/mypython/lib/python3.6/site-packages/airflow/executors/dask_executor.py:63: UserWarning: DaskExecutor does not support queues. All tasks will be run in the same cluster
'DaskExecutor does not support queues. '
distributed.protocol.pickle - INFO - Failed to serialize <function DaskExecutor.execute_async.<locals>.airflow_run at 0x12057a9d8>. Exception: Cell is empty
[2020-08-20 07:01:49,292] {scheduler_job.py:1361} ERROR - Exception when executing execute_helper
Traceback (most recent call last):
File "/mypython/lib/python3.6/site-packages/distributed/worker.py", line 843, in dumps_function
result = cache[func]
KeyError: <function DaskExecutor.execute_async.<locals>.airflow_run at 0x12057a9d8>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mypython/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 38, in dumps
result = pickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
AttributeError: Can't pickle local object 'DaskExecutor.execute_async.<locals>.airflow_run'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mypython/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 1359, in _execute
self._execute_helper()
File "/mypython/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 1420, in _execute_helper
if not self._validate_and_run_task_instances(simple_dag_bag=simple_dag_bag):
File "/mypython/lib/python3.6/site-packages/airflow/jobs/scheduler_job.py", line 1482, in _validate_and_run_task_instances
self.executor.heartbeat()
File "/mypython/lib/python3.6/site-packages/airflow/executors/base_executor.py", line 130, in heartbeat
self.trigger_tasks(open_slots)
File "/mypython/lib/python3.6/site-packages/airflow/executors/base_executor.py", line 154, in trigger_tasks
executor_config=simple_ti.executor_config)
File "/mypython/lib/python3.6/site-packages/airflow/executors/dask_executor.py", line 70, in execute_async
future = self.client.submit(airflow_run, pure=False)
File "/mypython/lib/python3.6/site-packages/distributed/client.py", line 1279, in submit
actors=actor)
File "/mypython/lib/python3.6/site-packages/distributed/client.py", line 2249, in _graph_to_futures
'tasks': valmap(dumps_task, dsk3),
File "/mypython/lib/python3.6/site-packages/toolz/dicttoolz.py", line 83, in valmap
rv.update(zip(iterkeys(d), map(func, itervalues(d))))
File "/mypython/lib/python3.6/site-packages/distributed/worker.py", line 881, in dumps_task
return {'function': dumps_function(task[0]),
File "/mypython/lib/python3.6/site-packages/distributed/worker.py", line 845, in dumps_function
result = pickle.dumps(func)
File "/mypython/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 51, in dumps
return cloudpickle.dumps(x, protocol=pickle.HIGHEST_PROTOCOL)
File "/mypython/lib/python3.6/site-packages/cloudpickle/cloudpickle_fast.py", line 101, in dumps
cp.dump(obj)
File "/mypython/lib/python3.6/site-packages/cloudpickle/cloudpickle_fast.py", line 540, in dump
return Pickler.dump(self, obj)
File "/usr/local/opt/python#3.6/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/usr/local/opt/python#3.6/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/mypython/lib/python3.6/site-packages/cloudpickle/cloudpickle_fast.py", line 722, in save_function
*self._dynamic_function_reduce(obj), obj=obj
File "/mypython/lib/python3.6/site-packages/cloudpickle/cloudpickle_fast.py", line 659, in _save_reduce_pickle5
dictitems=dictitems, obj=obj
File "/usr/local/opt/python#3.6/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 610, in save_reduce
save(args)
File "/usr/local/opt/python#3.6/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/opt/python#3.6/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 751, in save_tuple
save(element)
File "/usr/local/opt/python#3.6/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/local/opt/python#3.6/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 736, in save_tuple
save(element)
File "/usr/local/opt/python#3.6/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/mypython/lib/python3.6/site-packages/dill/_dill.py", line 1146, in save_cell
f = obj.cell_contents
ValueError: Cell is empty
[2020-08-20 07:01:49,302] {helpers.py:322} INFO - Sending Signals.SIGTERM to GPID 11451
[2020-08-20 07:01:49,303] {dag_processing.py:804} INFO - Exiting gracefully upon receiving signal 15
[2020-08-20 07:01:49,310] {dag_processing.py:1379} INFO - Waiting up to 5 seconds for processes to exit...
[2020-08-20 07:01:49,318] {helpers.py:288} INFO - Process psutil.Process(pid=11451, status='terminated') (11451) terminated with exit code 0
[2020-08-20 07:01:49,319] {helpers.py:288} INFO - Process psutil.Process(pid=11600, status='terminated') (11600) terminated with exit code None
[2020-08-20 07:01:49,319] {scheduler_job.py:1364} INFO - Exited execute loop
I am running it on macOS Catalina, if that might help to isolate the error.

I believe this issue is possibly what you are experiencing.
Looking at that ticket, it appears to still be open as a fix has been made, but has not yet made it to an official release.
This pull request contains the fix for the issue linked above - you could try building your Airflow stack locally from there, and see if it resolves the issue for you.

This started happening with new versions in downstream Dask dependencies. Locking the versions fixes the issue.
pip uninstall cloudpickle distributed
pip install cloudpickle==1.4.1 distributed==2.17.0
These were the problematic versions:
cloudpickle==1.6.0
distributed==2.26.0
I run Airflow 1.10.10 in docker and use the same image for Dask 2.13.0.

Related

Multiprocessing; How to debug: _pickle.PicklingError: Could not pickle object as excessively deep recursion required

I have a simulation which I can run using Python code and want to create multiple instances of it using a SubProcVecEnv from stable-baselines3. This uses subprocessing to run the simulations on different cores and it was working before I made a number of changes to my code. However, now I receive the error below and do not know how to debug it, because I don't understand which part of my code is causing it. Is there a way to find out which object/ method is causing the recursion depth being exceeded? I also do not remember writing a recursive method anywhere in my code. Researching the error message was not successful.
/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
Traceback (most recent call last):
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
return Pickler.dump(self, obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 639, in reducer_override
if sys.version_info[:2] < (3, 7) and _is_parametrized_type_hint(obj): # noqa # pragma: no branch
RecursionError: maximum recursion depth exceeded in comparison
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/philipp/Code/ba_pw/train.py", line 84, in <module>
venv = utils.make_venv(env_class, network, params, remote_ports, monitor_log_dir)
File "/home/philipp/Code/ba_pw/sumo_rl/utils/utils.py", line 170, in make_venv
return vec_env.SubprocVecEnv(env_fs)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 106, in __init__
process.start()
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/context.py", line 291, in _Popen
return Popen(process_obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/popen_forkserver.py", line 35, in __init__
super().__init__(process_obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/popen_forkserver.py", line 47, in _launch
reduction.dump(process_obj, buf)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 372, in __getstate__
return cloudpickle.dumps(self.var)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/home/philipp/anaconda3/envs/sumo_rl/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 570, in dump
raise pickle.PicklingError(msg) from e
_pickle.PicklingError: Could not pickle object as excessively deep recursion required.

I finally figured out a solution using the answer to this question:
I looks like the object which I want to pickle has too many layers. I called:
sys.setrecursionlimit(3000)
and now it works.

could not serialize access due to concurrent update while creating pos picking from a job

Impacted versions:
12.0
Steps to reproduce:
I have made a customization to postpone the creation of pos order picking
and delegate the task to a job
some time i get the error below
Current behavior:
2020-06-18 17:49:24,588 1370 ERROR cafe9.rabeh.io odoo.addons.base.models.ir_cron: Call from cron POS Orders: Process Pending Orders for server action #610 failed in Job #24
Traceback (most recent call last):
File "/opt/rabeh/odoo/odoo/addons/base/models/ir_cron.py", line 102, in _callback
self.env['ir.actions.server'].browse(server_action_id).run()
File "/opt/rabeh/odoo/odoo/addons/base/models/ir_actions.py", line 569, in run
res = func(action, eval_context=eval_context)
File "/opt/rabeh/odoo/odoo/addons/base/models/ir_actions.py", line 445, in run_action_code_multi
safe_eval(action.sudo().code.strip(), eval_context, mode="exec", nocopy=True) # nocopy
allows to return 'action'
File "/opt/rabeh/odoo/odoo/tools/safe_eval.py", line 350, in safe_eval
return unsafe_eval(c, globals_dict, locals_dict)
File "/opt/rabeh-12/rabeh_addons/pos_pending_session/models/pos_order.py", line 68, in pending_picking_creation
po_order.create_picking()
File "/opt/rabeh-12/rabeh_addons/pos_pending_session/models/pos_order.py", line 36, in
create_picking
res = super(PosOrder, orders).create_picking()
File "/opt/rabeh/odoo/addons/point_of_sale/models/pos_order.py", line 841, in create_picking
order._force_picking_done(order_picking)
File "/opt/rabeh/odoo/addons/point_of_sale/models/pos_order.py", line 856, in _force_picking_done
picking.action_done()
File "/opt/rabeh/odoo/addons/stock/models/stock_picking.py", line 631, in action_done
todo_moves._action_done()
File "/opt/rabeh/odoo/addons/purchase_stock/models/stock.py", line 96, in _action_done
res = super(StockMove, self)._action_done()
File "/opt/rabeh/odoo/addons/stock_account/models/stock.py", line 389, in _action_done
res = super(StockMove, self)._action_done()
File "/opt/rabeh/odoo/addons/stock/models/stock_move.py", line 1137, in _action_done
moves_todo.mapped('move_line_ids')._action_done()
File "/opt/rabeh/odoo/addons/stock/models/stock_move_line.py", line 445, in _action_done
Quant._update_available_quantity(ml.product_id, ml.location_dest_id, quantity, lot_id=ml.lot_id, package_id=ml.result_package_id, owner_id=ml.owner_id, in_date=in_date)
File "/opt/rabeh/odoo/addons/stock/models/stock_quant.py", line 216, in _update_available_quantity
self._cr.execute("SELECT 1 FROM stock_quant WHERE id = %s FOR UPDATE NOWAIT", [quant.id], log_exceptions=False)
File "/opt/rabeh/odoo/odoo/sql_db.py", line 148, in wrapper
return f(self, *args, **kwargs)
File "/opt/rabeh/odoo/odoo/sql_db.py", line 225, in execute
res = self._obj.execute(query, params)
psycopg2.errors.SerializationFailure: could not serialize access due to concurrent update
Expected behavior:
i think this line should generate could not obtain lock
i was just wondering when it could generate "could not serialize access due to concurrent update"

It could have obtained the lock, as the lock is currently available. But it had been locked at some previous point which overlaps with the current transaction's snapshot. So obtaining the lock is possible, but would create a serialization problem if it were to acquire it. Reporting that as a serialization failure seems like the correct outcome.
Using FOR UPDATE NOWAIT in a transaction with elevated isolation level seems inconsistent, or at least unnecessary. What are you hoping to accomplish by doing this? Your description of "while creating pos picking from a job" doesn't elucidate this for me. Is that some odoo-specific jargon?

KeyError: 'browser' with Splinter and Behaving automated testing

I followed the instructions here: http://shon.github.io/2014/06/19/ui_testing_and_bdd.html about setting up Splinter with Behaving to run automated tests. I'm able to run a test successfully, but at the end of the test, it throws an error saying:
KeyError: 'browser'
and it won't continue testing any additional feature files. I'm pretty new to python and need some help in troubleshooting this.
Exception KeyError: 'browser'
Traceback (most recent call last):
File "/usr/local/bin/behave", line 11, in <module> sys.exit(main())
File "/Library/Python/2.7/site-packages/behave/__main__.py", line 109, in main
failed = runner.run()
File "/Library/Python/2.7/site-packages/behave/runner.py", line 672, in run
return self.run_with_paths()
File "/Library/Python/2.7/site-packages/behave/runner.py", line 693, in run_with_paths
return self.run_model()
File "/Library/Python/2.7/site-packages/behave/runner.py", line 483, in run_model
failed = feature.run(self)
File "/Library/Python/2.7/site-packages/behave/model.py", line 523, in run
failed = scenario.run(runner)
File "/Library/Python/2.7/site-packages/behave/model.py", line 867, in run
runner.run_hook('before_scenario', runner.context, self)
File "/Library/Python/2.7/site-packages/behave/runner.py", line 405, in run_hook
self.hooks[name](context, *args)
File "features/environment.py", line 48, in before_scenario
context.browser = default_browser
File "/Library/Python/2.7/site-packages/behave/runner.py", line 223, in __setattr__
record = self._record[attr]
KeyError: 'browser'

I found the issue. It is related to the Feature file structure. The Feature file was missing:
Background:
Given a browser
This also required changes to the environment.py file based on the info here: https://github.com/ggozad/behaving

When I shutdown the Zope server it shows an AttributeError

I am using Plone 4.3.3 for creating my Plone site but when I shut-down the server it shows the following error.
Traceback (most recent call last):
File "/Plone/zinstance/parts/instance/bin/interpreter", line 298, in <module>
exec(compile(__file__f.read(), __file__, "exec"))
File "/Plone/buildout-cache/eggs/Zope2-2.13.22-py2.7.egg/Zope2/Startup/run.py", line 76, in <module>
run()
File "/Plone/buildout-cache/eggs/Zope2-2.13.22-py2.7.egg/Zope2/Startup/run.py", line 26, in run
starter.run()
File "/Plone/buildout-cache/eggs/Zope2-2.13.22-py2.7.egg/Zope2/Startup/__init__.py", line 108, in run
self.shutdown()
File "/Plone/buildout-cache/eggs/Zope2-2.13.22-py2.7.egg/Zope2/Startup/__init__.py", line 113, in shutdown
db.close()
File "/Plone/buildout-cache/eggs/ZODB3-3.10.5-py2.7-linux-i686.egg/ZODB/DB.py", line 624, in close
user#user-Vostro-3300:~/Plone/zinstance$ #self._connectionMap
File "/Plone/buildout-cache/eggs/ZODB3-3.10.5-py2.7-linux-i686.egg/ZODB/DB.py", line 506, in _connectionMap
self.pool.map(f)
File "/Plone/buildout-cache/eggs/ZODB3-3.10.5-py2.7-linux-i686.egg/ZODB/DB.py", line 206, in map
self.all.map(f)
File "/Plone/buildout-cache/eggs/transaction-1.1.1-py2.7.egg/transaction/weakset.py", line 58, in map
f(elt)
File "/Plone/buildout-cache/eggs/ZODB3-3.10.5-py2.7-linux-i686.egg/ZODB/DB.py", line 628, in _
c._release_resources()
File "/Plone/buildout-cache/eggs/ZODB3-3.10.5-py2.7-linux-i686.egg/ZODB/Connection.py", line 1075, in _release_resources
c._storage.release()
AttributeError: 'NoneType' object has no attribute 'release'

There is an issue with Zope2 shutdown that tries to close a database connection (and in turn, a storage). However, this late-running sequence has some cosmetic side-effect for users of RelStorage. This is annoying, but not fundamentally a problem that should cause any data integrity issues.
Users of FileStorage or ZEO should not see this.
References:
https://github.com/zopefoundation/Zope/commit/5032027470091957a6c0028da04c0fc0a1ed646b
https://mail.zope.org/pipermail/zodb-dev/2013-August/015119.html

weird error with django-celery or python

I'm having trouble running tasks. I run ./manage celeryd -B -l info, it correctly loads all tasks to registry.
The error happens when any of the tasks run - the task starts, does its thing, and then I get:
[ERROR/MainProcess] Thread 'ResultHandler' crashed: ValueError('Octet out of range 0..2**64-1',)
Traceback (most recent call last):
File "/Users/jzelez/Sites/my_virtual_env/lib/python2.7/site-packages/celery/concurrency/processes/pool.py", line 221, in run
return self.body()
File "/Users/jzelez/Sites/my_virtual_env/lib/python2.7/site-packages/celery/concurrency/processes/pool.py", line 458, in body
on_state_change(task)
File "/Users/jzelez/Sites/my_virtual_env/lib/python2.7/site-packages/celery/concurrency/processes/pool.py", line 436, in on_state_change
state_handlers[state](*args)
File "/Users/jzelez/Sites/my_virtual_env/lib/python2.7/site-packages/celery/concurrency/processes/pool.py", line 413, in on_ack
cache[job]._ack(i, time_accepted, pid)
File "/Users/jzelez/Sites/my_virtual_env/lib/python2.7/site-packages/celery/concurrency/processes/pool.py", line 1016, in _ack
self._accept_callback(pid, time_accepted)
File "/Users/jzelez/Sites/my_virtual_env/lib/python2.7/site-packages/celery/worker/job.py", line 424, in on_accepted
self.acknowledge()
File "/Users/jzelez/Sites/my_virtual_env/lib/python2.7/site-packages/celery/worker/job.py", line 516, in acknowledge
self.on_ack()
File "/Users/jzelez/Sites/my_virtual_env/lib/python2.7/site-packages/celery/worker/consumer.py", line 405, in ack
message.ack()
File "/Users/jzelez/Sites/my_virtual_env/lib/python2.7/site-packages/kombu-2.1.0-py2.7.egg/kombu/transport/base.py", line 98, in ack
self.channel.basic_ack(self.delivery_tag)
File "/Users/jzelez/Sites/my_virtual_env/lib/python2.7/site-packages/amqplib-1.0.2-py2.7.egg/amqplib/client_0_8/channel.py", line 1740, in basic_ack
args.write_longlong(delivery_tag)
File "/Users/jzelez/Sites/my_virtual_env/lib/python2.7/site-packages/amqplib-1.0.2-py2.7.egg/amqplib/client_0_8/serialization.py", line 325, in write_longlong
raise ValueError('Octet out of range 0..2**64-1')
ValueError: Octet out of range 0..2**64-1
I also must note that this worked on my previous Lion install, and even if I create a blank virtualenv with some test code, when a task runs it gives this error.
This happens with Python 2.7.2 and 2.6.4.
Django==1.3.1
amqplib==1.0.2
celery==2.4.6
django-celery==2.4.2

It appears there is some bug with homebrew install python. I've now switched to the native Lion one (2.7.1) and it works.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Airflow scheduler crashes when a DAG is run - python

Related

Multiprocessing; How to debug: _pickle.PicklingError: Could not pickle object as excessively deep recursion required

could not serialize access due to concurrent update while creating pos picking from a job

KeyError: 'browser' with Splinter and Behaving automated testing

When I shutdown the Zope server it shows an AttributeError

weird error with django-celery or python

Categories

Resources