I am writing tests using pytest and pytest-xdist and I want to run pytest_sessionstart before all workers start running and pytest_sessionfinish when they are done.
I found this solution: link, but this is not working as expected. There are multiple sessions starting and finishing during test run. Hence multiple cleanups are done, which cause the tests to fail (it cleans tmp directory and tests fails with FileNotFoundError).
I added code to write to file when session is started and once it is finished. The log looks like this:
init 0x00007f988f5ee120
worker init gw0
...
worker init gw7
init 0x00007f229cdac2e0
cleanup 0x00007f229cdac2e0 0
init 0x00007f1a31a4e2e0
cleanup 0x00007f1a31a4e2e0 0
worker done gw0
...
worker done gw4
cleanup 0x00007f988f5ee120 1
As you can see there are some session starting after all workers started and before they are done.
My code looks like this:
def pytest_sessionstart(session: pytest.Session):
if hasattr(session.config, 'workerinput'):
# log to file 'worker init {id}'
return
# log to file 'init {sess id}'
# do some stuff
def pytest_sessionfinish(session: pytest.Session, exitstatus: int):
if hasattr(session.config, 'workerinput'):
# log to file 'worker done {id}'
return
# log to file 'cleanup {sess id} {exitstatus}'
# do some stuff
It turned out that vs-code starts pytest in background with --collect-only argument. Those session were not filtered as they were not worker sessions and they performed init / cleanup.
The solution is to add checking if argument --collect-only is present.
Code:
def pytest_sessionfinish(session: pytest.Session, exitstatus: int):
if hasattr(session.config, 'workerinput'):
return
if '--collect-only' in session.config.invocation_params.args:
return
# do some stuff
Related
In a nutshell
I get a BrokenProcessPool exception when parallelizing my code with concurrent.futures. No further error is displayed. I want to find the cause of the error and ask for ideas of how to do that.
Full problem
I am using concurrent.futures to parallelize some code.
with ProcessPoolExecutor() as pool:
mapObj = pool.map(myMethod, args)
I end up with (and only with) the following exception:
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
Unfortunately, the program is complex and the error appears only after the program has run for 30 minutes. Therefore, I cannot provide a nice minimal example.
In order to find the cause of the issue, I wrapped the method that I run in parallel with a try-except-block:
def myMethod(*args):
try:
...
except Exception as e:
print(e)
The problem remained the same and the except block was never entered. I conclude that the exception does not come from my code.
My next step was to write a custom ProcessPoolExecutor class that is a child of the original ProcessPoolExecutor and allows me to replace some methods with cusomized ones. I copied and pasted the original code of the method _process_worker and added some print statements.
def _process_worker(call_queue, result_queue):
"""Evaluates calls from call_queue and places the results in result_queue.
...
"""
while True:
call_item = call_queue.get(block=True)
if call_item is None:
# Wake up queue management thread
result_queue.put(os.getpid())
return
try:
r = call_item.fn(*call_item.args, **call_item.kwargs)
except BaseException as e:
print("??? Exception ???") # newly added
print(e) # newly added
exc = _ExceptionWithTraceback(e, e.__traceback__)
result_queue.put(_ResultItem(call_item.work_id, exception=exc))
else:
result_queue.put(_ResultItem(call_item.work_id,
result=r))
Again, the except block is never entered. This was to be expected, because I already ensured that my code does not raise an exception (and if everything worked well, the exception should be passed to the main process).
Now I am lacking ideas how I could find the error. The exception is raised here:
def submit(self, fn, *args, **kwargs):
with self._shutdown_lock:
if self._broken:
raise BrokenProcessPool('A child process terminated '
'abruptly, the process pool is not usable anymore')
if self._shutdown_thread:
raise RuntimeError('cannot schedule new futures after shutdown')
f = _base.Future()
w = _WorkItem(f, fn, args, kwargs)
self._pending_work_items[self._queue_count] = w
self._work_ids.put(self._queue_count)
self._queue_count += 1
# Wake up queue management thread
self._result_queue.put(None)
self._start_queue_management_thread()
return f
The process pool is set to be broken here:
def _queue_management_worker(executor_reference,
processes,
pending_work_items,
work_ids_queue,
call_queue,
result_queue):
"""Manages the communication between this process and the worker processes.
...
"""
executor = None
def shutting_down():
return _shutdown or executor is None or executor._shutdown_thread
def shutdown_worker():
...
reader = result_queue._reader
while True:
_add_call_item_to_queue(pending_work_items,
work_ids_queue,
call_queue)
sentinels = [p.sentinel for p in processes.values()]
assert sentinels
ready = wait([reader] + sentinels)
if reader in ready:
result_item = reader.recv()
else: #THIS BLOCK IS ENTERED WHEN THE ERROR OCCURS
# Mark the process pool broken so that submits fail right now.
executor = executor_reference()
if executor is not None:
executor._broken = True
executor._shutdown_thread = True
executor = None
# All futures in flight must be marked failed
for work_id, work_item in pending_work_items.items():
work_item.future.set_exception(
BrokenProcessPool(
"A process in the process pool was "
"terminated abruptly while the future was "
"running or pending."
))
# Delete references to object. See issue16284
del work_item
pending_work_items.clear()
# Terminate remaining workers forcibly: the queues or their
# locks may be in a dirty state and block forever.
for p in processes.values():
p.terminate()
shutdown_worker()
return
...
It is (or seems to be) a fact that a process terminates, but I have no clue why. Are my thoughts correct so far? What are possible causes that make a process terminate without a message? (Is this even possible?) Where could I apply further diagnostics? Which questions should I ask myself in order to come closer to a solution?
I am using python 3.5 on 64bit Linux.
I think I was able to get as far as possible:
I changed the _queue_management_worker method in my changed ProcessPoolExecutor module such that the exit code of the failed process is printed:
def _queue_management_worker(executor_reference,
processes,
pending_work_items,
work_ids_queue,
call_queue,
result_queue):
"""Manages the communication between this process and the worker processes.
...
"""
executor = None
def shutting_down():
return _shutdown or executor is None or executor._shutdown_thread
def shutdown_worker():
...
reader = result_queue._reader
while True:
_add_call_item_to_queue(pending_work_items,
work_ids_queue,
call_queue)
sentinels = [p.sentinel for p in processes.values()]
assert sentinels
ready = wait([reader] + sentinels)
if reader in ready:
result_item = reader.recv()
else:
# BLOCK INSERTED FOR DIAGNOSIS ONLY ---------
vals = list(processes.values())
for s in ready:
j = sentinels.index(s)
print("is_alive()", vals[j].is_alive())
print("exitcode", vals[j].exitcode)
# -------------------------------------------
# Mark the process pool broken so that submits fail right now.
executor = executor_reference()
if executor is not None:
executor._broken = True
executor._shutdown_thread = True
executor = None
# All futures in flight must be marked failed
for work_id, work_item in pending_work_items.items():
work_item.future.set_exception(
BrokenProcessPool(
"A process in the process pool was "
"terminated abruptly while the future was "
"running or pending."
))
# Delete references to object. See issue16284
del work_item
pending_work_items.clear()
# Terminate remaining workers forcibly: the queues or their
# locks may be in a dirty state and block forever.
for p in processes.values():
p.terminate()
shutdown_worker()
return
...
Afterwards I looked up the meaning of the exit code:
from multiprocessing.process import _exitcode_to_name
print(_exitcode_to_name[my_exit_code])
whereby my_exit_code is the exit code that was printed in the block I inserted to the _queue_management_worker. In my case the code was -11, which means that I ran into a segmentation fault. Finding the reason for this issue will be a huge task but goes beyond the scope of this question.
If you are using macOS, there is a known issue with how some versions of macOS uses forking that's not considered fork-safe by Python in some scenarios. The workaround that worked for me is to use no_proxy environment variable.
Edit ~/.bash_profile and include the following (it might be better to specify list of domains or subnets here, instead of *)
no_proxy='*'
Refresh the current context
source ~/.bash_profile
My local versions the issue was seen and worked around are: Python 3.6.0 on
macOS 10.14.1 and 10.13.x
Sources:
Issue 30388
Issue 27126
let me know... how to setup worker for the arq job in python.
error i got
assert len(self.functions) > 0, 'at least one function or cron_job must be registered'
AssertionError: at least one function or cron_job must be registered
You need a function that the worker should run. Otherwise the worker would be quite unnecessary.
For example with the function the_tasks, you add it to the functions argument of the worker:
arq import Worker
async def the_task(ctx):
print('running the task')
return 42
w = Worker(functions=[the_task],
redis_settings=WorkerSettings.redis_settings(),
max_jobs=1000,
keep_result_forever=True,
job_timeout=86000,
max_tries=1000)
w.run()
Maybe start with the demo example: https://arq-docs.helpmanual.io/#simple-usage
The goal: I need to setup some random value in pytest cache before test collection.
The problem: If I run tests in parallel using pytest-xdist with --cache-clear option master and each worker will clear the cache so I need to make sure that all workers are ready before setting value.
Possible solution:
def pytest_configure(config):
if (
hasattr(config, "workerinput")
and "--cache-clear" in config.invocation_params.args
):
# if it's worker of parallel run with cache clearing,
# need to wait to make sure that all workers are started.
# Otherwise, the next started worker will clear cache and create
# a new organization name
time.sleep(10)
name = config.cache.get(CACHE_ORG_KEY, None)
if not name:
name = <set random value>
config.cache.set(CACHE_ORG_KEY, name)
It works fine. I have 10 second sleep and seems it's enough for starting all workers (nodes). All workers are started, all of them clear the cache. The first one sets value to cache and others get it. But I don't like this approach, because there is no guarantee that all workers are started + extra waiting time.
I think about other approaches:
Disable clearing the cache for workers
Check that all workers are started
But I can not figure out how to do it. Any ideas?
UPD #1. Minimal reproducible example
Requirements:
pytest==6.2.5
pytest-xdist==2.5.0
Code:
conftest.py
import time
from test_clear_cache import set_name
def pytest_configure(config):
# if (
# hasattr(config, "workerinput")
# and "--cache-clear" in config.invocation_params.args
# ):
# time.sleep(10)
name = config.cache.get("name", None)
if not name:
name = f"name_{time.time_ns()}"
config.cache.set("name", name)
set_name(name)
test_clear_cache.py
import sys
NAME = "default"
def set_name(name):
global NAME
NAME = name
def test_clear_cache1():
print(f"Test #1: {NAME}", file=sys.stderr)
def test_clear_cache2():
print(f"Test #2: {NAME}", file=sys.stderr)
def test_clear_cache3():
print(f"Test #3: {NAME}", file=sys.stderr)
def test_clear_cache4():
print(f"Test #4: {NAME}", file=sys.stderr)
Output:
(venv) C:\Users\HP\PycharmProjects\PytestCacheClear>pytest -s -n=4 --cache-clear
========================================================================================================== test session starts ===========================================================================================================
platform win32 -- Python 3.7.8, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: C:\Users\HP\PycharmProjects\PytestCacheClear
plugins: forked-1.4.0, xdist-2.5.0
gw0 [4] / gw1 [4] / gw2 [4] / gw3 [4]
Test #4: name_1643377905887805600
Test #3: name_1643377905816748300
Test #2: name_1643377905735875700
Test #1: name_1643377905645880100
....
=========================================================================================================== 4 passed in 0.61s ============================================================================================================
Note: if you uncomment code in conftest.py tests will print the same name.
Do I need to worry about file atomicity in luigi with the following code, picking a dataframe and returning it as an output from a task? I don't get the atomicity part, as I would hope luigi would just wait for the task to complete writing a file before stating the task is complete.
class readSQLtoPickle(luigi.Task):
sql = luigi.Parameter()
pickle = luigi.Parameter()
def output(self):
return luigi.LocalTarget(self.pickle,format=format.Nop)
def run(self):
data = pd.read_sql(self.sql, ariel)
with self.output().open('w') as f:
pickle.dump(data, f)
class grabData(luigi.Task): # standard Luigi Task class
sql = luigi.Parameter(default="SELECT * FROM DIM_DRUG_PRODUCT")
pickle = luigi.Parameter(default="drug_product.pkl")
def requires(self):
# we need to read the log file before we can process it
return readSQLtoPickle(sql=self.sql, pickle=self.pickle)
def run(self):
with self.input().open('r') as f:
df = pickle.load(f)
print(type(df))
print(df.head(100))
print(len(df))
Writing to LocalTarget is atomic. Behind the scene lugi first writes to a temp file and then moves the temp file to your actual target. Look for atomic_file in the source code
I don't get the atomicity part, as I would hope luigi would just wait for the task to complete writing a file before stating the task is complete.
If you use a local scheduler to run your task (--local-scheduler) and have only one worker, then you should be fine.
It becomes a problem if you have several workers working on the same tasks and are trying to identity which tasks are now available to run.
In your example one worker could be trying to check if grabData is ready to run, and see that the file is available while another worker is in the middle of readSQLtoPickle writing on the file.
I want to use pyinotify to watch changes on the filesystem. If a file has changed, I want to update my database file accordingly (re-read tags, other information...)
I put the following code in my app's signals.py
import pyinotify
....
# create filesystem watcher in seperate thread
wm = pyinotify.WatchManager()
notifier = pyinotify.ThreadedNotifier(wm, ProcessInotifyEvent())
# notifier.setDaemon(True)
notifier.start()
mask = pyinotify.IN_CLOSE_WRITE | pyinotify.IN_CREATE | pyinotify.IN_MOVED_TO | pyinotify.IN_MOVED_FROM
dbgprint("Adding path to WatchManager:", settings.MUSIC_PATH)
wdd = wm.add_watch(settings.MUSIC_PATH, mask, rec=True, auto_add=True)
def connect_all():
"""
to be called from models.py
"""
rescan_start.connect(rescan_start_callback)
upload_done.connect(upload_done_callback)
....
This works great when django is run with ''./manage.py runserver''. However, when run as ''./manage.py runfcgi'' django won't start. There is no error message, it just hangs and won't daemonize, probably at the line ''notifier.start()''.
When I run ''./manage.py runfcgi method=threaded'' and enable the line ''notifier.setDaemon(True)'', then the notifier thread is stopped (isAlive() = False).
What is the correct way to start endless threads together with django when django is run as fcgi? Is it even possible?
Well, duh. Never start an own, endless thread besides django. I use celery, where it works a bit better to run such threads.