I have scripts running 24/7 that sometimes get stuck when a thread in concurrent.futures gets no response for a request.
The hanging-threads 2.0.5 module prints out which thread hangs and why.
The print looks something like this:
Thread 139646566659840 "ThreadPoolExecutor-666849_1" hangs -
File "/usr/lib/python3.9/threading.py", line 912, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
self.run()
File "/usr/lib/python3.9/threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 77, in _worker
work_item.run()
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
How can I, instead of just printing out the hanging threads and files, raise an exception when a thread is not responding in a given time? The script should just restart itself if hanging occurs, instead of waiting for a response.
I have tried with timeout, but concurrent futures can not be cancelled while running.
concurrent futures can not be cancelled while running
This is your problem. A hanging thread is still 'running'. Cancelling it from outside is not possible.
Thus you have two options:
switch to something which can be cancelled, like a ProcessPoolExecutor, or
rewrite the blocking code so it fails.
Since you say 'response to a request'---if this is a network request and you are early enough/frustrated enough in the dev cycle I thoroughly recommend switching to a concurrent multiprocessing framework like asyncio. This is exactly what they were developed for. In particular you may be interested in trios implementation of cancel scopes.
Related
I use asyncio to write to mongo, using motor library.
When I have few bulk_writes, it works with no problem.
However, when I have many write in the same time, I get an exception RuntimeError: can't start new thread.
File "/usr/local/lib/python3.7/site-packages/motor/metaprogramming.py", line 77, in method **unwrapped_kwargs)
File "/usr/local/lib/python3.7/site-packages/motor/frameworks/asyncio/__init__.py", line 74 in run_on_executor
_EXECUTOR, functools.partial(fn, *args, **kwargs))
File "uvloop/loop.pyx", line 2702, in uvloop.loop.Loop.run_in_exector
File "/usr/local/lib/python3.7/concurrent/features/thread.py", line 160, in submit
self._adjust_thread_count()
File "/usr/local/lib/python3.7/concurrent/features/thread.py", line 181, in _adjust_thread_count
t.start()
File "usr/local/lib/python3.7/threading.py", line 847, in start
_start_new_thread(self._bootsrap, ())
RuntimeError: can't start new thread
I tried to change maxPoolSize, but it didn't work.
Important facts:
In my local computer, it works with no errors. However, in Openshift I have this problems.
In Openshift I run my code via gunicorn via gunicorn app:app --worker-class uvicorn.workers.UvicornWorker
In Openshift, when I have only one worker, it works. But with 2+ worker I have this problem.
I don't open many connection of AsyncIOMontorClient, I have only two at a time.
With pymongo with almost the same code, I have no error, but there is no asyncio support in pymongo.
Without the part of mongo, my code works with no problems.
Solved.
There is a limit of 1024 threads per openshift pod.
I am using multiprocessing.pool to work with the http server in python - it works great, but when I terminate, I get a slew of errors from all the spawnpoolworkers - and I'm just wondering how I avoid this.
My main code:
def run(self):
global pool
port = self.arguments.port
pool = multiprocessing.Pool( processes= self.arguments.threads)
with http.server.HTTPServer( ("", port), Handler ) as daemon:
print(f"serving on port {port}")
while True:
try:
daemon.handle_request()
except KeyboardInterrupt:
print("\nexiting")
pool.terminate()
pool.join()
return 0
I've tried doing nothing to the pool, I've tried doing pool.close() - I've tried not joining. But even if I just run that - never even access the port or call anything onto the pool, I still get a random list of things like this when I press control-c
Process SpawnPoolWorker-8:
Process SpawnPoolWorker-4:
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/opt/homebrew/Cellar/python#3.10/3.10.1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/queues.py", line 365, in get
with self._rlock:
File "/opt/homebrew/Cellar/python#3.10/3.10.1/F
how do I exit the pool cleanly, with no errors, and with no output?
ok - I'm stupid - the control-c was also interrupting all the child processes. This fixed it:
def ignore_control_c():
signal.signal(signal.SIGINT, signal.SIG_IGN)
pool = multiprocessing.Pool( processes = self.arguments.threads, initializer = ignore_control_c )
I have a simple execution statement that updates if a condition is met. I understand a deadlock is when both process A and B fight for the same resources, and they both rely on another to finish.
However, I just have a simple UPDATE statement, and I can't find the other source of the deadlock - but I was just confused how this line can cause a deadlock statement:
await self.bot.pg_con.execute("UPDATE gameusers SET raid_pass = raid_pass + 1 WHERE raid_pass < 10")
File "/home/debian/.local/lib/python3.7/site-packages/asyncpg/pool.py", line 518, in execute
return await con.execute(query, *args, timeout=timeout)
File "/home/debian/.local/lib/python3.7/site-packages/asyncpg/connection.py", line 272, in execute
return await self._protocol.query(query, timeout)
File "asyncpg/protocol/protocol.pyx", line 316, in query
asyncpg.exceptions.DeadlockDetectedError: deadlock detected
DETAIL: Process 24326 waits for ShareLock on transaction 75790925; blocked by process 24240.
Process 24240 waits for ShareLock on transaction 75790924; blocked by process 24326.
How can it occur in a simple UPDATE statement?
Thanks for your time.
I have a coroutine that waiting for a signal rising :
#cocotb.coroutine
def wait_for_rise(self):
yield RisingEdge(self.dut.mysignal)
I'm launching it in my «main» test function like it :
mythread = cocotb.fork(wait_for_rise())
I want to stop it after a while even if no signal rise happen. I tryed to «kill» it:
mythread.kill()
But exception happen :
Send raised exception: 'RunningCoroutine' object has no attribute '_join'
File "/opt/cocotb/cocotb/decorators.py", line 121, in send
return self._coro.send(value)
File "/myproject.py", line 206, in i2c_read
wTXDRwthread.kill()
File "/opt/cocotb/cocotb/decorators.py", line 151, in kill
cocotb.scheduler.unschedule(self)
File "/opt/cocotb/cocotb/scheduler.py", line 453, in unschedule
if coro._join in self._trigger2coros:
Is there a solution to stop forked coroutine properly ?
This very much looks like it is the same problem as in https://github.com/potentialventures/cocotb/issues/650 - you can subscribe to the issue to be notified when its status changes.
I'm trying to make a worker run only one task at a time, then shutdown. I've got the shutdown part working correctly (some background here: celery trying shutdown worker by raising SystemExit in task_postrun signal but always hangs and the main process never exits), but when it shuts down, I'm getting an error:
[2013-02-13 12:19:05,689: CRITICAL/MainProcess] Couldn't ack 1, reason:AttributeError("'NoneType' object has no attribute 'method_writer'",)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/kombu/transport/base.py", line 104, in ack_log_error
self.ack()
File "/usr/local/lib/python2.7/site-packages/kombu/transport/base.py", line 99, in ack
self.channel.basic_ack(self.delivery_tag)
File "/usr/local/lib/python2.7/site-packages/amqplib/client_0_8/channel.py", line 1742, in basic_ack
self._send_method((60, 80), args)
File "/usr/local/lib/python2.7/site-packages/amqplib/client_0_8/abstract_channel.py", line 75, in _send_method
self.connection.method_writer.write_method(self.channel_id,
AttributeError: 'NoneType' object has no attribute 'method_writer'
Why is this happening? Not only does it not ack, but it also purges all of the other tasks that are left in the queue (big problem).
How do I fix this?
UPDATE
Below is the stack trace with everything updated (pip install -U kombu amqp amqplib celery):
[2013-02-13 11:58:05,357: CRITICAL/MainProcess] Internal error: AttributeError("'NoneType' object has no attribute 'method_writer'",)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/worker/__init__.py", line 372, in process_task
req.execute_using_pool(self.pool)
File "/usr/local/lib/python2.7/dist-packages/celery/worker/job.py", line 219, in execute_using_pool
timeout=task.time_limit)
File "/usr/local/lib/python2.7/dist-packages/celery/concurrency/base.py", line 137, in apply_async
**options)
File "/usr/local/lib/python2.7/dist-packages/celery/concurrency/base.py", line 27, in apply_target
callback(target(*args, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/celery/worker/job.py", line 333, in on_success
self.acknowledge()
File "/usr/local/lib/python2.7/dist-packages/celery/worker/job.py", line 439, in acknowledge
self.on_ack(logger, self.connection_errors)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/base.py", line 98, in ack_log_error
self.ack()
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/base.py", line 93, in ack
self.channel.basic_ack(self.delivery_tag)
File "/usr/local/lib/python2.7/dist-packages/amqp/channel.py", line 1562, in basic_ack
self._send_method((60, 80), args)
File "/usr/local/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 57, in _send_method
self.connection.method_writer.write_method(
AttributeError: 'NoneType' object has no attribute 'method_writer'
Exiting in task_postrun is not recommended as task_postrun is executed outside of the "task body" error handling.
Exactly what happens when a task calls sys.exit is not well defined,
and actually it depends on the pool being used.
With multiprocessing the child process will simply be replaced by a new one.
In other pools the worker will shutdown, but this is something that is likely to change
so that it's consistent with multiprocessing behavior.
Calling exit outside of the task body is regarded as an internal error (crash).
The "task body" is whatever executes at task.__call__()
I think maybe a better solution for this would be to use a custom execution
strategy:
from celery.worker import strategy
from functools import wraps
#staticmethod
def shutdown_after_strategy(task, app, consumer):
default_handler = strategy.default(task, app, consumer)
def _shutdown_to_exit_after(fun):
#wraps(fun)
def _inner(*args, **kwargs):
try:
return fun(*args, **kwargs)
finally:
raise SystemExit()
return _inner
return _decorate_to_exit_after(default_handler)
#celery.task(Strategy=shutdown_after_strategy)
def shutdown_after():
print('will shutdown after this')
This isn't exactly beautiful, but the execution strategy is there to optimize
task execution and not to be easily extendable (the worker "precompiles" the execution
path for each task type by caching Task.Strategy)
In Celery 3.1 you can extend the worker and consumer using "bootsteps", so likely
there will be a pretty solution then.