I use asyncio to write to mongo, using motor library.
When I have few bulk_writes, it works with no problem.
However, when I have many write in the same time, I get an exception RuntimeError: can't start new thread.
File "/usr/local/lib/python3.7/site-packages/motor/metaprogramming.py", line 77, in method **unwrapped_kwargs)
File "/usr/local/lib/python3.7/site-packages/motor/frameworks/asyncio/__init__.py", line 74 in run_on_executor
_EXECUTOR, functools.partial(fn, *args, **kwargs))
File "uvloop/loop.pyx", line 2702, in uvloop.loop.Loop.run_in_exector
File "/usr/local/lib/python3.7/concurrent/features/thread.py", line 160, in submit
self._adjust_thread_count()
File "/usr/local/lib/python3.7/concurrent/features/thread.py", line 181, in _adjust_thread_count
t.start()
File "usr/local/lib/python3.7/threading.py", line 847, in start
_start_new_thread(self._bootsrap, ())
RuntimeError: can't start new thread
I tried to change maxPoolSize, but it didn't work.
Important facts:
In my local computer, it works with no errors. However, in Openshift I have this problems.
In Openshift I run my code via gunicorn via gunicorn app:app --worker-class uvicorn.workers.UvicornWorker
In Openshift, when I have only one worker, it works. But with 2+ worker I have this problem.
I don't open many connection of AsyncIOMontorClient, I have only two at a time.
With pymongo with almost the same code, I have no error, but there is no asyncio support in pymongo.
Without the part of mongo, my code works with no problems.
Solved.
There is a limit of 1024 threads per openshift pod.
Related
I have scripts running 24/7 that sometimes get stuck when a thread in concurrent.futures gets no response for a request.
The hanging-threads 2.0.5 module prints out which thread hangs and why.
The print looks something like this:
Thread 139646566659840 "ThreadPoolExecutor-666849_1" hangs -
File "/usr/lib/python3.9/threading.py", line 912, in _bootstrap
self._bootstrap_inner()
File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
self.run()
File "/usr/lib/python3.9/threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 77, in _worker
work_item.run()
File "/usr/lib/python3.9/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
How can I, instead of just printing out the hanging threads and files, raise an exception when a thread is not responding in a given time? The script should just restart itself if hanging occurs, instead of waiting for a response.
I have tried with timeout, but concurrent futures can not be cancelled while running.
concurrent futures can not be cancelled while running
This is your problem. A hanging thread is still 'running'. Cancelling it from outside is not possible.
Thus you have two options:
switch to something which can be cancelled, like a ProcessPoolExecutor, or
rewrite the blocking code so it fails.
Since you say 'response to a request'---if this is a network request and you are early enough/frustrated enough in the dev cycle I thoroughly recommend switching to a concurrent multiprocessing framework like asyncio. This is exactly what they were developed for. In particular you may be interested in trios implementation of cancel scopes.
I try to built web UI for solving optimization problem by using Flask as web framework, Pyomo as optimization library and CBC as optimization engine. The error appear when I call solver while running web server.
If I run only optimization task, I get no error. It seems like the problem occur when using with Flask web server.
The error occur when Flask call this line solver = pyomo.SolverFactory('cbc', executable='CBC_PATH')
Error when running web server:
File "C:\Users\siwapolt\Envs\venv\lib\site-packages\pyomo\opt\base\solvers.py", line 582, in solve
_status = self._apply_solver()
File "C:\Users\siwapolt\Envs\venv\lib\site-packages\pyomo\opt\solver\shellcmd.py", line 244, in _apply_solver
self._rc, self._log = self._execute_command(self._command)
File "C:\Users\siwapolt\Envs\venv\lib\site-packages\pyomo\opt\solver\shellcmd.py", line 308, in _execute_command
define_signal_handlers = self._define_signal_handlers
File "C:\Users\siwapolt\Envs\venv\lib\site-packages\pyutilib\subprocess\processmngr.py", line 545, in run_command
= signal.signal(signal.SIGINT, handler)
File "c:\users\siwapolt\appdata\local\continuum\anaconda3\Lib\signal.py", line 47, in signal
handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
ValueError: signal only works in main thread
Yes, as long as you have PyUtilib 5.6.3, you have this fix. That said, signal handlers are still on by default. If you want to turn it off, you need to:
import pyutilib.subprocess.GlobalData
pyutilib.subprocess.GlobalData.DEFINE_SIGNAL_HANDLERS_DEFAULT = False
References: https://github.com/PyUtilib/pyutilib/issues/31#issuecomment-382479024
I have a celery worker, running on a Ubuntu 14.04 server, that is reading and writing to an MSSQL Server database using pyodbc and the freetds driver. When the SQL Server goes down, the function fails as expected and the celery worker starts trying to clean up and get ready for the next task. At this time, the worker calls django's "connection.close()" method. This method appears to send a command to rollback any incomplete transactions. Since the server is down this throws an exception that is not caught by the celery worker. The worker then hangs and neither releases the task or move on to the next task.
I tried overriding the on_failure and after_return methods for the function and calling connection.close() there (as specified in other answers), but that didn't work. I suspect it is because the when I call connection.close() it has the same issue, and just bubbles the exception up, or because celery's cleanup code gets run before those two methods get called.
Any ideas on how to either catch this exception before it gets to celery, or avoid it all together?
Below is the stack trace of the exception:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 283, in trace_task
state, retval, uuid, args, kwargs, None,
File "/var/www/cortex/corespring/tasks.py", line 13, in after_return
connections['xxx'].close()
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/init.py", line 317, in close
self.connection.close()
File "/usr/local/lib/python2.7/dist-packages/pyodbc.py", line 2642, in close
self.rollback()
File "/usr/local/lib/python2.7/dist-packages/pyodbc.py", line 2565, in rollback
check_success(self, ret)
File "/usr/local/lib/python2.7/dist-packages/pyodbc.py", line 987, in check_success
ctrl_err(SQL_HANDLE_DBC, ODBC_obj.dbc_h, ret, ODBC_obj.ansi)
File "/usr/local/lib/python2.7/dist-packages/pyodbc.py", line 965, in ctrl_err
raise DatabaseError(state,err_text)
DatabaseError: (u'08S01', u'[08S01] [FreeTDS][SQL Server]Write to the server failed')
I'm using Django 1.6 and Django-ImageKit 3.2.1.
I'm trying to generate images asynchronously with ImageKit. Async image generation works locally but not on the production server.
I'm using Celery and I've tried both:
IMAGEKIT_DEFAULT_CACHEFILE_BACKEND = 'imagekit.cachefiles.backends.Async'
IMAGEKIT_DEFAULT_CACHEFILE_BACKEND = 'imagekit.cachefiles.backends.Celery'
Using the Simple backend (synchronous) instead of Async or Celery works fine on the production server. So I don't understand why the asynchronous backend gives me the following ImportError (pulled from the Celery log):
[2014-04-05 21:51:26,325: CRITICAL/MainProcess] Can't decode message body: DecodeError(ImportError('No module named s3utils',),) [type:u'application/x-python-serialize' encoding:u'binary' headers:{}]
body: '\x80\x02}q\x01(U\x07expiresq\x02NU\x03utcq\x03\x88U\x04argsq\x04cimagekit.cachefiles.backends\nCelery\nq\x05)\x81q\x06}bcimagekit.cachefiles\nImageCacheFile\nq\x07)\x81q\x08}q\t(U\x11cachefile_backendq\nh\x06U\x12ca$
Traceback (most recent call last):
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/messaging.py", line 585, in _receive_callback
decoded = None if on_m else message.decode()
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/message.py", line 142, in decode
self.content_encoding, accept=self.accept)
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/serialization.py", line 184, in loads
return decode(data)
File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__
self.gen.throw(type, value, traceback)
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/serialization.py", line 59, in _reraise_errors
reraise(wrapper, wrapper(exc), sys.exc_info()[2])
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/serialization.py", line 55, in _reraise_errors
yield
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/serialization.py", line 184, in loads
return decode(data)
File "/opt/python/run/venv/lib/python2.6/site-packages/kombu/serialization.py", line 64, in pickle_loads
return load(BytesIO(s))
DecodeError: No module named s3utils
s3utils is what defines my AWS S3 bucket paths. I'll post it if need be, but the strange thing I think is that the synchronous backend has no problem importing s3utils while the asynchronous does... and asynchronous does ONLY on the production server, not locally.
I'd be SO greatful for any help debugging this. I've been wrestling this for days. I'm still learning Django and python so I'm hoping this is a stupid mistake on my part. My Google-fu has failed me.
As I hinted at in my comment above, this kind of thing is usually caused by forgetting to restart the worker.
It's a common gotcha with Celery. The workers are a separate process from your web server so they have their own versions of your code loaded. And just like with your web server, if you make a change to your code, you need to reload so it sees the change. The web server talks to your worker not by directly running code, but by passing serialized messages via the broker, which will say something like "call the function do_something()". Then the worker will read that message and—and here's the tricky part—call its version of do_something(). So even if you restart your webserver (so that it has a new version of your code), if you forget to reload the worker (which is what actually calls the function), the old version of the function will be called. In other words, you need to restart the worker any time you make a change to your tasks.
You might want to check out the autoreload option for development. It could save you some headaches.
I've been trying to make use of RabbitMQ from within my gevent program by using the Pika library (monkey patched by gevent), gevent likes randomly throwing a timeout error.
What should I do? Is there another library I could use?
WARNING:root:Document not found, retrying primary.
Traceback (most recent call last):
...
File "/usr/lib/python2.7/dist-packages/pika/adapters/blocking_connection.py", line 32, in __init__
BaseConnection.__init__(self, parameters, None, reconnection_strategy)
File "/usr/lib/python2.7/dist-packages/pika/adapters/base_connection.py", line 50, in __init__
reconnection_strategy)
File "/usr/lib/python2.7/dist-packages/pika/connection.py", line 170, in __init__
self._connect()
File "/usr/lib/python2.7/dist-packages/pika/connection.py", line 228, in _connect
self.parameters.port or spec.PORT)
File "/usr/lib/python2.7/dist-packages/pika/adapters/blocking_connection.py", line 44, in _adapter_connect
self._handle_read()
File "/usr/lib/python2.7/dist-packages/pika/adapters/base_connection.py", line 151, in _handle_read
data = self.socket.recv(self._suggested_buffer_size)
File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 427, in recv
wait_read(sock.fileno(), timeout=self.timeout, event=self._read_event)
File "/usr/lib/python2.7/dist-packages/gevent/socket.py", line 169, in wait_read
switch_result = get_hub().switch()
File "/usr/lib/python2.7/dist-packages/gevent/hub.py", line 164, in switch
return greenlet.switch(self)
timeout: timed out
Pika is not ideally suited to use with gevent because pika implements its own asynchronous connection to RabbitMQ based on non-blocking sockets. This just does not fit well with gevent's implementation of the same.
You may want to consider using py-amqplib or kombu
I'm also having timeout problems with using Pika in a Django/Gunicorn application. I played with raising connection_attempts or increasing the timeout but RabbitMQ always closed the connection with a handshake error. The latter seems to indicate that Pika never transmitted any data on the socket.
The cause for the timeouts could be this libevent bug - at least in my environment the script attached to the bug is able to reproduce the issue.
You could try upgrading to gevent>=1.0 (at the time of writing not released yet):
wget http://gevent.googlecode.com/files/gevent-1.0b4.tar.gz
pip install gevent-1.0b4.tar.gz