gunicorn threads getting killed silently - python

gunicorn version 19.9.0
Got the following gunicorn config:
accesslog = "access.log"
worker_class = 'sync'
workers = 1
worker_connections = 1000
timeout = 300
graceful_timeout = 300
keepalive = 300
proc_name = 'server'
bind = '0.0.0.0:8080'
name = 'server.py'
preload = True
log_level = "info"
threads = 7
max_requests = 0
backlog = 100
As you can see, the server is configured to run 7 threads.
The server is started with:
gunicorn -c gunicorn_config.py server:app
Here are the number of lines and thread IDs from our log file at the beginning (with the last line being the thread of the main server):
10502 140625414080256
10037 140624842843904
9995 140624859629312
9555 140625430865664
9526 140624851236608
9409 140625405687552
2782 140625422472960
6 140628359804736
So 7 threads are processing the requests. (Already we can see that thread 140625422472960 is processing substantially fewer requests than the other threads.)
But after the lines examined above, thread 140625422472960 just vanishes and the log file only has:
19602 140624859629312
18861 140625405687552
18766 140624851236608
18765 140624842843904
12523 140625414080256
2111 140625430865664
(excluding the main thread here)
From the server logs we could see that the thread received a request and started processing it, but never finished. The client received no response either.
There is no error/warning in the log file, nor in stderr.
And running the app for a little longer, two more threads are gone:
102 140624842843904
102 140624851236608
68 140624859629312
85 140625405687552
How to debug this?

Digging into the stderr logs further, finally found something like this exception stack trace:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
req = six.next(parser)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
super(Request, self).__init__(cfg, unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
unused = self.parse(self.unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
self.headers = self.parse_headers(data[:idx])
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
[2018-11-04 17:57:55 +0330] [31] [ERROR] Socket error processing request.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
req = six.next(parser)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
super(Request, self).__init__(cfg, unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
unused = self.parse(self.unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
self.headers = self.parse_headers(data[:idx])
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
This is due to this gunicorn bug.
An interim solution until this bug is fixed is to monkey patch gunicorn as done by asantoni.

Related

Error 111 connection refused (Python, celery, redis)

I tried to get all the active/scheduled/reserved tasks in redis:
from celery.task.control import inspect
inspect_obj = inspect()
inspect_obj.active()
inspect_obj.scheduled()
inspect_obj.reserved()
But was greeted with a list of errors as follows:
My virtual environment ==> HubblerAPI.
Iam using this from the ec2 console
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/celery/app/control.py", line 81, in active
return self._request('dump_active', safe=safe)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/celery/app/control.py", line 71, in _request
timeout=self.timeout, reply=True,
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/celery/app/control.py", line 316, in broadcast
limit, callback, channel=channel,
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/pidbox.py", line 283, in _broadcast
chan = channel or self.connection.default_channel
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/connection.py", line 771, in default_channel
self.connection
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/connection.py", line 756, in connection
self._connection = self._establish_connection()
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/connection.py", line 711, in _establish_connection
conn = self.transport.establish_connection()
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/kombu/transport/pyamqp.py", line 116, in establish_connection
conn = self.Connection(**opts)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/connection.py", line 165, in __init__
self.transport = self.Transport(host, connect_timeout, ssl)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/connection.py", line 186, in Transport
return create_transport(host, connect_timeout, ssl)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/transport.py", line 299, in create_transport
return TCPTransport(host, connect_timeout)
File "/home/ec2-user/HubblerAPI/local/lib/python3.4/site-
packages/amqp/transport.py", line 95, in __init__
raise socket.error(last_err)
**OSError: [Errno 111] Connection refused**
My celery config file is as follows:
BROKER_TRANSPORT = 'redis'
BROKER_TRANSPORT_OPTIONS = {
'queue_name_prefix': 'dev-',
'wait_time_seconds': 10,
'polling_interval': 30,
# The polling interval decides the number of seconds to sleep
between unsuccessful polls
'visibility_timeout': 3600 * 5,
# If a task is not acknowledged within the visibility_timeout, the
task will be redelivered to another worker and executed.
}
CELERY_MESSAGES_DB = 6
BROKER_URL = "redis://%s:%s/%s" % (AWS_REDIS_ENDPOINT, AWS_REDIS_PORT,
CELERY_MESSAGES_DB)
What am i doing wrong here as the error log suggests that its not using the redis broker.
Looks like your python code doesn't recognize your configs since it is attempting to use RabbitMQ's ampq protocol instead of the configured broker.
I suggest the following
https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html
Your configs look similar to Django configs for Celery yet it doesn't seem you are using Celery with Django.
https://docs.celeryq.dev/en/latest/django/first-steps-with-django.html
The issue is using "BROKER_URL" instead of "CELERY_BROKER_URL" in settings.py. Celery wasn't finding the URL and was defaulting to the rabbitmq port instead of the redis port.

Celery upgrade (3.1->4.1) - Connection reset by peer

We are working with celery at the last year, with ~15 workers, each one defined with concurrency between 1-4.
Recently we upgraded our celery from v3.1 to v4.1
Now we are having the following errors in each one of the workers logs, any ideas what can cause to such error?
2017-08-21 18:33:19,780 94794 ERROR Control command error: error(104, 'Connection reset by peer') [file: pidbox.py, line: 46]
Traceback (most recent call last):
File "/srv/dy/venv/lib/python2.7/site-packages/celery/worker/pidbox.py", line 42, in on_message
self.node.handle_message(body, message)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 129, in handle_message
return self.dispatch(**body)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 112, in dispatch
ticket=ticket)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 135, in reply
serializer=self.mailbox.serializer)
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 265, in _publish_reply
**opts
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 181, in publish
exchange_name, declare,
File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 203, in _publish
mandatory=mandatory, immediate=immediate,
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/channel.py", line 1748, in _basic_publish
(0, exchange, routing_key, mandatory, immediate), msg
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 64, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/method_framing.py", line 178, in write_frame
write(view[:offset])
File "/srv/dy/venv/lib/python2.7/site-packages/amqp/transport.py", line 272, in write
self._write(s)
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 104] Connection reset by peer
BTW: our tasks in the form:
#app.task(name='EXAMPLE_TASK'],
bind=True,
base=ConnectionHolderTask)
def example_task(self, arg1, arg2, **kwargs):
# task code
We are also having massive issues with celery... I spend 20% of my time just dancing around weird idle-hang/crash issues with our workers sigh
We had a similar case that was caused by a high concurrency combined with a high worker_prefetch_multiplier, as it turns out fetching thousands of tasks is a good way to frack the connection.
If that's not the case: try to disable the broker pool by setting broker_pool_limit to None.
Just some quick ideas that might (hopefully) help :-)

Twisted plugin needs to fail fast of port is taken

I have a twistd plugin that listens on a port and does very simple things. The problem is that when I start it, if the post is not available it just sits there with the process running, but doing nothing. I need the process to exit immediately in this case so the larger system can notice and deal with the problem
I have code like this:
def makeService(options):
root = Resource() # Not what I actually have...
factory = server.Site(root)
server_string = b'tcp:{0}:interface={1}'.format(options['port'], options['interface'])
endpoint = endpoints.serverFromString(reactor, server_string)
service = internet.StreamServerEndpointService(endpoint, factory)
return service
This results in:
2016-12-19T11:42:21-0600] [info] [3082] [-] Log opened.
[2016-12-19T11:42:21-0600] [info] [3082] [-] twistd 15.5.0 (/home/matthew/code-venvs/wgcbap/bin/python 2.7.6) starting up.
[2016-12-19T11:42:21-0600] [info] [3082] [-] reactor class: twisted.internet.epollreactor.EPollReactor.
[2016-12-19T11:42:21-0600] [critical] [3082] [-] Unhandled Error
Traceback (most recent call last):
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/scripts/_twistd_unix.py", line 394, in startApplication
service.IService(application).privilegedStartService()
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/application/service.py", line 278, in privilegedStartService
service.privilegedStartService()
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/application/internet.py", line 352, in privilegedStartService
self._waitingForPort = self.endpoint.listen(self.factory)
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/endpoints.py", line 457, in listen
interface=self._interface)
--- <exception caught here> ---
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 121, in execute
result = callable(*args, **kw)
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 478, in listenTCP
p.startListening()
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/tcp.py", line 984, in startListening
raise CannotListenError(self.interface, self.port, le)
twisted.internet.error.CannotListenError: Couldn't listen on 127.0.0.1:9999: [Errno 98] Address already in use.
And it continues to run, doing nothing....
Adding a line service._raiseSynchronously = True just above the return works, but seems to be undocumented and feels dirty.
Is there an approved way to do this?

Celery kombu fails after self.connections.acquire

When my celery service is running after 7-10 days I received this exception out of nowhere, this causes my Tasks not to be processed. A restart of celery fixes the problem.
INTERNAL ERROR: RuntimeError('Acquire on closed pool',)
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 253, in trace_task
I, R, state, retval = on_error(task_request, exc, uuid)
File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 201, in on_error
R = I.handle_error_state(task, eager=eager)
File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 85, in handle_error_state
}[self.state](task, store_errors=store_errors)
File "/usr/lib/python2.7/dist-packages/celery/app/trace.py", line 118, in handle_failure
req.id, exc, einfo.traceback, request=req,
File "/usr/lib/python2.7/dist-packages/celery/backends/base.py", line 121, in mark_as_failure
traceback=traceback, request=request)
File "/usr/lib/python2.7/dist-packages/celery/backends/amqp.py", line 124, in store_result
with self.app.amqp.producer_pool.acquire(block=True) as producer:
File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 868, in acquire
R = self.prepare(R)
File "/usr/lib/python2.7/dist-packages/kombu/pools.py", line 63, in prepare
conn = self._acquire_connection()
File "/usr/lib/python2.7/dist-packages/kombu/pools.py", line 38, in _acquire_connection
return self.connections.acquire(block=True)
File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 859, in acquire
raise RuntimeError('Acquire on closed pool')
RuntimeError: Acquire on closed pool
Software versions
software -> celery:3.1.20 (Cipater) kombu:3.0.35 py:2.7.6
billiard:3.3.0.22 py-amqp:1.4.9
platform -> system:Linux arch:64bit, ELF imp:CPython
loader -> celery.loaders.default.Loader
settings -> transport:amqp results:amqp
CELERY_ACCEPT_CONTENT: ['json', 'pickle', 'yaml']
CELERY_ENABLE_UTC: True
CELERY_IGNORE_RESULT: False
CELERY_IMPORTS:
('catalogue.app.voice.cluster.deploy_cluster',
'catalogue.app.common.install_uc',
'hypervisor.app.deploy_esx',
'hypervisor.app.vm_operations',
'tools.deploy_tools')
CELERYD_CHDIR: '/usr/local/src/imbue/application/app'
CELERY_TASK_RESULT_EXPIRES: 18000
CELERY_RESULT_PERSISTENT: True
CELERY_TIMEZONE: 'US/Eastern'
BROKER_URL: 'amqp://******:********#rabbitmq:5672//'
CELERY_RESULT_BACKEND: 'amqp'
Only workaround now is to restart.
Ubuntu 14.04 2 GB RAM/2 CPU/40 GB HDD
This looks like a bug in celery. Asksol fixed this few days back.
You can install celery from source code and try it. If it is still causing problems, please create new issue on github.

Bottle causing a python program to crash? Uses simple implementation of threads, queues, and forking

I've tried to simplify this as much as possible but I'm still getting an error. I have a simple http server (bottle) that upon receiving a post request executes a function which is supposed to quickly fork itself. The parent process simply returns a job ID and closes while the child process continues to process same data (which is a list of URLs). I've removed all the input and output functions and hard coded the data but my program is still crashing. The funny part is when I alter the program to run directly at the command line rather then start an http server and wait for bottle to execute it everything works fine!
#!/usr/bin/python
#This is a comment
import sys, time, bottle, os
from threading import Thread
from Queue import Queue
from bottle import route, run, request, abort
num_fetch_threads = 2
url_queue = Queue()
def fetchURLContent(i, q):
while True:
#print '%s: Looking for URLs in queue' % i
url = q.get()
#print 'URL found: %s' % url[0]
q.task_done()
time.sleep(1)
#route('/', method='POST') # or #route('/login', method='POST')
def main():
urls = ['http://www.yahoo.com', 'http://www.google.com']
newpid = os.fork()
if newpid == 0:
for i in range(num_fetch_threads):
worker = Thread(target=fetchURLContent, args=(i, url_queue))
worker.setDaemon(True)
worker.start()
print 'Queuing: ', url
for url in urls:
url_queue.put(url)
time.sleep(2)
print 'main thread waiting...'
url_queue.join()
print 'Done'
else:
print "Your job id is 5"
return
def webServer():
run(host='33.33.33.10', port=8080)
if __name__ == "__main__":
print 'Listening on 8080...'
webServer()
The error message I get is as follows:
Listening on 8080...
Bottle v0.11.3 server starting up (using WSGIRefServer())...
Listening on http://33.33.33.10:8080/
Hit Ctrl-C to quit.
33.33.33.1 - - [19/Oct/2012 21:21:24] "POST / HTTP/1.1" 200 0
Traceback (most recent call last):
File "/usr/lib/python2.7/wsgiref/handlers.py", line 86, in run
self.finish_response()
File "/usr/lib/python2.7/wsgiref/handlers.py", line 128, in finish_response
self.finish_content()
File "/usr/lib/python2.7/wsgiref/handlers.py", line 246, in finish_content
self.send_headers()
9 url_queue = Queue()
File "/usr/lib/python2.7/wsgiref/handlers.py", line 268, in send_headers
self.send_preamble()
File "/usr/lib/python2.7/wsgiref/handlers.py", line 189, in send_preamble
self._write('HTTP/%s %s\r\n' % (self.http_version,self.status))
File "/usr/lib/python2.7/wsgiref/handlers.py", line 389, in _write
self.stdout.write(data)
File "/usr/lib/python2.7/socket.py", line 324, in write
self.flush()
File "/usr/lib/python2.7/socket.py", line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe
----------------------------------------
Exception happened during processing of request from ('33.33.33.1', 57615)
Traceback (most recent call last):
File "/usr/lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 310, in process_request
self.finish_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 323, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python2.7/SocketServer.py", line 640, in __init__
self.finish()
File "/usr/lib/python2.7/SocketServer.py", line 693, in finish
self.wfile.flush()
File "/usr/lib/python2.7/socket.py", line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe
----------------------------------------
Any ideas?
Your main() function terminates immediately without returning anything. Bottle writes an empty HTTP response to the socket and the web server closes the connection.
Your forked off process stays a bit longer in main(), but then terminates too and causes Bottle to write another empty response to the already closed socket. Thats the error you get (broken pipe).
Forking at that point cannot work. HTTP does not allow more than one response per request. You can either block until all work is done and then send a response, or send the response immediately and do the work in a differed thread.

Categories

Resources