Nameko/RabbitMQ: OSError: Server unexpectedly closed connection - python

I have two nameko services that communicate using RPC via RabbitMQ. Locally with docker-compose all works fine. Then I deployed everything to Kubernetes/Istio cluster on DigitalOcean and started get the following errors. It repeats continuously 1 time in 10/20/60 minutes. Communication between services works fine (before and after recconect I suppose) but logs are messy with those unexpected reconnections that should not happen.
Helm RabbitMQ configuration file
I tried to increase RAM and CPU configuration (to the values in the configuration files above: 512Mb and 400m) but still have the same behavior.
NB: I don't touch services after deployment, no messages being sent or any requests made and I have this error for the first time in around 60 minutes. When I make requests they succeed but eventually we still have this errors in logs afterwards.
Nameko service log:
"Connection to broker lost, trying to re-establish connection...",
"exc_info": "Traceback (most recent call last):
File \"/usr/local/lib/python3.6/site-packages/kombu/mixins.py\", line 175, in run for _ in self.consume(limit=None, **kwargs):
File \"/usr/local/lib/python3.6/site-packages/kombu/mixins.py\", line 197, in consume conn.drain_events(timeout=safety_interval)
File \"/usr/local/lib/python3.6/site-packages/kombu/connection.py\", line 323, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File \"/usr/local/lib/python3.6/site-packages/kombu/transport/pyamqp.py\", line 103, in drain_events
return connection.drain_events(**kwargs)
File \"/usr/local/lib/python3.6/site-packages/amqp/connection.py\", line 505, in drain_events
while not self.blocking_read(timeout):
File \"/usr/local/lib/python3.6/site-packages/amqp/connection.py\", line 510, in blocking_read\n frame = self.transport.read_frame()
File \"/usr/local/lib/python3.6/site-packages/amqp/transport.py\", line 252, in read_frame
frame_header = read(7, True)
File \"/usr/local/lib/python3.6/site-packages/amqp/transport.py\", line 446, in _read
raise IOError('Server unexpectedly closed connection')
OSError: Server unexpectedly closed connection"}
{"name": "kombu.mixins", "asctime": "29/12/2019 20:22:54", "levelname": "INFO", "message": "Connected to amqp://user:**#rabbit-rabbitmq:5672//"}
RabbitMQ log
2019-12-29 20:22:54.563 [warning] <0.718.0> closing AMQP connection <0.718.0> (127.0.0.1:46504 -> 127.0.0.1:5672, vhost: '/', user: 'user'):
client unexpectedly closed TCP connection
2019-12-29 20:22:54.563 [warning] <0.705.0> closing AMQP connection <0.705.0> (127.0.0.1:46502 -> 127.0.0.1:5672, vhost: '/', user: 'user'):
client unexpectedly closed TCP connection
2019-12-29 20:22:54.681 [info] <0.3424.0> accepting AMQP connection <0.3424.0> (127.0.0.1:43466 -> 127.0.0.1:5672)
2019-12-29 20:22:54.689 [info] <0.3424.0> connection <0.3424.0> (127.0.0.1:43466 -> 127.0.0.1:5672): user 'user' authenticated and granted access to vhost '/'
2019-12-29 20:22:54.690 [info] <0.3431.0> accepting AMQP connection <0.3431.0> (127.0.0.1:43468 -> 127.0.0.1:5672)
2019-12-29 20:22:54.696 [info] <0.3431.0> connection <0.3431.0> (127.0.0.1:43468 -> 127.0.0.1:5672): user 'user' authenticated and granted access to vhost '/'
UPD:
Rabbit pod yaml

Issue is with istio proxy getting injected as sidecar container inside rabbitmq pod. You need to exclude istio proxy from rabbitmq then it should work.

Have you tried to increase the heartbeat of the connection? It is likely that your connection gets terminated on lower level due inactivity.
Also make sure that you have enough resources to run all containers on the host machine.
I had similar issue and I am not sure which one of the following solved it for me:
Proper resource management
Making an entry point in the DockerFile of a bash script that runs the file with the code that is supposed to be executed on infinite loop. (I know that one solved the memory leaks - bash script executed the file with your code, your code listens for message, gets a message and executes, exit the code, bash script loads it again....). I had my workers restarting after each message (the whole worker exits and new one is started - bad idea).
Hope this gets you somewhere.

Related

Authenticating rabbitmq using ExternalCredentials

I have a rabbitmq server and use the pika library with Python to produce/consume messages. For development purposes, I was simply using
credentials = pika.PlainCredentials(<user-name>, <password>)
I want to change that to use pika.ExternalCredentials or TLS.
I have set up my rabbitmq server to listen for TLS on port 5671, and have configured it correctly. I am able to communicate with rabbitmq from localhost, but the moment I try to communicate with it from outside the localhost it doesn't like that. I have a feeling my "credentials" are based on the "guest" user in rabbitmq.
rabbitmq.config
%% -*- mode: erlang -*-
[
{rabbit,
[
{ssl_listeners, [5671]},
{auth_mechanisms, ['PLAIN', 'AMQPLAIN', 'EXTERNAL']},
{ssl_options, [{cacertfile,"~/tls-gen/basic/result/ca_certificate.pem"},
{certfile,"~/tls-gen/basic/result/server_certificate.pem"},
{keyfile,"~/tls-gen/basic/result/server_key.pem"},
{verify,verify_none},
{ssl_cert_login_from, common_name},
{fail_if_no_peer_cert,false}]}
]}
].
I can confirm this works, since in my logs for rabbitmq I see:
2019-08-21 15:34:47.663 [info] <0.442.0> started TLS (SSL) listener on [::]:5671
Server-side everything seems to be set up, I have also generated certificates and all the .pem files required.
test_rabbitmq.py
import pika
import ssl
from pika.credentials import ExternalCredentials
context = ssl.create_default_context(cafile="~/tls-gen/basic/result/ca_certificate.pem")
context.load_cert_chain("~/tls-gen/basic/result/client_certificate.pem",
"~/tls-gen/basic/result/client_key.pem")
ssl_options = pika.SSLOptions(context, "10.154.0.27")
params = pika.ConnectionParameters(port=5671,ssl_options=ssl_options, credentials = ExternalCredentials())
connection = pika.BlockingConnection(params)
channel = connection.channel()
When I run the script locally
(<Basic.GetOk(['delivery_tag=1', 'exchange=', 'message_count=0', 'redelivered=False', 'routing_key=foobar'])>, <BasicProperties>, b'Hello, world!')
When I run the script from another instance
Traceback (most recent call last):
File "pbbarcode.py", line 200, in <module>
main()
File "pbbarcode.py", line 187, in main
connection = pika.BlockingConnection(params)
File "/usr/local/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 359, in __init__
self._impl = self._create_connection(parameters, _impl_class)
File "/usr/local/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 450, in _create_connection
raise self._reap_last_connection_workflow_error(error)
pika.exceptions.AMQPConnectionError
When I run the script locally, and delete the guest user
Traceback (most recent call last):
File "test_mq.py", line 12, in <module>
with pika.BlockingConnection(conn_params) as conn:
File "/home/daudn/.local/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 359, in __init__
self._impl = self._create_connection(parameters, _impl_class)
File "/home/daudn/.local/lib/python3.7/site-packages/pika/adapters/blocking_connection.py", line 450, in _create_connection
raise self._reap_last_connection_workflow_error(error)
pika.exceptions.ProbableAuthenticationError: ConnectionClosedByBroker: (403) 'ACCESS_REFUSED - Login was refused using authentication mechanism PLAIN. For details see the broker logfile.'
It seems like SSL is configured with the user "guest" and rabbitmq doesn't allow connections to guest outside of localhost. How can I use SSL with a different user?
When I delete the guest user, this is what the rabbitmq log says:
2019-08-22 10:14:40.054 [info] <0.735.0> accepting AMQP connection <0.735.0> (127.0.0.1:59192 -> 127.0.0.1:5671)
2019-08-22 10:14:40.063 [error] <0.735.0> Error on AMQP connection <0.735.0> (127.0.0.1:59192 -> 127.0.0.1:5671, state: starting):
PLAIN login refused: user 'guest' - invalid credentials
2019-08-22 10:14:40.063 [warning] <0.735.0> closing AMQP connection <0.735.0> (127.0.0.1:59192 -> 127.0.0.1:5671):
client unexpectedly closed TCP connection
2019-08-22 10:15:12.613 [info] <0.743.0> Creating user 'guest'
2019-08-22 10:15:28.370 [info] <0.750.0> Setting user tags for user 'guest' to [administrator]
2019-08-22 10:15:51.352 [info] <0.768.0> Setting permissions for 'guest' in '/' to '.*', '.*', '.*'
2019-08-22 10:15:54.237 [info] <0.774.0> accepting AMQP connection <0.774.0> (127.0.0.1:59202 -> 127.0.0.1:5671)
2019-08-22 10:15:54.243 [info] <0.774.0> connection <0.774.0> (127.0.0.1:59202 -> 127.0.0.1:5671): user 'guest' authenticated and granted access to vhost '/'
This also clearly means the SSL is still using the username and password to connect to rabbitmq? HELP!
References:
tls_official_example
pika_official_tls_docs
added_authentication_external
You will have to enable the rabbitmq-auth-mechanism-ssl plugin , i think you are missing that part.
To enable the plugin do the following ( showing the example for a Windows setup)
rabbitmq-plugins.bat enable rabbitmq_auth_mechanism_ssl
Going to leave this here for future reference
ssl_options = pika.SSLOptions(context, "rabbitmq-node-name")
params = pika.ConnectionParameters(host="rabbitmq-node-name",port=5671,ssl_options=ssl_options, credentials = ExternalCredentials())
The confusion was that I believed when doing SSLOptions(context, "rabbitmq-node-name") I thought I had supplied the host here and did not have to supply it again in the args for ConnectionParameters(). But turns out that's incorrect, if no host is supplied, it defaults to localhost. Which is why the script ran locally and not outside of the local network.

How to connect to rabbit on vagrant host?

I set up a server using vagrant on a virtual machine. After installing rabbitmq, I tried to connect to it using script outside VM. There's already Django and RabbitMQ running on VM. After running a script I have an exception:
pika.exceptions.IncompatibleProtocolError: StreamLostError: ('Transport indicated EOF',)
How to solve my problem?
My friend already used the code provided below on raspberryPi which actually managed to execute it. The only thing I changed on my PC was the hostname changed from the specified IP to my '127.0.0.1'and I added the port number.
import pika
import sys
import random
import time
credentials = pika.PlainCredentials(username='admin', password='admin')
connection = pika.BlockingConnection(pika.ConnectionParameters(host='127.0.0.1',port=15672,credentials=credentials))
channel = connection.channel()
channel.queue_declare(queue='hello',durable=True)
Error message:
$ python send.py
Traceback (most recent call last):
File "send.py", line 8, in <module>
connection = pika.BlockingConnection(pika.ConnectionParameters(host='127.0.0.1',port=15672,credentials=credentials))
File "C:\Users\Pigeonnn\AppData\Local\Programs\Python\Python37\lib\site-packages\pika\adapters\blocking_connection.py", line 360, in __init__
self._impl = self._create_connection(parameters, _impl_class)
File "C:\Users\Pigeonnn\AppData\Local\Programs\Python\Python37\lib\site-packages\pika\adapters\blocking_connection.py", line 451, in _create_connection
raise self._reap_last_connection_workflow_error(error)
pika.exceptions.IncompatibleProtocolError: StreamLostError: ('Transport indicated EOF',)
#Pigeonnn provided the answer to his own question in his own comment to the original question on this very post:
Actually I've just found a solution. The thing is if you want to
listen to rabbitmq you need to connect through port 5672 - not 15672.
Changed ports, forwarded and everything works :)
Stating the docs and highlighting the response, RabbitMQ listening ports are:
AMQP: 5672
AMQP/ssl: 5671
HTTP management UI: 15672
first forward the a host port to a guest port on Vagrant in the vagrant configuration file (Vagrantfile). Beware to not utilise a host port that is already used.
Vagrant.configure("2") do |config|
config.vm.network "forwarded_port", guest: 5672, host: 5671 # Rabbit
end
then connect like so:
credentials = pika.PlainCredentials(username='admin', password='admin')
connection = pika.BlockingConnection(pika.ConnectionParameters(host='127.0.0.1',port=5671,credentials=credentials))
don't forget to configure the user admin accordingly.

How to connect to Tor control port (9051) from a remote host?

I'm trying to connect to control port (9051) of tor from a remote machine using stem python library.
dum.py
from stem import Signal
from stem.control import Controller
def set_new_ip():
"""Change IP using TOR"""
with Controller.from_port(address = '10.130.8.169', port=9051) as controller:
controller.authenticate(password='password')
controller.signal(Signal.NEWNYM)
set_new_ip()
I'm getting the following error
Traceback (most recent call last):
File "/home/jkl/anaconda3/lib/python3.5/site-packages/stem/socket.py", line 398, in _make_socket
control_socket.connect((self._control_addr, self._control_port))
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "dum.py", line 28, in <module>
set_new_ip();
File "dum.py", line 7, in set_new_ip
with Controller.from_port(address = '10.130.4.162', port=9051) as controller:
File "/home/jkl/anaconda3/lib/python3.5/site-packages/stem/control.py", line 998, in from_port
control_port = stem.socket.ControlPort(address, port)
File "/home/jkl/anaconda3/lib/python3.5/site-packages/stem/socket.py", line 372, in __init__
self.connect()
File "/home/jkl/anaconda3/lib/python3.5/site-packages/stem/socket.py", line 243, in connect
self._socket = self._make_socket()
File "/home/jkl/anaconda3/lib/python3.5/site-packages/stem/socket.py", line 401, in _make_socket
raise stem.SocketError(exc)
stem.SocketError: [Errno 111] Connection refused
Then I went through /etc/tor/torrc config file.
It says
The port on which Tor will listen for local connections from Tor
controller applications, as documented in control-spec.txt.
ControlPort 9051
## If you enable the controlport, be sure to enable one of these
## authentication methods, to prevent attackers from accessing it.
HashedControlPassword 16:E5364A963AF943CB607CFDAE3A49767F2F8031328D220CDDD1AE30A471
SocksListenAddress 0.0.0.0:9050
CookieAuthentication 1
My question is ,
How do I connect to control port of Tor from a remote host?
Is there is any work around or config parameter that I need to set?
a possible duplicate of Stem is giving the "Unable to connect to port 9051" error which has no answers
Tested with Tor 0.3.3.7.
ControlListenAddress config is OBSOLETE and Tor will ignore it and log the following message
[warn] Skipping obsolete configuration option 'ControlListenAddress'
You can still set ControlPort to 0.0.0.0:9051 in your torrc file. Though, Tor is not very happy about it (and rightly so) and will warn you
You have a ControlPort set to accept connections from a non-local
address. This means that programs not running on your computer can
reconfigure your Tor. That's pretty bad, since the controller protocol
isn't encrypted! Maybe you should just listen on 127.0.0.1 and use a
tool like stunnel or ssh to encrypt remote connections to your control
port.
Also, you have to set either CookieAuthentication or HashedControlPassword otherwise ControlPort will be closed
You have a ControlPort set to accept unauthenticated connections from
a non-local address. This means that programs not running on your
computer can reconfigure your Tor, without even having to guess a
password. That's so bad that I'm closing your ControlPort for you. If
you need to control your Tor remotely, try enabling authentication and
using a tool like stunnel or ssh to encrypt remote access.
All the risks mentioned in #drew010's answer still stand.
You'd need to set ControlListenAddress in addition to the ControlPort. You could set that to to 0.0.0.0 (binds to all addresses) or a specific IP your server listens on.
If you choose to do this it would be extremely advisable to configure your firewall to only allow control connections from specific IP's and block them from all others.
Also note, the control port traffic will not be encrypted, so it'd also be advisable to use cookie authentication so your password isn't sent over the net.
You could also run a hidden service to expose the control port over Tor and then connect to the hidden service using Stem and Tor.
But the general answer is ControlListenAddress needs to be set to bind to an IP other than 127.0.0.1 (localhost).

How to prevent errno 32 broken pipe?

Currently I am using an app built in python. When I run it in personal computer, it works without problems.
However, when I move it into a production server. It keeps showing me the error attached as below:.
I've done some research and I got the reason that the end user browser stops the connection while the server is still busy sending data.
I wonder why did it happen and what is the root cause that prevents it from running properly in production server, while it works on my personal computer. Any advice is appreciated
Exception happened during processing of request from ('127.0.0.1', 34226)
Traceback (most recent call last):
File "/usr/lib/python2.7/SocketServer.py", line 284, in
_handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 310, in process_request
self.finish_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 323, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python2.7/SocketServer.py", line 641, in __init__
self.finish()
File "/usr/lib/python2.7/SocketServer.py", line 694, in finish
self.wfile.flush()
File "/usr/lib/python2.7/socket.py", line 303, in flush
self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe
Your server process has received a SIGPIPE writing to a socket. This usually happens when you write to a socket fully closed on the other (client) side. This might be happening when a client program doesn't wait till all the data from the server is received and simply closes a socket (using close function).
In a C program you would normally try setting to ignore SIGPIPE signal or setting a dummy signal handler for it. In this case a simple error will be returned when writing to a closed socket. In your case a python seems to throw an exception that can be handled as a premature disconnect of the client.
The broken pipe error usually occurs if your request is blocked or takes too long and after request-side timeout, it'll close the connection and then, when the respond-side (server) tries to write to the socket, it will throw a pipe broken error.
It depends on how you tested it, and possibly on differences in the TCP stack implementation of the personal computer and the server.
For example, if your sendall always completes immediately (or very quickly) on the personal computer, the connection may simply never have broken during sending. This is very likely if your browser is running on the same machine (since there is no real network latency).
In general, you just need to handle the case where a client disconnects before you're finished, by handling the exception.
Remember that TCP communications are asynchronous, but this is much more obvious on physically remote connections than on local ones, so conditions like this can be hard to reproduce on a local workstation. Specifically, loopback connections on a single machine are often almost synchronous.
This might be because you are using two method for inserting data into database and this cause the site to slow down.
def add_subscriber(request, email=None):
if request.method == 'POST':
email = request.POST['email_field']
e = Subscriber.objects.create(email=email).save() <====
return HttpResponseRedirect('/')
else:
return HttpResponseRedirect('/')
In above function, the error is where arrow is pointing. The correct implementation is below:
def add_subscriber(request, email=None):
if request.method == 'POST':
email = request.POST['email_field']
e = Subscriber.objects.create(email=email)
return HttpResponseRedirect('/')
else:
return HttpResponseRedirect('/')
If it's a python a web application or service such as Flask or FastAPI, this error might occur if the production server is configured to timeout a request that takes too long. There are relevant parameters in Gunicorn and Uvicorn such as GRACEFUL_TIMEOUT and TIMEOUT that need to be configured according to the needs of your application. You may also want to check your reverse proxy or gateway timeout thresholds.
Try this code at the top of your program:
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
It should fix the issue.

I/O error(socket error): [Errno 111] Connection refused

I have a program that uses urllib to periodically fetch a url, and I see intermittent
errors like :
I/O error(socket error): [Errno 111] Connection refused.
It works 90% of the time, but the othe r10% it fails. If retry the fetch immediately after it fails, it succeeds. I'm unable to figure out why this is so. I tried to see if any ports are available, and they are. Any debugging ideas?
For additional info, the stack trace is:
File "/usr/lib/python2.6/urllib.py", line 203, in open
return getattr(self, name)(url)
File "/usr/lib/python2.6/urllib.py", line 342, in open_http
h.endheaders()
File "/usr/lib/python2.6/httplib.py", line 868, in endheaders
self._send_output()
File "/usr/lib/python2.6/httplib.py", line 740, in _send_output
self.send(msg)
File "/usr/lib/python2.6/httplib.py", line 699, in send
self.connect()
File "/usr/lib/python2.6/httplib.py", line 683, in connect
self.timeout)
File "/usr/lib/python2.6/socket.py", line 512, in create_connection
raise error, msg
Edit - A google search isn't very helpful, what I got out of it is that the server
I'm fetching from sometimes refuses connections, how can I verify its not a bug in my code
and this is indeed the case?
Use a packet sniffer like Wireshark to look at what happens. You need to see a SYN-flagged packet outgoing, a SYN+ACK-flagged incoming and then a ACK-flagged outgoing. After that, the port is considered open on the local side.
If you only see the first packet and the error message comes after several seconds of waiting, the other side is not answering at all (like in: unplugged cable, overloaded server, misguided packet was discarded) and your local network stack aborts the connection attempt. If you see RST packets, the host actually denies the connection. If you see "ICMP Port unreachable" or host unreachable packets, a firewall or the target host inform you of the port actually being closed.
Of course you cannot expect the service to be available at all times (consider all the points of failure in between you and the data), so you should try again later.
Getting an ECONNREFUSED errno means that your kernel was refused a connection at the other end, so if it's a bug, it's either in your kernel or in the other end.
What you can do is to trap the error in a very specific way and try again in a little while, since this seems to work:
# This is Python > 2.5 code
import errno, time
for attempt in range(MAXIMUM_NUMBER_OF_ATTEMPTS):
try:
# your urllib call here
except EnvironmentError as exc: # replace " as " with ", " for Python<2.6
if exc.errno == errno.ECONNREFUSED:
time.sleep(A_COUPLE_OF_SECONDS)
else:
raise # re-raise otherwise
else: # we tried, and we had no failure, so
break
else: # we never broke out of the for loop
raise RuntimeError("maximum number of unsuccessful attempts reached")
Replace the two all-caps constants with your favourite numbers.
I previously had this problem with my EC2 instance (I was serving couchdb to serve resources -- am considering Amazon's S3 for the future).
One thing to check (assuming Ec2) is that the couchdb port is added to your open ports within your security policy.
I specifically encountered
"[Errno 111] Connection refused"
over EC2 when the instance was stopped and started. The problem seems to be a pidfile race. The solution for me was killing couchdb (entirely and properly) via:
pkill -f couchdb
and then restarting with:
/etc/init.d/couchdb restart
I'm not exactly sure what's causing this. You can try looking in your socket.py (mine is a different version, so line numbers from the trace don't match, and I'm afraid some other details might not match as well).
Anyway, it seems like a good practice to put your url fetching code in a try: ... except: ... block, and handle this with a short pause and a retry. The URL you're trying to fetch may be down, or too loaded, and that's stuff you'll only be able to handle in with a retry anyway.
Its seems that server is not running properly so ensure that with terminal by
telnet ip port
example
telnet localhost 8069
It will return connected to localhost so it indicates that there is no problem with the connection
Else it will return Connection refused it indicates that there is problem with the connection

Categories

Resources