folks,
I faced an issue trying to reconnect to server after rebooting.. I saw other articles about similar issues but everything I try comes with the same error.
Goal
Automatically reconnect to a server after reboot
Script
ssh_client = SSHClient()
ssh_client.set_missing_host_key_policy(AutoAddPolicy())
ssh_client.connect(hostname=host,port=port, username=user, password=psw)
s = ssh_client.get_transport().open_session()
agent.AgentRequestHandler(s)
try:
stdin, stdout, stderr = ssh_client.exec_command(command, get_pty= True)
get_output(stdout)
channel = stdout.channel
stdin.close()
channel.shutdown_write()
stdout_chunks = []
stdout_chunks.append(channel.recv(len(channel.in_buffer)))
while not channel.closed or channel.recv_ready() or channel.recv_stderr_ready():
got_chunk = False
readq, _, _ = select.select([stdout.channel], [], [])
for c in readq:
if c.recv_ready():
stdout_chunks.append(channel.recv(len(c.in_buffer)))
got_chunk = True
if c.recv_stderr_ready():
stderr.channel.recv_stderr(len(c.in_stderr_buffer))
got_chunk = True
if not got_chunk \
and channel.exit_status_ready() \
and not channel.recv_stderr_ready() \
and not channel.recv_ready():
channel.shutdown_read()
channel.close()
break
stdout.close()
stderr.close()
except (ConnectionResetError, SSHException):
print('Connection died')
Error is cached by try catch block:
Connection died
Connection died
Connection died
Connection died
Connection died
Connection died
Connection died
Connection died
Connection died
Connection died
Connection died
Connection died
Connection died
Connection died
Connection died
Script which I running in a remote server ends with a reboot command:
/sbin/shutdown -r now
I will post this as an answer as it is too long to be explained in a comment.
Your code is still missing parts as we do not know how you call the try/except structure and what happens when the exception is caught. However, if I may make a guess from your indent is that if an exception is caught, you will repeat try/except somehow.
You seem to rely on the channel closed status in your logic but there is an underlying layer in the form of a TCP socket. When you reboot the server, your channel dies but so does the TCP layer. In your exception handling you will need to recreate this.
I would try something like this:
try:
...
...
...
stdout.close()
stderr.close()
except (...):
sleep(2) # to prevent a busyloop when your server is rebooting
try:
ssh_client.close() # Close the connection just in case it is alive
except:
pass # We do not care if it succeeds or fails
counter = 0 # optional
while True:
sleep(2) # to prevent a busyloop when your server is rebooting
counter += 1
if counter > X:
print("server permanently down, exiting")
exit (1)
try:
ssh_client.connect(hostname=host,port=port, username=user, password=psw)
s = ssh_client.get_transport().open_session()
break # We have a liftoff
except:
pass # Server not responding yet. Try again.
(I did not test the above code, just wrote it here to give the idea. There might be typos in there)
You can ignore the counter part. I generally use a counter to prevent programs from trying until the cows come home if the server is down for long term. If you want to keep trying, remove these. If you use them, just set X high enough to allow the server plenty of time to reboot and then some.
The key part is recreating your TCP connection after an error and only leaving the error handler when you have a working connection again.
We attempt to close the existing connection just in case it is still there to avoid exhausting server resources if the problem is not in the connection dropping but we do not care if it succeeds or fails. Then we recreate the connection from scratch.
This may or may not work in your case as we do not know from your code how you re-enter this after an exception - and you do not seem to be sure either based on your comments.
Related
Currently, I'm using pymodbus to read data from a few slave devices. I discovered that even if the connecting fails, the script would still continue regardless instead of just failing. I want it to just fail properly so that I can catch the error itself.
Below is the function I made for establishing a connection.
def init485(port_in, baudrate_in):
client = ModbusClient(method="rtu", port=port_in, stopbits=1 ,bytesize=8 ,parity="N", baudrate=baudrate_in, timeout=3)
connection = client.connect()
print("Connecting to", port_in, "...")
sleep(5)
if (connection is True):
print("Connection successful at", getcurrenttime())
else:
print("Failed to connect. Please check if settings are correct.")
sleep(2)
return connection, client
And below that is the try-except within a while loop.
while True:
try:
c = init485("/dev/ttyUSB0", 9600)
connection = c[0]
mb_client = c[1]
while (connection is True):
...
except:
mb_client.close()
print("Failed")
sleep(60)
The idea is that if the connection to the port fails, it'd catch and print "Failed". Right now, it's just stuck in the while loop. Is there a way to make it work?
The error I want to catch is this:
ERROR:pymodbus.client.sync:could not open port '/dev/ttyUSB0': FileNotFoundError(2, 'The system cannot find the path specified.', None, 3)
I have created a multithreaded socket server to connect many clients to the server using python. If a client stops unexpectedly due to an exception, server runs nonstop. Is there a way to kill that particular thread alone in the server and the rest running
Server:
class ClientThread(Thread):
def __init__(self,ip,port):
Thread.__init__(self)
self.ip = ip
self.port = port
print("New server socket thread started for " + ip + ":" + str(port))
def run(self):
while True :
try:
message = conn.recv(2048)
dataInfo = message.decode('ascii')
print("recv:::::"+str(dataInfo)+"::")
except:
print("Unexpected error:", sys.exc_info()[0])
Thread._stop(self)
tcpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpServer.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
tcpServer.bind((TCP_IP, 0))
tcpServer.listen(10)
print("Port:"+ str(tcpServer.getsockname()[1]))
threads = []
while True:
print( "Waiting for connections from clients..." )
(conn, (ip,port)) = tcpServer.accept()
newthread = ClientThread(ip,port)
newthread.start()
threads.append(newthread)
for t in threads:
t.join()
Client:
def Main():
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect((host,int(port)))
while True:
try:
message = input("Enter Command")
s.send(message.encode('ascii'))
except Exception as ex:
logging.exception("Unexpected error:")
break
s.close()
Sorry about a very, very long answer but here goes.
There are quite a many issues with your code. First of all, your client does not actually close the socket, as s.close() will never get executed. Your loop is interrupted at break and anything that follows it will be ignored. So change the order of these statements for the sake of good programming but it has nothing to do with your problem.
Your server code is wrong in quite a many ways. As it is currently written, it never exits. Your threads also do not work right. I have fixed your code so that it is a working, multithreaded server, but it still does not exit as I have no idea what would be the trigger to make it exit. But let us start from the main loop:
while True:
print( "Waiting for connections from clients..." )
(conn, (ip,port)) = tcpServer.accept()
newthread = ClientThread(conn, ip,port)
newthread.daemon = True
newthread.start()
threads.append(newthread) # Do we need this?
for t in threads:
t.join()
I have added passing of conn to your client thread, the reason of which becomes apparent in a moment. However, your while True loop never breaks, so you will never enter the for loop where you join your threads. If your server is meant to be run indefinitely, this is not a problem at all. Just remove the for loop and this part is fine. You do not need to join threads just for the sake of joining them. Joining threads only allows your program to block until a thread has finished executing.
Another addition is newthread.daemon = True. This sets your threads to daemonic, which means they will exit as soon as your main thread exits. Now your server responds to control + c even when there are active connections.
If your server is meant to be never ending, there is also no need to store threads in your main loop to threads list. This list just keeps growing as a new entry will be added every time a client connects and disconnects, and this leaks memory as you are not using the threads list for anything. I have kept it as it was there, but there still is no mechanism to exit the infinite loop.
Then let us move on to your thread. If you want to simplify the code, you can replace the run part with a function. There is no need to subclass Thread in this case, but this works so I have kept your structure:
class ClientThread(Thread):
def __init__(self,conn, ip,port):
Thread.__init__(self)
self.ip = ip
self.port = port
self.conn = conn
print("New server socket thread started for " + ip + ":" + str(port))
def run(self):
while True :
try:
message = self.conn.recv(2048)
if not message:
print("closed")
try:
self.conn.close()
except:
pass
return
try:
dataInfo = message.decode('ascii')
print("recv:::::"+str(dataInfo)+"::")
except UnicodeDecodeError:
print("non-ascii data")
continue
except socket.error:
print("Unexpected error:", sys.exc_info()[0])
try:
self.conn.close()
except:
pass
return
First of all, we store conn to self.conn. Your version used a global version of conn variable. This caused unexpected results when you had more than one connection to the server. conn is actually a new socket created for the client connection at accept, and this is unique to each thread. This is how servers differentiate between client connections. They listen to a known port, but when the server accepts the connection, accept creates another port for that particular connection and returns it. This is why we need to pass this to the thread and then read from self.conn instead of global conn.
Your server "hung" upon client connetion errors as there was no mechanism to detect this in your loop. If the client closes connection, socket.recv() does not raise an exception but returns nothing. This is the condition you need to detect. I am fairly sure you do not even need try/except here but it does not hurt - but you need to add the exception you are expecting here. In this case catching everything with undeclared except is just wrong. You have also another statement there potentially raising exceptions. If your client sends something that cannot be decoded with ascii codec, you would get UnicodeDecodeError (try this without error handling here, telnet to your server port and copypaste some Hebrew or Japanese into the connection and see what happens). If you just caught everything and treated as socket errors, you would now enter the thread ending part of the code just because you could not parse a message. Typically we just ignore "illegal" messages and carry on. I have added this. If you want to shut down the connection upon receiving a "bad" message, just add self.conn.close() and return to this exception handler as well.
Then when you really are encountering a socket error - or the client has closed the connection, you will need to close the socket and exit the thread. You will call close() on the socket - encapsulating it in try/except as you do not really care if it fails for not being there anymore.
And when you want to exit your thread, you just return from your run() loop. When you do this, your thread exits orderly. As simple as that.
Then there is yet another potential problem, if you are not only printing the messages but are parsing them and doing something with the data you receive. This I do not fix but leave this to you.
TCP sockets transmit data, not messages. When you build a communication protocol, you must not assume that when your recv returns, it will return a single message. When your recv() returns something, it can mean one of five things:
The client has closed the connection and nothing is returned
There is exactly one full message and you receive that
There is only a partial message. Either because you read the socket before the client had transmitted all data, or because the client sent more than 2048 bytes (even if your client never sends over 2048 bytes, a malicious client would definitely try this)
There are more than one messages waiting and you received them all
As 4, but the last message is partial.
Most socket programming mistakes are related to this. The programmer expects 2 to happen (as you do now) but they do not cater for 3-5. You should instead analyse what was received and act accordingly. If there seems to be less data than a full message, store it somewhere and wait for more data to appear. When more data appears, concatenate these and see if you now have a full message. And when you have parsed a full message from this buffer, inspect the buffer to see if there is more data there - the first part of the next message or even more full messages if your client is fast and server is slow. If you process a message and then wipe the buffer, you might have wiped also bytes from your next message.
Is there a way to recontinue an ssh connection after the connection was interrupted? Paramiko seems to have a timeout when it doesn't get any response from the connected device. After disconnection, if I try to execute something over the ssh connection, I get the error "Socket is closed". I do know that there is an option for timeout in ssh.connect() but I already tried to set it to 99999 and None but that didn't work.
Btw, the program continuously tries to send input over ssh.write(). If that doesn't work, it waits for 2 seconds and tries again.
Try something like this, it will write to the ssh connection and will reconnect if the connection times out.
def writeOrReconnect(towrite)
try:
return ssh.write(towrite)
except socket.error as e:
#re-connect here
return ssh.write(towrite)
To use it..
writeOrReconnect('sudo apt-get install ufw') #or whatever you put inside ssh.write()
I'm writing a python program that uses Telnet to send the same few commands once every second, and then reads the output, organizes it into a Dictionary, and then prints to a JSON file (Were it is later read in by a front-end web-gui). The purpose of this is to provide a live-updates of crucial telnet command outputs.
The problem I am having is that if the connection is lost halfway though the program, it causes the program to crash. I have tried a number of ways to deal with this, such using a Boolean that is set to True once the connection is made and False if there is a timeout error, but this has some limitations. If the connection is successfully made, but later gets disconnected, the Boolean will read true in spite of the connection being lost. I have found some ways to deal with this too (Ex: if a Telnet command returns no output within 5 seconds, the connection was lost, and the boolean is updated to False).
However it is a complex program and it seems there are too many possible ways a disconnect can slip by the checks I have written and still cause the program to crash.
I am hoping to find a very simple way of checking that the Telnet command is connected. Better yet if it is a single line of code. The only way I currently know of how to check if it is connected is to try and connect again, which will fail if the network connection is lost. However, I do not want to have to open a new telnet connection every time I check to make sure it is connected. If it is already connected, it is a waste of crucial time, and there is no way to know it is not connected until after you try to connect.
I'm looking for something like:
tnStatus = [function or line of code that checks if Telnet is connected (w/o trying to open a connection), and returns boolean]
if(tnStatus == True):
sendComand('bla')
Any suggestions?
I'm running Python 2.6 (cannot update for backwards compatibility reasons)
EDIT:
This is (abridged) code of how I am presently connecting to telnet and sending/reading commands.
class cliManager():
'''
Class to manage a Command Line Interface connection via Telnet
'''
def __init__(self, host, port, timeout):
self.host = host
self.port = port
self.timeout = timeout #Timeout for connecting to telnet
self.isConnected = False
# CONNECT to device via TELNET, catch connection errors.
def connect(self):
try:
if self.tn:
self.tn.close()
print("Connecting...")
self.tn = telnetlib.Telnet(self.host, self.port, self.timeout)
print("Connection Establised")
self.isConnected = True
except Exception:
print("Connection Failed")
self.isConnected = False
.
.
.
def sendCmd(self, cmd):
# CHECK if connected, if not then reconnect
output = {}
if not self.reconnect():
return output
#Ensure cmd is valid, strip out \r\t\n, etc
cmd = self.validateCmd(cmd)
#Send Command and newline
self.tn.write(cmd + "\n")
response = ''
try:
response = self.tn.read_until('\n*', 5)
if len(response) == 0:
print "No data returned!"
self.isConnected = False
except EOFError:
print "Telnet Not Connected!"
self.isConnected = False
output = self.parseCmdStatus(response)
return output
elswhere...
cli = cliManager("136.185.10.44", 6000, 2)
cli.connect()
giDict = cli.sendCmd('getInfo')
[then giDict and other command results go to other methods where they are formatted and interpreted for the front end user]
You can try following code to check if telnet connection is still usable or not.
def is_connected(self):
try:
self.tn.read_very_eager()
return True
except EOFError:
print("EOFerror: telnet connection is closed")
return False
You can also refer https://docs.python.org/3/library/telnetlib.html for Telnet.read_very_eager() usage and:
https://lgfang.github.io/computer/2007/07/06/py-telnetlib#:~:text=The%20difference%20is%20that%20read_eager,read%20as%20much%20as%20possible.&text=The%20remaining%20read%20functions%20basically%20block%20until%20they%20received%20designated%20data.
I have a rabbitmq server and a amqp consumer (python) using kombu.
I have installed my app in a system that has a firewall that closes idle connections after 1 hour.
This is my amqp_consumer.py:
try:
# connections
with Connection(self.broker_url, ssl=_ssl, heartbeat=self.heartbeat) as conn:
chan = conn.channel()
# more stuff here
with conn.Consumer(queue, callbacks = [messageHandler], channel = chan):
# Process messages and handle events on all channels
while True:
conn.drain_events()
except Exception as e:
# do stuff
what i want is that if the firewall closed the connection, then i want to reconnect. should i use the heartbeat argument or should i pass a timeout argument (of 3600 sec) to the drain_events() function?
What are the differences between both options? (seems to do the same).
Thanks.
The drain_events on it's own would not produce any heartbeats, unless there are messages to consume and acknowledge. If the queue is idle then eventually the connection would be closed (by rabbit server or by your firewall).
What you should do is use both the heartbeat and the timeout like so:
while True:
try:
conn.drain_events(timeout=1)
except socket.timeout:
conn.heartbeat_check()
This way even if the queue is idle the connection won't be closed.
Besides that you might want to wrap the whole thing with a retry policy in case the connection does get closed or some other network error.