Python: Multithreaded socket server runs endlessly when client stops unexpectedly

Python: Multithreaded socket server runs endlessly when client stops unexpectedly - python

I have created a multithreaded socket server to connect many clients to the server using python. If a client stops unexpectedly due to an exception, server runs nonstop. Is there a way to kill that particular thread alone in the server and the rest running
Server:
class ClientThread(Thread):
def __init__(self,ip,port):
Thread.__init__(self)
self.ip = ip
self.port = port
print("New server socket thread started for " + ip + ":" + str(port))
def run(self):
while True :
try:
message = conn.recv(2048)
dataInfo = message.decode('ascii')
print("recv:::::"+str(dataInfo)+"::")
except:
print("Unexpected error:", sys.exc_info()[0])
Thread._stop(self)
tcpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpServer.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
tcpServer.bind((TCP_IP, 0))
tcpServer.listen(10)
print("Port:"+ str(tcpServer.getsockname()[1]))
threads = []
while True:
print( "Waiting for connections from clients..." )
(conn, (ip,port)) = tcpServer.accept()
newthread = ClientThread(ip,port)
newthread.start()
threads.append(newthread)
for t in threads:
t.join()
Client:
def Main():
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect((host,int(port)))
while True:
try:
message = input("Enter Command")
s.send(message.encode('ascii'))
except Exception as ex:
logging.exception("Unexpected error:")
break
s.close()

Sorry about a very, very long answer but here goes.
There are quite a many issues with your code. First of all, your client does not actually close the socket, as s.close() will never get executed. Your loop is interrupted at break and anything that follows it will be ignored. So change the order of these statements for the sake of good programming but it has nothing to do with your problem.
Your server code is wrong in quite a many ways. As it is currently written, it never exits. Your threads also do not work right. I have fixed your code so that it is a working, multithreaded server, but it still does not exit as I have no idea what would be the trigger to make it exit. But let us start from the main loop:
while True:
print( "Waiting for connections from clients..." )
(conn, (ip,port)) = tcpServer.accept()
newthread = ClientThread(conn, ip,port)
newthread.daemon = True
newthread.start()
threads.append(newthread) # Do we need this?
for t in threads:
t.join()
I have added passing of conn to your client thread, the reason of which becomes apparent in a moment. However, your while True loop never breaks, so you will never enter the for loop where you join your threads. If your server is meant to be run indefinitely, this is not a problem at all. Just remove the for loop and this part is fine. You do not need to join threads just for the sake of joining them. Joining threads only allows your program to block until a thread has finished executing.
Another addition is newthread.daemon = True. This sets your threads to daemonic, which means they will exit as soon as your main thread exits. Now your server responds to control + c even when there are active connections.
If your server is meant to be never ending, there is also no need to store threads in your main loop to threads list. This list just keeps growing as a new entry will be added every time a client connects and disconnects, and this leaks memory as you are not using the threads list for anything. I have kept it as it was there, but there still is no mechanism to exit the infinite loop.
Then let us move on to your thread. If you want to simplify the code, you can replace the run part with a function. There is no need to subclass Thread in this case, but this works so I have kept your structure:
class ClientThread(Thread):
def __init__(self,conn, ip,port):
Thread.__init__(self)
self.ip = ip
self.port = port
self.conn = conn
print("New server socket thread started for " + ip + ":" + str(port))
def run(self):
while True :
try:
message = self.conn.recv(2048)
if not message:
print("closed")
try:
self.conn.close()
except:
pass
return
try:
dataInfo = message.decode('ascii')
print("recv:::::"+str(dataInfo)+"::")
except UnicodeDecodeError:
print("non-ascii data")
continue
except socket.error:
print("Unexpected error:", sys.exc_info()[0])
try:
self.conn.close()
except:
pass
return
First of all, we store conn to self.conn. Your version used a global version of conn variable. This caused unexpected results when you had more than one connection to the server. conn is actually a new socket created for the client connection at accept, and this is unique to each thread. This is how servers differentiate between client connections. They listen to a known port, but when the server accepts the connection, accept creates another port for that particular connection and returns it. This is why we need to pass this to the thread and then read from self.conn instead of global conn.
Your server "hung" upon client connetion errors as there was no mechanism to detect this in your loop. If the client closes connection, socket.recv() does not raise an exception but returns nothing. This is the condition you need to detect. I am fairly sure you do not even need try/except here but it does not hurt - but you need to add the exception you are expecting here. In this case catching everything with undeclared except is just wrong. You have also another statement there potentially raising exceptions. If your client sends something that cannot be decoded with ascii codec, you would get UnicodeDecodeError (try this without error handling here, telnet to your server port and copypaste some Hebrew or Japanese into the connection and see what happens). If you just caught everything and treated as socket errors, you would now enter the thread ending part of the code just because you could not parse a message. Typically we just ignore "illegal" messages and carry on. I have added this. If you want to shut down the connection upon receiving a "bad" message, just add self.conn.close() and return to this exception handler as well.
Then when you really are encountering a socket error - or the client has closed the connection, you will need to close the socket and exit the thread. You will call close() on the socket - encapsulating it in try/except as you do not really care if it fails for not being there anymore.
And when you want to exit your thread, you just return from your run() loop. When you do this, your thread exits orderly. As simple as that.
Then there is yet another potential problem, if you are not only printing the messages but are parsing them and doing something with the data you receive. This I do not fix but leave this to you.
TCP sockets transmit data, not messages. When you build a communication protocol, you must not assume that when your recv returns, it will return a single message. When your recv() returns something, it can mean one of five things:
The client has closed the connection and nothing is returned
There is exactly one full message and you receive that
There is only a partial message. Either because you read the socket before the client had transmitted all data, or because the client sent more than 2048 bytes (even if your client never sends over 2048 bytes, a malicious client would definitely try this)
There are more than one messages waiting and you received them all
As 4, but the last message is partial.
Most socket programming mistakes are related to this. The programmer expects 2 to happen (as you do now) but they do not cater for 3-5. You should instead analyse what was received and act accordingly. If there seems to be less data than a full message, store it somewhere and wait for more data to appear. When more data appears, concatenate these and see if you now have a full message. And when you have parsed a full message from this buffer, inspect the buffer to see if there is more data there - the first part of the next message or even more full messages if your client is fast and server is slow. If you process a message and then wipe the buffer, you might have wiped also bytes from your next message.

Related

How to send multiple messages over same socket connection?

I am trying to send an array of messages through the same socket connection, but I get an error.
Here is my client code:
def send_over_socket(hl7_msg_array):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((config.HOST, config.PORT))
for single_hl7_msg in hl7_msg_array:
sock.send(single_hl7_msg.to_mllp().encode('UTF-8'))
received = sock.recv(1024*1024)
print("Sent: ", received)
sock.shutdown()
sock.close()
While debugging the code, I see that the exception occurs when I call the sock.recv(1024*1024) for the second message.
Here is the error:
ConnectionAbortedError: [WinError 10053] An established connection was aborted by the software in your host machine
Server-side code:
def run_mllp_server():
class PDQHandler(AbstractHandler):
def reply(self):
msg = hl7.parse(self.incoming_message)
msg_pid = msg[1][3]
msg_key = msg[2][3][0][1]
msg_value = msg[2][5]
lock = RLock()
lock.acquire()
results_collection[str(msg_pid)][str(msg_key)] = str(msg_value)
lock.release()
print("Received: ", repr(self.incoming_message))
return parse_message(self.incoming_message).to_mllp()
# error handler
class ErrorHandler(AbstractErrorHandler):
def reply(self):
if isinstance(self.exc, UnsupportedMessageType):
print("Error handler success 1")
else:
print("Error handler else case")
handlers = {
'ORU^R01^ORU_R01': (PDQHandler,),
'ERR': (ErrorHandler,)
}
server = MLLPServer(config.SOCKET_HOST, config.SOCKET_PORT, handlers)
print("Running Socket on port ", config.SOCKET_PORT)
server.serve_forever()
Here I am using MLLP protocol which has a TCP connection behind the scenes.
Can you help me please figure out what is going on? Is it a problem of ACK?

I do not know python at all but...
I do not think multiple messages is your problem. Looking at exception, I guess your first message is being sent correctly. Then, your client code waits for ACK to be received; but server never sends it. It instead closes the connection.
Also, make sure that whether sendall should be used instead of send.
After above issue is fixed, to send multiple messages on same connection, you have to follow MLLP (also called LLP) so that server can differentiate the message.
Description HEX ASCII Symbol
Message starting character 0B 11 <VT>
Message ending characters 1C,0D 28,13 <FS>,<CR>
This way, when you send a message to Listener (TCP/MLLP server), it looks for Start and End Block in your incoming data. Based on it, it differentiates each message.
Please refer to this answer for more details.

TCP Socket unable to send data even after executing sendall ()

Hi I have multiple systems communicating via messages using TCP connections.
In the Process of Communication between the two, I am at first sending a message as "Start Process", for which in return it should reply as "Process Started"
However, the message "Process Started" is not received to the other system, while the line sendall ("Process Started") is executed without any exception.
My sample code is as follows:
TCP Initialisation:
def __init__(self):
self.tcp = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.tcp.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.tcp.setsockopt(socket.SOL_SOCKET, socket.SO_RCVTIMEO, struct.pack('ll',0,1))
self.tcp.bind('',12000)
self.tcp.listen(1)
self.client, seld.address = self.tcp.accept()
(self.Ip,self.port)= self.address
Main Funciton:
while True:
msg =""
try:
msg = self.client.recv(512)
except socket.error as e:
if e[0] == 11:
# exception for RECVTIMEO used in the socket creation.
pass
if (msg == "Start Process"):
send = "Process Started"
self.client.sendall(send)
print "status sent"
While Executing this piece of code, I am able to receive the message "Start Process". but "Process started" is not sent though on executing the line self.client.sendall(send), I have captured the packets in wireshark, I got the packet containing "Start Process" but No packet obtained for "Process Started"
Can someone help me with this ???

Your code has several issues, but the one you are struggling with now is that if you connect to port 12000 with telnet and send "Start Process" and hit enter, you will actually send and receive Start Process\n instead of Start Process and your if statement does not match. If you strip the trailing newline, you can mitigate this:
try:
msg = self.client.recv(512)
msg = msg.strip()
Your program may need some tweaking, though. As you set your socket non-blocking, self.tcp.accept() raises an exception as you try accepting a connection that is not there yet. The same goes with self.client.recv(), after you have received the correct message. When you add strip, your if clause will match and you will get a response. However, it will just keep responding, as it immediately tries to recv(512) again, fails with an exception, then you catch it and do the if clause again. msg is still "Start Process" as it has not changed (the consecutive recv just failed, it did not return "" or None).
Try moving your if clause inside your try/except construction and only compare msg if you actually have received one.
This program would also be immensely wasteful, as it would be running in a busy loop most of the time, waiting for either connections or messages. If you need to use non-blocking sockets, you would need to address this.
A better way might be using threads to have a blocking thread that listens and accepts new connections, and for every connection it would spawn a worker thread. These threads could then be blocking in recv() until a message is received.
Just a thought.
Hannu

Python IRC ChatBot hangs on socket.recv after seemingly random time even though socket.settimeout is 8

Hey so I decided to create an IRC ChatBot whose sole purpose it is to read incoming messages from Twitch Chat and if a giveaway is recognized by a keyword it's supposed to enter the giveaway by sending !enter in Chat.
I build the Bot upon this source: https://github.com/BadNidalee/ChatBot. I only changed things in the Run.py so thats the only Code I'm going to post. The unaltered ChatBot does work but it has no reconnect ability and regularly stops receiving data because the socket closes or other reasons.
All I wanted to change was make it so that the ChatBot is stable and can just stay in the IRC Chat constantly without disconnecting. I tried to achieve this by setting a timeout of 8 seconds for my socket and catching timeout exceptions that would occur and reconnect after they occur.
And all in all it does seem to work, my Bot does what it's supposed to even when alot of messages are coming in, it recognizes when a Giveaway starts and answers acordingly. IRC Server PING Messages are also handled and answered correctly. If there is no message in Chat for over 8 seconds the Exception gets thrown correctly and the Bot also reconnects correctly to IRC.
BUT heres my Problem: After seemingly random times the socket will literally just Stop working. What I find strange is it will sometimes work for 20 minutes and sometimes for an hour. It doesn't occur when special events, like lots of messages or something else happens in Chat, it really seems random. It will not timeout there's just nothing happening anymore. If I cancel the program with CTRL-C at this point the console sais the last call was "readbuffer = s.recv(1024)" But why is it not throwing a timeout exception at that point? If s.recv was called the socket should timeout if nothing is received after 8 seconds but the program just stops and there is no more output until you manually abort it.
Maybe I went about it the wrong way completely. I just want a stable 24/7-able ChatBot that scans for one simple keyword and answers with one simple !enter.
This is also my first Time programming in Python so If I broke any conventions or made any grave mistakes let me know.
The getUser Method returns the username of the line of chat that is scanned currently.
The getMessage Method returns the message of the line of chat that is scanned.
The openSocket Method opens the Socket and sends JOIN NICK PASS etc to the IRC
#!/usr/bin/python
import string
import socket
import datetime
import time
from Read import getUser, getMessage
from Socket import openSocket, sendMessage
from Initialize import joinRoom
connected = False
readbuffer = ""
def connect():
print "Establishing Connection..."
irc = openSocket()
joinRoom(irc)
global connected
connected = True
irc.settimeout(8.0)
print "Connection Established!"
return irc
while True:
s = connect()
s.settimeout(8.0)
while connected:
try:
readbuffer = s.recv(1024)
temp = string.split(readbuffer, "\n")
readbuffer = temp.pop()
for line in temp:
if "PING" in line:
s.send(line.replace("PING", "PONG"))
timern = str(datetime.datetime.now().time())
timern = timern[0:8]
print timern + " PING received"
break
user = getUser(line)
message = getMessage(line)
timern = str(datetime.datetime.now().time())
timern = timern[0:8]
print timern +" " + user + ": " + message
if "*** NEW" in message:
sendMessage(s, "!enter")
break
except socket.timeout:
connected = False
print "Socket Timed Out, Connection closed!"
break
except socket.error:
connected = False
print "Socket Error, Connection closed!"
break

I think you've missunderstood how timeout work on the socket.
s.settimeout(8.0)
Will only set s.connect(...) to timeout if it can't reach the destination host.
Further more, usually what you want to use instead if s.setblocking(0) however this alone won't help you either (probably).
Instead what you want to use is:
import select
ready = select.select([s], [], [], timeout_in_seconds)
if ready[0]:
data = s.recv(1024)
What select does is check the buffer to see if any incoming data is available, if there is you call recv() which in itself is a blocking operation. If there's nothing in the buffer select will return empty and you should avoid calling recv().
If you're running everything on *Nix you're also better off using epoll.
from select import epoll, EPOLLIN
poll = epoll()
poll.register(s.fileno(), EPOLLIN)
events = poll.poll(1) # 1 sec timeout
for fileno, event in events:
if event is EPOLLIN and fileno == s.fileno():
data = s.recv(1024)
This is a crude example of how epoll could be used.
But it's quite fun to play around with and you should read more about it

Deadlock in Python Threads

I am trying to implement a simpley portscanner with Python. It works by creating a number of worker threads which scan ports that are provided in a queue. They save the results in another queue. When all ports are scanned the threads and the application should terminate. And here lies the problem: For small numbers of ports everything works fine, but if I try to scan 200 or more ports, the application will get caught in a deadlock. I have no idea, why.
class ConnectScan(threading.Thread):
def __init__(self, to_scan, scanned):
threading.Thread.__init__(self)
self.to_scan = to_scan
self.scanned = scanned
def run(self):
while True:
try:
host, port = self.to_scan.get()
except Queue.Empty:
break
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
s.connect((host, port))
s.close()
self.scanned.put((host, port, 'open'))
except socket.error:
self.scanned.put((host, port, 'closed'))
self.to_scan.task_done()
class ConnectScanner(object):
def scan(self, host, port_from, port_to):
to_scan = Queue.Queue()
scanned = Queue.Queue()
for port in range(port_from, port_to + 1):
to_scan.put((host, port))
for i in range(20):
ConnectScan(to_scan, scanned).start()
to_scan.join()
Does anybody see what might be wrong? Also I would appreciate some tipps how to debug such threading issues in Python.

I don't see anything obviously wrong with your code, but as it stands the break will never be hit - self.to_scan.get() will wait forever rather than raising Queue.Empty. Given that you're loading up the queue with ports to scan before starting the threads, you can change that to self.to_scan.get(False) to have the worker threads exit correctly when all the ports have been claimed.
Combined with the fact that you have non-daemon threads (threads that will keep the process alive after the main thread finishes), that could be the cause of the hang. Try printing something after the to_scan.join() to see whether it's stopped there, or at process exit.
As Ray says, if an exception other than socket.error is raised between self.to_scan.get() and self.to_scan.task_done(), then the join call will hang. It could help to change that code to use a try/finally to be sure:
def run(self):
while True:
try:
host, port = self.to_scan.get(False)
except Queue.Empty:
break
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
s.connect((host, port))
s.close()
self.scanned.put((host, port, 'open'))
except socket.error:
self.scanned.put((host, port, 'closed'))
finally:
self.to_scan.task_done()
In general, debugging multithreaded processes is tricky. I try to avoid anything blocking indefinitely - it's better to have something crash noisily because a timeout was too short than to have it just stop forever waiting for an item that will never appear. So I'd specify timeouts for your self.to_scan.get, socket.connect and to_scan.join calls.
Use logging to work out the order events are occurring - printing can get interleaved from different threads, but loggers are thread-safe.
Also, something like this recipe can be handy for dumping the current stack trace for each thread.
I haven't used any debuggers with support for debugging multiple threads in Python, but there are some listed here.

It is likely that not all items on the to_scan queue are consumed and that you're not calling the task_done method enough times to unblock ConnectScanner.
Could it be that an exception is thrown during the runtime of ConnectScan.run that you're not catching and your threads prematurely terminate?

zeromq: how to prevent infinite wait?

I just got started with ZMQ. I am designing an app whose workflow is:
one of many clients (who have random PULL addresses) PUSH a request to a server at 5555
the server is forever waiting for client PUSHes. When one comes, a worker process is spawned for that particular request. Yes, worker processes can exist concurrently.
When that process completes it's task, it PUSHes the result to the client.
I assume that the PUSH/PULL architecture is suited for this. Please correct me on this.
But how do I handle these scenarios?
the client_receiver.recv() will wait for an infinite time when server fails to respond.
the client may send request, but it will fail immediately after, hence a worker process will remain stuck at server_sender.send() forever.
So how do I setup something like a timeout in the PUSH/PULL model?
EDIT: Thanks user938949's suggestions, I got a working answer and I am sharing it for posterity.

If you are using zeromq >= 3.0, then you can set the RCVTIMEO socket option:
client_receiver.RCVTIMEO = 1000 # in milliseconds
But in general, you can use pollers:
poller = zmq.Poller()
poller.register(client_receiver, zmq.POLLIN) # POLLIN for recv, POLLOUT for send
And poller.poll() takes a timeout:
evts = poller.poll(1000) # wait *up to* one second for a message to arrive.
evts will be an empty list if there is nothing to receive.
You can poll with zmq.POLLOUT, to check if a send will succeed.
Or, to handle the case of a peer that might have failed, a:
worker.send(msg, zmq.NOBLOCK)
might suffice, which will always return immediately - raising a ZMQError(zmq.EAGAIN) if the send could not complete.

This was a quick hack I made after I referred user938949's answer and http://taotetek.wordpress.com/2011/02/02/python-multiprocessing-with-zeromq/ . If you do better, please post your answer, I will recommend your answer.
For those wanting lasting solutions on reliability, refer http://zguide.zeromq.org/page:all#toc64
Version 3.0 of zeromq (beta ATM) supports timeout in ZMQ_RCVTIMEO and ZMQ_SNDTIMEO. http://api.zeromq.org/3-0:zmq-setsockopt
Server
The zmq.NOBLOCK ensures that when a client does not exist, the send() does not block.
import time
import zmq
context = zmq.Context()
ventilator_send = context.socket(zmq.PUSH)
ventilator_send.bind("tcp://127.0.0.1:5557")
i=0
while True:
i=i+1
time.sleep(0.5)
print ">>sending message ",i
try:
ventilator_send.send(repr(i),zmq.NOBLOCK)
print " succeed"
except:
print " failed"
Client
The poller object can listen in on many recieving sockets (see the "Python Multiprocessing with ZeroMQ" linked above. I linked it only on work_receiver. In the infinite loop, the client polls with an interval of 1000ms. The socks object returns empty if no message has been recieved in that time.
import time
import zmq
context = zmq.Context()
work_receiver = context.socket(zmq.PULL)
work_receiver.connect("tcp://127.0.0.1:5557")
poller = zmq.Poller()
poller.register(work_receiver, zmq.POLLIN)
# Loop and accept messages from both channels, acting accordingly
while True:
socks = dict(poller.poll(1000))
if socks:
if socks.get(work_receiver) == zmq.POLLIN:
print "got message ",work_receiver.recv(zmq.NOBLOCK)
else:
print "error: message timeout"

The send wont block if you use ZMQ_NOBLOCK, but if you try closing the socket and context, this step would block the program from exiting..
The reason is that the socket waits for any peer so that the outgoing messages are ensured to get queued.. To close the socket immediately and flush the outgoing messages from the buffer, use ZMQ_LINGER and set it to 0..

If you're only waiting for one socket, rather than create a Poller, you can do this:
if work_receiver.poll(1000, zmq.POLLIN):
print "got message ",work_receiver.recv(zmq.NOBLOCK)
else:
print "error: message timeout"
You can use this if your timeout changes depending on the situation, instead of setting work_receiver.RCVTIMEO.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.