How do I gracefully close a socket with a persistent HTTP connection? - python

I'm writing a very simple client in Python that fetches an HTML page from the WWW. This is the code I've come up with so far:
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("www.mywebsite.com", 80))
sock.send(b"GET / HTTP/1.1\r\nHost:www.mywebsite.com\r\n\r\n")
while True:
chunk = sock.recv(1024) # (1)
if len(chunk) == 0:
break
print(chunk)
sock.close()
The problem is: being an HTTP/1.1 connection persistent by default, the code gets stuck in # (1) waiting for more data from the server once the transmission is over.
I know I can solve this by a) adding the Connection: close request header, or by b) setting a timeout to the socket. A non-blocking socket here would not help, as the select() syscall would still hang (unless I set a timeout on it, but that's just another form of case b)).
So is there another way to do it, while keeping the connection persistent?

As has already been said in the comments, there's a lot to consider if you're trying to write an all-singing, all-dancing HTTP processor. However, if you're just practising with sockets then consider this.
Let's assume that you know how the response will end. For example, if we do essentially what you're doing in your code to the main Google page, we know that the response will end with '\r\n\r\n'. So, what we can do is just read 1 byte at a time and look out for that terminating sequence.
This code will NOT give you the full Google main page because, as you will see, the response is chunked - and that's a whole new ball game.
Having said all of that, you may find this instructive:
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
sock.connect(('www.google.com', 80))
sock.send(b'GET / HTTP/1.1\r\nHost:www.google.com\r\n\r\n')
end = [b'\r', b'\n', b'\r', b'\n']
d = []
while d[-len(end):] != end:
d.append(sock.recv(1))
print(''.join(b.decode() for b in d))
finally:
sock.close()

Related

What is a proper endless socket server loop in Python

I am a Python newbie and my first task is to create a small server program that will forward events from a network unit to a rest api.
The overall structure of my code seems to work, but I have one problem. After I receive the first package, nothing happens. Is something wrong with my loop such that new packages (from the same client) aren't accepted?
Packages look something like this: EVNTTAG 20190219164001132%0C%3D%E2%80h%90%00%00%00%01%CBU%FB%DF ... not that it matters, but I'm sharing just for clarity.
My code (I skipped the irrelevant init of rest etc. but the main loop is the complete code):
# Configure TAGP listener
ipaddress = ([l for l in ([ip for ip in socket.gethostbyname_ex(socket.gethostname())[2] if not ip.startswith("127.")][:1], [[(s.connect(('8.8.8.8', 53)), s.getsockname()[0], s.close()) for s in [socket.socket(socket.AF_INET, socket.SOCK_DGRAM)]][0][1]]) if l][0][0])
server_name = ipaddress
server_address = (server_name, TAGPListenerPort)
print ('starting TAGP listener on %s port %s' % server_address)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(server_address)
sock.listen(1)
sensor_data = {'tag': 0}
# Listen for TAGP data and forward events to ThingsBoard
try:
while True:
data = ""
connection, client_address = sock.accept()
data = str(connection.recv(1024))
if data.find("EVNTTAG") != -1:
timestamp = ((data.split())[1])[:17]
tag = ((data.split())[1])[17:]
sensor_data['tag'] = tag
client.publish('v1/devices/me/telemetry', json.dumps(sensor_data), 1)
print (data)
except KeyboardInterrupt:
# Close socket server (TAGP)
connection.shutdown(1)
connection.close()
# Close client to ThingsBoard
client.loop_stop()
client.disconnect()
There are multiple issues with your code:
First of all you need a loop over what client sends. So you first connection, client_address = sock.accept() and you now have a client. But in the next iteration of the loop you do .accept() again overwriting your old connection with a new client. If there is no new client this simply waits forever. And that's what you observe.
So this can be fixed like this:
while True:
conn, addr = sock.accept()
while True:
data = conn.recv(1024)
but this code has another issue: no new client can connect until the old one disconnects (well, at the moment it just loops indefinitly regardless of whether the client is alive or not, we'll deal with it later). To overcome it you can use threads (or async programming) and process each client independently. For example:
from threading import Thread
def client_handler(conn):
while True:
data = conn.recv(1024)
while True:
conn, addr = sock.accept()
t = Thread(target=client_handler, args=(conn,))
t.start()
Async programming is harder and I'm not gonna address it here. Just be aware that there are multiple advantages of async over threads (you can google those).
Now each client has its own thread and the main thread only worries about accepting connections. Things happen concurrently. So far so good.
Let's focus on the client_handler function. What you misunderstand is how sockets work. This:
data = conn.recv(1024)
does not read 1024 bytes from the buffer. It actually reads up to 1024 bytes with 0 being possible as well. Even if you send 1024 bytes it can still read say 3. And when you receive a buffer of length 0 then this is an indication that the client disconnected. So first of all you need this:
def client_handler(conn):
while True:
data = conn.recv(1024)
if not data:
break
Now the real fun begins. Even if data is nonempty it can be of arbitrary length between 1 and 1024. Your data can be chunked and may require multiple .recv calls. And no, there is nothing you can do about it. Chunking can happen due to some other proxy servers or routers or network lag or cosmic radiation or whatever. You have to be prepared for it.
So in order to work with that correctly you need a proper framing protocol. For example you have to somehow know how big is the incoming packet (so that you can answer the question "did I read everything I need?"). One way to do that is by prefixing each frame with (say) 2 bytes that combine into total length of the frame. The code may look like this:
def client_handler(conn):
while True:
chunk = conn.recv(1) # read first byte
if not chunk:
break
size = ord(chunk)
chunk = conn.recv(1) # read second byte
if not chunk:
break
size += (ord(chunk) << 8)
Now you know that the incoming buffer will be of length size. With that you can loop to read everything:
def handle_frame(conn, frame):
if frame.find("EVNTTAG") != -1:
pass # do your stuff here now
def client_handler(conn):
while True:
chunk = conn.recv(1)
if not chunk:
break
size = ord(chunk)
chunk = conn.recv(1)
if not chunk:
break
size += (ord(chunk) << 8)
# recv until everything is read
frame = b''
while size > 0:
chunk = conn.recv(size)
if not chunk:
return
frame += chunk
size -= len(chunk)
handle_frame(conn, frame)
IMPORTANT: this is just an example of handling a protocol that prefixes each frame with its length. Note that the client has to be adjusted as well. You either have to define such protocol or if you have a given one you have to read the spec and try to understand how framing works. For example this is done very differently with HTTP. In HTTP you read until you meet \r\n\r\n which signals the end of headers. And then you check Content-Length or Transfer-Encoding headers (not to mention hardcore things like protocol switch) to determine next action. This gets quite complicated though. I just want you to be aware that there are other options. Nevertheless framing is necessary.
Also network programming is hard. I'm not gonna dive into things like security (e.g. against DDOS) and performance. The code above should be treated as extreme simplification, not production ready. I advice using some existing soft.

How to limit the number of connections to a socket and trigger timeout on client (Python)

How can I set a limit on the number of connections that a server socket can accept at once? I want to be able to set a max number of connections, and then once that limit is reached, any further attempts from clients to connect will result in a timeout. So far, I have tried something like this for the server:
sock = socket.socket()
sock.setblocking(0)
sock.bind(address)
sock.listen(0)
connections = []
while True:
readable, writable, exceptional = select.select([sock], [], [])
if readable and len(connections) < MAX_CONNECTIONS:
connection, client_address = s.accept()
connections.append(connection)
# Process connection asynchronously
and for the client:
try:
sock = socket.create_connection(self.address, timeout=TIMEOUT)
sock.settimeout(None)
print "Established connection."
except socket.error as err:
print >> sys.stderr, "Socket connection error: " + str(err)
sys.exit(1)
# If connection successful, do stuff
Because of the structure of the rest of the program, I have chosen to use a non-blocking socket on the server and I do not wish to change this. Right now, the clients are able to connect to the server, even after the limit is reached and the server stops accepting them. How do I solve this? Thanks.
I believe there might be a slight misunderstanding of select() at play here. According to the manpage, select() returns file descriptors that are "ready for some class of IO operation", where ready means "it is possible to perform a corresponding IO operation without blocking".
The corresponding IO operation on a listening socket is accept(), which can only be performed without blocking if the OS already made the full TCP handshake for you, otherwise you might block waiting for the client's final ACK, for instance.
This means that as long as the listening socket is open, connections will be accepted by the OS, even if not being handled by the application.
If you want to reject connections after a set number, you have basically two options:
simply accept and close directly after accepting.
close the listening socket upon reaching the limit and reopen when done.
The second option is more convoluted and requires use of the SO_REUSEADDR option, which might not be the right thing in your case. It might also not work on all OSs, though it does seem to work reliably on Linux.
Here's a quick sketch of the second solution (since the first is pretty straightforward).
def get_listening_socket():
sock = socket.socket()
sock.setblocking(0)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('0.0.0.0', 5555))
sock.listen(0)
return sock
sock = get_listening_socket()
LIMIT = 1
conns = {sock}
while True:
readable, writable, exceptional = select.select(conns, [], [])
if sock in readable: # new connection on the listening socket
conn, caddr = sock.accept()
conns.add(conn)
if len(conns) > LIMIT: # ">" because "sock" is also in there
conns.remove(sock)
sock.close()
else: # reading from an established connection
for c in readable:
buf = c.recv(4096)
if not buf:
conns.remove(c)
sock = get_listening_socket()
conns.add(sock)
else:
print("received: %s" % buf)
You may, however, want to rethink why you'd want to do this in the first place. If it's only about saving some memory on the server, than you might be over-optimizing and should be looking into syn-cookies instead.

When/why to use s.shutdown(socket.SHUT_WR)?

I have just started learning python network programming. I was reading Foundations of Python Network Programming and could not understand the use of s.shutdown(socket.SHUT_WR) where s is a socket object.
Here is the code(where sys.argv[2] is the number of bytes user wants to send, which is rounded off to a multiple of 16) in which it is used:
import socket, sys
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
HOST = '127.0.0.1'
PORT = 1060
if sys.argv[1:] == ['server']:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((HOST, PORT))
s.listen(1)
while True:
print 'Listening at', s.getsockname()
sc, sockname = s.accept()
print 'Processing up to 1024 bytes at a time from', sockname
n = 0
while True:
message = sc.recv(1024)
if not message:
break
sc.sendall(message.upper()) # send it back uppercase
n += len(message)
print '\r%d bytes processed so far' % (n,),
sys.stdout.flush()
print
sc.close()
print 'Completed processing'
elif len(sys.argv) == 3 and sys.argv[1] == 'client' and sys.argv[2].isdigit():
bytes = (int(sys.argv[2]) + 15) // 16 * 16 # round up to // 16
message = 'capitalize this!' # 16-byte message to repeat over and over
print 'Sending', bytes, 'bytes of data, in chunks of 16 bytes'
s.connect((HOST, PORT))
sent = 0
while sent < bytes:
s.sendall(message)
sent += len(message)
print '\r%d bytes sent' % (sent,),
sys.stdout.flush()
print
s.shutdown(socket.SHUT_WR)
print 'Receiving all the data the server sends back'
received = 0
while True:
data = s.recv(42)
if not received:
print 'The first data received says', repr(data)
received += len(data)
if not data:
break
print '\r%d bytes received' % (received,),
s.close()
else:
print >>sys.stderr, 'usage: tcp_deadlock.py server | client <bytes>'
And this is the explanation that the author provides which I am finding hard to understand:
Second, you will see that the client makes a shutdown() call on the socket after it finishes sending its transmission. This solves an important problem: if the server is going to read forever until it sees end-of-file, then how will the client avoid having to do a full close() on the socket and thus forbid itself from doing the many recv() calls that it still needs to make to receive the server’s response? The solution is to “half-close” the socket—that is, to permanently shut down communication in one direction but without destroying the socket itself—so that the server can no longer read any data, but can still send any remaining reply back in the other direction, which will still be open.
My understanding of what it will do is that it will prevent the client application from further sending the data and thus will also prevent the server side from further attempting to read any data.
What I cant understand is that why is it used in this program and in what situations should I consider using it in my programs?
My understanding of what it will do is that it will prevent the client
application from further sending the data and thus will also prevent
the server side from further attempting to read any data.
Your understanding is correct.
What I cant understand is that why is it used in this program …
As your own statement suggests, without the client's s.shutdown(socket.SHUT_WR) the server would not quit waiting for data, but instead stick in its sc.recv(1024) forever, because there would be no connection termination request sent to the server.
Since the server then would never get to its sc.close(), the client on his part also would not quit waiting for data, but instead stick in its s.recv(42) forever, because there would be no connection termination request sent from the server.
Reading this answer to "close vs shutdown socket?" might also be enlightening.
The explanation is half-baked, it applies only to this specific code and overall I would vote with all-fours that this is bad practice.
Now to understand why is it so, you need to look at a server code. This server works by blocking execution until it receives 1024 bytes. Upon reception it processes the data (makes it upper-case) and sends it back. Now the problem is with hardcoded value of 1024. What if your string is shorter than 1024 bytes?
To resolve this you need to tell the server that - hey there is no more data coming your way, so return from message = sc.recv(1024) and you do this by shutting down the socket in one direction.
You do not want to fully close the socket, because then the server would not be able to send you the reply.

Correct multiprocessing to treat UDP in Python

I am trying to implement a simple UDP client and server. Server should receive a message and return a transformed one.
My main technique for server is to listen UDP messages in a loop, then spawn multiprocessing.Process for each incoming message and send the reply within each Process instance:
class InputProcessor(Process):
...
def run(self):
output = self.process_input()
self.sock.sendto(output, self.addr) # send a reply
if __name__ == "__main__":
print "serving at %s:%s" % (UDP_IP, UDP_PORT)
sock = socket.socket(socket.AF_INET, # Internet
socket.SOCK_DGRAM) # UDP
sock.bind((UDP_IP,UDP_PORT))
while True:
data, addr = sock.recvfrom(1024) # buffer size is 1024 bytes
print "received message: %s from %s:%s" % (data, addr[0], addr[1])
p = InputProcessor(sock, data, addr)
p.start()
In test client, I do something like this:
def send_message(ip, port, data):
sock = socket.socket(socket.AF_INET, # Internet
socket.SOCK_DGRAM) # UDP
print "sending: %s" % data
sock.sendto(data, (ip, port))
sock.close()
for i in xrange(SECONDS*REQUESTS_PER_SECOND):
data = generate_data()
p = multiprocessing.Process(target=send_message, args=(UDP_IP,
UDP_PORT,
data))
p.start()
time.sleep(1/REQUESTS_PER_SECOND)
The problem I am having with the code above is that when REQUESTS_PER_SECOND becomes higher than certain value (~50), it seems some client processes receive responses destinated to different processes, i.e. process #1 receives response for process #2, and vice versa.
Please criticize my code as much as possible, due to I am new to network programming and may miss something obvious. Maybe it's even worth and better for some reason to use Twisted, hovewer, I am highly interested in understanding the internals. Thanks.
As per previous answer, I think that the main reason is that there is a race condition at the UDP port for the clients. I do not see receiving at the client code, but presumably it is similar to the one in server part. What I think happens in concrete terms is that for values under 50 requests / second, the request - response roundtrip gets completed and the client exits. When more requests arrive, there may be multiple processes blocking to read the UDP socket, and then it is probably nondeterministic which client process receives the incoming message. If the network latency is going to be larger in the real setting, this limit will be hit sooner.
Thanks guys a lot! It seems I've found why my code failed before. I was using multiprocessing.Manager().dict() within client to check if the results from server are correct. However, I didn't use any locks to wrap a set of write operations to that dict(), thus got a lot of errors though the output from server was correct.
Shortly, in client, I was doing incorrect checks for correct server responses.

Simple Python Web Server trouble

I'm trying to write a python web server using the socket library. I've been through several sources and can't figure out why the code I've written doesn't work. Others have run very similar code and claim it works. I'm new to python so I might be missing something simple.
The only way it will work now is I send the data variable back to the client. The browser prints the original GET request. When I try to send an HTTP response, the connection times out.
import socket
##Creates several variables, including the host name, the port to use
##the size of a transmission, and how many requests can be handled at once
host = ''
port = 8080
backlog = 5
size = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((host,port))
s.listen(backlog)
while 1:
client, address = s.accept()
data = client.recv(16)
if data:
client.send('HTTP/1.0 200 OK\r\n')
client.send("Content-Type: text/html\r\n\r\n")
client.send('<html><body><h1>Hello World</body></html>')
client.close()
s.close()
You need to consume the input before responding, and you shouldn't close the socket in your while loop:
Replace client.recv(16) with client.recv(size), to consume the request.
Move your last line, s.close() back one indent, so that it is not in your while loop. At the moment you are closing the connection, then trying to accept from it again, so your server will crash after the first request.
Unless you are doing this as an exercise, you should extend SimpleHTTPServer instead of using sockets directly.
Also, adding this line after your create the socket (before bind) fixes any "Address already in use" errors you might be getting.
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
Good luck!

Categories

Resources