I'm writing a very simple client in Python that fetches an HTML page from the WWW. This is the code I've come up with so far:
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("www.mywebsite.com", 80))
sock.send(b"GET / HTTP/1.1\r\nHost:www.mywebsite.com\r\n\r\n")
while True:
chunk = sock.recv(1024) # (1)
if len(chunk) == 0:
break
print(chunk)
sock.close()
The problem is: being an HTTP/1.1 connection persistent by default, the code gets stuck in # (1) waiting for more data from the server once the transmission is over.
I know I can solve this by a) adding the Connection: close request header, or by b) setting a timeout to the socket. A non-blocking socket here would not help, as the select() syscall would still hang (unless I set a timeout on it, but that's just another form of case b)).
So is there another way to do it, while keeping the connection persistent?
As has already been said in the comments, there's a lot to consider if you're trying to write an all-singing, all-dancing HTTP processor. However, if you're just practising with sockets then consider this.
Let's assume that you know how the response will end. For example, if we do essentially what you're doing in your code to the main Google page, we know that the response will end with '\r\n\r\n'. So, what we can do is just read 1 byte at a time and look out for that terminating sequence.
This code will NOT give you the full Google main page because, as you will see, the response is chunked - and that's a whole new ball game.
Having said all of that, you may find this instructive:
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
sock.connect(('www.google.com', 80))
sock.send(b'GET / HTTP/1.1\r\nHost:www.google.com\r\n\r\n')
end = [b'\r', b'\n', b'\r', b'\n']
d = []
while d[-len(end):] != end:
d.append(sock.recv(1))
print(''.join(b.decode() for b in d))
finally:
sock.close()
I have the following problem: I want a sever to send the contents of a textfile
when requested to do so. I have writen a server script which sends the contents to the client and the client script which receives all the contents with a revcall loop. The recvall works fine when
I run the server and client from the same device for testing.
But when I run the server from a different device in the same wifi network to receive the textfile contents from the server device, the recvall doesn't work and I only receive the first 1460 bytes of the text.
server script
import socket
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind(("", 5000))
server.listen(5)
def send_file(client):
read_string = open("textfile", "rb").read() #6 kilobyte large textfile
client.send(read_string)
while True:
client, data = server.accept()
connect_data = client.recv(1024)
if connect_data == b"send_string":
send_file(client)
else:
pass
client script
import socket
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect(("192.168.1.10", 5000))
connect_message = client.send(b"send_string")
receive_data = ""
while True: # the recvall loop
receive_data_part = client.recv(1024).decode()
receive_data += receive_data_part
if len(receive_data_part) < 1024:
break
print(receive_data)
recv(1024) means to receive at least 1 and at most 1024 bytes. If the connection has closed, you receive 0 bytes, and if something goes wrong, you get an exception.
TCP is a stream of bytes. It doesn't try to keep the bytes from any given send together for the recv. When you make the call, if the TCP endpoint has some data, you get that data.
In client, you assume that anything less than 1024 bytes must be the last bit of data. Not so. You can receive partial buffers at any time. Its a bit subtle on the server side, but you make the same mistake there by assuming that you'll receive exactly the command b"send_string" in a single call.
You need some sort of a protocol that tells receivers when they've gotten the right amount of data for an action. There are many ways to do this, so I can't really give you the answer. But this is why there are protocols out there like zeromq, xmlrpc, http, etc...
I have a problem when recieving messages sent over an ssl socket. On rare occasions I lose the first few bytes of data in the message. I am pretty certain this somehow is a speed problem since it only seems to happen when 2 messages are sent in rapid succession (1-2 milliseconds apart). I am running the recieving code in a separate thread with minimal code dumping the messages in a queue as they arrive.
queue = Queue()
...
def read_feed(session_key, hostname, port, ssl_socket):
''' READ whatever is coming on the stream '''
while (1):
try:
output = ssl_socket.recv(2048) # Message size always < 2048
except (ConnectionResetError, OSError):
logger.info("Connecting feed")
try:
ssl_socket.connect((hostname, port))
except ValueError: # Something's wrong, disconnect and do a new round
ssl_socket.close()
else:
cmd = {"cmd":"login", "args":{"session_key":session_key}}
data = str.encode(json.dumps(cmd) + "\n")
num_bytes = ssl_socket.send(data)
else:
queue.put(output)
...
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
ssl_socket = ssl.wrap_socket(s)
...
t3 = threading.Thread(target=read_feed, name = 'Read Feed', args=(session_key, hostname, port, ssl_socket))
t3.start()
I was first suspecting that somehow the other threads running was stealing too much CPU time so that the network buffer was filled before this thread got a chance to run, but I have tried to use a multi core machine and the problem persists.
In essense this should be the only code running when I am connected?
while (1):
output = ssl_socket.recv(2048) # Message size always < 2048
queue.put(output)
Or am I making the wrong assumptions here? Maybe the try:/except: construct is costly, or is the queue.put method slow and I should use something else? Or maybe Python is not the right tool for the job?
Any suggestions on how to improve the code so that I don't lose those few precious first bytes?
I've read quite a few things and this still escapes me. I know how to do it when using raw sockets. The following works just fine, times out after 1 second if no data is received:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind((HOST, PORT))
sock.listen(1)
while 1:
conn, addr = sock.accept()
data = ''
conn.settimeout(1)
try:
while 1:
chunk = conn.recv(1024)
data += chunk
if not chunk:
break
print 'Read: %s' % data
conn.send(data.upper())
except (socket.timeout, socket.error, Exception) as e:
print(str(e))
finally:
conn.close()
print 'Done'
But when trying something similar when using SocketServer.TCPServer with SocketServer.BaseRequestHandler (not with SocketServer.StreamRequestHandler where I know how to set a timeout) it seems not as trivial. I didn't find a way to set a timeout for receiving the data. Consider this snippet (not complete code):
class MyTCPHandler(SocketServer.BaseRequestHandler):
def handle(self):
data = ''
while 1:
chunk = self.request.recv(1024)
data += chunk
if not chunk:
break
if __name__ == "__main__":
HOST, PORT = "0.0.0.0", 9987
SocketServer.TCPServer.allow_reuse_address = True
server = SocketServer.TCPServer((HOST, PORT), MyTCPHandler)
server.serve_forever()
Suppose the client sends only 10 bytes. The while loop runs once, chunk is not empty, so then executes self.request.recv() again but the client has no more data to send and recv() blocks indefinitely ...
I know I can implement a small protocol, check for terminating strings/chars, check message length etc., but I really want to implement a timeout as well for unforeseen circumstances (client "disappears" for example).
I'd like to set and also update a timeout, i.e. reset the timeout after every chunk, needed for slow clients (though that's a secondary issue at the moment).
Thanks in advance
You can do the same thing with SocketServer.BaseRequestHandler.request.settimeout() as you did with the raw socket.
eg:
class MyTCPHandler(SocketServer.BaseRequestHandler):
def handle(self):
self.request.settimeout(1)
...
In this case self.request.recv() will terminate if it takes longer than 1 second to complete.
class MyTCPHandler(SocketServer.BaseRequestHandler):
timeout=5
...
... will raise an exception (which serve_forever() will catch) and shut down the connection if 5 seconds pass without receiving data after calling recv(). Be careful, though; it'll also shut down your connection if you're sending data for more than 5 seconds as well.
This may be Python 3 specific, mind, but it works for me.
When I try to receive larger amounts of data it gets cut off and I have to press enter to get the rest of the data. At first I was able to increase it a little bit but it still won't receive all of it. As you can see I have increased the buffer on the conn.recv() but it still doesn't get all of the data. It cuts it off at a certain point. I have to press enter on my raw_input in order to receive the rest of the data. Is there anyway I can get all of the data at once? Here's the code.
port = 7777
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('0.0.0.0', port))
sock.listen(1)
print ("Listening on port: "+str(port))
while 1:
conn, sock_addr = sock.accept()
print "accepted connection from", sock_addr
while 1:
command = raw_input('shell> ')
conn.send(command)
data = conn.recv(8000)
if not data: break
print data,
conn.close()
TCP/IP is a stream-based protocol, not a message-based protocol. There's no guarantee that every send() call by one peer results in a single recv() call by the other peer receiving the exact data sent—it might receive the data piece-meal, split across multiple recv() calls, due to packet fragmentation.
You need to define your own message-based protocol on top of TCP in order to differentiate message boundaries. Then, to read a message, you continue to call recv() until you've read an entire message or an error occurs.
One simple way of sending a message is to prefix each message with its length. Then to read a message, you first read the length, then you read that many bytes. Here's how you might do that:
def send_msg(sock, msg):
# Prefix each message with a 4-byte length (network byte order)
msg = struct.pack('>I', len(msg)) + msg
sock.sendall(msg)
def recv_msg(sock):
# Read message length and unpack it into an integer
raw_msglen = recvall(sock, 4)
if not raw_msglen:
return None
msglen = struct.unpack('>I', raw_msglen)[0]
# Read the message data
return recvall(sock, msglen)
def recvall(sock, n):
# Helper function to recv n bytes or return None if EOF is hit
data = bytearray()
while len(data) < n:
packet = sock.recv(n - len(data))
if not packet:
return None
data.extend(packet)
return data
Then you can use the send_msg and recv_msg functions to send and receive whole messages, and they won't have any problems with packets being split or coalesced on the network level.
You can use it as: data = recvall(sock)
def recvall(sock):
BUFF_SIZE = 4096 # 4 KiB
data = b''
while True:
part = sock.recv(BUFF_SIZE)
data += part
if len(part) < BUFF_SIZE:
# either 0 or end of data
break
return data
The accepted answer is fine but it will be really slow with big files -string is an immutable class this means more objects are created every time you use the + sign, using list as a stack structure will be more efficient.
This should work better
while True:
chunk = s.recv(10000)
if not chunk:
break
fragments.append(chunk)
print "".join(fragments)
Most of the answers describe some sort of recvall() method. If your bottleneck when receiving data is creating the byte array in a for loop, I benchmarked three approaches of allocating the received data in the recvall() method:
Byte string method:
arr = b''
while len(arr) < msg_len:
arr += sock.recv(max_msg_size)
List method:
fragments = []
while True:
chunk = sock.recv(max_msg_size)
if not chunk:
break
fragments.append(chunk)
arr = b''.join(fragments)
Pre-allocated bytearray method:
arr = bytearray(msg_len)
pos = 0
while pos < msg_len:
arr[pos:pos+max_msg_size] = sock.recv(max_msg_size)
pos += max_msg_size
Results:
You may need to call conn.recv() multiple times to receive all the data. Calling it a single time is not guaranteed to bring in all the data that was sent, due to the fact that TCP streams don't maintain frame boundaries (i.e. they only work as a stream of raw bytes, not a structured stream of messages).
See this answer for another description of the issue.
Note that this means you need some way of knowing when you have received all of the data. If the sender will always send exactly 8000 bytes, you could count the number of bytes you have received so far and subtract that from 8000 to know how many are left to receive; if the data is variable-sized, there are various other methods that can be used, such as having the sender send a number-of-bytes header before sending the message, or if it's ASCII text that is being sent you could look for a newline or NUL character.
Disclaimer: There are very rare cases in which you really need to do this. If possible use an existing application layer protocol or define your own eg. precede each message with a fixed length integer indicating the length of data that follows or terminate each message with a '\n' character. (Adam Rosenfield's answer does a really good job at explaining that)
With that said, there is a way to read all of the data available on a socket. However, it is a bad idea to rely on this kind of communication as it introduces the risk of loosing data. Use this solution with extreme caution and only after reading the explanation below.
def recvall(sock):
BUFF_SIZE = 4096
data = bytearray()
while True:
packet = sock.recv(BUFF_SIZE)
if not packet: # Important!!
break
data.extend(packet)
return data
Now the if not packet: line is absolutely critical!
Many answers here suggested using a condition like if len(packet) < BUFF_SIZE: which is broken and will most likely cause you to close your connection prematurely and loose data. It wrongly assumes that one send on one end of a TCP socket corresponds to one receive of sent number of bytes on the other end. It does not. There is a very good chance that sock.recv(BUFF_SIZE) will return a chunk smaller than BUFF_SIZE even if there's still data waiting to be received. There is a good explanation of the issue here and here.
By using the above solution you are still risking data loss if the other end of the connection is writing data slower than you are reading. You may just simply consume all data on your end and exit when more is on the way. There are ways around it that require the use of concurrent programming, but that's another topic of its own.
A variation using a generator function (which I consider more pythonic):
def recvall(sock, buffer_size=4096):
buf = sock.recv(buffer_size)
while buf:
yield buf
if len(buf) < buffer_size: break
buf = sock.recv(buffer_size)
# ...
with socket.create_connection((host, port)) as sock:
sock.sendall(command)
response = b''.join(recvall(sock))
You can do it using Serialization
from socket import *
from json import dumps, loads
def recvall(conn):
data = ""
while True:
try:
data = conn.recv(1024)
return json.loads(data)
except ValueError:
continue
def sendall(conn):
conn.sendall(json.dumps(data))
NOTE: If you want to shara a file using code above you need to encode / decode it into base64
I think this question has been pretty well answered, but I just wanted to add a method using Python 3.8 and the new assignment expression (walrus operator) since it is stylistically simple.
import socket
host = "127.0.0.1"
port = 31337
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((host,port))
s.listen()
con, addr = s.accept()
msg_list = []
while (walrus_msg := con.recv(3)) != b'\r\n':
msg_list.append(walrus_msg)
print(msg_list)
In this case, 3 bytes are received from the socket and immediately assigned to walrus_msg. Once the socket receives a b'\r\n' it breaks the loop. walrus_msg are added to a msg_list and printed after the loop breaks. This script is basic but was tested and works with a telnet session.
NOTE: The parenthesis around the (walrus_msg := con.recv(3)) are needed. Without this, while walrus_msg := con.recv(3) != b'\r\n': evaluates walrus_msg to True instead of the actual data on the socket.
Modifying Adam Rosenfield's code:
import sys
def send_msg(sock, msg):
size_of_package = sys.getsizeof(msg)
package = str(size_of_package)+":"+ msg #Create our package size,":",message
sock.sendall(package)
def recv_msg(sock):
try:
header = sock.recv(2)#Magic, small number to begin with.
while ":" not in header:
header += sock.recv(2) #Keep looping, picking up two bytes each time
size_of_package, separator, message_fragment = header.partition(":")
message = sock.recv(int(size_of_package))
full_message = message_fragment + message
return full_message
except OverflowError:
return "OverflowError."
except:
print "Unexpected error:", sys.exc_info()[0]
raise
I would, however, heavily encourage using the original approach.
For anyone else who's looking for an answer in cases where you don't know the length of the packet prior.
Here's a simple solution that reads 4096 bytes at a time and stops when less than 4096 bytes were received. However, it will not work in cases where the total length of the packet received is exactly 4096 bytes - then it will call recv() again and hang.
def recvall(sock):
data = b''
bufsize = 4096
while True:
packet = sock.recv(bufsize)
data += packet
if len(packet) < bufsize:
break
return data
This code reads 1024*32(=32768) bytes in 32 iterations from the buffer which is received from Server in socket programming-python:
jsonString = bytearray()
for _ in range(32):
packet = clisocket.recv(1024)
if not packet:
break
jsonString.extend(packet)
Data resides in jsonString variable
Plain and simple:
data = b''
while True:
data_chunk = client_socket.recv(1024)
if data_chunk:
data+=data_chunk
else:
break