How to read Python socket recv - python

I'm attempting to send an HTTP Request to a website and read the data it returns. The first website I tried worked successfully. It returned about 4 packets of data and then returned a 0 packet which the script caught and terminated.
However, attempting to load http://www.google.com/ does not work this way. Instead, it returns about 10 packets of the same length, a final smaller packet, and then proceeds to time out. Is it normal for this to happen? Does it all just depend on what server the host is using?
If anyone could recommend an alternative way to reading with socket.recv() that would take into account that a final null packet is not always sent, it would be greatly appreciated. Thanks.
try:
data = s.recv(4096)
while True:
more = s.recv(4096)
print len(more)
if not more:
break
else:
data += more
except socket.timeout:
errMsg = "Connection timed-out while connecting to %s. Request headers were as follows: %s", (parsedUrl.netloc, rHeader.headerContent)
self.logger.exception(errMsg)
raise Exception

For HTTP, use requests rather than writing your own.
> ipython
In [1]: import requests
In [2]: r = requests.get('http://www.google.com')
In [3]: r.status_code
Out[3]: 200
In [4]: r.text[:80]
Out[4]: u'<!doctype html><html itemscope="itemscope" itemtype="http://schema.org/WebPage">'
In [5]: len(r.text)
Out[5]: 10969

TCP does not give you "packets", but sequential bytes sent from the other side. It is a stream. recv() gives you chunks of that stream that are currently available. You stitch them back together and parse the stream content.
HTTP is rather involved protocol to work out by hand, so you probably want to start with some existing library like httplib instead.

It could be that Google uses Keep-Alive to keep the socket open in order to serve a further request. This would require parsing of the header and reading the exact number of bytes.
Depending on which version of HTTP you use, you have to add Connection: Keep-Alive to your headers or not. (This might be the simplest solution: just use HTTP/1.0 instead of 1.1.)
If you use that feature nevertheless, you would have to receive your first chunk of data and
parse if there is a '\r\nContent-Length: ' inside, and if so, take the bytes between that and the next '\r\n' and convert them to a number. That is your size.
Have a look if you have a '\r\n\r\n' in your data. If so, that is the end of your header. From here, you must read the exact number of bytes mentionned above.
Example:
import socket
s = socket.create_connection(('www.google.com', 80))
s.send("GET / HTTP/1.1\r\n\r\n")
x = s.recv(10000)
poscl = x.lower().find('\r\ncontent-length: ')
poseoh = x.find('\r\n\r\n')
if poscl < poseoh and poscl >= 0 and poseoh >= 0:
# found CL header
poseocl = x.find('\r\n',poscl+17)
cl = int(x[poscl+17:poseocl])
realdata = x[poseoh+4:]
Now, you have the content length in cl and the (start of the) payload data in realdata. The number of bytes missing of this request is missing = cl - len(realdata). If it is 0, you've got everything; if not, do s.read(missing) and recalculate missing until it is 0.
The code above is a simppe start of the job to be done; there are some places where you might need to recv() further before you can proceed.
This is quite compliated. By far easier ways would be
to use HTTP 1.1's Connection: close header in the request,
to use HTTP 1.0,
to use one of the libraries crafted for this task and not to reinvent the wheel.

Related

Python Sockets - Data is being sent way too fast

I'm using python sockets to send data across however whenever I'm sending data to the client, it seems to miss my data unless I'm debugging (which allows me to pause execution when needed).
Server snippet:
def send_file(client_socket: socket):
with open('client.py', 'rb') as file:
while True:
read_data = file.read()
client_socket.sendall(read_data)
if not read_data:
client_socket.sendall('End'.encode())
break
print('Finished')
The server reports that it has finished and sent the 'end' message, but my client seems to be hanging on listening for too long, even though I thought adding a end message would help.
Client Snippet:
with open('test.txt', 'wb') as file:
while True:
received_bytes = sock.recv(BUFFER_SIZE)
if received_bytes == b'End':
break
file.write(received_bytes)
# TODO: Restart client program
What am I doing wrong here?
received_bytes = sock.recv(BUFFER_SIZE)
Does read BUFFER_SIZE or less bytes, depending on BUFFER_SIZE and message you send End might become part of received_bytes rather than whole received_bytes or be cut between subsequent received_bytes. Consider following example, let say your message is HELLOWORLD thus you do send HELLOWORLDEnd and BUFFER_SIZE equals 3 therefore received_bytes are subsequently
HEL
LOW
ORL
DEn
d
Thus last is clearly not End. You do not need special way of informing about end, see part of socket Example from docs:
while True:
data = conn.recv(1024)
if not data: break
conn.sendall(data)
In your case this means not sending End and replacing
if received_bytes == b'End':
using
if not received_bytes:
EDIT after comment server cannot send an empty message and so the client is still listening for data even though it has well sufficiently been sent
If you must use end of message marker AT ANY PRICE, then consider using single byte for that purpose, which never appears in your message. For example if you elect \x00 for this purpose, you might then do
received_bytes = sock.recv(BUFFER_SIZE)
file.write(received_bytes.rstrip(b'\x00'))
if received_bytes.endswith(b'\x00'):
break
.endswith and .rstrip methods work for bytes same way like for strs. Thus this code does write received_bytes sans trailing \x00 (note that .rstrip does not modify received_bytes) then if received_bytes endswith said byte break.

Python file transfer (tcp socket), problem with slow network

I setted up a secure socket using Tor and socks, but i'm facing a problem when sending large amount of data
Sender:
socket.send(message.encode())
Receiver:
chunks = []
while 1:
part = connection.recv(4096)
chunks.append(part.decode())
if len(part) < 4096:
break
response = "".join(chunks)
Since the network speed is not consistent in a loop i don't always fill the 4096b buffer, so the loop breaks and i don't receive the full data.
Lowering the buffer size doesn't seem an option because the "packet" size can be as low as 20b sometimes
TCP can split your package data in any amount of pieces it wants. So you should never rely on other end of a socket on the size of the packet received. You have to invent another mechanism for detecting end of message/end of file.
If you are going to send only one blob and close socket, then on server side you just read until you get False value:
while True:
data = sock.recv(1024)
if data:
print(data)
# continue
else:
sock.close()
break
If you are going to send multiple messages, you have to decide, what will be the separator between them. For text protocols it is a good idea to use lineending. You can then enjoy the power of Twisted LineReceiver protocol and others.
If you are doing a binary protocol, it's a common practice to preface your each message with size byte/word/dword.
Try using structure to pass the length of the incoming data first to the receiver, "import struct". That way the receiving end knows exactly how much data to receive. In this example bytes are being sent over the socket, the examples here I've borrowed from my github upload github.com/nsk89/netcrypt for reference and cut out the encryption steps from the send function as well as it sending a serialised dictionary.
Edit I should also clarify that when you send data over the socket especially if your sending multiple messages they all sit in the stream as one long message. Not every message is 4096 bytes in length. If one is 2048 in length and the next 4096 and you receive 4096 on your buffers you'll receive the first message plus half of the next message or completely hang waiting for more data that doesn't exist.
data_to_send = struct.pack('>I', len(data_to_send)) + data_to_send # pack the length of data in the first four bytes of data stream, >I indicates internet byte order
socket_object.sendall(data_to_send) # transport data
def recv_message(socket_object):
raw_msg_length = recv_all(socket_object, 4) # receive first 4 bytes of data in stream
if not raw_msg_length:
return None
# unpack first 4 bytes using network byte order to retrieve incoming message length
msg_length = struct.unpack('>I', raw_msg_length)[0]
return recv_all(socket_object, msg_length) # recv rest of stream up to message length
def recv_all(socket_object, num_bytes):
data = b''
while len(data) < num_bytes: # while amount of data recv is less than message length passed
packet = socket_object.recv(num_bytes - len(data)) # recv remaining bytes/message
if not packet:
return None
data += packet
return data
By the way, no need to decode the every part before combine them to a chunk, combine all the parts to a chunk and then decode the chunk.
For your situation, the better way is using 2 steps.
Step1: sender send the size of the message, receiver take this size and ready to receive the message.
Step2: sender send the message, receiver combine the data if necessary.
Sender
# Step 1
socket.send( str(len(message.encode())).encode() )
# Step 2
socket.send(message.encode("utf-8"))
Receiver
# Step 1
message_size = connection.recv(1024)
print("Will receive message size:",message_size.decode())
# Step 2
recevied_size = 0
recevied_data = b''
while recevied_size < int(message_size.decode()):
part = connection.recv(1024)
recevied_size += len(part)
recevied_data += part
else:
print(recevied_data.decode("utf-8", "ignore"))
print("message receive done ....",recevied_size)

TCP socket reads out of turn

I am using TCP with Python sockets, transfering data from one computer to another. However the recv command reads more than it should in the serverside, I could not find the issue.
client.py
while rval:
image_string = frame.tostring()
sock.sendall(image_string)
rval, frame = vc.read()
server.py
while True:
image_string = ""
while len(image_string) < message_size:
data = conn.recv(message_size)
image_string += data
The length of the message is 921600 (message_size) so it is sent with sendall, however when recieved, when I print the length of the arrived messages, the lengths are sometimes wrong, and sometimes correct.
921600
921600
921923 # wrong
922601 # wrong
921682 # wrong
921600
921600
921780 # wrong
As you see, the wrong arrivals have no pattern. As I use TCP, I expected more consistency, however it seems the buffers are mixed up and somehow recieving a part of the next message, therefore producing a longer message. What is the issue here ?
I tried to add just the relevant part of the code, I can add more if you wish, but the code performs well on localhost but fails on two computers, so there should be no errors besides the transmitting part.
Edit1: I inspected this question a bit, it mentions that all send commands in the client may not be recieved by a single recv in the server, but I could not understand how to apply this to practice.
TCP is a stream protocol. There is ABSOLUTELY NO CONNECTION between the sizes of the chunks of data you send, and the chunks of data you receive. If you want to receive data of a known size, it's entirely up to you to only request that much data: you're currently requesting the total length of the data each time, which is going to try to read too much except in the unlikely event of the entire data being retrieved by the first .recv() call. Basically, you need to do something like data = conn.recv(message_size - len(image_string)) to reflect the fact that the amount of remaining data is decreasing.
Think of TCP as a raw stream of bytes. It is your responsibility to track where you are in the stream and interpret it correctly. Buffer what you read and only extract what you currently need.
Here's an (untested) class to illustrate:
class Buffer:
def __init__(self,socket):
self.socket = socket
self.buffer = b''
def recv_exactly(self,count):
# Could return less if socket closes early...
while len(self.buffer) < count:
data = self.socket.recv(4096)
if not data: break
self.buffer += data
ret,self.buffer = self.buffer[:count],self.buffer[count:]
return ret
The recv always requests the same amount of data and queues it in a buffer. recv_exactly only returns the number of bytes requested and leaves any extra in the buffer.

Raw load found, how to access?

To start off, I have read through other raw answers pertaining to scapy on here, however none have been useful, maybe I am just doing something wrong and thats what has brought me here today.
So, for starters, I have a pcap file, which started corrupted with some retransmissions, to my belief I have gotten it back to gether correctly.
It contains Radiotap header, IEEE 802.11 (dot11), logical-link control, IPv4, UDP, and DNS.
To my understanding, the udp packets being transmitted hold this raw data, however, do to a some recent quirks, maybe the raw is in Radiotap/raw.
Using scapy, I'm iterating through the packets, and when a packet with the Raw layer is found, I am using the .show() function of scapy to view it.
As such, I can see that there is a raw load available
###[ Raw ]###
\load \
|###[ Raw ]###
| load = '#\x00\x00\x00\xff\xff\xff\xff\xff\xff\x10h?'
So, I suppose my question is, how can I capture this payload to receive whatever this may be, To my knowledge the load is supposed to be an image file, however I have trouble believing such, so I assume I have misstepped somewhere.
Here is the code I'm using to achieve the above result
from scapy.all import *
from scapy.utils import *
pack = rdpcap('/home/username/Downloads/new.pcap')
for packet in pack:
if packet.getlayer(Raw):
print '[+] Found Raw' + '\n'
l = packet.getlayer(Raw)
rawr = Raw(l)
rawr.show()
Any help, or insight for further reading would be appreciated, I am new to scapy and no expert in packet dissection.
*Side note, previously I had tried (using separate code and server) to replay the packets and send them to myself, to no avail. However I feel thats due to my lack of knowledge in receipt of UDP packets.
UPDATES - I have now tested my pcap file with a scapy reassembler, and I've confirmed I have no fragmented packets, or anything of the sort, so I assume all should go smoothly...
Upon opening my pcap in wireshark, I can see that there are retransmissions, but I'm not sure how much that will affect my goals since no fragmentation occurred?
Also, I have tried the getlayer(Raw).load, if I use print on it I get some gibberish to the screen, I'm assuming its the data to my would-be-image, however I need to now get it into a usable format.
You can do:
data = packet[Raw].load
You should be able to access the field in this way:
l = packet.getlayer(Raw).load
Using Scapy’s interactive shell I was successful doing this:
pcap = rdpcap('sniffed_packets.pcap')
s = pcap.sessions()
for key, value in s.iteritems():
# Looking for telnet sessions
if ':23' in key:
for v in value:
try:
v.getlayer(Raw).load
except AttributeError:
pass
If you are trying to get the load part of the packet only, you can try :
def handle_pkt(pkt):
if TCP in pkt and pkt[TCP].dport == 5201:
#print("got a packet")
print(pkt[IP])
load_part = pkt[IP].load
print("Load#",load_part)
pkt.show2()
sys.stdout.flush()

Read specific bytes using urlopen()

I want to read specific bytes from a remote file using a python module. I am using urllib2. Specific bytes in the sense bytes in the form of Offset,Size. I know we can read X number of bytes from a remote file using urlopen(link).read(X). Is there any way so that I can read data which starts from Offset of length Size.?
def readSpecificBytes(link,Offset,size):
# code to be written
This will work with many servers (Apache, etc.), but doesn't always work, esp. not with dynamic content like CGI (*.php, *.cgi, etc.):
import urllib2
def get_part_of_url(link, start_byte, end_byte):
req = urllib2.Request(link)
req.add_header('Range', 'bytes=' + str(start_byte) + '-' + str(end_byte))
resp = urllib2.urlopen(req)
content = resp.read()
Note that this approach means that the server never has to send and you never download the data you don't need/want, which could save tons of bandwidth if you only want a small amount of data from a large file.
When it doesn't work, just read the first set of bytes before the rest.
See Wikipedia Article on HTTP headers for more details.
Unfortunately the file-like object returned by urllib2.urlopen() doesn't actually have a seek() method. You will need to work around this by doing something like this:
def readSpecificBytes(link,Offset,size):
f = urllib2.urlopen(link)
if Offset > 0:
f.read(Offset)
return f.read(size)

Categories

Resources