How to use python-trio with google protocol buffer? - python

I am trying to read some data streams using protobuf in python, and i want to use trio to make the client for reading the streams. The protobuf has some method calls, and i find they do not work when i use trio streams.
Python client on a linux machine.
import DTCProtocol_pb2 as Dtc
async def parent(addr, encoding, heartbeat_interval):
print(f"parent: connecting to 127.0.0.1:{addr[1]}")
client_stream = await trio.open_tcp_stream(addr[0], addr[1])
# encoding request
print("parent: spawing encoding request ...")
enc_req = create_enc_req(encoding) # construct encoding request
await send_message(enc_req, Dtc.ENCODING_REQUEST,client_stream, 'encoding request') # send encoding request
log.debug('get_reponse: started')
response = await client_stream.receive_some(1024)
m_size = struct.unpack_from('<H', response[:2]) # the size of message
m_type = struct.unpack_from('<H', response[2:4]) # the type of the message
m_body = response[4:]
m_resp = Dtc.EncodingResponse()
m_body would be some bytes data, which I dont know how to decode. Dtc.EncodingResponse() is the protobuf method which would give a Dtc object which contains the response in a readable format. (Dtc is the protobuf file). But I get nothing here. When I did this script without trio, Dtc.EncodingResponse() would give the full response in readable format.
I am guessing the problem is that the "client_stream" is a trio stream object that only reads bytes, and so I probably need to use a ReceiveChannel object instead. But if this is true, I dont know how to do this.
UPDATE:
The answer below by Nathaniel J. Smith solves my problem.
m_resp = Dtc.EncodingResponse()
m_resp.ParseFromString(m_body)
I feel so silly, but I did not ParseFromString the data previously, and that was all it took. Extremely grateful to all who gave replies. Hope this helps someone out there.

Like #shmee said in the comment, I think your code got mangled some by the edits... you should double-check.
When I did this script without trio, Dtc.EncodingResponse() would give the full response in readable format
I think you might have dropped a line when switching to Trio? Dtc.EncodingResponse() just creates a new empty EncodingResponse object. If you want to parse the data from m_body into your new object, you have to do that explicitly, with something like:
m_resp = Dtc.EncodingResponse()
m_resp.ParseFromString(m_body)
However, there's another problem... the reason it's called receive_some is that it receives some bytes, but might not receive all the bytes you asked for. Your code is assuming that a single call to receive_some will fetch all the bytes in the response, and that might be true when you're doing simple test, but in general it's not guaranteed. If you don't get enough data on the first call to receive_some, you might need to keep calling it repeatedly until you get all the data.
This is actually very standard... sockets work the same way. That's why the first thing your server is sending an m_size field at the beginning – it's so you can tell whether you've gotten all the data or not!
Unfortunately, as of June 2019, Trio doesn't provide a helper to do this loop for you – you can track progress on that in this issue. In the mean time, it's possible to write your own. I think something like this should work:
async def receive_exactly(stream, count):
buf = bytearray()
while len(buf) < count:
new_data = await stream.receive_some(count - len(buf))
if not new_data:
raise RuntimeError("other side closed the connection unexpectedly")
buf += new data
return buf
async def receive_encoding_response(stream):
header = await receive_exactly(stream, 4)
(m_size, m_type) = struct.unpack('<HH', header)
m_body = await receive_exactly(stream, m_size)
m_resp = Dtc.EncodingResponse()
m_resp.ParseFromString(m_size)
return m_resp

Related

TCP socket reads out of turn

I am using TCP with Python sockets, transfering data from one computer to another. However the recv command reads more than it should in the serverside, I could not find the issue.
client.py
while rval:
image_string = frame.tostring()
sock.sendall(image_string)
rval, frame = vc.read()
server.py
while True:
image_string = ""
while len(image_string) < message_size:
data = conn.recv(message_size)
image_string += data
The length of the message is 921600 (message_size) so it is sent with sendall, however when recieved, when I print the length of the arrived messages, the lengths are sometimes wrong, and sometimes correct.
921600
921600
921923 # wrong
922601 # wrong
921682 # wrong
921600
921600
921780 # wrong
As you see, the wrong arrivals have no pattern. As I use TCP, I expected more consistency, however it seems the buffers are mixed up and somehow recieving a part of the next message, therefore producing a longer message. What is the issue here ?
I tried to add just the relevant part of the code, I can add more if you wish, but the code performs well on localhost but fails on two computers, so there should be no errors besides the transmitting part.
Edit1: I inspected this question a bit, it mentions that all send commands in the client may not be recieved by a single recv in the server, but I could not understand how to apply this to practice.
TCP is a stream protocol. There is ABSOLUTELY NO CONNECTION between the sizes of the chunks of data you send, and the chunks of data you receive. If you want to receive data of a known size, it's entirely up to you to only request that much data: you're currently requesting the total length of the data each time, which is going to try to read too much except in the unlikely event of the entire data being retrieved by the first .recv() call. Basically, you need to do something like data = conn.recv(message_size - len(image_string)) to reflect the fact that the amount of remaining data is decreasing.
Think of TCP as a raw stream of bytes. It is your responsibility to track where you are in the stream and interpret it correctly. Buffer what you read and only extract what you currently need.
Here's an (untested) class to illustrate:
class Buffer:
def __init__(self,socket):
self.socket = socket
self.buffer = b''
def recv_exactly(self,count):
# Could return less if socket closes early...
while len(self.buffer) < count:
data = self.socket.recv(4096)
if not data: break
self.buffer += data
ret,self.buffer = self.buffer[:count],self.buffer[count:]
return ret
The recv always requests the same amount of data and queues it in a buffer. recv_exactly only returns the number of bytes requested and leaves any extra in the buffer.

Read specific bytes using urlopen()

I want to read specific bytes from a remote file using a python module. I am using urllib2. Specific bytes in the sense bytes in the form of Offset,Size. I know we can read X number of bytes from a remote file using urlopen(link).read(X). Is there any way so that I can read data which starts from Offset of length Size.?
def readSpecificBytes(link,Offset,size):
# code to be written
This will work with many servers (Apache, etc.), but doesn't always work, esp. not with dynamic content like CGI (*.php, *.cgi, etc.):
import urllib2
def get_part_of_url(link, start_byte, end_byte):
req = urllib2.Request(link)
req.add_header('Range', 'bytes=' + str(start_byte) + '-' + str(end_byte))
resp = urllib2.urlopen(req)
content = resp.read()
Note that this approach means that the server never has to send and you never download the data you don't need/want, which could save tons of bandwidth if you only want a small amount of data from a large file.
When it doesn't work, just read the first set of bytes before the rest.
See Wikipedia Article on HTTP headers for more details.
Unfortunately the file-like object returned by urllib2.urlopen() doesn't actually have a seek() method. You will need to work around this by doing something like this:
def readSpecificBytes(link,Offset,size):
f = urllib2.urlopen(link)
if Offset > 0:
f.read(Offset)
return f.read(size)

Sending a Dictionary using Sockets in Python?

My problem: Ok, I made a little chat program thing where I am basically using sockets in order to send messages over a network.
It works great, but when I decided to take it a step further, I ran into a problem.
I decided to add some encryption to the strings I was sending over the network, and so I went ahead and wrote the script that did that.
The problem is that apparently you can't just send a dictionary through sockets as you might with a string.
I did some research first, and I found this stuff about Pickles. Unfortunately, I couldn't find out exactly how I could use them to convert strings, aside from having it exporting the dictionary to a file, but I can't do that without changing my program.
Can anyone help explain how I am to do this? I've looked around everywhere but I can't seem to find out how.
I've uploaded what I've got so far here, if that comes of any interest to anybody.
print("\n\t\t Fill out the following fields:")
HOST = input("\nNet Send Server Public IP: ")
PORT = int(input("\nNet Send Server Port: "))
#------------------------------------------------
#Assessing Validity of Connection
#------------------------------------------------
try:
s = socket(AF_INET,SOCK_STREAM)
s.connect((HOST,PORT))
print("Connected to server:",HOST,)
except IOError:
print("\n\n\a\t\tUndefined Connection Error Encountered")
input("Press Enter to exit, then restart the script")
sys.exit()
#-------------------------------------------------
#Now Sending and recieving mesages
#-------------------------------------------------
i = True
while i is True:
try:
User_input = input("\n Enter your message: ")
Lower_Case_Conversion = User_input.lower()
#Tdirectory just stores the translated letters
Tdirectory = []
# x is zero so that it translates the first letter first, evidently
x = 0
COUNTLIMIT = len(Lower_Case_Conversion)
while x < COUNTLIMIT:
for letter in Lower_Case_Conversion[x]:
if letter in TRvalues:
Tdirectory += [TRvalues[Lower_Case_Conversion[x]]]
x = x + 1
message = input('Send: ')
s.send(message.encode())
print("\n\t\tAwaiting reply from: ",HOST,)
reply = s.recv(1024)
print(HOST,"\n : ",reply)
except IOError:
print("\n\t\aIOError Detected, connection most likely lost.")
input("\n\nPress Enter to exit, then restart the script")
Oh, and if your wondering what TRvalues is. It's the dictionary that contains the 'translations' for encrypting simple messages.
try:
TRvalues = {}
with open(r"C:\Users\Owatch\Documents\Python\FunStuff\nsed.txt", newline="") as f:
reader = csv.reader(f, delimiter=" ")
TRvalues = dict(reader)
(The translations are held in a .txt it imports)
You have to serialize your data. there would be many ways to do it, but json and pickle will be the likely way to go for they being in standard library.
for json :
import json
data_string = json.dumps(data) #data serialized
data_loaded = json.loads(data) #data loaded
for pickle(or its faster sibling cPickle):
import cPickle as pickle
data_string = pickle.dumps(data, -1)
#data serialized. -1, which is an optional argument, is there to pick best the pickling protocol
data_loaded = pickle.loads(data) #data loaded.
also, please don't write
i= True
while i is True:
#do_something
because simple while True: would suffice.
You need to serialize your data first. There are several ways to do this, the most common probably JSON, XML and (python specific) pickles. Or your own custom serialization.
The basic idea is: Serialize your data, send it, receive it, deserialize it again.
If you want to use pickle you can use the loads and dumps functions.
import pickle
a_dict = { x:str(x) for x in range(5) }
serialized_dict = pickle.dumps(a_dict)
# Send it through the socket and on the receiving end:
a_dict = pickle.loads(the_received_string)
You can also use JSON in a similar fashion. I like JSON because it is human readable and isn't python specific.
import json
a_dict = { x:str(x) for x in range(5) }
serialized_dict = json.dumps(a_dict)
# Send it through the socket and on the receiving end:
a_dict = json.loads(the_received_string)
You can use pickle and python remote object (or pyro only), to send full objects and data over networks (Internet included). For instance, if you want send object (dict, list, class, objects, etc. ) use python remote objects for it.
It very useful for you want to do.
There is more information in this link http://pythonhosted.org/Pyro4/
And this starter manual can be useful to know what you send or execute on network pcs http://pythonhosted.org/Pyro4/intro.html#simple-example
I hope it will help you
Using JSON to serialize your data is the way I prefer to do it. I actually made a library that does just that for you: jsonsocket library. It will do the serialization/deserialization automatically for you. It also handles big amounts of data efficiently.
You can also use zmqObjectExchanger (https://github.com/ZdenekM/zmq_object_exchanger). It wraps pickle and zmq to transfer python objects over network.

How to read Python socket recv

I'm attempting to send an HTTP Request to a website and read the data it returns. The first website I tried worked successfully. It returned about 4 packets of data and then returned a 0 packet which the script caught and terminated.
However, attempting to load http://www.google.com/ does not work this way. Instead, it returns about 10 packets of the same length, a final smaller packet, and then proceeds to time out. Is it normal for this to happen? Does it all just depend on what server the host is using?
If anyone could recommend an alternative way to reading with socket.recv() that would take into account that a final null packet is not always sent, it would be greatly appreciated. Thanks.
try:
data = s.recv(4096)
while True:
more = s.recv(4096)
print len(more)
if not more:
break
else:
data += more
except socket.timeout:
errMsg = "Connection timed-out while connecting to %s. Request headers were as follows: %s", (parsedUrl.netloc, rHeader.headerContent)
self.logger.exception(errMsg)
raise Exception
For HTTP, use requests rather than writing your own.
> ipython
In [1]: import requests
In [2]: r = requests.get('http://www.google.com')
In [3]: r.status_code
Out[3]: 200
In [4]: r.text[:80]
Out[4]: u'<!doctype html><html itemscope="itemscope" itemtype="http://schema.org/WebPage">'
In [5]: len(r.text)
Out[5]: 10969
TCP does not give you "packets", but sequential bytes sent from the other side. It is a stream. recv() gives you chunks of that stream that are currently available. You stitch them back together and parse the stream content.
HTTP is rather involved protocol to work out by hand, so you probably want to start with some existing library like httplib instead.
It could be that Google uses Keep-Alive to keep the socket open in order to serve a further request. This would require parsing of the header and reading the exact number of bytes.
Depending on which version of HTTP you use, you have to add Connection: Keep-Alive to your headers or not. (This might be the simplest solution: just use HTTP/1.0 instead of 1.1.)
If you use that feature nevertheless, you would have to receive your first chunk of data and
parse if there is a '\r\nContent-Length: ' inside, and if so, take the bytes between that and the next '\r\n' and convert them to a number. That is your size.
Have a look if you have a '\r\n\r\n' in your data. If so, that is the end of your header. From here, you must read the exact number of bytes mentionned above.
Example:
import socket
s = socket.create_connection(('www.google.com', 80))
s.send("GET / HTTP/1.1\r\n\r\n")
x = s.recv(10000)
poscl = x.lower().find('\r\ncontent-length: ')
poseoh = x.find('\r\n\r\n')
if poscl < poseoh and poscl >= 0 and poseoh >= 0:
# found CL header
poseocl = x.find('\r\n',poscl+17)
cl = int(x[poscl+17:poseocl])
realdata = x[poseoh+4:]
Now, you have the content length in cl and the (start of the) payload data in realdata. The number of bytes missing of this request is missing = cl - len(realdata). If it is 0, you've got everything; if not, do s.read(missing) and recalculate missing until it is 0.
The code above is a simppe start of the job to be done; there are some places where you might need to recv() further before you can proceed.
This is quite compliated. By far easier ways would be
to use HTTP 1.1's Connection: close header in the request,
to use HTTP 1.0,
to use one of the libraries crafted for this task and not to reinvent the wheel.

Facing issue while writing data into a file in python script

I have to write 7231 bytes into a file using python script. In a client-server program, my python script act like client and it received 7231 bytes from server. If I check in TCP-Dump, its shows complete data. But when I try to write into a file; I am missing the content.
My script:
def SendOnce(self, req='/gpsData=1',method="GET"):
conn = httplib.HTTPConnection(self.proxy)
self.Logresponse("\nConnection Open\n<br />")
conn.request(method,req)
Log="\nRequest Send: %s\n<br \>\n" %req
self.Logresponse(Log)
response = conn.getresponse()
Log = "\nResponse Code: %s\n<br \>\n" %response.status
self.Logresponse(Log)
Log = "\nSarav -- Get Header: %s \n version= %s <br \>\n" %(response.msg,response.version)
self.Logresponse(Log)
if (response.status==200):
Log = response.read()
self.Logresponse(Log)
conn.close()
self.Logresponse("\nConnection Close\n<br \>")
return response
this "self.Logresponse(Log)" is writing into file. If i receive 1023 bytes, its writing full content into that. Please help me out how to write complete data.
Note: I am writing Hexa Format data.
First of all, 7231 bytes is not exactly huge...
With the limited info you gave, I would guess that you might have forgotten to take the OS's write buffer into account. You probably try to read the file before all the content was written to it.
Python generally uses the system's standard buffer (you can change that). You can decrease that buffer, or force a flush yourself.
I'm just guessing, it might be that the .read() function doesn't return all data in one chunk; can you try to modify the inner part like this:
if (response.status==200):
while 1:
Log = response.read()
if not Log:
break
self.Logresponse(Log)

Categories

Resources