Zlib won't decompress python 2.7 - python

I am trying to move data from a server to a client via socket.
I try to compress data from the server and send it to the socket,
but when I try to decompress in the client I get this error:
zlib.error: Error -5 while decompressing data: incomplete or truncated stream
I think I know why, but I don't get why it happens.
Maybe it's because I try to decompress "Uncompressed" data,
because if the client got data it doesn't know if the data is compressed or not and trying to compress it now causes the error.
Maybe I'm completely wrong, but I don't know how to fix it, I need your help.
Client: gets the data (which is a string representing an image)
def room_client(port,ip):
roomC = socket.socket()
roomC.connect((ip, port))
while True:
print 'in while of client server'
#recv pict
#display
#send ack
img = ""
size = roomC.recv(1024)
roomC.sendall(size)
while len(img) < int(size):
data = roomC.recv(1024)
img += data
roomC.send("ACK")
to_pic = img.split('#')[0]
print to_pic
scrn = open("monitor_serv.png", "wb")
scrn.write(zlib.decompress(to_pic))
scrn.close()
Server: sending the image(screenshot)
def room_server(port):
#sends pictures
print 'in room_server '
roomS = socket.socket()
roomS.bind(('0.0.0.0',port))
roomS.listen(1)
client, addr = roomS.accept()
while True:
print 'in while of room server'
# take picture
# send picture
# recv ack
flag = True
img1 = ImageGrab.grab()
send = zlib.compress(img1.tobytes())
size = img1.size
send = send + "#" + str(size[0]) + "#"+ str(size[1]) + "#0#0"
client.sendall(str(len(send)))
print "0"
f = client.recv(1024)
print "A ", f
client.sendall(send)
g = client.recv(1024)
print "C ", g
while True:
if flag:
flag = False
img2 = ImageGrab.grab()
coordinates = equal(img1, img2)
cropped_image = img2.crop(coordinates)
else:
flag = True
img1 = ImageGrab.grab()
coordinates = equal(img1, img2)
cropped_image = img1.crop(coordinates)
if coordinates is not None:
size = cropped_image.size
send = zlib.compress(cropped_image.tobytes())
try:
send = send + "#" + str(size[0]) + "#" + \
str(size[1]) + "#" + str(coordinates[0]) + "#" + str(coordinates[1])
client.sendall(str(len(send)))
client.recv(1024)
client.sendall(send)
client.recv(1024)
except:
break

On the server, you're sending this:
send = zlib.compress(img1.tobytes())
size = img1.size
send = send + "#" + str(size[0]) + "#"+ str(size[1]) + "#0#0"
On the client, you're parsing it like this:
to_pic = img.split('#')[0]
print to_pic
scrn = open("monitor_serv.png", "wb")
scrn.write(zlib.decompress(to_pic))
There's going to be a # byte in almost all arbitrary compressed files. So your to_pic is going to be truncated at the first one. Which means zlib will almost always give you an error saying you've given it a truncated stream.
You need to come up with some other way to frame the data. Some options:
Instead of sending data#width#height#0#0 prefixed by the byte length of that string, you could send just data prefixed by its byte length, width, and height.
If you want to use # as a delimiter, you could escape any # bytes inside the actual image data, and then unescape on the other side. For example, you could replace('#', '##'), and then re.split on the first single # sign, and then replace('##', '#').
If you rsplit to pull off the last four #s instead of split to pull off the first one… it's a bit hacky, but it would work here, because none of the other fields could ever have an # in them, just the compressed image data field.
There are other issues with the protocol framing that you need to rethink, but they're all things that, when you're sending smallish files over localhost sockets, will only occasionally come up; this is the only one that's almost bound to come up almost every time. Unfortunately, that doesn't mean you don't need to fix the other ones; it just means they'll be harder to debug.
Meanwhile, there's another flaw in your design:
What you get back from ImageGrab.grab() (even before cropping) isn't a PNG image, it's raw PIL/Pillow Image data. You compress that on the server, uncompress it on the client, and save those bytes as a PNG file. You can't do that.
One option is to use Pillow on the client as well: create an Image object from the decompressed bytes, then tell it to save itself to a PNG file.
Another option is to have the server export to bytes in PNG format instead of giving you the raw bytes. There are two big advantages to this version: No need for PIL installed on the client side, and PNG data is already compressed so you can scrap all your zlib stuff and write much simpler code.

Related

Trying to send string variable via Python socket

I'm in a CTF competition and I'm stuck on a challenge where I have to retrieve a string from a socket, reverse it and get it back. The string changes too fast to do it manually. I'm able to get the string and reverse it but am failing at sending it back. I'm pretty sure I'm either trying to do something that's not possible or am just too inexperienced at Python/sockets/etc. to kung fu my way through.
Here's my code:
import socket
aliensocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
aliensocket.connect(('localhost', 10000))
aliensocket.send('GET_KEY'.encode())
key = aliensocket.recv(1024)
truncKey = str(key)[2:16]
revKey = truncKey[::-1]
print(truncKey)
print(revKey)
aliensocket.send(bytes(revKey.encode('UTF-8')))
print(aliensocket.recv(1024))
aliensocket.close()
And here is the output:
F9SIJINIK4DF7M
M7FD4KINIJIS9F
b'Server expects key to unlock or GET_KEY to retrieve the reversed key'
key is received as a byte string. The b'' wrapped around it when printed just indicates it is a byte string. It is not part of the string. .encode() turns a Unicode string into a byte string, but you can just mark a string as a byte string by prefixing with b.
Just do:
aliensocket.send(b'GET_KEY')
key = aliensocket.recv(1024)
revKey = truncKey[::-1]
print(truncKey) # or do truncKey.decode() if you don't want to see b''
print(revKey)
aliensocket.send(revKey)
data = ''
while True:
chunk = aliensocket.recv(1)
data +=chunk
if not chunk:
rev = data[::-1]
aliensocket.sendall(rev)
break

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 398: invalid start byte || book python for everyone

hey am trying to pull image from web server using socket programming in python while going through python for everyone book there is example in networked programming chapter i copied the code from example urljpeg.py
import socket
import time
#HOST = 'data.pr4e.org'
#PORT = 80
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
mysock.sendall(b'GET http://data.pr4e.org/cover3.jpg HTTP/1.0\r\n\r\n')
count = 0
picture = b""
while True:
data = mysock.recv(5120)
if len(data) < 1: break
# time .sleep(0.25)
count = count + len(data)
print( len(data),count)
picture = picture + data
mysock.close()
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())
# skip pasr the header and save the picture data
picture = picture[pos+4:]
fhand = open("stuff.jpg","wb")
fhand.write(picture)
fhand.close()
The error message indicates that you are trying to decode data which is not utf-8. So why is this happening? Let's take a step back and look at what the code is doing:
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
print("Header length ", pos)
print(picture[:pos].decode())
We're trying to find a sequence of \r\n\r\n, i.e. CR LF CR LF in the data. This would be the empty line that separates the HTTP header (which should be in ASCII, which is a subset of UTF-8) from the actual image data. Then we try to decode everything up to that point as a string. So why does it fail? The program conveniently prints the header length, and in the bit you posted earlier we could see that this was -1, which means that the picture.find call did not find anything! Why not? Well, look carefully at what the code actually does:
# look for the end of the header (2crlf)
pos = picture.find(b"r\n\r\n")
It should be looking for \r\n\r\n, but it is actually looking for r\n\r\n!

Weird Python output when printing

I'm very new to python and I'm trying to read pressure data from a Honeywell differential pressure sensor using the LibMPSSE library. I'm reading it from a Adafruit FT232H chip and I'm using python 2.7.6 on ubuntu Linux.
#!/usr/bin/env python
from mpsse import *
SIZE = 2 # 2 bytes MSB first
WCMD = "\x50" # Write start address command
RCMD = "\x51" # Read command
FOUT = "eeprom.txt" # Output file
try:
eeprom = MPSSE(I2C)
# print "%s initialized at %dHz (I2C)" % (eeprom.GetDescription(), eeprom.GetClock())
eeprom.Start()
eeprom.Write(WCMD)
# Send Start condition to i2c-slave (Pressure Sensor)
if eeprom.GetAck() == ACK:
# ACK received,resend START condition and set R/W bit to 1 for read
eeprom.Start()
eeprom.Write(RCMD)
if eeprom.GetAck() == ACK:
# ACK recieved, continue supply the clock to slave
data = eeprom.Read(SIZE)
eeprom.SendNacks()
eeprom.Read(1)
else:
raise Exception("Received read command NACK2!")
else:
raise Exception("Received write command NACK1!")
eeprom.Stop()
print(data)
eeprom.Close()
except Exception, e:
print "MPSSE failure:", e
According to the library, Read returns a string of size bytes and whenever I want to print the data, the only output I see is �����2��T_`ʋ�Q}�*/�eE�
. I've tried encoding with utf-8 and still no luck.
Python may print "weird" output when bytes/characters in the string (data in your case) contain non-printable characters. To "view" the content of the individual bytes (as integers or hex), do the following:
print(','.join(['{:d}'.format(x) for x in map(ord, data)])) # decimal
or
print(','.join(['{:02X}'.format(x) for x in map(ord, data)]))
Since the length of your data buffer is set by SIZE=2, to extract each byte from this buffer as an integer you can do the following:
hi, lo = map(ord, data)
# or:
hi = ord(data[0])
lo = ord(data[1])
To read more about ord and what it does - see https://docs.python.org/2/library/functions.html#ord

Python pySerial read data from arduino breaks when sending "(char)0"

I send some data from an arduino using pySerial.
My Data looks like
bytearray(DST, SRC, STATUS, TYPE, CHANNEL, DATA..., SIZEOFDATA)
where sizeofData is a test that all bytes are received.
The problem is, every time when a byte is zero, my python program just stops reading there:
serial_port = serial.Serial("/dev/ttyUSB0")
while serial_port.isOpen():
response_header_str = serial_port.readline()
format = '>';
format += ('B'*len(response_header_str));
response_header = struct.unpack(format, response_header_str)
pprint(response_header)
serial_port.close()
For example, when I send bytearray(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) everything is fine. But when I send something like bytearray(1,2,3,4,0,1,2,3,4) I don't see everything beginning with the zero.
The problem is that I cannot avoid sending zeros as I am just sending the "memory dump" e.g. when I send a float value, there might be zero bytes.
how can I tell pyserial not to ignore zero bytes.
I've looked through the source of PySerial and the problem is in PySerial's implementation of FileLike.readline (in http://svn.code.sf.net/p/pyserial/code/trunk/pyserial/serial/serialutil.py). The offending function is:
def readline(self, size=None, eol=LF):
"""\
Read a line which is terminated with end-of-line (eol) character
('\n' by default) or until timeout.
"""
leneol = len(eol)
line = bytearray()
while True:
c = self.read(1)
if c:
line += c
if line[-leneol:] == eol:
break
if size is not None and len(line) >= size:
break
else:
break
return bytes(line)
With the obvious problem being the if c: line. When c == b'\x00' this evaluates to false, and the routine breaks out of the read loop. The easiest thing to do would be to reimplement this yourself as something like:
def readline(port, size=None, eol="\n"):
"""\
Read a line which is terminated with end-of-line (eol) character
('\n' by default) or until timeout.
"""
leneol = len(eol)
line = bytearray()
while True:
line += port.read(1)
if line[-leneol:] == eol:
break
if size is not None and len(line) >= size:
break
return bytes(line)
To clarify from your comments, this is a replacement for the Serial.readline method that will consume null-bytes and add them to the returned string until it hits the eol character, which we define here as "\n".
An example of using the new method, with a file-object substituted for the socket:
>>> # Create some example data terminated by a newline containing nulls.
>>> handle = open("test.dat", "wb")
>>> handle.write(b"hell\x00o, w\x00rld\n")
>>> handle.close()
>>>
>>> # Use our readline method to read it back in.
>>> handle = open("test.dat", "rb")
>>> readline(handle)
'hell\x00o, w\x00rld\n'
Hopefully this makes a little more sense.

Content-Encoding: gzip + Transfer-Encoding: chunked with gzip/zlib gives incorrect header check

How do you manage chunked data with gzip encoding?
I have a server which sends data in the following manner:
HTTP/1.1 200 OK\r\n
...
Transfer-Encoding: chunked\r\n
Content-Encoding: gzip\r\n
\r\n
1f50\r\n\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xec}\xebr\xdb\xb8\xd2\xe0\xef\xb8\xea\xbc\x03\xa2\xcc\x17\xd9\xc7\xba\xfa\x1e\xc9r*\x93\xcbL\xf6\xcc\x9c\xcc7\xf1\x9c\xf9\xb6r\xb2.H ... L\x9aFs\xe7d\xe3\xff\x01\x00\x00\xff\xff\x03\x00H\x9c\xf6\xe93\x00\x01\x00\r\n0\r\n\r\n
I've had a few different approaches to this but there's something i'm forgetting here.
data = b''
depleted = False
while not depleted:
depleted = True
for fd, event in poller.poll(2.0):
depleted = False
if event == select.EPOLLIN:
tmp = sock.recv(8192)
data += zlib.decompress(tmp, 15 + 32)
Gives (also tried decoding only data after \r\n\r\n obv):
zlib.error: Error -3 while decompressing data: incorrect header check
So I figured the data should be decompressed once the data has been recieved in it's whole format..
...
if event == select.EPOLLIN:
data += sock.recv(8192)
data = zlib.decompress(data.split(b'\r\n\r\n',1)[1], 15 + 32)
Same error. Also tried decompressing data[:-7] because of the chunk ID at the very end of the data and with data[2:-7] and other various combinations, but with the same error.
I've also tried the gzip module via:
with gzip.GzipFile(fileobj=Bytes(data), 'rb') as fh:
fh.read()
But that gives me "Not a gzipped file".
Even after recording down the data as recieved by the servers (headers + data) down into a file, and then creating a server-socket on port 80 serving the data (again, as is) to the browser it renders perfectly so the data is intact.
I took this data, stripped off the headers (and nothing else) and tried gzip on the file:
Thanks to #mark-adler I produced the following code to un-chunk the chunked data:
unchunked = b''
pos = 0
while pos <= len(data):
chunkLen = int(binascii.hexlify(data[pos:pos+2]), 16)
unchunked += data[pos+2:pos+2+chunkLen]
pos += 2+len('\r\n')+chunkLen
with gzip.GzipFile(fileobj=BytesIO(data[:-7])) as fh:
data = fh.read()
This produces OSError: CRC check failed 0x70a18ee9 != 0x5666e236 which is one step closer. In short I clip the data according to these four parts:
<chunk length o' X bytes> \r\n <chunk> \r\n
I'm probably getting there, but not close enough.
Footnote: Yes, the socket is far from optimal, but it looks this way because i thought i didn't get all the data from the socket so i implemented a huge timeout and a attempt at a fail-safe with depleted :)
You can't split on \r\n since the compressed data may contain, and if long enough, certainly will contain that sequence. You need to dechunk first using the length provided (e.g. the first length 1f50) and feed the resulting chunks to decompress. The compressed data starts with the \x1f\x8b.
The chunking is hex number, crlf, chunk with that many bytes, crlf, hex number, crlf, chunk, crlf, ..., last chunk (of zero length), [possibly some headers], crlf.
#mark-adler gave me some good pointers on how the chunked mode in the HTML protocol works, besides this i fiddled around with different ways of unzipping the data.
You're supposed to stitch the chunks into one big heap
You're supposed to use gzip not zlib
You can only unzip the entire stitched chunks, doing it in parts will not work
Here's the solution for all three of the above problems:
unchunked = b''
pos = 0
while pos <= len(data):
chunkNumLen = data.find(b'\r\n', pos)-pos
# print('Chunk length found between:',(pos, pos+chunkNumLen))
chunkLen=int(data[pos:pos+chunkNumLen], 16)
# print('This is the chunk length:', chunkLen)
if chunkLen == 0:
# print('The length was 0, we have reached the end of all chunks')
break
chunk = data[pos+chunkNumLen+len('\r\n'):pos+chunkNumLen+len('\r\n')+chunkLen]
# print('This is the chunk (Skipping',pos+chunkNumLen+len('\r\n'),', grabing',len(chunk),'bytes):', [data[pos+chunkNumLen+len('\r\n'):pos+chunkNumLen+len('\r\n')+chunkLen]],'...',[data[pos+chunkNumLen+len('\r\n')+chunkLen:pos+chunkNumLen+len('\r\n')+chunkLen+4]])
unchunked += chunk
pos += chunkNumLen+len('\r\n')+chunkLen+len('\r\n')
with gzip.GzipFile(fileobj=BytesIO(unchunked)) as fh:
unzipped = fh.read()
return unzipped
I left the debug output in there but uncommented for a reason.
It was extremely useful even tho it looks like a mess to get what data you/i was actually trying to decompress and which parts was fetched where and which values each calculation brings fourth.
This code will walk through the chunked data with the following format:
<chunk length o' X bytes> \r\n <chunk> \r\n
Had to be careful when first of all extracting the X bytes as they came in 1f50 which i first had to use binascii.hexlify(data[0:4]) on before putting it into int(), not sure why i don't need that anymore because i needed it in order to get a length of ~8000 before but then it gave me a REALLY big number all of a sudden which was't logical even tho i didn't really give it any other data.. anyway.
After that it was just a matter of making sure the numbers were correct and then combine all the chunks into one hughe pile of gzip data and feed that into .GzipFile(...).
Edit 3 years later:
I'm aware that this was a client-side problem at first, but here's a server-side function to send a some what functional test:
def http_gzip(data):
compressed = gzip.compress(data)
# format(49, 'x') returns `31` which is `\x31` but without the `\x` notation.
# basically the same as `hex(49)` but ment for these kind of things.
return bytes(format(len(compressed), 'x')),'UTF-8') + b'\r\n' + compressed + b'\r\n0\r\n\r\n'

Categories

Resources