How to gzip a bytearray in Python? - python

I have binary data inside a bytearray that I would like to gzip first and then post via requests. I found out how to gzip a file but couldn't find it out for a bytearray. So, how can I gzip a bytearray via Python?

Have a look at the zlib-module of Python.
Python 3: zlib-module
A short example:
import zlib
compressed_data = zlib.compress(my_bytearray)
You can decompress the data again by:
decompressed_byte_data = zlib.decompress(compressed_data)
Python 2: zlib-module
A short example:
import zlib
compressed_data = zlib.compress(my_string)
You can decompress the data again by:
decompressed_string = zlib.decompress(compressed_data)
As you can see, Python 3 uses bytearrays while Python 2 uses strings.

In case the bytearray is not too large to be stored in memory more than once and known as b, you can just:
b_gz = str(b).encode('zlib')
If you need to do deocding first, have a look at the decode() method of the bytearray.

The zlib module of Python Standard Library should meet your requirements :
>>> import zlib
>>> a = b'abcdefghijklmn' * 10
>>> ca = zlib.compress(a)
>>> len(a)
140
>>> len(ca)
25
>>> b = zlib.decompress(ca)
>>> b == a
True
>>> b
b'abcdefghijklmnabcdefghijklmnabcdefghijklmnabcdefghijklmnabcdefghijklmnabcdefghijklmnabcdefghijklmnabcdefghijklmnabcdefghijklmnabcdefghijklmn'
This is the output under Python3.4, but it works same under Python 2.7 -

import zlib
import binascii
def compress_packet(packet):
return zlib.compress(buffer(packet),1)
def decompress_packet(compressed_packet):
return zlib.decompress(compressed_packet)
def demo_zlib() :
packet1 = bytearray()
packet1.append(0x41)
packet1.append(0x42)
packet1.append(0x43)
packet1.append(0x44)
print "before compression: packet:{0}".format(binascii.hexlify(packet1))
cpacket1 = compress_packet(packet1)
print "after compression: packet:{0}".format(binascii.hexlify(cpacket1))
print "before decompression: packet:{0}".format(binascii.hexlify(cpacket1))
dpacket1 = decompress_packet(buffer(cpacket1))
print "after decompression: packet:{0}".format(binascii.hexlify(dpacket1))
def main() :
demo_zlib()
if __name__ == '__main__' :
main()
This should do. The zlib requires access to bytearray content, use buffer() for that.

Related

Python IP to integer conversion not working as expected

I am using the answer provided here to convert string IP to integer. But I am not getting expected output. Specifically, the method is
>>> ipstr = '1.2.3.4'
>>> parts = ipstr.split('.')
>>> (int(parts[0]) << 24) + (int(parts[1]) << 16) + \
(int(parts[2]) << 8) + int(parts[3])
but when I provide it the value 172.31.22.98 I get 2887718498 back. However, I expect to see value -1407248798 as provided by Google's Guava library https://google.github.io/guava/releases/20.0/api/docs/com/google/common/net/InetAddresses.html#fromInteger-int-
Also, I've verified that this service provides expected output but all of the answers provided by the aforementioned StackOverflow answer return 2887718498
Note that I cannot use any third party library. So I am pretty much limited to using a hand-written code (no imports)
A better way is to use the library method
>>> from socket import inet_aton
>>> int.from_bytes(inet_aton('172.31.22.98'), byteorder="big")
2887718498
This is still the same result you had above
Here is one way to view it as a signed int
>>> from ctypes import c_long
>>> c_long(2887718498)
c_long(-1407248798)
To do it without imports (but why? The above is all first party CPython)
>>> x = 2887718498
>>> if x >= (1<<31): x-= (1<<32)
>>> x
-1407248798
Found this post whilst trying to do the same as OP. Developed the below for my simple mind to understand and for someone to benefit from.
baseIP = '192.168.1.0'
baseIPFirstOctet = (int((baseIP).split('.')[0]))
baseIPSecondOctet = (int((baseIP).split('.')[1]))
baseIPThirdOctet = (int((baseIP).split('.')[2]))
baseIPFourthOctet = (int((baseIP).split('.')[3]))

Human readable output in bits

I have looked at the modules humanize and humanfriendly, and neither can convert a large bit value to human readable bit output (e.g. Mbits, Gbits, Tbits, ..etc). Has anyone come across such a module? Example:
mbits = 1000000
gbits = 1000000000
Then
print(human.bits(mbits)) # would output "1 Mbit"
print(human.bits(gbits)) # would output "1 Gbit"
...etc, up to exabit.
You can try hurry.filesize
>>> from hurry.filesize import size
>>> size(11000)
'10K'
There is another library bitmath
>>> from bitmath import *
>>> small_number = MiB(10000)
>>> print small_number.best_prefix()
9.765625 GiB

BYTE to python C-Type structure Conversion Issue

I am facing a little problem on deserializing some bytes that have been received from the POSIX Queue.
We are trying to develop a module where a Python application will post a POSIX queue data to a C application and C will re-post data to Python Queue..
All data is Ctype Structure based.
Structure defintion:
msgStruct.py
MAX_MSG_SIZE = 5120
class MsgStruct(ctypes.Structure):
_fields_ = [
("msgType", ctypes.c_int),
("msgSize",ctypes.c_int),
("setState",ctypes.c_int),
("msgBuf",ctypes.c_char * MAX_MSG_SIZE)
]
conversions.py
class conversions():
def serialize(ctypesObj):
"""
FAQ: How do I copy bytes to Python from a ctypes.Structure?
"""
return buffer(ctypesObj)[:]
def deserialize(ctypesObj, inputBytes):
"""
FAQ: How do I copy bytes to a ctypes.Structure from Python?
"""
fit = min(len(inputBytes), ctypes.sizeof(ctypesObj))
ctypes.memmove(ctypes.addressof(ctypesObj), inputBytes, fit)
return ctypesObj
test.py
from mapStruct import *
from conversions import conversions
wrapper=conversions()
data="\x01\x00\x00\x00\x70\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x1e\x00\x00\x00\x25\x42\x35\x32\x33\x39\x35\x31\x32\x35\x32\x34\x38\x39\x35\x30\x30\x36\x5e\x56\x45\x4e\x4b\x41\x54\x20\x52\x41\x47\x41\x56\x41\x4e\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x5e\x31\x36\x30\x34\x31\x30\x31\x31\x36\x35\x35\x36\x30\x30\x31\x34\x31\x30\x30\x30\x30\x30\x30\x3f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x3b\x35\x32\x33\x39\x35\x31\x32\x35\x32\x34\x38\x39\x35\x30\x30\x36\x3d\x31\x36\x30\x34\x31\x30\x31\x31\x34\x31\x30\x3f\x00\x00...\x00"
"""
Data is the Queue data that is recieved by python
"""
baseStruct=MsgStruct()
rxData=wrapper.deserialize(baseStruct,data)
print rxData.setState # Prints as expected
print rxData.msgType # Prints as expected
print rxData.msgSize
print rxData.msgBuf.encode('hex') # here is probles i dont C any data in this buffer
Please guide me on solving this issue. I am very much surprised that the buffer(rxData.msgSize) is always empty and would like to know why.
ctypes is trying to be helpful with c_char buffers by converting it into a Python string. The conversion stops at the first null byte. Observe what happens when I change the first couple bytes of data in your data buffer:
0
1
368
b'\x01\x02'
Change the type of msgBuf to c_ubyte instead so ctypes won't try to be "helpful" and then look at the data character-by-character with:
>>> print repr(''.join(chr(x) for x in rxData.msgBuf))
'\x00\x00\x00\x00\x02\x00\x00\x00\x1e\x00\x00\x00%B5239512524 ...
But there is no reason to use ctypes at all:
import struct
data=b"\x01\x00\x00\x00\x70\x01\x00\x00\x00\x00\x00\x00\x01\x02\x00\x00\x02\x00\x00\x00\x1e\x00\x00\x00\x25\x42\x35\x32\x33\x39\x35\x31\x32\x35\x32\x34\x38\x39\x35\x30\x30\x36\x5e\x56\x45\x4e\x4b\x41\x54\x20\x52\x41\x47\x41\x56\x41\x4e\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x5e\x31\x36\x30\x34\x31\x30\x31\x31\x36\x35\x35\x36\x30\x30\x31\x34\x31\x30\x30\x30\x30\x30\x30\x3f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x3b\x35\x32\x33\x39\x35\x31\x32\x35\x32\x34\x38\x39\x35\x30\x30\x36\x3d\x31\x36\x30\x34\x31\x30\x31\x31\x34\x31\x30\x3f\x00\x00...\x00"
msg_offset = struct.calcsize('iii')
print struct.unpack_from('iii',data)
print repr(data[msg_offset:])
Output:
(1, 368, 0)
'\x01\x02\x00\x00\x02\x00\x00\x00\x1e\x00\x00\x00%B5239512524895006^VENKAT RAGAVAN ^16041011655600141000000?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00;5239512524895006=16041011410?\x00\x00...\x00'
You can use struct to unpack the data for you:
import struct
import ctypes
class MsgStruct(ctypes.Structure):
_fields_ = [
("msgType", ctypes.c_int),
("msgSize",ctypes.c_int),
("setState",ctypes.c_int),
("msgBuf",ctypes.c_char * 5120)
]
def deserialize(data):
sz = len(data)-struct.calcsize('iii')
return MsgStruct(*struct.unpack('iii{}s'.format(sz), data))
Testing it with your data:
In [18]: data
Out[18]: '\x01\x00\x00\x00p\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x1e\x00\x00\x00%B5239512524895006^VENKAT RAGAVAN ^16041011655600141000000?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00;5239512524895006=16041011410?\x00\x00...\x00'
In [19]: s = deserialize(data)
In [20]: s.
s.msgBuf s.msgSize s.msgType s.setState
In [20]: s.msgType
Out[20]: 1
In [21]: s.msgSize
Out[21]: 368
In [22]: s.setState
Out[22]: 0
Edit: The MsgStruct assignment doesn't work for the msgBuf field. See the answers to this question for the reason. Unpacking the struct works fine:
In [13]: sz=12
In [14]: struct.unpack('iii{}s'.format(len(data)-sz), data)
Out[14]: (1, 368, 0, '\x00\x00\x00\x00\x02\x00\x00\x00\x1e\x00\x00\x00%B5239512524895006^VENKAT RAGAVAN ^16041011655600141000000?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00;5239512524895006=16041011410?\x00\x00...\x00')

Way to get value of this hex number

import binascii
f = open('file.ext', 'rb')
print binascii.hexlify(f.read(4))
f.close()
This prints:
84010100
I know that I must retrieve the hex number 184 out of this data.
How can it be done in python? I've used the struct module before, but I don't know if its little endian, big..whatever.. how can I get 184 from this number using struct?
>>> x = b'\x84\x01\x01\x00'
>>> import struct
>>> struct.unpack_from('<h', x)
(388,)
>>> map(hex, struct.unpack_from('<h', x))
['0x184']
< means little endian, h means read a 16-bit integer ("short"). Detail is in the package doc.

How to find number of bytes taken by python variable

Is there anyway i can know how much bytes taken by particular variable in python. E.g; lets say i have
int = 12
print (type(int))
it will print
<class 'int'>
But i wanted to know how many bytes it has taken on memory? is it possible?
You can find the functionality you are looking for here (in sys.getsizeof - Python 2.6 and up).
Also: don't shadow the int builtin!
import sys
myint = 12
print(sys.getsizeof(myint))
if you want to know size of int, you can use struct
>>> import struct
>>> struct.calcsize("i")
4
otherwise, as others already pointed out, use getsizeof (2.6). there is also a recipe you can try.
In Python >= 2.6 you can use sys.getsizeof.
Numpy offers infrastructure to control data size. Here are examples (py3):
import numpy as np
x = np.float32(0)
print(x.nbytes) # 4
a = np.zeros((15, 15), np.int64)
print(a.nbytes) # 15 * 15 * 8 = 1800
This is super helpful when trying to submit data to the graphics card with pyopengl, for example.
You could also take a look at Pympler, especially its asizeof module, which unlike sys.getsizeof works with Python >=2.2.
on python command prompt, you can use size of function
$ import python
$ import ctypes
$ ctypes.sizeof(ctypes.c_int)
and read more on it from https://docs.python.org/2/library/ctypes.html
In Python 3 you can use sys.getsizeof().
import sys
myint = 12
print(sys.getsizeof(myint))
The best library for that is guppy:
import guppy
import inspect
def get_object_size(obj):
h = guppy.hpy()
callers_local_vars = inspect.currentframe().f_back.f_locals.items()
vname = "Constant"
for var_name, var_val in callers_local_vars:
if var_val == obj:
vname = str(var_name)
size = str("{0:.2f} GB".format(float(h.iso(obj).domisize) / (1024 * 1024)))
return str("{}: {}".format(vname, size))
The accepted answer sys.getsizeof is correct.
But looking at your comment about the accepted answer you might want the number of bits a number is occupying in binary. You can use bit_length
(16).bit_length() # '10000' in binary
>> 5
(4).bit_length() # '100' in binary
>> 3

Categories

Resources