How to uncompress gzipped data in a byte array? - python

I have a byte array containing data that is compressed by gzip.
Now I need to uncompress this data. How can this be achieved?

zlib.decompress(data, 15 + 32) should autodetect whether you have gzip data or zlib data.
zlib.decompress(data, 15 + 16) should work if gzip and barf if zlib.
Here it is with Python 2.7.1, creating a little gz file, reading it back, and decompressing it:
>>> import gzip, zlib
>>> f = gzip.open('foo.gz', 'wb')
>>> f.write(b"hello world")
11
>>> f.close()
>>> c = open('foo.gz', 'rb').read()
>>> c
'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00'
>>> ba = bytearray(c)
>>> ba
bytearray(b'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00')
>>> zlib.decompress(ba, 15+32)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: must be string or read-only buffer, not bytearray
>>> zlib.decompress(bytes(ba), 15+32)
'hello world'
>>>
Python 3.x usage would be very similar.
Update based on comment that you are running Python 2.2.1.
Sigh. That's not even the last release of Python 2.2. Anyway, continuing with the foo.gz file created as above:
Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> strobj = open('foo.gz', 'rb').read()
>>> strobj
'\x1f\x8b\x08\x08\x14\xf4\xdcM\x02\xfffoo\x00\xcbH\xcd\xc9\xc9W(\xcf/\xcaI\x01\x00\x85\x11J\r\x0b\x00\x00\x00'
>>> import zlib
>>> zlib.decompress(strobj, 15+32)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
zlib.error: Error -2 while preparing to decompress data
>>> zlib.decompress(strobj, 15+16)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
zlib.error: Error -2 while preparing to decompress data
# OK, we can't use the back door method. Plan B: use the
# documented approach i.e. gzip.GzipFile with a file-like object.
>>> import gzip, cStringIO
>>> fileobj = cStringIO.StringIO(strobj)
>>> gzf = gzip.GzipFile('dummy-name', 'rb', 9, fileobj)
>>> gzf.read()
'hello world'
# Success. Now let's assume you have an array.array object-- which requires
# premeditation; they aren't created accidentally!
# The following code assumes subtype 'B' but should work for any subtype.
>>> import array, sys
>>> aaB = array.array('B')
>>> aaB.fromfile(open('foo.gz', 'rb'), sys.maxint)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
EOFError: not enough items in file
#### Don't panic, just read the fine manual
>>> aaB
array('B', [31, 139, 8, 8, 20, 244, 220, 77, 2, 255, 102, 111, 111, 0, 203, 72, 205, 201, 201, 87, 40, 207, 47, 202, 73, 1, 0, 133, 17, 74, 13, 11, 0, 0, 0])
>>> strobj2 = aaB.tostring()
>>> strobj2 == strobj
1 #### means True
# You can make a str object and use that as above.
# ... or you can plug it directly into StringIO:
>>> gzip.GzipFile('dummy-name', 'rb', 9, cStringIO.StringIO(aaB)).read()
'hello world'

Apparently you can do this
import zlib
# ...
ungziped_str = zlib.decompressobj().decompress('x\x9c' + gziped_str)
Or this:
zlib.decompress( data ) # equivalent to gzdecompress()
For more info, look here: Python docs

Related

unpickle a python 2 object in python 3 raises ValueError

In python 2.7.6:
# the data i'm trying to pickle
>>> x[0:5]
[494.12804680901604, 641.9374923706055, 778.293918918919, 470.2265625, 237.21332017010934]
>>> y[0:5]
[236.99996948242188, 381.6793310733242, 685.0, 409.0909090909091, 658.0]
>>> z[0:5]
[23, 20, 98, 24, 78]
>>> holder = [x,y,z]
How i'm pickling:
with open('holderData.obj','wb') as f:
pickle.dump(holder,f)
f.close()
In python 3.6.2
with open('holderData.obj','rb') as f:
d = pickle.load(f, encoding='bytes')
Yet, this returns:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ValueError: could not convert string to float
The only question/answer I could found related to this issue, tells me to add the encoding='bytes' bit which doesn't work in this instance.
The pickle itself print(repr(pickle.dumps(holder))):
'(lp0\n(lp1\nF494.12804680901604\naF641.9374923706055\naF778.293918918919\naF470.2265625\naF237.21332017010934\naF372.76081123737373\naF396.15337968952133\naF615.2265625\naF470.2265625\naF581.2155330882352\naF488.40675200803213\naF475.47189597315435\naF92.0511279585

Weird TypeError from json.dumps

In python 3.4.0, using json.dumps() throws me a TypeError in one case but works like a charm in other case (which I think is equivalent to the first one).
I have a dict where keys are strings and values are numbers and other dicts (i.e. something like {'x': 1.234, 'y': -5.678, 'z': {'a': 4, 'b': 0, 'c': -6}}).
This fails (the stacktrace is not from this particular code snippet but from my larger script which I won't paste here but it is essentialy the same):
>>> x = dict(foo()) # obtain the data and make a new dict of it to really be sure
>>> import json
>>> json.dumps(x)
Traceback (most recent call last):
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/pydevd.py", line 1733, in <module>
debugger.run(setup['file'], None, None)
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/pydevd.py", line 1226, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/mnt/data/gandalv/progs/pycharm-3.4/helpers/pydev/_pydev_execfile.py", line 38, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc) #execute the script
File "/mnt/data/gandalv/School/PhD/Other work/Krachy/code/recalculate.py", line 54, in <module>
ls[1] = json.dumps(f)
File "/usr/lib/python3.4/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.4/json/encoder.py", line 192, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.4/json/encoder.py", line 250, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.4/json/encoder.py", line 173, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 306 is not JSON serializable
The 306 is one of the values in one of ther inner dicts in x. It is not always the same number, sometimes it is a different number contained in the dict, apparently because of the unorderedness of a dict.
However, this works like a charm:
>>> x = foo() # obtain the data and make a new dict of it to really be sure
>>> import ast
>>> import json
>>> x2 = ast.literal_eval(repr(x))
>>> x == x2
True
>>> json.dumps(x2)
"{...}" # the json representation of dict as it should be
Could anyone, please, tell me why does this happen or what could be the cause? The most confusing part is that those two dicts (the original one and the one obtained through evaluation of the representation of the original one) are equal but the dumps() function behaves differently for each of them.
The cause was that the numbers inside the dict were not ordinary python ints but numpy.in64s which are apparently not supported by the json encoder.
As you have seen, numpy int64 data types are not serializable into json directly:
>>> import numpy as np
>>> import json
>>> a=np.zeros(3, dtype=np.int64)
>>> a[0]=-9223372036854775808
>>> a[2]=9223372036854775807
>>> jstr=json.dumps(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/__init__.py", line 230, in dumps
return _default_encoder.encode(obj)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 192, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 250, in iterencode
return _iterencode(o, 0)
File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/json/encoder.py", line 173, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([-9223372036854775808, 0, 9223372036854775807]) is not JSON serializable
However, Python integers -- including longer integers -- can be serialized and deserialized:
>>> json.loads(json.dumps(2**123))==2**123
True
So with numpy, you can convert directly to Python data structures then serialize:
>>> jstr=json.dumps(a.tolist())
>>> b=np.array(json.loads(jstr))
>>> np.array_equal(a,b)
True

Python recognizes the function count as a name

I am viewing the Python tutorials from the Pascal institute BDFL says are the best to start and i have a very basic question
While in the tutorial says:
How many of each base does this sequence contains?
>>> count(seq, 'a')
35
>>> count(seq, 'c')
21
>>> count(seq, 'g')
44
>>> count(seq, 't')
12
When i try to do is it does not work
>>> count(seq, 'a')
Traceback (most recent call last):
File "<pyshell#140>", line 1, in <module>
count(seq, 'a')
NameError: name 'count' is not defined
Why this is happening?
I' ve searched Stack resoures BTW and I didn't find anything.
COMMENT
Take a look at the start of the section 1.1.3. You have to type first from string import *
>>> from string import*
>>> nb_a = count(seq, 'a')
Traceback (most recent call last):
File "<pyshell#73>", line 1, in <module>
nb_a = count(seq, 'a')
NameError: name 'count' is not defined
>>> from string import *
>>> nb_a = count(seq, 'a')
Traceback (most recent call last):
File "<pyshell#75>", line 1, in <module>
nb_a = count(seq, 'a')
NameError: name 'count' is not defined
I did.
ANSWER
>>> from string import *
>>> from string import count
Traceback (most recent call last):
File "<pyshell#93>", line 1, in <module>
from string import count
ImportError: cannot import name count
>>> from string import count
Traceback (most recent call last):
File "<pyshell#94>", line 1, in <module>
from string import count
ImportError: cannot import name count
I did. Didn' t work.
The tutorial you linked to is very old:
Python 2.4.2 (#1, Dec 20 2005, 16:25:40)
You're probably using a more modern Python (>= 3) in which case there are no longer string functions like count in the string module. We used to have
Python 2.7.5+ (default, Feb 27 2014, 19:39:55)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from string import count
>>> count("abcc", "c")
2
but today:
Python 3.3.2+ (default, Feb 28 2014, 00:53:38)
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from string import count
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name count
>>> import string
>>> dir(string)
['ChainMap', 'Formatter', 'Template', '_TemplateMetaclass', '__builtins__',
'__cached__', '__doc__', '__file__', '__initializing__', '__loader__', '__name__',
'__package__', '_re', '_string', 'ascii_letters', 'ascii_lowercase',
'ascii_uppercase', 'capwords', 'digits', 'hexdigits', 'octdigits', 'printable',
'punctuation', 'whitespace']
These days we use the string methods instead, the ones that live in str itself:
>>> 'abcc'.count('c')
2
or even
>>> str.count('abcc','c')
2
While the other answers are correct, current Python releases propose another way to call count, as it is usable for str but also any type of sequence, as advised inside the documentation:
>>> seq.count('a')
35
As seq is as string object, it also have the count method.
This methodcount() is defined in string package. For using this method in your code, you need to import the definition.
Adding the following import statement before using the method will solve your problem
from string import count
>>> seq='acdaacc'
>>> count(seq,'a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'count' is not defined
>>> from string import count
>>> count(seq,'a')
3
count is a method in the string module, meaning that at the top of your file (before you use the function) you need to "import" it so that your interpreter knows what you're talking about. Add the line from string import count as the first line of your file and it should work.

Is there a way to implement **kwargs behavior when calling a Python script from the command line

Say I have a function as follows:
def foo(**kwargs):
print kwargs
And then call the function like this, I get this handy little dict of all kwargs.
>>> foo(a = 5, b = 7)
{'a': 5, 'b': 7}
I want to do this directly to scripts I call from command line. So entering this:
python script.py a = 5 b = 7
Would create a similar dict to the example above. Can this be done?
Here's what I have so far:
import sys
kwargs_raw = sys.argv[1:]
kwargs = {key:val for key, val in zip(kwargs_raw[::3], kwargs_raw[1::3])}
print kwargs
And here's what this produces:
Y:\...\Python>python test.py a = 5 b = 7
{'a': '5', 'b': '7'}
So you may be wondering why this isn't good enough
Its very structured, and thus, won't work if a or b are anything other that strings, ints, or floats.
I have no way of determining if the user intended to have 5 be an int, string, or float
I've seen ast.literal_eval() around here before, but I couldn't figure out how to get that to work. Both my attempts failed:
>>> ast.literal_eval("a = 5")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "Y:\admin\Anaconda\lib\ast.py", line 49, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "Y:\admin\Anaconda\lib\ast.py", line 37, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
a = 5
and
>>> ast.literal_eval("{a:5,b:7}")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "Y:\admin\Anaconda\lib\ast.py", line 80, in literal_eval
return _convert(node_or_string)
File "Y:\admin\Anaconda\lib\ast.py", line 63, in _convert
in zip(node.keys, node.values))
File "Y:\admin\Anaconda\lib\ast.py", line 62, in <genexpr>
return dict((_convert(k), _convert(v)) for k, v
File "Y:\admin\Anaconda\lib\ast.py", line 79, in _convert
raise ValueError('malformed string')
ValueError: malformed string
If it matters, I'm using Python 2.7.6 32-bit on Windows 7 64-bit. Thanks in advance
It seems what you're really looking for is a way to parse command-line arguments. Take a look at the argparse module: http://docs.python.org/2/library/argparse.html#module-argparse
Alternately, if you really want to give your arguments in dictionary-ish form, just use the json module:
import json, sys
# Run your program as:
# python my_prog.py "{'foo': 1, 'bar': 2}"
# (the quotes are important)
data = json.loads(sys.argv[1])

Python: Problems with a list comprehension using module laspy

recently i understand the great advantage to use the list comprehension. I am working with several milion of points (x,y,z) stored in a special format *.las file. In python there are two way to work with this format:
Liblas module [http://www.liblas.org/tutorial/python.html][1] (in a C++/Python)
laspy module [http://laspy.readthedocs.org/en/latest/tut_part_1.html][2] (pure Python)
I had several problem with liblas and i wish to test laspy.
in liblas i can use list comprehension as:
from liblas import file as lasfile
f = lasfile.File(inFile,None,'r') # open LAS
points = [(p.x,p.y) for p in f] # read in list comprehension
in laspy i cannot figurate how do the same:
from laspy.file import File
f = file.File(inFile, mode='r')
f
<laspy.file.File object at 0x0000000013939080>
(f[0].X,f[0].Y)
(30839973, 696447860)
i tryed several combination as:
points = [(p.X,p.Y) for p in f]
but i get this message
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: Point instance has no attribute 'x'
I tryed in uppercase and NOT-uppercase because Python is case sensitive:
>>> [(p.x,p.y) for p in f]
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: Point instance has no attribute 'x'
>>> [(p.X,p.Y) for p in f]
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: Point instance has no attribute 'X'
this is in interactive prompt:
C:\Python27>python.exe
Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> from laspy.file import File
>>> inFile="C:\\04-las_clip_inside_area\\Ku_018_class.las"
>>> f = File(inFile, None, 'r')
>>> f
<laspy.file.File object at 0x00000000024D5E10>
>>> points = [(p.X,p.Y) for p in f]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: Point instance has no attribute 'X'
>>>
the print p after the list is:
print dir(p)
['__doc__', '__init__', '__module__', 'make_nice', 'pack', 'packer', 'reader', 'unpacked']
in a loop format i have always the same error
>>> for p in f:
... print dir(p)
... print p.X,p.Y
...
['__doc__', '__init__', '__module__', 'make_nice', 'pack', 'packer', 'reader', 'unpacked']
Traceback (most recent call last):
File "<interactive input>", line 3, in <module>
AttributeError: Point instance has no attribute 'X'
using this code suggested by nneonneo
import numpy as np
for p in f:
... points = np.array([f.X, f.Y]).T
i can store in an array
points
array([[ 30839973, 696447860],
[ 30839937, 696447890],
[ 30839842, 696447832],
...,
[ 30943795, 695999984],
[ 30943695, 695999922],
[ 30943960, 695999995]])
but miss the way to create a list comprehension
points = [np.array(p.X,p.Y).T for p in f]
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: Point instance has no attribute 'X'
thanks in advance for help.
Gianni
Python is case-sensitive. Too me it looks like you ask for attribute x, but it should be an uppercase X.
Try
import numpy as np
...
points = np.array([f.X, f.Y]).T
It looks like Point has a make_nice() method that makes more attributes show up.
for p in f: p.make_nice()
Now your list comp should work (with uppercase X and Y--see comments below).
[(p.X,p.Y) for p in f]
note: This answer is not tested. It is based on reading the source of laspy.util.Point.
Relevant source:
def make_nice(self):
'''Turn a point instance with the bare essentials (an unpacked list of data)
into a fully populated point. Add all the named attributes it possesses,
including binary fields.
'''
i = 0
for dim in self.reader.point_format.specs:
self.__dict__[dim.name] = self.unpacked[i]
i += 1
# rest of method snipped

Categories

Resources