I have a function that's reading a content object into a pandas dataframe.
import pandas as pd
from cStringIO import StringIO, InputType
def create_df(content):
assert content, "No content was provided, can't create dataframe"
if not isinstance(content, InputType):
content = StringIO(content)
content.seek(0)
return pd.read_csv(content)
However I keep getting the error TypeError: StringIO() argument 1 must be string or buffer, not cStringIO.StringIO
I checked the incoming type of the content prior to the StringIO() conversion inside the function and it's of type str. Without the conversion I get an error that the str object does not have a seek function. Any idea whats wrong here?
You only tested for InputType, which is a cStringIO.StringIO() instance that supports reading. You appear to have the other type, OutputType, the instance created for an instance that supports writing to:
>>> import cStringIO
>>> finput = cStringIO.StringIO('Hello world!') # the input type, it has data ready to read
>>> finput
<cStringIO.StringI object at 0x1034397a0>
>>> isinstance(finput, cStringIO.InputType)
True
>>> foutput = cStringIO.StringIO() # the output type, it is ready to receive data
>>> foutput
<cStringIO.StringO object at 0x102fb99d0>
>>> isinstance(foutput, cStringIO.OutputType)
True
You'd need to test for both types, just use a tuple of the two types as the second argument to isinstance():
from cStringIO import StringIO, InputType, OutputType
if not isinstance(content, (InputType, OutputType)):
content = StringIO(content)
or, and this is the better option, test for read and seek attributes, so you can also support regular files:
if not (hasattr(content, 'read') and hasattr(content, 'seek')):
# if not a file object, assume it is a string and wrap it in an in-memory file.
content = StringIO(content)
or you could just test for strings and [buffers](https://docs.python.org/2/library/functions.html#buffer(, since those are the only two types that StringIO() can support:
if isinstance(content, (str, buffer)):
# wrap strings into an in-memory file
content = StringIO(content)
This has the added bonus that any other file object in the Python library, including compressed files and tempfile.SpooledTemporaryFile() and io.BytesIO() will also be accepted and work.
Related
I have a gzip file and I am trying to read it via Python as below:
import zlib
do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)
it throws this error:
zlib.error: Error -3 while decompressing: incorrect header check
How can I overcome it?
You have this error:
zlib.error: Error -3 while decompressing: incorrect header check
Which is most likely because you are trying to check headers that are not there, e.g. your data follows RFC 1951 (deflate compressed format) rather than RFC 1950 (zlib compressed format) or RFC 1952 (gzip compressed format).
choosing windowBits
But zlib can decompress all those formats:
to (de-)compress deflate format, use wbits = -zlib.MAX_WBITS
to (de-)compress zlib format, use wbits = zlib.MAX_WBITS
to (de-)compress gzip format, use wbits = zlib.MAX_WBITS | 16
See documentation in http://www.zlib.net/manual.html#Advanced (section inflateInit2)
examples
test data:
>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>>
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>>
obvious test for zlib:
>>> zlib.decompress(zlib_data)
'test'
test for deflate:
>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'
test for gzip:
>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'
the data is also compatible with gzip module:
>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data) # io.BytesIO for Python 3
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()
automatic header detection (zlib or gzip)
adding 32 to windowBits will trigger header detection
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'
using gzip instead
or you can ignore zlib and use gzip module directly; but please remember that under the hood, gzip uses zlib.
fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
Update: dnozay's answer explains the problem and should be the accepted answer.
Try the gzip module, code below is straight from the python docs.
import gzip
f = gzip.open('/home/joe/file.txt.gz', 'rb')
file_content = f.read()
f.close()
I just solved the "incorrect header check" problem when uncompressing gzipped data.
You need to set -WindowBits => WANT_GZIP in your call to inflateInit2 (use the 2 version)
Yes, this can be very frustrating. A typically shallow reading of the documentation presents Zlib as an API to Gzip compression, but by default (not using the gz* methods) it does not create or uncompress the Gzip format. You have to send this non-very-prominently documented flag.
This does not answer the original question, but it may help someone else that ends up here.
The zlib.error: Error -3 while decompressing: incorrect header check also occurs in the example below:
b64_encoded_bytes = base64.b64encode(zlib.compress(b'abcde'))
encoded_bytes_representation = str(b64_encoded_bytes) # this the cause
zlib.decompress(base64.b64decode(encoded_bytes_representation))
The example is a minimal reproduction of something I encountered in some legacy Django code, where Base64 encoded bytes (from an HTTP POST) were being stored in a Django CharField (instead of a BinaryField).
When reading a CharField value from the database, str() is called on the value, without an explicit encoding, as can be seen in the Django source.
The str() documentation says:
If neither encoding nor errors is given, str(object) returns object.str(), which is the “informal” or nicely printable string representation of object. For string objects, this is the string itself. If object does not have a str() method, then str() falls back to returning repr(object).
So, in the example, we are inadvertently base64-decoding
"b'eJxLTEpOSQUABcgB8A=='"
instead of
b'eJxLTEpOSQUABcgB8A=='.
The zlib decompression in the example would succeed if an explicit encoding were used, e.g. str(b64_encoded_bytes, 'utf-8').
NOTE specific to Django:
What's especially tricky: this issue only arises when retrieving a value from the database. See for example the test below, which passes (in Django 3.0.3):
class MyModelTests(TestCase):
def test_bytes(self):
my_model = MyModel.objects.create(data=b'abcde')
self.assertIsInstance(my_model.data, bytes) # issue does not arise
my_model.refresh_from_db()
self.assertIsInstance(my_model.data, str) # issue does arise
where MyModel is
class MyModel(models.Model):
data = models.CharField(max_length=100)
To decompress incomplete gzipped bytes that are in memory, the answer by dnozay is useful but it misses the zlib.decompressobj call which I found to be necessary:
incomplete_decompressed_content = zlib.decompressobj(wbits=zlib.MAX_WBITS | 16).decompress(incomplete_gzipped_content)
Note that zlib.MAX_WBITS | 16 is 15 | 16 which is 31. For some background about wbits, see zlib.decompress.
Credit: answer by Yann Vernier which notes the the zlib.decompressobj call.
Funnily enough, I had that error when trying to work with the Stack Overflow API using Python.
I managed to get it working with the GzipFile object from the gzip directory, roughly like this:
import gzip
gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb'))
file_contents = gzip_file.read()
My case was to decompress email messages that are stored in Bullhorn database. The snippet is the following:
import pyodbc
import zlib
cn = pyodbc.connect('connection string')
cursor = cn.cursor()
cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ')
for msg in cursor.fetchall():
#magic in the second parameter, use negative value for deflate format
decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS)
Just add headers 'Accept-Encoding': 'identity'
import requests
requests.get('http://gett.bike/', headers={'Accept-Encoding': 'identity'})
https://github.com/requests/requests/issues/3849
I have a JSON file hosted locally in my Django directory. It is fetched from that file to a view in views.py, where it is read in like so:
def Stops(request):
json_data = open(finders.find('JSON/myjson.json'))
data1 = json.load(json_data) # deserialises it
data2 = json.dumps(data1) # json formatted string
json_data.close()
return JsonResponse(data2, safe=False)
Using JsonResponse without (safe=False) returns the following error:
TypeError: In order to allow non-dict objects to be serialized set the safe parameter to False.
Similarly, using json.loads(json_data.read()) instead of json.load gives this error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
This is confusing to me - I have validated the JSON using an online validator. When the JSON is sent to the frontend with safe=False, the resulting object that arrives is a string, even after calling .json() on it in javascript like so:
fetch("/json").then(response => {
return response.json();
}).then(data => {
console.log("data ", data); <---- This logs a string to console
...
However going another step and calling JSON.parse() on the string converts the object to a JSON object that I can use as intended
data = JSON.parse(data);
console.log("jsonData", data); <---- This logs a JSON object to console
But this solution doesn't strike me as a complete one.
At this point I believe the most likely thing is that there is something wrong with the source JSON - (in the file character encoding?) Either that or json.dumps() is not doing what I think it should, or I am not understanding the Django API's JSONresponse function in a way I'm not aware of...
I've reached the limit of my knowledge on this subject. If you have any wisdom to impart, I would really appreciate it.
EDIT: As in the answer below by Abdul, I was reformatting the JSON into a string with the json.dumps(data1) line
Working code looks like:
def Stops(request):
json_data = open(finders.find('JSON/myjson.json'))
data = json.load(json_data) # deserialises it
json_data.close()
return JsonResponse(data, safe=False) # pass the python object here
Let's see the following lines of your code:
json_data = open(finders.find('JSON/myjson.json'))
data1 = json.load(json_data) # deserialises it
data2 = json.dumps(data1) # json formatted string
You open a file and get a file pointer in json_data, parse it's content and get a python object in data1 and then turn it back into a JSON string and store it into data2. Somewhat redundant right? Next you pass this JSON string to JsonResponse which will further try to serialize it into JSON!! Meaning you then get a string inside a string in JSON.
Try the following code instead:
def Stops(request):
json_data = open(finders.find('JSON/myjson.json'))
data = json.load(json_data) # deserialises it
json_data.close()
return JsonResponse(data, safe=False) # pass the python object here
Note: function names in python should ideally be in snake_case not PascalCase, hence instead of Stops you should use stops. See
PEP 8 -- Style Guide for Python
Code
This question already has answers here:
How can I parse (read) and use JSON?
(5 answers)
Closed 25 days ago.
In Python I'm getting an error:
Exception: (<type 'exceptions.AttributeError'>,
AttributeError("'str' object has no attribute 'read'",), <traceback object at 0x1543ab8>)
Given python code:
def getEntries (self, sub):
url = 'http://www.reddit.com/'
if (sub != ''):
url += 'r/' + sub
request = urllib2.Request (url +
'.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'})
response = urllib2.urlopen (request)
jsonStr = response.read()
return json.load(jsonStr)['data']['children']
What does this error mean and what did I do to cause it?
The problem is that for json.load you should pass a file like object with a read function defined. So either you use json.load(response) or json.loads(response.read()).
Ok, this is an old thread but.
I had a same issue, my problem was I used json.load instead of json.loads
This way, json has no problem with loading any kind of dictionary.
Official documentation
json.load - Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.
json.loads - Deserialize s (a str, bytes or bytearray instance containing a JSON document) to a Python object using this conversion table.
You need to open the file first. This doesn't work:
json_file = json.load('test.json')
But this works:
f = open('test.json')
json_file = json.load(f)
If you get a python error like this:
AttributeError: 'str' object has no attribute 'some_method'
You probably poisoned your object accidentally by overwriting your object with a string.
How to reproduce this error in python with a few lines of code:
#!/usr/bin/env python
import json
def foobar(json):
msg = json.loads(json)
foobar('{"batman": "yes"}')
Run it, which prints:
AttributeError: 'str' object has no attribute 'loads'
But change the name of the variablename, and it works fine:
#!/usr/bin/env python
import json
def foobar(jsonstring):
msg = json.loads(jsonstring)
foobar('{"batman": "yes"}')
This error is caused when you tried to run a method within a string. String has a few methods, but not the one you are invoking. So stop trying to invoke a method which String does not define and start looking for where you poisoned your object.
AttributeError("'str' object has no attribute 'read'",)
This means exactly what it says: something tried to find a .read attribute on the object that you gave it, and you gave it an object of type str (i.e., you gave it a string).
The error occurred here:
json.load(jsonStr)['data']['children']
Well, you aren't looking for read anywhere, so it must happen in the json.load function that you called (as indicated by the full traceback). That is because json.load is trying to .read the thing that you gave it, but you gave it jsonStr, which currently names a string (which you created by calling .read on the response).
Solution: don't call .read yourself; the function will do this, and is expecting you to give it the response directly so that it can do so.
You could also have figured this out by reading the built-in Python documentation for the function (try help(json.load), or for the entire module (try help(json)), or by checking the documentation for those functions on http://docs.python.org .
Instead of json.load() use json.loads() and it would work:
ex:
import json
from json import dumps
strinjJson = '{"event_type": "affected_element_added"}'
data = json.loads(strinjJson)
print(data)
So, don't use json.load(data.read()) use json.loads(data.read()):
def findMailOfDev(fileName):
file=open(fileName,'r')
data=file.read();
data=json.loads(data)
return data['mail']
use json.loads() function , put the s after that ... just a mistake btw i just realized after i searched error
def getEntries (self, sub):
url = 'http://www.reddit.com/'
if (sub != ''):
url += 'r/' + sub
request = urllib2.Request (url +
'.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'})
response = urllib2.urlopen (request)
jsonStr = response.read()
return json.loads(jsonStr)['data']['children']
try this
Open the file as a text file first
json_data = open("data.json", "r")
Now load it to dict
dict_data = json.load(json_data)
If you need to convert string to json. Then use loads() method instead of load(). load() function uses to load data from a file so used loads() to convert string to json object.
j_obj = json.loads('["label" : "data"]')
Is it possible to modify the tags of a downloaded MP3, without writing it to the disk?
I am using
def downloadTrack(url)
track_data = requests.get(url)
audiofile = Mp3AudioInherited(track_data.content)
audiofile.initTag()
with the class Mp3AudioInherited inherited from core.AudioFile much like mp3.Mp3AudioFile. The only signficant difference:
class Mp3AudioInherited(core.AudioFile):
...
def _read(self):
with io.BytesIO(self.data) as file_obj:
self._tag = id3.Tag()
tag_found = self._tag.parse(file_obj, self._tag_version)
...
Unfortunately _tag.parse() throws a ValueError: Invalid type: <type '_io.BytesIO'>. Isn't BytesIO a file-like object?
Thanks and regards!
No, io.BytesIO objects are not file-like (i.e., they are not interchangeable with file objects) in Python 2. Try using StringIO.StringIO to get a memory-backed file-like object in Python 2.
I have a gzip file and I am trying to read it via Python as below:
import zlib
do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)
it throws this error:
zlib.error: Error -3 while decompressing: incorrect header check
How can I overcome it?
You have this error:
zlib.error: Error -3 while decompressing: incorrect header check
Which is most likely because you are trying to check headers that are not there, e.g. your data follows RFC 1951 (deflate compressed format) rather than RFC 1950 (zlib compressed format) or RFC 1952 (gzip compressed format).
choosing windowBits
But zlib can decompress all those formats:
to (de-)compress deflate format, use wbits = -zlib.MAX_WBITS
to (de-)compress zlib format, use wbits = zlib.MAX_WBITS
to (de-)compress gzip format, use wbits = zlib.MAX_WBITS | 16
See documentation in http://www.zlib.net/manual.html#Advanced (section inflateInit2)
examples
test data:
>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>>
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>>
obvious test for zlib:
>>> zlib.decompress(zlib_data)
'test'
test for deflate:
>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'
test for gzip:
>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'
the data is also compatible with gzip module:
>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data) # io.BytesIO for Python 3
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()
automatic header detection (zlib or gzip)
adding 32 to windowBits will trigger header detection
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'
using gzip instead
or you can ignore zlib and use gzip module directly; but please remember that under the hood, gzip uses zlib.
fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
Update: dnozay's answer explains the problem and should be the accepted answer.
Try the gzip module, code below is straight from the python docs.
import gzip
f = gzip.open('/home/joe/file.txt.gz', 'rb')
file_content = f.read()
f.close()
I just solved the "incorrect header check" problem when uncompressing gzipped data.
You need to set -WindowBits => WANT_GZIP in your call to inflateInit2 (use the 2 version)
Yes, this can be very frustrating. A typically shallow reading of the documentation presents Zlib as an API to Gzip compression, but by default (not using the gz* methods) it does not create or uncompress the Gzip format. You have to send this non-very-prominently documented flag.
This does not answer the original question, but it may help someone else that ends up here.
The zlib.error: Error -3 while decompressing: incorrect header check also occurs in the example below:
b64_encoded_bytes = base64.b64encode(zlib.compress(b'abcde'))
encoded_bytes_representation = str(b64_encoded_bytes) # this the cause
zlib.decompress(base64.b64decode(encoded_bytes_representation))
The example is a minimal reproduction of something I encountered in some legacy Django code, where Base64 encoded bytes (from an HTTP POST) were being stored in a Django CharField (instead of a BinaryField).
When reading a CharField value from the database, str() is called on the value, without an explicit encoding, as can be seen in the Django source.
The str() documentation says:
If neither encoding nor errors is given, str(object) returns object.str(), which is the “informal” or nicely printable string representation of object. For string objects, this is the string itself. If object does not have a str() method, then str() falls back to returning repr(object).
So, in the example, we are inadvertently base64-decoding
"b'eJxLTEpOSQUABcgB8A=='"
instead of
b'eJxLTEpOSQUABcgB8A=='.
The zlib decompression in the example would succeed if an explicit encoding were used, e.g. str(b64_encoded_bytes, 'utf-8').
NOTE specific to Django:
What's especially tricky: this issue only arises when retrieving a value from the database. See for example the test below, which passes (in Django 3.0.3):
class MyModelTests(TestCase):
def test_bytes(self):
my_model = MyModel.objects.create(data=b'abcde')
self.assertIsInstance(my_model.data, bytes) # issue does not arise
my_model.refresh_from_db()
self.assertIsInstance(my_model.data, str) # issue does arise
where MyModel is
class MyModel(models.Model):
data = models.CharField(max_length=100)
To decompress incomplete gzipped bytes that are in memory, the answer by dnozay is useful but it misses the zlib.decompressobj call which I found to be necessary:
incomplete_decompressed_content = zlib.decompressobj(wbits=zlib.MAX_WBITS | 16).decompress(incomplete_gzipped_content)
Note that zlib.MAX_WBITS | 16 is 15 | 16 which is 31. For some background about wbits, see zlib.decompress.
Credit: answer by Yann Vernier which notes the the zlib.decompressobj call.
Funnily enough, I had that error when trying to work with the Stack Overflow API using Python.
I managed to get it working with the GzipFile object from the gzip directory, roughly like this:
import gzip
gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb'))
file_contents = gzip_file.read()
My case was to decompress email messages that are stored in Bullhorn database. The snippet is the following:
import pyodbc
import zlib
cn = pyodbc.connect('connection string')
cursor = cn.cursor()
cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ')
for msg in cursor.fetchall():
#magic in the second parameter, use negative value for deflate format
decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS)
Just add headers 'Accept-Encoding': 'identity'
import requests
requests.get('http://gett.bike/', headers={'Accept-Encoding': 'identity'})
https://github.com/requests/requests/issues/3849