I want to write a file. Based on the name of the file this may or may not be compressed with the gzip module. Here is my code:
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wb') as fd:
print('blah blah blah'.encode(), file=fd)
I'm opening the writable file in binary mode and encoding my string to be written. However I get the following error:
File "/usr/lib/python3.5/gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Why is my object not a bytes? I get the same error if I open the file with 'w' and skip the encoding step. I also get the same error if I remove the '.gz' from the filename.
I'm using Python3.5 on Ubuntu 16.04
For me, changing the gzip flag to 'wt' did the job. I could write the original string, without "byting" it.
(tested on python 3.5, 3.7 on ubuntu 16).
From python 3 gzip doc - quoting: "... The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt', or 'xt' for text mode..."
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wt') as fd:
print('blah blah blah', file=fd)
!zcat output.gz
> blah blah blah
you can convert it to bytes like this.
import gzip
with gzip.open(filename, 'wb') as fd:
fd.write('blah blah blah'.encode('utf-8'))
print is a relatively complex function. It writes str to a file but not the str that you pass, it writes the str that is the result of rendering the parameters.
If you have bytes already, you can use fd.write(bytes) directly and take care of adding a newline if you need it.
If you don't have bytes, make sure fd is opened to receive text.
You can serialize it using pickle.
First serializing the object to be written using pickle, then using gzip.
To save the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
# serialize the object
serialized_obj = pickle.dumps(object)
# writing zip file
with gzip.open(filename, 'wb') as f:
f.write(serialized_obj)
To load the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
with gzip.open(filename, 'rb') as f:
serialized_obj = f.read()
# de-serialize the object
object = pickle.loads(serialized_obj)
Related
I am working with pdf content in python and my input from a service response is of the type _io.BufferedRandom. I need to save this file as pdf within my service for further usage
response = open('test_file.pdf', 'rb+')
this is the input to my service and is of the type _io.BufferedRandom
with open('output.pdf', 'wb+') as f:
f.write(response)
doing this I get the error - TypeError: a bytes-like object is required, not '_io.BufferedRandom'
Any help is appreciated thank you.
As an open method return the file object to open a file for reading/write or append. like
open(filename, mode)
f = open('workfile', 'w')
and in your case, you try to write file object to another file, not the content
f.write(response)
So you will need to use read function as
f.read(size) - read file and return a string (in text mode) or bytes object (in binary mode).
so the final procedure will be
with open('output.pdf', 'wb+') as f:
f.write(response.read())
I have read the documentation and a few additional posts on SO and other various places, but I can't quite figure out this concept:
When you call csvFilename = gzip.open(filename, 'rb') then reader = csv.reader(open(csvFilename)), is that reader not a valid csv file?
I am trying to solve the problem outlined below, and am getting a coercing to Unicode: need string or buffer, GzipFile found error on line 41 and 7 (highlighted below), leading me to believe that the gzip.open and csv.reader do not work as I had previously thought.
Problem I am trying to solve
I am trying to take a results.csv.gz and convert it to a results.csv so that I can turn the results.csv into a python dictionary and then combine it with another python dictionary.
File 1:
alertFile = payload.get('results_file')
alertDataCSV = rh.dataToDict(alertFile) # LINE 41
alertDataTotal = rh.mergeTwoDicts(splunkParams, alertDataCSV)
Calls File 2:
import gzip
import csv
def dataToDict(filename):
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename)) # LINE 7
alertData={}
for row in reader:
alertData[row[0]]=row[1:]
return alertData
def mergeTwoDicts(dictA, dictB):
dictC = dictA.copy()
dictC.update(dictB)
return dictC
*edit: also forgive my non-PEP style of naming in Python
gzip.open returns a file-like object (same as what plain open returns), not the name of the decompressed file. Simply pass the result directly to csv.reader and it will work (the csv.reader will receive the decompressed lines). csv does expect text though, so on Python 3 you need to open it to read as text (on Python 2 'rb' is fine, the module doesn't deal with encodings, but then, neither does the csv module). Simply change:
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename))
to:
# Python 2
csvFile = gzip.open(filename, 'rb')
reader = csv.reader(csvFile) # No reopening involved
# Python 3
csvFile = gzip.open(filename, 'rt', newline='') # Open in text mode, not binary, no line ending translation
reader = csv.reader(csvFile) # No reopening involved
The following worked for me for python==3.7.9:
import gzip
my_filename = my_compressed_file.csv.gz
with gzip.open(my_filename, 'rt') as gz_file:
data = gz_file.read() # read decompressed data
with open(my_filename[:-3], 'wt') as out_file:
out_file.write(data) # write decompressed data
my_filename[:-3] is to get the actual filename so that it does get a random filename.
I have multiple gzfile in subfolders that I want to unzip in one folder. It works fine but there's a BOM signature at the beginning of each file that I would like to be removed. I have checked other questions like Removing BOM from gzip'ed CSV in Python or Convert UTF-8 with BOM to UTF-8 with no BOM in Python but it doesn't seem to work. I use Python 3.6 in Pycharm on Windows.
Here's first my code without attempt:
import gzip
import pickle
import glob
def save_object(obj, filename):
with open(filename, 'wb') as output: # Overwrites any existing file.
pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)
output_path = 'path_out'
i = 1
for filename in glob.iglob(
'path_in/**/*.gz', recursive=True):
print(filename)
with gzip.open(filename, 'rb') as f:
file_content = f.read()
new_file = output_path + "z" + str(i) + ".txt"
save_object(file_content, new_file)
f.close()
i += 1
Now, with the logic defined in Removing BOM from gzip'ed CSV in Python (at least what I understand of it) if I replace file_content = f.read() by file_content = csv.reader(f.read().decode('utf-8-sig').encode('utf-8').splitlines()), I get:
TypeError: can't pickle _csv.reader objects
I checked for this error (e.g. "Can't pickle <type '_csv.reader'>" error when using multiprocessing on Windows) but I found no solution I could apply.
A minor adaptation of the very first question you link to trivially works.
tripleee$ cat bomgz.py
import gzip
from subprocess import run
with open('bom.txt', 'w') as handle:
handle.write('\ufeffmoo!\n')
run(['gzip', 'bom.txt'])
with gzip.open('bom.txt.gz', 'rb') as f:
file_content = f.read().decode('utf-8-sig')
with open('nobom.txt', 'w') as output:
output.write(file_content)
tripleee$ python3 bomgz.py
tripleee$ gzip -dc bom.txt.gz | xxd
00000000: efbb bf6d 6f6f 210a ...moo!.
tripleee$ xxd nobom.txt
00000000: 6d6f 6f21 0a moo!.
The pickle parts didn't seem relevant here but might have been obscuring the goal of getting a block of decoded str out of an encoded blob of bytes.
Sample fileobject data contains the following,
b'QmFyY29kZSxRdHkKQTIzMjMsMTAKQTIzMjQsMTUKNjUxMDA1OTUzMjkyNSwxMgpBMjMyNCwxCkEyMzI0LDEKQTIzMjMsMTAK'
And python file contains the following code
string_data = BytesIO(base64.decodestring(csv_rec))
read_file = csv.reader(string_data, quotechar='"', delimiter=',')
next(read_file)
when i run the above code in python, i got the following exception
_csv.Error: iterator should return strings, not int (did you open the file in text mode?)
How can i open a bytes data in text mode ?
You are almost there. Indeed, csv.reader expects iterator which returns strings (not bytes). Such iterator is provided by sibling of BytesIO - io.StringIO.
from io import StringIO
csv_rec = b'QmFyY29kZSxRdHkKQTIzMjMsMTAKQTIzMjQsMTUKNjUxMDA1OTUzMjkyNSwxMgpBMjMyNCwxCkEyMzI0LDEKQTIzMjMsMTAK'
bytes_data = base64.decodestring(csv_rec)
# decode() method is used to decode bytes to string
string_data = StringIO(bytes_data.decode())
read_file = csv.reader(string_data, quotechar='"', delimiter=',')
next(read_file)
I want to write a file. Based on the name of the file this may or may not be compressed with the gzip module. Here is my code:
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wb') as fd:
print('blah blah blah'.encode(), file=fd)
I'm opening the writable file in binary mode and encoding my string to be written. However I get the following error:
File "/usr/lib/python3.5/gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Why is my object not a bytes? I get the same error if I open the file with 'w' and skip the encoding step. I also get the same error if I remove the '.gz' from the filename.
I'm using Python3.5 on Ubuntu 16.04
For me, changing the gzip flag to 'wt' did the job. I could write the original string, without "byting" it.
(tested on python 3.5, 3.7 on ubuntu 16).
From python 3 gzip doc - quoting: "... The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt', or 'xt' for text mode..."
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wt') as fd:
print('blah blah blah', file=fd)
!zcat output.gz
> blah blah blah
you can convert it to bytes like this.
import gzip
with gzip.open(filename, 'wb') as fd:
fd.write('blah blah blah'.encode('utf-8'))
print is a relatively complex function. It writes str to a file but not the str that you pass, it writes the str that is the result of rendering the parameters.
If you have bytes already, you can use fd.write(bytes) directly and take care of adding a newline if you need it.
If you don't have bytes, make sure fd is opened to receive text.
You can serialize it using pickle.
First serializing the object to be written using pickle, then using gzip.
To save the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
# serialize the object
serialized_obj = pickle.dumps(object)
# writing zip file
with gzip.open(filename, 'wb') as f:
f.write(serialized_obj)
To load the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
with gzip.open(filename, 'rb') as f:
serialized_obj = f.read()
# de-serialize the object
object = pickle.loads(serialized_obj)