Sample fileobject data contains the following,
b'QmFyY29kZSxRdHkKQTIzMjMsMTAKQTIzMjQsMTUKNjUxMDA1OTUzMjkyNSwxMgpBMjMyNCwxCkEyMzI0LDEKQTIzMjMsMTAK'
And python file contains the following code
string_data = BytesIO(base64.decodestring(csv_rec))
read_file = csv.reader(string_data, quotechar='"', delimiter=',')
next(read_file)
when i run the above code in python, i got the following exception
_csv.Error: iterator should return strings, not int (did you open the file in text mode?)
How can i open a bytes data in text mode ?
You are almost there. Indeed, csv.reader expects iterator which returns strings (not bytes). Such iterator is provided by sibling of BytesIO - io.StringIO.
from io import StringIO
csv_rec = b'QmFyY29kZSxRdHkKQTIzMjMsMTAKQTIzMjQsMTUKNjUxMDA1OTUzMjkyNSwxMgpBMjMyNCwxCkEyMzI0LDEKQTIzMjMsMTAK'
bytes_data = base64.decodestring(csv_rec)
# decode() method is used to decode bytes to string
string_data = StringIO(bytes_data.decode())
read_file = csv.reader(string_data, quotechar='"', delimiter=',')
next(read_file)
Related
Here is my scenario: I have a zip file that I am downloading with requests into memory rather than writing a file. I am unzipping the data into an object called myzipfile. Inside the zip file is a csv file. I would like to convert each row of the csv data into a dictionary. Here is what I have so far.
import csv
from io import BytesIO
import requests
# other imports etc.
r = requests.get(url=fileurl, headers=headers, stream=True)
filebytes = BytesIO(r.content)
myzipfile = zipfile.ZipFile(filebytes)
for name in myzipfile.namelist():
mycsv = myzipfile.open(name).read()
for row in csv.DictReader(mycsv): # it fails here.
print(row)
errors:
Traceback (most recent call last):
File "/usr/lib64/python3.7/csv.py", line 98, in fieldnames
self._fieldnames = next(self.reader)
_csv.Error: iterator should return strings, not int (did you open the file in text mode?)
Looks like csv.DictReader(mycsv) expects a file object instead of raw data. How do I convert the rows in the mycsv object data (<class 'bytes'>) to a list of dictionaries? I'm trying to accomplish this without writing a file to disk and working directly from csv objects in memory.
The DictReader expects a file or file-like object: we can satisfy this expectation by loading the zipped file into an io.StringIO instance.
Note that StringIO expects its argument to be a str, but reading a file from the zipfile returns bytes, so the data must be decoded. This example assumes that the csv was originally encoded with the local system's default encoding. If that is not the case the correct encoding must be passed to decode().
for name in myzipfile.namelist():
data = myzipfile.open(name).read().decode()
mycsv = io.StringIO(data)
reader = csv.DictReader(mycsv)
for row in reader:
print(row)
dict_list = [] # a list
reader = csv.DictReader(open('yourfile.csv', 'rb'))
for line in reader: # since we used DictReader, each line will be saved as a dictionary
dict_list.append(line)
I want to write a file. Based on the name of the file this may or may not be compressed with the gzip module. Here is my code:
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wb') as fd:
print('blah blah blah'.encode(), file=fd)
I'm opening the writable file in binary mode and encoding my string to be written. However I get the following error:
File "/usr/lib/python3.5/gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Why is my object not a bytes? I get the same error if I open the file with 'w' and skip the encoding step. I also get the same error if I remove the '.gz' from the filename.
I'm using Python3.5 on Ubuntu 16.04
For me, changing the gzip flag to 'wt' did the job. I could write the original string, without "byting" it.
(tested on python 3.5, 3.7 on ubuntu 16).
From python 3 gzip doc - quoting: "... The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt', or 'xt' for text mode..."
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wt') as fd:
print('blah blah blah', file=fd)
!zcat output.gz
> blah blah blah
you can convert it to bytes like this.
import gzip
with gzip.open(filename, 'wb') as fd:
fd.write('blah blah blah'.encode('utf-8'))
print is a relatively complex function. It writes str to a file but not the str that you pass, it writes the str that is the result of rendering the parameters.
If you have bytes already, you can use fd.write(bytes) directly and take care of adding a newline if you need it.
If you don't have bytes, make sure fd is opened to receive text.
You can serialize it using pickle.
First serializing the object to be written using pickle, then using gzip.
To save the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
# serialize the object
serialized_obj = pickle.dumps(object)
# writing zip file
with gzip.open(filename, 'wb') as f:
f.write(serialized_obj)
To load the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
with gzip.open(filename, 'rb') as f:
serialized_obj = f.read()
# de-serialize the object
object = pickle.loads(serialized_obj)
I have read the documentation and a few additional posts on SO and other various places, but I can't quite figure out this concept:
When you call csvFilename = gzip.open(filename, 'rb') then reader = csv.reader(open(csvFilename)), is that reader not a valid csv file?
I am trying to solve the problem outlined below, and am getting a coercing to Unicode: need string or buffer, GzipFile found error on line 41 and 7 (highlighted below), leading me to believe that the gzip.open and csv.reader do not work as I had previously thought.
Problem I am trying to solve
I am trying to take a results.csv.gz and convert it to a results.csv so that I can turn the results.csv into a python dictionary and then combine it with another python dictionary.
File 1:
alertFile = payload.get('results_file')
alertDataCSV = rh.dataToDict(alertFile) # LINE 41
alertDataTotal = rh.mergeTwoDicts(splunkParams, alertDataCSV)
Calls File 2:
import gzip
import csv
def dataToDict(filename):
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename)) # LINE 7
alertData={}
for row in reader:
alertData[row[0]]=row[1:]
return alertData
def mergeTwoDicts(dictA, dictB):
dictC = dictA.copy()
dictC.update(dictB)
return dictC
*edit: also forgive my non-PEP style of naming in Python
gzip.open returns a file-like object (same as what plain open returns), not the name of the decompressed file. Simply pass the result directly to csv.reader and it will work (the csv.reader will receive the decompressed lines). csv does expect text though, so on Python 3 you need to open it to read as text (on Python 2 'rb' is fine, the module doesn't deal with encodings, but then, neither does the csv module). Simply change:
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename))
to:
# Python 2
csvFile = gzip.open(filename, 'rb')
reader = csv.reader(csvFile) # No reopening involved
# Python 3
csvFile = gzip.open(filename, 'rt', newline='') # Open in text mode, not binary, no line ending translation
reader = csv.reader(csvFile) # No reopening involved
The following worked for me for python==3.7.9:
import gzip
my_filename = my_compressed_file.csv.gz
with gzip.open(my_filename, 'rt') as gz_file:
data = gz_file.read() # read decompressed data
with open(my_filename[:-3], 'wt') as out_file:
out_file.write(data) # write decompressed data
my_filename[:-3] is to get the actual filename so that it does get a random filename.
I want to write a file. Based on the name of the file this may or may not be compressed with the gzip module. Here is my code:
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wb') as fd:
print('blah blah blah'.encode(), file=fd)
I'm opening the writable file in binary mode and encoding my string to be written. However I get the following error:
File "/usr/lib/python3.5/gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Why is my object not a bytes? I get the same error if I open the file with 'w' and skip the encoding step. I also get the same error if I remove the '.gz' from the filename.
I'm using Python3.5 on Ubuntu 16.04
For me, changing the gzip flag to 'wt' did the job. I could write the original string, without "byting" it.
(tested on python 3.5, 3.7 on ubuntu 16).
From python 3 gzip doc - quoting: "... The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt', or 'xt' for text mode..."
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wt') as fd:
print('blah blah blah', file=fd)
!zcat output.gz
> blah blah blah
you can convert it to bytes like this.
import gzip
with gzip.open(filename, 'wb') as fd:
fd.write('blah blah blah'.encode('utf-8'))
print is a relatively complex function. It writes str to a file but not the str that you pass, it writes the str that is the result of rendering the parameters.
If you have bytes already, you can use fd.write(bytes) directly and take care of adding a newline if you need it.
If you don't have bytes, make sure fd is opened to receive text.
You can serialize it using pickle.
First serializing the object to be written using pickle, then using gzip.
To save the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
# serialize the object
serialized_obj = pickle.dumps(object)
# writing zip file
with gzip.open(filename, 'wb') as f:
f.write(serialized_obj)
To load the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
with gzip.open(filename, 'rb') as f:
serialized_obj = f.read()
# de-serialize the object
object = pickle.loads(serialized_obj)
I have a csv file with a few patterns. I only want to selectively load lines into the csv reader class of python. Currently, csv only takes a file object. Is there a way to get around this?
In other words, what I need is:
with open('filename') as f:
for line in f:
if condition(line):
record = csv.reader(line)
But, currently, csv class fails if it is given a line instead of a file object.
From the csv.reader docstring:
csvfile can be any object which supports the iterator protocol and returns a string each time its __next__() method is called
You can feed csv.reader with a generator iterator that yields only the selected rows.
with open('filename') as f:
lines = (line for line in f if condition(line))
for record in csv.reader(lines):
do_something()
To read file as stream you can use this.
io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True)
import shlex
lex = shlex.shlex('"sreeraag","100,ABC,XYZ",112',',', posix=True)
lex.whitespace += ','
lex.whitespace_split = True
print list(lex)
yields
['sreeraag', '100,ABC,XYZ', '112']
Found a solution: As csv expects object which supports __next__(), I'm using a StringIO class to convert string to StringIO object which in turn handles __next__() and returns one line everytime for csv reader class.
with open('filename') as f:
for line in f:
if condition(line):
record = csv.reader(StringIO.StringIO(line))
```
with open("xx.csv") as f:
csv = f.readlines()
print(csv[0])
```
→_→ Life is short,your need pandas
pip install pandas
```
import pandas as pd
df = pd.read_csv(filepath or url)
df.ix[0]
df.ix[1]
df.ix[1:3]
```