Can't interact with an object during a second pass - python

I've run into a strange problem while writing some code for my personal use. I'll let my code do the talking...
def getValues(self, reader):
for row in reader:
#does stuff
return assetName, efficiencyRating
def handleSave(self, assetName, reader):
outputFile = open(self.outFilename, 'w')
for row in reader:
#does other stuff
outputFile.close()
return
def handleCalc(self):
reader = csv.reader(open(self.filename), delimiter = ',', quotechar = '\"')
assetName, efficiencyRating = self.getValues(reader)
self.handleSave(assetName, reader)
This is just a portion of the code (obviously). The problem I'm having is in handleSave trying to loop through reader. It doesn't appear to ever enter the loop? I'm really not sure what is happening. The loop in getValues behaves as expected.
Can someone explain what is happening? What have I done wrong? What should I do to fix this?

Once you've iterated through an iterator once, you can't iterate through it again.
One way you can solve this is before you call handleSave, rewind the file and create a new reader:
f = open(self.filename)
reader = csv.reader(f, delimiter = ',', quotechar = '"')
assetName, efficiencyRating = self.getValues(reader)
f.seek(0) # rewind file
reader = csv.reader(f, delimiter = ',', quotechar = '"')
self.handleSave(assetName, reader)
Alternatively, you can read the data into a list:
rows = list(reader)
And then iterate through rows rather than reader.
As a side note, the convention in Python is for names to be lowercase, separated by underscores, rather than camel case. (e.g. get_values rather than getValues, handle_save rather than handleSave)

The reader method of csv module, acts on the sequences and as you have iterated over it once in your getValues method, the sequence is already exhaused. Unfortunately, I don't see any better method than passing the sequence again.
Move the csv.reader into your methods and send
open(self.filename), delimiter = ',', quotechar = '\"')
Or create a file object each time or seek(0) to reset and send that as the argument to the object which will be handled by the reader method. That should help.

Related

Replace newlines in the middle of .csv columns

I have a CSV file which can look like this:
Col1,"Col2","Col31
Col32
Col33"
That means if I use this code:
with open(inpPath, "r+") as csvFile:
wb_obj = list(csv.reader(csvFile, delimiter=','))
for row in wb_obj:
print(row)
The output looks like this:
[Col1,"Col2","Col31\nCol32\nCol33"]
So I am trying to replace the \n characters with spaces so the CSV file would be rewritten like this: Col1,"Col2","Col31 Col32 Col33"
I have written this short function but it results in Error: Cannot open mapping csv, Exception thrown I/O operation on closed file.
def processCSV(fileName):
with open(fileName, "rU") as csvFile:
filtered = (line.replace('\n', ' ') for line in csvFile)
wb_obj = csv.reader(filtered, delimiter=",")
return wb_obj
How could I fix that? Thank you very much for any help
Your processCSV function returns an iterable based on the file object csvFile, and yet when the iterable gets consumed by the caller of the processCSV function, the file object csvFile is already closed by the context manager for being outside the with statement, hence the said error. The common pattern for such a function is to make it accept a file object as a parameter, and let the caller open the file instead to pass the file object to the function so that the file can remain open for the caller.
You also should not replace all newlines with spaces to begin with, since you really only want to replace newlines if they are within double quotes, which would be parsed by csv.reader as part of a column value rather than row separators. Instead, you should let csv.reader do its parsing first, and then replace newline characters with spaces in all column values:
def processCSV(file):
return ([col.replace('\n', ' ') for col in row] for row in csv.reader(file))
# the caller
with open(filename) as file:
for row in processCSV(file):
print(*row, sep=',')
This comes down to using a generator expression inside of the file context. See a longer explanation here: https://stackoverflow.com/a/39656712/15981783.
You can change the generator expression to a list comprehension and it will work:
import csv
def processCSV(fileName):
with open(fileName, "r") as csvFile:
# list comprehension used here instead
filtered = [line.replace('\n', ' ') for line in csvFile]
wb_obj = csv.reader(filtered, delimiter=",")
return wb_obj
print(list(processCSV("tmp.csv")))
Returns:
[['Col1', 'Col2', 'Col31 Col32 Col33']]

how do I remove commas within columns from data retrieved from a CSV file

I have several CSV files that I need to process. Within the columns of each, there might be commas in the fields. Strings might also be sitting within double quotes. I got it right to come up with something, but I am working with CSV files that are sometimes between 200 - 400 MB. Processing them with my current code lets a 11MB file take 4 minutes to be processed.
What can I do here to have it run faster or maybe to process the entire data all at once instead of running through the code field by field ?
import csv
def rem_lrspaces(data):
data = data.lstrip()
data = data.rstrip()
data = data.strip()
return data
def strip_bs(data):
data = data.replace(",", " ")
return data
def rem_comma(tmp1,tmp2):
with open(tmp2, "w") as f:
f.write("")
f.close()
file=open(tmp1, "r")
reader = csv.reader(file,quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL, skipinitialspace=True)
for line in reader:
for field in line:
if "," in field :
field=rem_lrspaces(strip_bs(field))
with open(tmp2, "a") as myfile:
myfile.write(field+",")
with open(tmp2, "a") as myfile:
myfile.write("\n")
pdfsource=r"C:\automation\cutoff\test2"
csvsource=pdfsource
ofn = "T3296N17"
file_in = r"C:\automation\cutoff\test2"+chr(92)+ofn+".CSV"
file_out = r"C:\automation\cutoff\test2"+chr(92)+ofn+".TSV"
rem_comma(file_in,file_out)
A few low-hanging fruit:
strip_bs is too simple to justify the overhead of calling the function.
rem_lrspaces is redundantly stripping whitespace; one call to data.strip() is all you need, in which case it too is too simple to justify a separate function.
You are also spending a lot of time repeatedly opening the output file.
Also, it's better to pass already-open file handles to rem_comma, as it makes testing easier by allowing in-memory file-like objects to be passed as arguments.
This code simply builds a new list of fields from each line, then uses csv.writer to write the new fields back to the output file.
import csv
def rem_comma(f_in, f_out):
reader = csv.reader(f_in, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True)
writer = csv.writer(f_out)
for line in reader:
new_line = [field.replace(",", " ").strip() for field in line]
writer.write_row(new_line)
ofn = "T3296N17"
file_in = r"C:\automation\cutoff\test2"+chr(92)+ofn+".CSV"
file_out = r"C:\automation\cutoff\test2"+chr(92)+ofn+".TSV"
with open(file_in) as f1, open(file_out) as f2:
rem_comma(f1, f2)

Read CSV with comma as linebreak

I have a file saved as .csv
"400":0.1,"401":0.2,"402":0.3
Ultimately I want to save the data in a proper format in a csv file for further processing. The problem is that there are no line breaks in the file.
pathname = r"C:\pathtofile\file.csv"
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
print(reader)
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file)
csv_writer.writerow(reader)
The print reader output looks exactly how I want (or at least it's a format I can further process).
"400":0.1
"401":0.2
"402":0.3
And now I want to save that to a new csv file. However the output looks like
"""",4,0,0,"""",:,0,.,1,"
","""",4,0,1,"""",:,0,.,2,"
","""",4,0,2,"""",:,0,.,3
I'm sure it would be intelligent to convert the format to
400,0.1
401,0.2
402,0.3
at this stage instead of doing later with another script.
The main problem is that my current code
with open(pathname, newline='') as file:
reader = file.read().replace(',', '\n')
reader = csv.reader(reader,delimiter=':')
x = []
y = []
print(reader)
for row in reader:
x.append( float(row[0]) )
y.append( float(row[1]) )
print(x)
print(y)
works fine for the type of csv files I currently have, but doesn't work for these mentioned above:
y.append( float(row[1]) )
IndexError: list index out of range
So I'm trying to find a way to work with them too. I think I'm missing something obvious as I imagine that it can't be too hard to properly define the linebreak character and delimiter of a file.
with open(pathname, newline=',') as file:
yields
ValueError: illegal newline value: ,
The right way with csv module, without replacing and casting to float:
import csv
with open('file.csv', 'r') as f, open('filenew.csv', 'w', newline='') as out:
reader = csv.reader(f)
writer = csv.writer(out, quotechar=None)
for r in reader:
for i in r:
writer.writerow(i.split(':'))
The resulting filenew.csv contents (according to your "intelligent" condition):
400,0.1
401,0.2
402,0.3
Nuances:
csv.reader and csv.writer objects treat comma , as default delimiter (no need to file.read().replace(',', '\n'))
quotechar=None is specified for csv.writer object to eliminate double quotes around the values being saved
You need to split the values to form a list to represent a row. Presently the code is splitting the string into individual characters to represent the row.
pathname = r"C:\pathtofile\file.csv"
with open(pathname) as old_file:
with open(r"C:\pathtofile\filenew.csv", 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter=',')
text_rows = old_file.read().split(",")
for row in text_rows:
items = row.split(":")
csv_writer.writerow([int(items[0]), items[1])
If you look at the documentation, for write_row, it says:
Write the row parameter to the writer’s file
object, formatted according to the current dialect.
But, you are writing an entire string in your code
csv_writer.writerow(reader)
because reader is a string at this point.
Now, the format you want to use in your CSV file is not clearly mentioned in the question. But as you said, if you can do some preprocessing to create a list of lists and pass each sublist to writerow(), you should be able to produce the required file format.

python argument 1 must have a write method

Hi Im trying to save a modified csv file thats been read. See code.
import csv
with open("Bang.csv", 'rt') as f:
data = f.read()
new_data = data.replace('"', '')
for row in csv.reader(new_data.splitlines(),
delimiter=' ',
skipinitialspace=True):
pa = (','.join(row))
wr = csv.writer("pa", delimiter=',')
wr.writerow("pa")
I can print Data and pa but when I run I get the above mentioned error. What am I missing. Thanks
As mentioned in the manual, the first parameter of csv.writer must be a file-like object.
Suppose you want to write into the stdout (print on the screen), you can modify you code like this:
#pa = (','.join(row)) # you don't need to join row manually
wr = csv.writer(sys.stdout, delimiter=',')
wr.writerow(row)
I really don't know, but I think that the first argument passed to csv.writer( ) function should be a filehandler instead a string variable.

How to read csv on python with newline separator #

I need to read a csv on Python, and the text file that I have has this structure:
"114555","CM13","0004","0","C/U"#"99172","CM13","0001","0","C/U"#"178672","CM13","0001","0","C/U"
delimeter: ,
newline: #
My code so far:
import csv
data = []
with open('stock.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',', lineterminator='#')
for row in reader:
data.append({'MATERIAL': row[0],'CENTRO': row[1], 'ALMACEN': row[2], 'STOCK_VALORIZADO' : row[3], 'STOCK_UMB':row[4]})
print(data) #this print just one row
This code only print one row, because it's not recognize # as a newline,
and prints it with quotes:
[{'MATERIAL': '114555', 'CENTRO': 'CM13', 'ALMACEN': '0004', 'STOCK_VALORIZADO': '0', 'STOCK_UMB': 'C/U#"99172"'}]
According to https://docs.python.org/2/library/csv.html :
"The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future." Hence for now, providing the argument lineterminator='#' will not work.
I think the best option is to read your entire file into a variable, and replace all '#' characters, you can do this as follows:
with open("stock.csv", "r") as myfile:
data = myfile.read().replace('#', '\n')
Now you need to adjust your algorithm in such a way that you can pass the variable data to csv.reader (instead of the file stock.csv), according to the python doc:
"The "iterable" argument can be any object that returns a line
of input for each iteration, such as a file object or a list. [...]"
Hence you can pass data.splitlines() to csv.reader.
I was struggling with CRLF ('\r\n') line endings using csv.reader. I was able to get it working using the newline parameter in open
with open(local_file, 'r', newline='\r\n') as f:
reader = csv.reader(f)

Categories

Resources