This question already has answers here:
Iterating on a file doesn't work the second time [duplicate]
(4 answers)
Closed 1 year ago.
I was refactoring some code for my program and I have a mistake somewhere in the process. I am reading and writing .csv files.
In the beginning of my program I iterate through a .csv file in order to find which data from the file I need.
with open(csvPath, mode='r') as inputFile:
csvReader = csv.reader(inputFile)
potentialVals = []
paramVals = {}
for row in csvReader:
if row[3] == "Parameter":
continue
# Increment vales in dict
if row[3] not in paramVals:
paramVals[row[3]] = 1
else:
paramVals[row[3]] += 1
This iterates and works fine, the for loop gets me every row in the .csv file. I them perform some calculations and go to iterate through the same .csv file again later, and then select data to write to a new .csv file. My problem is here, when I go to iterate through a second time, it only gives me the first row of the .csv file, and nothing else.
# Write all of the information to our new csv file
with open(outputPath, mode='w') as outputFile:
csvWriter = csv.writer(outputFile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
inputFile.seek(0)
rowNum = 0
for row in csvReader:
print(row)
Where the print statement is, it only prints the first line of the .csv file, and then exits the for loop. I'm not really sure what is causing this. I thought it might have been the
inputFile.seek(0)
But even if I opened a 2nd reader, the problem persisted. This for loop was working before I refactored it, all the other code is the same except the for loop I'm having trouble with, here is what it used to look like:
Edit: So I thought maybe it was a variable instance error, so I tried renaming my variables instead of reusing them and the issue persisted. Going to try a new file instance now,
Edit 2: Okay so this is interesting, when I look at the line_num value for my reader object (when I open a new one instead of using .seek) it does output 1, so I am at the beginning of my file. And when I look at the len(list(csvReader)) it is 229703, which shows that the .csv is fully there, so still not sure why it won't do anything besides the first row of the .csv
Edit 3: Just as a hail mary attempt, I tried creating a deep copy of the .csv file and iterating through that, but same results. I also tried just doing an entire separate .csv file and I also got the same issue of only getting 1 row. I guess that eliminates that it's a file issue, the information is there but there is something preventing it from reading it.
Edit 4: Here is where I'm currently at with the same issue. I might just have to rewrite this method completely haha but I'm going to lunch so I won't be able to actively respond now. Thank you for the help so far though!
# TODO: BUG HERE
with open(csvPath, mode='r') as inputFile2:
csvReader2 = csv.reader(inputFile2)
...
for row2 in csvReader2:
print("CSV Line Num: " + str(csvReader2.line_num))
print("CSV Index: " + str(rowNum))
print("CSV Length: " + str(len(list(csvReader2))))
print("CSV Row: " + str(row2))
Also incase it helps, here is csvPath:
nameOfInput = input("Please enter the file you'd like to convert: ")
csvPath = os.path.dirname(os.path.realpath(nameOfInput))
csvPath = os.path.join(csvPath, nameOfInput)
If you read the documentation carefully, it says csv reader is just a parser and all the heavy lifting is done by the underlying file object.
In your case, you are trying to read from a closed file in the second iteration and that is why it isn't working.
For csv reader to work you'll need an underlying object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable.
Link to the documentation: https://docs.python.org/3/library/csv.html
Related
So I have a program that reads a CSV file and performs some computations with the data before outputting it to a separate file. It was working fine yesterday, but when I came back it the program terminates without ever calling the for loop that iterates through each line of the CSV. No error is given. Does anyone know the reason for this?
Below is the function.
def my_map(my_input_stream, my_output_stream, my_mapper_input_parameters):
#files = glob.glob(my_input_stream)
#f = open(my_input_stream,"w")
out = codecs.open(my_output_stream,"w")
with open(my_input_stream) as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
stop_station = line[7]
start_station = line[3]
print(start_station)
out.write(f"{start_station}\t(1,0)\n")
print("Writing to file..")
out.write(f"{stop_station}\t(0,1)\n")
out.close()
Everything works fine apart from the for loop.
One of the lines you commented out is
f = open(my_input_stream,"w")
If you ever ran this, then my_input_stream would have been opened in "write" mode. This may have overwritten the data in the file and left you with a blank file, which would explain why the for loop is "skipped" - there's no data to iterate over.
Can you verify that my_input_stream still has actual data in it?
I have the following CSV file:
id;name;duration;predecessors;
10;A;7;;
20;B;10;10;
25;B2;3;10;
30;C;5;10;
40;D;5;20,30, 25;
That is, the last row, in the fourth column I have three elements (20,30,25) separated by comma.
I have the following code:
csv_file = open(path_to_csv, 'r')
csv_file_reader = csv.reader(csv_file, delimiter=',')
first_row = True
for row in csv_file_reader :
if not first_row:
print(row)
else :
first_row = False
but I get a weird output:
['10;A;7;;']
['20;B;10;10;']
['25;B2;3;10;']
['30;C;5;10;']
['40;D;5;20', '30', ' 25;']
Any ideas?
Thanks in advance
You have specified CSV in your description, which stands for Comma Separated Values. However, your data uses semicolons.
Consider specifying the delimiter as ; for the CSV library:
with open(path_to_csv, 'r') as csv_file:
csv_file_reader = csv.reader(csv_file, delimiter=';')
...
And while we're here, note the change to using the with statement to open the file. The with statement allows you to open the file in a language-robust manner. No matter what happens (exception, quit, etc.), Python guarantees that the file will be closed and all resources accounted for. You don't need to close the file, just exit the block (unindent). It's "Pythonic" and a good habit to get into.
✓ #Antonio, I appreciate the above answer. As we know CSV is a file with comma separated values and Python's csv module works based on this, by default.
✓ No problem, you can still read from it without using csv module.
✓ Based on your provided input in problem I have written another simple solution without using any Python module to read CSVs (it's ok for simple tasks).
Please read, try and comment if you are not satisfied with the code or if it fails for some of your test cases.I will modify and make it workable.
» Data.csv
id;name;duration;predecessors;
10;A;7;;
20;B;10;10;
25;B2;3;10;
30;C;5;10;
40;D;5;20,30, 25;
Now, have a look at the below code (that finds and prints all the lines with 4th column having more than one elements):
with open ("Data.csv") as csv_file:
for line in csv_file.readlines()[1:]:
arr = line.strip().split(";")
if len(arr[3].split(",") )> 1:
print(line) # 40;D;5;20,30, 25;
I am a beginner of Python. I am trying now figuring out why the second 'for' loop doesn't work in the following script. I mean that I could only get the result of the first 'for' loop, but nothing from the second one. I copied and pasted my script and the data csv in the below.
It will be helpful if you tell me why it goes in this way and how to make the second 'for' loop work as well.
My SCRIPT:
import csv
file = "data.csv"
fh = open(file, 'rb')
read = csv.DictReader(fh)
for e in read:
print(e['a'])
for e in read:
print(e['b'])
"data.csv":
a,b,c
tree,bough,trunk
animal,leg,trunk
fish,fin,body
The csv reader is an iterator over the file. Once you go through it once, you read to the end of the file, so there is no more to read. If you need to go through it again, you can seek to the beginning of the file:
fh.seek(0)
This will reset the file to the beginning so you can read it again. Depending on the code, it may also be necessary to skip the field name header:
next(fh)
This is necessary for your code, since the DictReader consumed that line the first time around to determine the field names, and it's not going to do that again. It may not be necessary for other uses of csv.
If the file isn't too big and you need to do several things with the data, you could also just read the whole thing into a list:
data = list(read)
Then you can do what you want with data.
I have created small piece of function which doe take path of csv file read and return list of dict at once then you loop through list very easily,
def read_csv_data(path):
"""
Reads CSV from given path and Return list of dict with Mapping
"""
data = csv.reader(open(path))
# Read the column names from the first line of the file
fields = data.next()
data_lines = []
for row in data:
items = dict(zip(fields, row))
data_lines.append(items)
return data_lines
Regards
print("writing text to file")
prompt = '>'
data = [input(prompt) for i in range(3)]
with open('textfile.txt', 'w') as testfile:
testfile.write("\n".join(data))
with open('textfile.txt', 'r') as testfile:
print (testfile.read())
data = [line.strip('\n') for line in testfile]
data2 = testfile.readlines()
print(data)
print(data2)
After learning how to read and write from text files I have been trying to use
for line in textfile
But to no avail. In my above code both data and data2 print as empty arrays which makes me think I am doing something really wrong. Before I could get testfile.readlines() to work but I was never able to use a for loop. For some reason it wouldn't even enter the loop (even if I do a standard for loop outside of list comprehension).
Does anyone have any ideas what I am doing incorrectly? I could not find anyone else who has this problem.
When you called
print (testfile.read())
That put the file pointer to the end of the file. You need to bring it back to the beginning again by calling
testfile.seek(0)
After that, so that when the next file reading method is called it will be able to read the file from the beginning again. Likewise, after that list comprehension assignment to data you will need to do the same so that data2 can be populated.
The first thing you do is print(testfile.read()) which reads the entire contents of the file. After that any read is going to fail. You need to seek back to the beginning of the file:
testfile.seek(0)
This is similar or identical to csv writer not closing file but I'm not 100% sure why my behaviour is different.
def LoadCSV:
with open('test.csv', 'r') as csvfile:
targetReader = csv.reader(csvfile, delimiter=',')
for row in targetReader:
...
then finally in the function
csvfile.close()
This opens the test.csv file in the same direction as the script. Desired behaviour is for when the script has done what it's doing to the rows in the function, it renames the sheet to test.[timestamp] to archive it and watches the directory for a new sheet to arrive.
Later down the code;
os.rename('test.csv', "test." + time.strftime("%x") )
Gives an error that the file can't be renamed because a process is still using it. How do I close this file once I'm done? csvfile.close() doesn't raise an exception, and if I step through my code in interactive mode I can see that csvfile is a "closed file object." What even is that? Surely an open file is an object but a closed one isn't, how do I make my code forget this even exists so I can then do IO on the file?
NOT FOR POINTS.
Code is not valid anyway, since your function name is wrong. If that was not intentional, better edit it or to produce a pseudo-replica of your code, rather than have us guess what the issue is.
To iterate, the issues with your code:
def LoadCSV is not valid. def LoadCSV() is. Proof in following screenshot. Notice how the lack of () is showing syntax error markers.
Fixing (1) above, your next problem is using csvfile.close(). If the code is properly written, once the code is out of the scope of with, the file is closed automatically. Even if the renaming part of the code is inside the function, it shouldn't pose any problems.
Final word of warning -- using the format string %x will produce date-strings like 08/25/14, depending on locale. Obviously, this is erroneous, as a / is invalid in filenames in Windows (try renaming a file manually with this). Better to be very explicit and just use %m%d%y instead.
Finally, see the running code on my end. If your code is not structured like this, then other errors we cannot guess might arise.
Result as follows after running:
Code for reference:
import csv
import os
import time
def LoadCSV():
with open("test.csv", "r") as csvfile:
targetReader = csv.reader(csvfile, delimiter=",")
for row in targetReader:
print row
new_name = "test.%s.csv" % time.strftime("%m%d%y")
print new_name
os.rename("test.csv", new_name)
LoadCSV()
Note that on my end, there is nothing that watches my file. Antivirus is on, and no multithreading obviously is enabled. Check if one of your other scripts concurrently watches this file for changes. It's better if instead of watching the file, the file is sent as an argument post-renaming to this other function instead, as this might be the reason why it's "being used". On the one hand, and this is untested on my side, possibly better to copy the file with a new name rather than rename it.
Hope this helps.
When you are using a with block you do not need to close the file, it should be released outside the scope. If you want python to "forget" the entire filehandle you could delete it with del csvfile. But since you are using with you should not delete the variable inside the scope.
Try without the with scope instead:
csvfile = open('test.csv','r')
targetReader = csv.reader(csvfile, delimiter=',')
for row in targetReader:
....
csvfile.close()
del targetReader
os.rename('test.csv','test.'+time.strftime('%x'))
It might be the csv reader that still access the file when you are using a with block.