CSV module - 'for row in reader' - between Mac and Windows - python

I developed some code on my Mac, using Wing IDE. The code i developed, which makes use of the csv module, is working, and does what I want it to on my Mac. The problem, though, is that the person I wrote it for needs to use it on Windows. I wasn't concerned about the code, as I'm not using any escape characters.
The code looks like this:
csvfile = open('file.csv', 'r')
csvreader = csv.reader(csvfile, delimiter = ',')
for row in csvreader:
thisVariable = row[0] # <<<--------
The 'arrow' I put in above is where the error is returned at, on the Windows machine. Like I said, the code works fine on the Mac and, actually, this is pretty far down in the code that I have written. There are other CSV files read from and written to above this statement, which use similar indexing.
I would really appreciate any ideas anybody might have regarding this issue! Thanks!

In Python 2
You need to open the file as a binary file:
csvfile = open('file.csv', 'rb')
csvreader = csv.reader(csvfile, delimiter = ',')
for row in csvreader:
thisVariable = row[0]
http://docs.python.org/2/library/csv.html#csv.reader
In Python 3
You need to set newline='' in your open statement:
csvfile = open('file.csv', 'r', newline='')
csvreader = csv.reader(csvfile, delimiter = ',')
for row in csvreader:
thisVariable = row[0]
http://docs.python.org/3.3/library/csv.html#csv.reader

I can see two potential issues. First, you should be opening the file in binary mode:
csvfile = open('file.csv', 'rb')
Second, you may be dealing with having two different end of lines for two different OS's. You can avoid this by adding U after the mode:
csvfile = open('file.csv', 'rbU')
I also suggest protecting your users from bad data by testing the row. This makes the end result:
csvfile = open('file.csv', 'rbU')
csvreader = csv.reader(csvfile, delimiter = ',')
for row in csvreader:
if not row:
continue
thisVariable = row[0]

From the docs csv.reader should be passed a file opened in binary mode.
I.e.:
csvfile = open('file.csv', 'rb')
Without seeing the input file that causes the problem I can't be sure this will fix the problem, but it is likely to cause other bugs.

Related

Some CSV cells are wrapped in "quotes" while others are not

I am a newbie to Python. I am not able to debug the code. Can someone please guide how to debug?
with open(inputFile, mode='rt') as f:
reader = csv.reader(f, delimiter=',', quotechar='"')
header = next(reader,None)
rows = sorted(reader, key=operator.itemgetter(1))
with open(outputFile, 'w') as final:
writer = csv.writer(final, delimiter=',')
writer.writerow(header)
for eachRow in rows:
writer.writerow(eachRow)
In some case the output is
"","xxx"
In other cases, I see
,xxx,
I tried for exception block faced some issue with indentation
When you instantiate csv.writer you can tell it what quoting behavior you want. Pass in quoting=csv.QUOTE_ALL to tell it to meticulously quote everything.
writer = csv.writer(final, delimiter=',', quoting=csv.QUOTE_ALL)
However, this is typically not necessary; any reasonable CSV implementation will allow and expect most fields to be unquoted. The only fields which really need to be quoted are the ones which contain literal double quotes or commas (or more generally speaking, literal instances of the column separator or the quoting character; there are common CSV dialects like TSV etc which use a different delimiter).

prevent EOL character from being changed when reading/writing from/to a csv file

I'm using the csv module in Python 3.8 to read and modify data in .csv files on macOS.
Python seems to change all of the EOL characters in my original .csv files.
This behaviour is not desired, because it makes it impossible for me to keep track of data changes.
All of the lines have '^M' appended to them, (which is '\r', a.k.a. the carriage return character).
The result is that, in Git, all of the lines are marked as changed.
When reading the original .csv file in binary mode, Python tells me that original EOL character is '\r\n'.
So I try to use this EOL character when writing to .csv:
def file_to_rows(path):
rows = []
with open(path) as csv_file:
row_reader = csv.reader(
csv_file,
delimiter=';',
quotechar='|',
quoting=csv.QUOTE_MINIMAL)
for row in row_reader:
rows.append(row)
return rows
def rows_to_file(rows, path):
with open(path, 'w', endline='\r\n') as csvfile:
rowswriter = csv.writer(
csvfile,
delimiter=';',
quotechar='|',
quoting=csv.QUOTE_MINIMAL)
for row in rows:
rowswriter.writerow(row)
# Running this function on a file should show NO changes in Git.
def csv_pass_through(path):
rows = file_to_rows(path)
rows_to_file(rows, path)
But git diff still shows me an '^M' has been added to all lines.
So it seems like Python is adding one carriage return character too many.
So, how does one read/write .csv data transparently (i.e. WITHOUT implicitly changing anything)?
martineau's comment was right. You can override the default '\r\n' in the writer constructor, like so:
def rows_to_file(rows, path):
with open(path, 'w') as csvfile:
rowswriter = csv.writer(
csvfile,
delimiter=';',
lineterminator='\n',
quotechar='|',
quoting=csv.QUOTE_MINIMAL)
for row in rows:
rowswriter.writerow(row)

Loop that will iterate a certain number of times through a CSV in Python

I have a large CSV file (~250000 rows) and before I work on fully parsing and sorting it I was trying to display only a part of it by writing it to a text file.
csvfile = open(file_path, "rb")
rows = csvfile.readlines()
text_file = open("output.txt", "w")
row_num = 0
while row_num < 20:
text_file.write(", ".join(row[row_num]))
row_num += 1
text_file.close()
I want to iterate through the CSV file and write only a small section of it to a text file so I can look at how it does this and see if it would be of any use to me. Currently the text file ends up empty.
A way I thought might do this would be to iterate through the file with a for loop that exits after a certain number of iteration but I could be wrong and I'm not sure how to do this, any ideas?
There's nothing specifically wrong with what you're doing, but it's not particularly Pythonic. In particular reading the whole file into memory with readlines() at the start seems pointless if you're only using 20 lines.
Instead you could use a for loop with enumerate and break when necessary.
csvfile = open(file_path, "rb")
text_file = open("output.txt", "w")
for i, row in enumerate(csvfile):
text_file.write(row)
if row_num >= 20:
break
text_file.close()
You could further improve this by using with blocks to open the files, rather than closing them explicitly. For example:
with open(file_path, "rb") as csvfile:
#your code here involving csvfile
#now the csvfile is closed!
Also note that Python might not be the best tool for this - you could do it directly from Bash, for example, with just head -n20 csvfile.csv > output.txt.
A simple solution would be to just do :
#!/usr/bin/python
# -*- encoding: utf-8 -*-
file_path = './test.csv'
with open(file_path, 'rb') as csvfile:
with open('output.txt', 'wb') as textfile:
for i, row in enumerate(csvfile):
textfile.write(row)
if i >= 20:
break
Explanation :
with open(file_path, 'rb') as csvfile:
with open('output.txt', 'wb') as textfile:
Instead of using open and close, it is recommended to use this line instead. Just write the lines that you want to execute when your file is opened into a new level of indentation.
'rb' and 'wb' are the keywords you need to open a file in respectively 'reading' and 'writing' in 'binary mode'
for i, row in enumerate(csvfile):
This line allows you to read line by line your CSV file, and using a tuple (i, row) gives you both the content of the row and its index. That's one of the awesome built-in functions from Python : check out here for more about it.
Hope this helps !
EDIT : Note that Python has a CSV package that can do that without enumerate :
# -*- encoding: utf-8 -*-
import csv
file_path = './test.csv'
with open(file_path, 'rb') as csvfile:
reader = csv.reader(csvfile)
with open('output.txt', 'wb') as textfile:
writer = csv.writer(textfile)
i = 0
while i<20:
row = next(reader)
writer.writerow(row)
i += 1
All we need to use is its reader and writer. They have functions next (that reads one line) and writerow (that writes one). Note that here, the variable row is not a string, but a list of strings, because the function does the split job by itself. It might be faster than the previous solution.
Also, this has the major advantage of allowing you to look anywhere you want in the file, no necessarily from the beginning (just change the bounds for i)

Excel disregards decimal separators when working with Python generated CSV file

I am currently trying to write a csv file in python. The format is as following:
1; 2.51; 12
123; 2.414; 142
EDIT: I already get the above format in my CSV, so the python code seems ok. It appears to be an excel issue which is olved by changing the settigs as #chucksmash mentioned.
However, when I try to open the generated csv file with excel, it doesn't recognize decimal separators. 2.414 is treated as 2414 in excel.
csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'wb')
writer = csv.writer(csvfile, delimiter=";")
writer.writerow(some_array_with_floats)
Did you check that the csv file is generated correctly as you want? Also, try to specify the delimeter character that your using for the csv file when you import/open your file. In this case, it is a semicolon.
For python 3, I think your above code will also run into a TypeError, which may be part of the problem.
I just made a modification with your open method to be 'w' instead of 'wb' since the array has float and not binary data. This seemed to generate the result that you were looking for.
csvfile = open('C:/Users/SUUSER/JRITraffic/Data/data.csv', 'w')
An ugly solution, if you really want to use ; as the separator:
import csv
import os
with open('a.csv', 'wb') as csvfile:
csvfile.write('sep=;'+ os.linesep) # new line
writer = csv.writer(csvfile, delimiter=";")
writer.writerow([1, 2.51, 12])
writer.writerow([123, 2.414, 142])
This will produce:
sep=;
1;2.51;12
123;2.414;142
which is recognized fine by Excel.
I personally would go with , as the separator in which case you do not need the first line, so you can basically:
import csv
with open('a.csv', 'wb') as csvfile:
writer = csv.writer(csvfile) # default delimiter is `,`
writer.writerow([1, 2.51, 12])
writer.writerow([123, 2.414, 142])
And excel will recognize what is going on.
A way to do this is to specify dialect=csv.excel in the writer. For example:
a = [[1, 2.51, 12],[123, 2.414, 142]]
csvfile = open('data.csv', 'wb')
writer = csv.writer(csvfile, delimiter=";", dialect=csv.excel)
writer.writerows(a)
csvfile.close()
Unless Excel is already configured to use semicolon as its default delimiter, it will be necessary to import data.csv using Data/FromText and specify semicolon as the delimiter in the Text Import Wizard step 2 screen.
Very little documentation is provided for the Dialect class at csv.Dialect. More information about it is at Dialects in the PyMOTW's "csv – Comma-separated value files" article on the Python csv module. More information about csv.writer() is available at https://docs.python.org/2/library/csv.html#csv.writer.

Python csv: Optimizing header output

I am writing a huge (160k+) set of data from SQL to a CSV. My script functions exactly as intended, but I am sure there has to be a more efficient way of including a header in the output. I cobbled together the following from reading writing header in csv python with DictWriter but feel like it lacks elegance.
Here's my code:
f = open(outfile,'w')
wf = csv.DictWriter(f, fieldnames, restval='OOPS')
wf.writer.writerow(wf.fieldnames)
f.close()
f = open(outfile,'a')
wf = csv.writer(f)
wf.writerows(rows)
f.close()
fieldnames is defined explicitly (10 custom column names), rows contains the fetchall() from my query.
Untested, but I don't see why this shouldn't do the job:
import csv
with open(outfile, "wb") as outf:
outcsv = csv.writer(outf)
outcsv.writerow(fieldnames)
outcsv.writerows(rows)

Categories

Resources