trouble with csv reader in python 2 - python

I am having trouble getting python 2 to loop through a .csv file. The code bellow is throwing the error:
>>> import csv
>>> with open('test.csv', 'rb') as f:
... reader = csv.reader(f)
... for row in reader:
... print row
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
The python 3 version of this works fine but I need this to run for 2. Any ideas what I am doing wrong?

You need to open using open('test.csv', 'rU')
universal newlines
relevant info from the docs here:
A manner of interpreting text streams in which all of the following are recognized as ending a line: the Unix end-of-line convention '\n', the Windows convention '\r\n', and the old Macintosh convention '\r'. See PEP 278 and PEP 3116, as well as str.splitlines() for an additional use
and here
In addition to the standard fopen() values mode may be 'U' or 'rU'. Python is usually built with universal newlines support; supplying 'U' opens the file as a text file, but lines may be terminated by any of the following: the Unix end-of-line convention '\n', the Macintosh convention '\r', or the Windows convention '\r\n'. All of these external representations are seen as '\n' by the Python program. If Python is built without universal newlines support a mode with 'U' is the same as normal text mode. Note that file objects so opened also have an attribute called newlines which has a value of None (if no newlines have yet been seen), '\n', '\r', '\r\n', or a tuple containing all the newline types seen.

Related

python 3 replaces line break \r with \n when writing to a file

I need to write to a file some text while contains a mix of line breaks \r and \n, and I want to keep both. However, in python 3 when I write this text to the file, all instances of \r are replaced with \n. This behavior is different from python 2, as you can see in the output below. What can I do to stop this replacement?
Here is the code:
import string
printable=string.printable
print([printable])
fopen=open("test.txt","w")
fopen.write(printable)
fopen.close()
fopen=open("test.txt","r")
content=fopen.read()
print([content])
fopen.close()
and here is the output, when I run the code on python 2 and python 3:
(base) Husseins-Air:Documents hmghaly$ python2.7 test_write_line_break.py
['0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~ \t\n\r\x0b\x0c']
['0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~ \t\n\r\x0b\x0c']
(base) Husseins-Air:Documents hmghaly$ python test_write_line_break.py
['0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~ \t\n\r\x0b\x0c']
['0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~ \t\n\n\x0b\x0c']
The issue is a python feature known as 'univeral newlines' that translates any types of line breaks in the source file to whatever the line seperator is in the output. You can disable this behavior with newline='' in the open function:
fopen=open("test.txt", "w", newline='')
...
fopen=open("test.txt", "r", newline='')

Get csv file line terminator

In a python script, I need to detect the endline terminator of different csv files. These endline terminators could be: '\r' (mac), '\r\n' (windows), '\n' (unix).
I tried with:
dialecto = csv.Sniffer().sniff(csvfile.read(2048), delimiters=",;")
dialecto.lineterminator
But it doesn't work.
How I could do that?
EDIT:
Based on abarnert response:
def getLineterminator(file):
with open(file, 'rU') as csvfile:
csvfile.next()
return csvfile.newlines
You can't use the csv module to auto-detect line terminators this way. The Sniffer that you're using is designed to guess between CSV dialects for use by csv.Reader. But, as the docs say, csv.Reader actually ignores lineterminator and handles line endings interchangeably, so Sniffer doesn't have any reason to set it.
But really, a CSV file with a XXX line terminators is just a text file with XXX line terminators. The fact that it's CSV is irrelevant. Just open the file in text mode, read a line out of it, and check its newlines property:
next(file)
file.newlines
In Python 3, as long as you opened the file in text mode (don't use a 'b' in the mode), this will work. In Python 2.x, you may need to specify universal newlines mode (don't use a 'b', and also do use a 'U'). If you're writing code for both versions, you can use universal newlines mode, and it'll just be ignored in 3.x—but don't do that unless you need it, since it's deprecated as of 3.6 and may become an error one day.

'\n' == 'posix' , '\r\n' == 'nt' (python) is that correct?

I'm writing a python(2.7) script that writes a file and has to run on linux, windows and maybe osx.
Unfortunately for compatibility problems I have to use carriage return and line feed in windows style.
Is that ok if I assume:
str = someFunc.returnA_longText()
with open('file','w') as f:
if os.name == 'posix':
f.write(str.replace('\n','\r\n'))
elif os.name == 'nt'
f.write(str)
Do I have to considerate an else?
os.name has other alternatives ('posix', 'nt', 'os2', 'ce', 'java', 'riscos'). Should I use platform module instead?
Update 1:
The goal is to use '\r\n' in any OS.
I'm receiving the str from
str = etree.tostring(root, pretty_print=True,
xml_declaration=True, encoding='UTF-8')
I'm not reading a file.
3. My fault, I should probably check the os.linesep instead?
Python file objects can handle this for you. By default, writing to a text-mode file translates \n line endings to the platform-local, but you can override this behaviour.
See the newline option in the open() function documentation:
newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:
When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
(the above applies to Python 3, Python 2 has similar behaviour, with io.open() giving you the Python 3 I/O options if needed).
Set the newline option if you need to force what line-endings are written:
with open('file', 'w', newline='\r\n') as f:
In Python 2, you'd have to open the file in binary mode:
with open('file', 'wb') as f:
# write `\r\n` line separators, no translation takes place
or use io.open() and write Unicode text:
import io
with io.open('file', 'w', newline='\r\n', encoding='utf8') as f:
f.write(str.decode('utf8'))
(but pick appropriate encodings; it is always a good idea to explicitly specify the codec even in Python 3).
You can always use the os.linesep constant if your program needs to know the appropriate line separator for the current platform.

disable the automatic change from \r\n to \n in python

I am working under ubuntu on a python3.4 script where I take in parameter a file (encoded to UTF-8), generated under Windows. I have to go through the file line by line (separated by \r\n) knowing that the "lines" contain some '\n' that I want to keep.
My problem is that Python transforms the file's "\r\n" to "\n" when opening. I've tried to open with different modes ("r", "rt", "rU").
The only solution I found is to work in binary mode and not text mode, opening with the "rb" mode.
Is there a way to do it without working in binary mode or a proper way to do it?
Set the newline keyword argument to open() to '\r\n', or perhaps to the empty string:
with open(filename, 'r', encoding='utf-8', newline='\r\n') as f:
This tells Python to only split lines on the \r\n line terminator; \n is left untouched in the output. If you set it to '' instead, \n is also seen as a line terminator but \r\n is not translated to \n.
From the open() function documentation:
newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. [...] If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
Bold emphasis mine.
From Martijn Pieters the solution is:
with open(filename, "r", newline='\r\n') as f:
This answer was posted as an edit to the question disable the automatic change from \r\n to \n in python by the OP lu1her under CC BY-SA 3.0.

How to disable universal newlines in Python 2.7 when using open()

I have a csv file that contains two different newline terminators (\n and \r\n). I want my Python script to use \r\n as the newline terminator and NOT \n. But the problem is that Python's universal newlines feature keeps normalizing everything to be \n when I open the file using open().
The strange thing is that it never used to normalize my newlines when I wrote this script, that's why I used Python 2.7 and it worked fine. But all of a sudden today it started normalizing everything and my script no longer works as needed.
How can I disable universal newlines when opening a file using open() (without opening in binary mode)?
You need to open the file in binary mode, as stated in the module documentation:
with open(csvfilename, 'rb') as fileobj:
reader = csv.reader(fileobj)
From the csv.reader() documentation:
If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.
In binary mode no line separator translations take place.

Categories

Resources