disable the automatic change from \r\n to \n in python - python

I am working under ubuntu on a python3.4 script where I take in parameter a file (encoded to UTF-8), generated under Windows. I have to go through the file line by line (separated by \r\n) knowing that the "lines" contain some '\n' that I want to keep.
My problem is that Python transforms the file's "\r\n" to "\n" when opening. I've tried to open with different modes ("r", "rt", "rU").
The only solution I found is to work in binary mode and not text mode, opening with the "rb" mode.
Is there a way to do it without working in binary mode or a proper way to do it?

Set the newline keyword argument to open() to '\r\n', or perhaps to the empty string:
with open(filename, 'r', encoding='utf-8', newline='\r\n') as f:
This tells Python to only split lines on the \r\n line terminator; \n is left untouched in the output. If you set it to '' instead, \n is also seen as a line terminator but \r\n is not translated to \n.
From the open() function documentation:
newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. [...] If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
Bold emphasis mine.

From Martijn Pieters the solution is:
with open(filename, "r", newline='\r\n') as f:
This answer was posted as an edit to the question disable the automatic change from \r\n to \n in python by the OP lu1her under CC BY-SA 3.0.

Related

'\n' == 'posix' , '\r\n' == 'nt' (python) is that correct?

I'm writing a python(2.7) script that writes a file and has to run on linux, windows and maybe osx.
Unfortunately for compatibility problems I have to use carriage return and line feed in windows style.
Is that ok if I assume:
str = someFunc.returnA_longText()
with open('file','w') as f:
if os.name == 'posix':
f.write(str.replace('\n','\r\n'))
elif os.name == 'nt'
f.write(str)
Do I have to considerate an else?
os.name has other alternatives ('posix', 'nt', 'os2', 'ce', 'java', 'riscos'). Should I use platform module instead?
Update 1:
The goal is to use '\r\n' in any OS.
I'm receiving the str from
str = etree.tostring(root, pretty_print=True,
xml_declaration=True, encoding='UTF-8')
I'm not reading a file.
3. My fault, I should probably check the os.linesep instead?
Python file objects can handle this for you. By default, writing to a text-mode file translates \n line endings to the platform-local, but you can override this behaviour.
See the newline option in the open() function documentation:
newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:
When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
(the above applies to Python 3, Python 2 has similar behaviour, with io.open() giving you the Python 3 I/O options if needed).
Set the newline option if you need to force what line-endings are written:
with open('file', 'w', newline='\r\n') as f:
In Python 2, you'd have to open the file in binary mode:
with open('file', 'wb') as f:
# write `\r\n` line separators, no translation takes place
or use io.open() and write Unicode text:
import io
with io.open('file', 'w', newline='\r\n', encoding='utf8') as f:
f.write(str.decode('utf8'))
(but pick appropriate encodings; it is always a good idea to explicitly specify the codec even in Python 3).
You can always use the os.linesep constant if your program needs to know the appropriate line separator for the current platform.

How to convert CRLF to LF on a Windows machine in Python

So I got those template, they are all ending in LF and I can fill some terms inside with format and still get LF files by opening with "wb".
Those templates are used in a deployment script on a windows machine to deploy on a unix server.
Problem is, a lot of people are going to mess with those template, and I'm 100% sure that some of them will put some CRLF inside.
How could I, using Python, convert all the CRLF to LF?
Convert line endings in-place (with Python 3)
Line endings:
Windows - \r\n, called CRLF
Linux/Unix/MacOS - \n, called LF
Windows to Linux/Unix/MacOS (CRLF ➡ LF)
Here is a short Python script for directly converting Windows line endings to Linux/Unix/MacOS line endings. The script works in-place, i.e., without creating an extra output file.
# replacement strings
WINDOWS_LINE_ENDING = b'\r\n'
UNIX_LINE_ENDING = b'\n'
# relative or absolute file path, e.g.:
file_path = r"c:\Users\Username\Desktop\file.txt"
with open(file_path, 'rb') as open_file:
content = open_file.read()
# Windows ➡ Unix
content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)
# Unix ➡ Windows
# content = content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING)
with open(file_path, 'wb') as open_file:
open_file.write(content)
Linux/Unix/MacOS to Windows (LF ➡ CRLF)
To change the converting from Linux/Unix/MacOS to Windows, simply comment the replacement for Unix ➡ Windows back in (remove the # in front of the line).
DO NOT comment out the command for the Windows ➡ Unix replacement, as it ensures a correct conversion. When converting from LF to CRLF, it is important that there are no CRLF line endings already present in the file. Otherwise, those lines would be converted to CRCRLF. Converting lines from CRLF to LF first and then doing the aspired conversion from LF to CRLF will avoid this issue (thanks #neuralmer for pointing that out).
Code Explanation
Binary Mode
Important: We need to make sure that we open the file both times in binary mode (mode='rb' and mode='wb') for the conversion to work.
When opening files in text mode (mode='r' or mode='w' without b), the platform's native line endings (\r\n on Windows and \r on old Mac OS versions) are automatically converted to Python's Unix-style line endings: \n. So the call to content.replace() couldn't find any \r\n line endings to replace.
In binary mode, no such conversion is done. Therefore the call to str.replace() can do its work.
Binary Strings
In Python 3, if not declared otherwise, strings are stored as Unicode (UTF-8). But we open our files in binary mode - therefore we need to add b in front of our replacement strings to tell Python to handle those strings as binary, too.
Raw Strings
On Windows the path separator is a backslash \ which we would need to escape in a normal Python string with \\. By adding r in front of the string we create a so called "raw string" which doesn't need any escaping. So you can directly copy/paste the path from Windows Explorer into your script.
(Hint: Inside Windows Explorer press CTRL+L to automatically select the path from the address bar.)
Alternative solution
We open the file twice to avoid the need of repositioning the file pointer. We could also have opened the file once with mode='rb+' but then we would have needed to move the pointer back to start after reading its content (open_file.seek(0)) and truncate its original content before writing the new one (open_file.truncate(0)).
Simply opening the file again in write mode does that automatically for us.
Cheers and happy programming,
winklerrr
Python 3:
The default newline type for open is universal, in which case it doesn't mind which sort of newline each line has.
You can also request a specific form of newline with the newline argument for open.
Translating from one form to the other is thus rather simple in Python:
with open('filename.in', 'r') as infile, \
open('filename.out', 'w', newline='\n') as outfile:
outfile.writelines(infile.readlines())
Python 2:
The open function supports universal newlines via the 'rU' mode.
Again, translating from one form to the other:
with open('filename.in', 'rU') as infile, \
open('filename.out', 'w', newline='\n') as outfile:
outfile.writelines(infile.readlines())
(In Python 3, mode U is actually deprecated; the equivalent form is newline=None, which is the default)
Why don't you try below:
str.replace('\r\n','\n');
CRLF => \r\n
LF => \n
It is possible to fix existing templates with messed-up ending with this code:
with open('file.tpl') as template:
lines = [line.replace('\r\n', '\n') for line in template]
with open('file.tpl', 'w') as template:
template.writelines(lines)

How to disable universal newlines in Python 2.7 when using open()

I have a csv file that contains two different newline terminators (\n and \r\n). I want my Python script to use \r\n as the newline terminator and NOT \n. But the problem is that Python's universal newlines feature keeps normalizing everything to be \n when I open the file using open().
The strange thing is that it never used to normalize my newlines when I wrote this script, that's why I used Python 2.7 and it worked fine. But all of a sudden today it started normalizing everything and my script no longer works as needed.
How can I disable universal newlines when opening a file using open() (without opening in binary mode)?
You need to open the file in binary mode, as stated in the module documentation:
with open(csvfilename, 'rb') as fileobj:
reader = csv.reader(fileobj)
From the csv.reader() documentation:
If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.
In binary mode no line separator translations take place.

Why is 'rblabla' a valid csv file mode?

I have a csv file and a function to read it.
I can open it in many ways, most of these modes produce similar results.
def read(mode):
with open("file.csv", mode) as inf:
reader = csv.reader(inf)
for row in reader:
print row
read('r') #prints \r\n characters
read('rb') #prints \r\n characters
read('rU') #prints \n characters but not \r characters
read('rblabla') #WAT.
I am wondering why the last example is allowed. It produces the same results as normal read mode.
Is there any reason why it works this way?
The mode is not for the csv reader, but for the python default file handler. Python only enforces mode to begin with 'r', 'w' or 'a', after stripping U. This is documented here, and is for python 2.5 and later.
The mode is an attribute of the file handler, and may be used by other applications, hence it may contain more characters.

Write a list to file containing text and hex values. How?

I need to write a list of values to a text file. Because of Windows, when I need to write a line feed character, windows does \n\r and other systems do \n.
It occurred to me that maybe I should write to file in binary.
How to I create a list like the following example and write to file in binary?
output = ['my first line', hex_character_for_line_feed_here, 'my_second_line']
How come the following does not work?
output = ['my first line', '\x0a', 'my second line']
Don't. Open the file in text mode and just let Python handle the newlines for you.
When you use the open() function you can set how Python should handle newlines with the newline keyword parameter:
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
So the default method is to write the correct line separator for your platform:
with open(outputfilename, 'w') as outputfile:
outputfile.write('\n'.join(output))
and does the right thing; on Windows \r\n characters are saved instead of \n.
If you specifically want to write \n only and not have Python translate these for you, use newline='':
with open(outputfilename, 'w', newline='') as outputfile:
outputfile.write('\n'.join(output))
Note that '\x0a' is exactly the same character as \n; \r is \x0d:
>>> '\x0a'
'\n'
>>> '\x0d'
'\r'
Create a text file, "myTextFile" in the same directory as your Python script. Then write something like:
# wb opens the file in "Write Binary" mode
myTextFile = open("myTextFile.txt", 'wb')
output = ['my first line', '369as3', 'my_second_line']
for member in output:
member.encode("utf-8") # Or whatever encoding you like =)
myTextFile.write(member + "\n")
This outputs a binary text file that looks like:
my first line
369as3
my_second_line
Edit: Updated for Python 3

Categories

Resources