Python's os module contains a value for a platform specific line separating string, but the docs explicitly say not to use it when writing to a file:
Do not use os.linesep as a line terminator when writing files opened in text mode (the default); use a single '\n' instead, on all platforms.
Docs
Previous questions have explored why you shouldn't use it in this context, but then what context is it useful for? When should you use the line separator, and for what?
the docs explicitly say not to use it when writing to a file
Not exactly. The doc says not to use it in text mode.
The os.linesep is used when you want to iterate through the lines of a text file. The internal scanner recognises the os.linesep and replaces it by a single \n.
For illustration, we write a binary file which contains 3 lines separated by \r\n (Windows delimiter):
import io
filename = "text.txt"
content = b'line1\r\nline2\r\nline3'
with io.open(filename, mode="wb") as fd:
fd.write(content)
The content of the binary file is:
with io.open(filename, mode="rb") as fd:
for line in fd:
print(repr(line))
NB: I used the "rb" mode to read the file as a binary file.
I get:
b'line1\r\n'
b'line2\r\n'
b'line3'
If I read the content of the file using the text mode, like this:
with io.open(filename, mode="r", encoding="ascii") as fd:
for line in fd:
print(repr(line))
I get:
'line1\n'
'line2\n'
'line3'
The delimiter is replaced by \n.
The os.linesep is also used in write mode. Any \n character is converted to the system default line separator: \r\n on Windows, \n on POSIX, etc.
With the io.open function you can force the line separator to whatever you want.
Example: how to write a Windows text file:
with io.open(filename, mode="w", encoding="ascii", newline="\r\n") as fd:
fd.write("one\ntwo\nthree\n")
If you read this file in text mode like this:
with io.open(filename, mode="rb") as fd:
content = fd.read()
print(repr(content))
You get:
b'one\r\ntwo\r\nthree\r\n'
As you know, reading and writing files in text mode in python converts the platform specific line separator to '\n' and vice versa. But if you would read a file in binary mode, no conversion takes place. Then you can explicitly convert the line endings using string.replace(os.linesep, '\n'). This can be useful if a file (or stream or whatever) contains a combination of binary and text data.
Related
How can I write to files using Python (on Windows) and use the Unix end of line character?
e.g. When doing:
f = open('file.txt', 'w')
f.write('hello\n')
f.close()
Python automatically replaces \n with \r\n.
The modern way: use newline=''
Use the newline= keyword parameter to io.open() to use Unix-style LF end-of-line terminators:
import io
f = io.open('file.txt', 'w', newline='\n')
This works in Python 2.6+. In Python 3 you could also use the builtin open() function's newline= parameter instead of io.open().
The old way: binary mode
The old way to prevent newline conversion, which does not work in Python 3, is to open the file in binary mode to prevent the translation of end-of-line characters:
f = open('file.txt', 'wb') # note the 'b' meaning binary
but in Python 3, binary mode will read bytes and not characters so it won't do what you want. You'll probably get exceptions when you try to do string I/O on the stream. (e.g. "TypeError: 'str' does not support the buffer interface").
For Python 2 & 3
See: The modern way: use newline='' answer on this very page.
For Python 2 only (original answer)
Open the file as binary to prevent the translation of end-of-line characters:
f = open('file.txt', 'wb')
Quoting the Python manual:
On Windows, 'b' appended to the mode opens the file in binary mode, so there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it’ll corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files. On Unix, it doesn’t hurt to append a 'b' to the mode, so you can use it platform-independently for all binary files.
You'll need to use the binary pseudo-mode when opening the file.
f = open('file.txt', 'wb')
def dos2unix(inp_file, out_file=None):
if out_file:
out_file_tmp = out_file
else:
out_file_tmp = inp_file + '_tmp'
if os.path.isfile(out_file_tmp):
os.remove(out_file_tmp)
with open(out_file_tmp, "w", newline='\n') as fout:
with open(inp_file, "r") as fin:
lines = fin.readlines()
lines = map(lambda line: line.strip() + '\n', lines)
fout.writelines(lines)
if not out_file:
shutil.move(out_file_tmp, inp_file)
print(f'dos2unix() {inp_file} is overwritten with converted data !')
else:
print(f'dos2unix() {out_file} is created with converted data !')
here is my code , it's really simple, I download a file(with lib requests ) and save it to disk,but the size i got is different from actually size write to disk
mus_resp =r.get("http://audio.xmcdn.com/group7/M07/21/73/wKgDWlbmOa3TD0D_AArDQp_Mj5Y641.m4a",headers=headers, stream=True)
#print len(mus_resp.content) here is 705346 bytes
fd = open( "file", 'w')
fd.write(mus_resp.content)
fd.flush()
fd.close()
exit()
print os.path.getsize('file') here is 708677 bytes
Your data is binary data, not text, and it likely contains \n characters semi-randomly (they don't mean newlines, it's just the same byte as ASCII newline). When you write them to a text mode file on Windows, it's seamlessly converting to \r\n (Windows standard line endings), bloating the final file. Open the file in binary mode and you'll disable line ending conversions:
fd = open("file", 'wb') # 'wb' means write binary mode
So I got those template, they are all ending in LF and I can fill some terms inside with format and still get LF files by opening with "wb".
Those templates are used in a deployment script on a windows machine to deploy on a unix server.
Problem is, a lot of people are going to mess with those template, and I'm 100% sure that some of them will put some CRLF inside.
How could I, using Python, convert all the CRLF to LF?
Convert line endings in-place (with Python 3)
Line endings:
Windows - \r\n, called CRLF
Linux/Unix/MacOS - \n, called LF
Windows to Linux/Unix/MacOS (CRLF ➡ LF)
Here is a short Python script for directly converting Windows line endings to Linux/Unix/MacOS line endings. The script works in-place, i.e., without creating an extra output file.
# replacement strings
WINDOWS_LINE_ENDING = b'\r\n'
UNIX_LINE_ENDING = b'\n'
# relative or absolute file path, e.g.:
file_path = r"c:\Users\Username\Desktop\file.txt"
with open(file_path, 'rb') as open_file:
content = open_file.read()
# Windows ➡ Unix
content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)
# Unix ➡ Windows
# content = content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING)
with open(file_path, 'wb') as open_file:
open_file.write(content)
Linux/Unix/MacOS to Windows (LF ➡ CRLF)
To change the converting from Linux/Unix/MacOS to Windows, simply comment the replacement for Unix ➡ Windows back in (remove the # in front of the line).
DO NOT comment out the command for the Windows ➡ Unix replacement, as it ensures a correct conversion. When converting from LF to CRLF, it is important that there are no CRLF line endings already present in the file. Otherwise, those lines would be converted to CRCRLF. Converting lines from CRLF to LF first and then doing the aspired conversion from LF to CRLF will avoid this issue (thanks #neuralmer for pointing that out).
Code Explanation
Binary Mode
Important: We need to make sure that we open the file both times in binary mode (mode='rb' and mode='wb') for the conversion to work.
When opening files in text mode (mode='r' or mode='w' without b), the platform's native line endings (\r\n on Windows and \r on old Mac OS versions) are automatically converted to Python's Unix-style line endings: \n. So the call to content.replace() couldn't find any \r\n line endings to replace.
In binary mode, no such conversion is done. Therefore the call to str.replace() can do its work.
Binary Strings
In Python 3, if not declared otherwise, strings are stored as Unicode (UTF-8). But we open our files in binary mode - therefore we need to add b in front of our replacement strings to tell Python to handle those strings as binary, too.
Raw Strings
On Windows the path separator is a backslash \ which we would need to escape in a normal Python string with \\. By adding r in front of the string we create a so called "raw string" which doesn't need any escaping. So you can directly copy/paste the path from Windows Explorer into your script.
(Hint: Inside Windows Explorer press CTRL+L to automatically select the path from the address bar.)
Alternative solution
We open the file twice to avoid the need of repositioning the file pointer. We could also have opened the file once with mode='rb+' but then we would have needed to move the pointer back to start after reading its content (open_file.seek(0)) and truncate its original content before writing the new one (open_file.truncate(0)).
Simply opening the file again in write mode does that automatically for us.
Cheers and happy programming,
winklerrr
Python 3:
The default newline type for open is universal, in which case it doesn't mind which sort of newline each line has.
You can also request a specific form of newline with the newline argument for open.
Translating from one form to the other is thus rather simple in Python:
with open('filename.in', 'r') as infile, \
open('filename.out', 'w', newline='\n') as outfile:
outfile.writelines(infile.readlines())
Python 2:
The open function supports universal newlines via the 'rU' mode.
Again, translating from one form to the other:
with open('filename.in', 'rU') as infile, \
open('filename.out', 'w', newline='\n') as outfile:
outfile.writelines(infile.readlines())
(In Python 3, mode U is actually deprecated; the equivalent form is newline=None, which is the default)
Why don't you try below:
str.replace('\r\n','\n');
CRLF => \r\n
LF => \n
It is possible to fix existing templates with messed-up ending with this code:
with open('file.tpl') as template:
lines = [line.replace('\r\n', '\n') for line in template]
with open('file.tpl', 'w') as template:
template.writelines(lines)
For example: Writing to csv file in Python
with open('StockPrice.csv', 'wb') as f:
Why would we need to open in binary for a csv file?
Is this just habit, or is there a use-case for when binary is necessary for a csv file?
It's necessary to use the mode "wb" when writing output using the csv module on Windows, because the csv module will write out line-ends as \r\n regardless of what platform you're running on.
If you're running on Windows, and you have a file open with mode "w", Python will add an extra carriage return every time you write a newline. So if you use a file with mode "w" to write output using the csv module, you will end up with \r\r\n line-endings, as both Python and the csv module have added carriage-return characters.
Here's a quick program that demonstrates the result. Note that we read the file in binary mode ("rb") to prevent Python from replacing \r\n with \n as it reads the file back in:
import csv
with open("output.csv", "w") as f:
w = csv.writer(f)
w.writerow([1,2,3,4])
w.writerow([5,6,7,8])
with open("output.csv", "rb") as f:
print repr(f.read())
When I run this on Windows, I get the following output:
'1,2,3,4\r\r\n5,6,7,8\r\r\n'
I need to write a list of values to a text file. Because of Windows, when I need to write a line feed character, windows does \n\r and other systems do \n.
It occurred to me that maybe I should write to file in binary.
How to I create a list like the following example and write to file in binary?
output = ['my first line', hex_character_for_line_feed_here, 'my_second_line']
How come the following does not work?
output = ['my first line', '\x0a', 'my second line']
Don't. Open the file in text mode and just let Python handle the newlines for you.
When you use the open() function you can set how Python should handle newlines with the newline keyword parameter:
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
So the default method is to write the correct line separator for your platform:
with open(outputfilename, 'w') as outputfile:
outputfile.write('\n'.join(output))
and does the right thing; on Windows \r\n characters are saved instead of \n.
If you specifically want to write \n only and not have Python translate these for you, use newline='':
with open(outputfilename, 'w', newline='') as outputfile:
outputfile.write('\n'.join(output))
Note that '\x0a' is exactly the same character as \n; \r is \x0d:
>>> '\x0a'
'\n'
>>> '\x0d'
'\r'
Create a text file, "myTextFile" in the same directory as your Python script. Then write something like:
# wb opens the file in "Write Binary" mode
myTextFile = open("myTextFile.txt", 'wb')
output = ['my first line', '369as3', 'my_second_line']
for member in output:
member.encode("utf-8") # Or whatever encoding you like =)
myTextFile.write(member + "\n")
This outputs a binary text file that looks like:
my first line
369as3
my_second_line
Edit: Updated for Python 3