I want to read a file and write it back out. Here's my code:
file = open( zipname , 'r' )
content = file.read()
file.close()
alt = open('x.zip', 'w')
alt.write(content )
alt.close()
This doesn't work, why?????
Edit:
The rewritten file is corrupt
(python 2.7.1 on windows)
Read and write in the binary mode, 'rb' and 'wb':
f = open(zipname , 'rb')
content = f.read()
f.close()
alt = open('x.zip', 'wb')
alt.write(content )
alt.close()
The reason the text mode didn't work on Windows is that the newline translation from '\r\n' to '\r' mangled the binary data in the zip file.
From this bit of the manual:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
If I run this program on my OS X or Linux box, it works exactly as you would expect. The file x.zip has exactly the same checksum as the original zip file and is not corrupt. I believe that Windows is one of the platforms where you need to explicitly open files in binary mode; try:
file = open(zipname, 'rb')
Related
How can I write to files using Python (on Windows) and use the Unix end of line character?
e.g. When doing:
f = open('file.txt', 'w')
f.write('hello\n')
f.close()
Python automatically replaces \n with \r\n.
The modern way: use newline=''
Use the newline= keyword parameter to io.open() to use Unix-style LF end-of-line terminators:
import io
f = io.open('file.txt', 'w', newline='\n')
This works in Python 2.6+. In Python 3 you could also use the builtin open() function's newline= parameter instead of io.open().
The old way: binary mode
The old way to prevent newline conversion, which does not work in Python 3, is to open the file in binary mode to prevent the translation of end-of-line characters:
f = open('file.txt', 'wb') # note the 'b' meaning binary
but in Python 3, binary mode will read bytes and not characters so it won't do what you want. You'll probably get exceptions when you try to do string I/O on the stream. (e.g. "TypeError: 'str' does not support the buffer interface").
For Python 2 & 3
See: The modern way: use newline='' answer on this very page.
For Python 2 only (original answer)
Open the file as binary to prevent the translation of end-of-line characters:
f = open('file.txt', 'wb')
Quoting the Python manual:
On Windows, 'b' appended to the mode opens the file in binary mode, so there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it’ll corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files. On Unix, it doesn’t hurt to append a 'b' to the mode, so you can use it platform-independently for all binary files.
You'll need to use the binary pseudo-mode when opening the file.
f = open('file.txt', 'wb')
def dos2unix(inp_file, out_file=None):
if out_file:
out_file_tmp = out_file
else:
out_file_tmp = inp_file + '_tmp'
if os.path.isfile(out_file_tmp):
os.remove(out_file_tmp)
with open(out_file_tmp, "w", newline='\n') as fout:
with open(inp_file, "r") as fin:
lines = fin.readlines()
lines = map(lambda line: line.strip() + '\n', lines)
fout.writelines(lines)
if not out_file:
shutil.move(out_file_tmp, inp_file)
print(f'dos2unix() {inp_file} is overwritten with converted data !')
else:
print(f'dos2unix() {out_file} is created with converted data !')
I try to download images, but they become corrupted for some reason? For example: This is an image I want to get.
And the result is this
My test code is:
import urllib2
def download_web_image(url):
request = urllib2.Request(url)
img = urllib2.urlopen(request).read()
with open ('test.jpg', 'w') as f: f.write(img)
download_web_image("http://upload.wikimedia.org/wikipedia/commons/8/8c/JPEG_example_JPG_RIP_025.jpg")
Why is this and how do I fix this?
You are opening 'test.jpg' file in the default (text) mode, which causes Python to use the "correct" newlines on Windows:
In text mode, the default when reading is to convert platform-specific
line endings (\n on Unix, \r\n on Windows) to just \n. When writing in
text mode, the default is to convert occurrences of \n back to
platform-specific line endings.
Of course, JPEG files are not text files, and 'fixing' the newlines will only corrupt the image. Instead, open the file in binary mode:
with open('test.jpg', 'wb') as f:
f.write(img)
For more details, see the documentation.
The issue described here looked initially like it was solvable by just having the spreadsheet closed in Excel before running the program.
It transpires, however, that having Excel closed is a necessary, but not sufficient, condition. The issue still occurs, but not on every Windows machine, and not every time (sometimes it occurs after a single execution, sometimes two).
I've modified the program such that it now reads from one spreadsheet and writes to a different one, still the issue presents itself. I even go on to programmatically kill any lingering Python processes before running the program. Still no joy.
The openpyxl save() function instantiates ZipFile thus:
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
... with Zipfile then using that to attempt to open the file in mode 'wb' thus:
if isinstance(file, basestring):
self._filePassed = 0
self.filename = file
modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
try:
self.fp = open(file, modeDict[mode])
except IOError:
if mode == 'a':
mode = key = 'w'
self.fp = open(file, modeDict[mode])
else:
raise
According to the docs:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
... which explains why mode 'wb' must be used.
Is there something in Python file opening that could possibly leave the file in some state of "openness"?
Windows: 8
Python: 2.7.10
openpyxl: latest
Two suggestions:
First is to use with to close the file correctly.
with open("some.xls", "wb") as excel_file:
#Do something
At the end of that the file will close on its own (see this).
You can also make a copy of the file and work on the copied file.
import shutil
shutil.copyfile(src, dst)
https://docs.python.org/2/library/shutil.html#shutil.copyfile
I am trying to grab some text written in Arabic from Youtube, writting it into a file and reading it again.
The source file to grab the text has:
#!/usr/bin/python
#encoding: utf-8
in the beginning of the file.
Writing the text are done like this:
f.write(comment + '\n' )
The file contents is readable Arabic, so I assume the previous steps were correct.
But the problem appears when trying to read the contents from the file (and writing them for example into another file) like this:
in = open('data_Pass1/EG', 'rb')
out.write(in.read())
Which results in output file like this:
\xd8\xa7\xd9\x8a\xd9\x87
What is causing this?
In python 3.x
in = open('data_Pass1/EG', 'r', encoding='utf-8')
out = open('_file_name_', 'w', encoding='utf-8')
In python 2.x.
import codecs
in = codecs.open('data_Pass1/EG', 'r', encoding='utf-8')
out = codecs.open('_file_name_', 'w', encoding='utf-8')
You're opening the input file in binary ('rb') mode. Open the file to read as text ('r'). I tend to use Python 3 so the source files are UTF-8 by default, so I don't know what effect setting the encoding for .py files inside the files has on text I/O, but if necessary you may also want to use encoding='utf8' inside the calls to open() for all your file I/O, unless that doesn't work in 2.7 in which case I'm not sure what the best way to handle that in Python 2.7 would be...
As Lee Daniel Crocker suggests, you'd probably be better off just opening both input and output files in binary mode ('rb' for the input file, 'wb' for the output) if you're passing the input directly to the output without doing any textual manipulation of it. (Though going by Andy's comment, in Python 2 it's better to open text files in binary mode and do explicit encoding/decoding anyway.)
This question already has answers here:
Difference between modes a, a+, w, w+, and r+ in built-in open function?
(9 answers)
Closed last month.
I am using pickle module in Python and trying different file IO modes:
# works on windows.. "rb"
with open(pickle_f, 'rb') as fhand:
obj = pickle.load(fhand)
# works on linux.. "r"
with open(pickle_f, 'r') as fhand:
obj = pickle.load(fhand)
# works on both "r+b"
with open(pickle_f, 'r+b') as fhand:
obj = pickle.load(fhand)
I never read about "r+b" mode anywhere, but found mentioning about it in the documentation.
I am getting EOFError on Linux if I use "rb" mode and on Windows if "r" is used. I just gave "r+b" mode a shot and it's working on both.
What's "r+b" mode? What's the difference between "rb" and "r+b"? Why does it work when the others don't?
r+ is used for reading, and writing mode. b is for binary.
r+b mode is open the binary file in read or write mode.
You can read more here.
r opens for reading, whereas r+ opens for reading and writing. The b is for binary.
This is spelled out in the documentation:
The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists), and 'a' for appending (which on some Unix systems means that all writes append to the end of the file regardless of the current seek position). If mode is omitted, it defaults to 'r'. The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.) See below for more possible values of mode.
Modes 'r+', 'w+' and 'a+' open the file for updating (note that 'w+' truncates the file). Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.
My understanding is that adding r+ opens for both read and write (just like w+, though as pointed out in the comment, will truncate the file). The b just opens it in binary mode, which is supposed to be less aware of things like line separators (at least in C++).
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
Source: Reading and Writing Files