The issue described here looked initially like it was solvable by just having the spreadsheet closed in Excel before running the program.
It transpires, however, that having Excel closed is a necessary, but not sufficient, condition. The issue still occurs, but not on every Windows machine, and not every time (sometimes it occurs after a single execution, sometimes two).
I've modified the program such that it now reads from one spreadsheet and writes to a different one, still the issue presents itself. I even go on to programmatically kill any lingering Python processes before running the program. Still no joy.
The openpyxl save() function instantiates ZipFile thus:
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
... with Zipfile then using that to attempt to open the file in mode 'wb' thus:
if isinstance(file, basestring):
self._filePassed = 0
self.filename = file
modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
try:
self.fp = open(file, modeDict[mode])
except IOError:
if mode == 'a':
mode = key = 'w'
self.fp = open(file, modeDict[mode])
else:
raise
According to the docs:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
... which explains why mode 'wb' must be used.
Is there something in Python file opening that could possibly leave the file in some state of "openness"?
Windows: 8
Python: 2.7.10
openpyxl: latest
Two suggestions:
First is to use with to close the file correctly.
with open("some.xls", "wb") as excel_file:
#Do something
At the end of that the file will close on its own (see this).
You can also make a copy of the file and work on the copied file.
import shutil
shutil.copyfile(src, dst)
https://docs.python.org/2/library/shutil.html#shutil.copyfile
Related
I wonder if open(file_name, "rb") as binary_file: pass does actually executes a file if it's exe? I am asking because I am reading some malicious files and viruses using Python stored as ".exe" files.
No it doesn't AND the flags 'rb' in your open statement stand for read binary. So it's only reading the file and putting it in a byte like object. So not only is it not executing (because that's not a function of open) it's only going to be opened in read mode.
You can read about the open function in the documentation.
I'm writing a simple parser. For now, it reads the whole current dir and open files with 'r' and 'w' permissions for all files that end with ".w". Here's the code for it:
import os
wc_dir = os.path.dirname(os.path.abspath(__file__))
files = [f for f in os.listdir(wc_dir) if os.path.isfile(os.path.join(wc_dir,f))]
comp_files_r = [open(f, 'r') for f in files if f.endswith(".w")]
comp_files_w = [open(f, 'w') for f in files if f.endswith(".w")]
As you can see, I have two lists with "open objects" with read and write permissions for all files in the current folder that end with ".w". For now, I have just one file. So, consider the following:
print comp_files_r
print comp_files_w
Output:
[<open file 'app.w', mode 'r' at 0x7effd48274b0>]
[<open file 'app.w', mode 'w' at 0x7effd4827540>]
It happens that, when I try to read the 'app.w' file:
def parse():
for f in comp_files_r:
with f as file:
data = file.read()
print repr(data)
parse()
I get an astonishing empty string for no reason. I've managed to discover that, all that I save in 'app.w' gets erased when I execute the code with the "w list comprehension". So why is that? I've learned from pain that trying to both read and write a file in "r+" mode can lead to weird results. That's not the situation. I've created different objects from the same file, and this is messing with the content of the file itself. Why?
It looks to me that your issue is that you're opening the file in 'w' mode. When you open a file in 'w' mode, the current file is deleted and replaced with the new file. 'r+' mode is for reading and editing.
I'd be willing to bet that if you read the contents of the files between the lines where you open them for reading and and the line where you open them for writing, you will see the contents of the files as you expect them to be.
There are a lot of files, for each of them I need to read the text content, do some processing of the text, then write the text back (replacing the old content).
I know I can first open the files as rt to read and process the content, and then close and reopen them as wt, but obviously this is not a good way. Can I just open a file once to read and write? How?
See: http://docs.python.org/2/library/functions.html#open
The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists), and 'a' for appending (which on some Unix systems means that all writes append to the end of the file regardless of the current seek position). If mode is omitted, it defaults to 'r'. The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.) See below for more possible values of mode.
Modes 'r+', 'w+' and 'a+' open the file for updating (note that 'w+' truncates the file). Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.
So, you can open a file in mode r+, read from it, truncate, then write to the same file object. But you shouldn't do that.
You should open the file in read mode, write to a temporary file, then os.rename the temporary file to overwrite the original file. This way, your actions are atomic; if something goes wrong during the write step (for example, it gets interrupted), you don't end up having lost the original file, and having only partially written out your replacement text.
Check out the fileinput module. It lets you do what others are advising: back up the input file, manipulate its contents, and then write the altered data to the same place.
Optional in-place filtering: if the keyword argument inplace=True is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently). This makes it possible to write a filter that rewrites its input file in place.
Here's an example. Say I have a text file like:
1
2
3
4
I can do (Python 3):
import fileinput
file_path = r"C:\temp\fileinput_test.txt"
with fileinput.FileInput(files=[file_path], inplace=True) as input_data:
for line in input_data:
# Double the number on each line
s = str(int(line.strip()) * 2)
print(s)
And my file becomes:
2
4
6
8
You can use the 'r+' file mode to open a file for reading and writing at the same time.
example:
with open("file.txt", 'r+') as filehandle:
# can read and write to file here
well, you can choose the "r+w" mode, with which you need only open the file once
This question already has answers here:
Difference between modes a, a+, w, w+, and r+ in built-in open function?
(9 answers)
Closed last month.
I am using pickle module in Python and trying different file IO modes:
# works on windows.. "rb"
with open(pickle_f, 'rb') as fhand:
obj = pickle.load(fhand)
# works on linux.. "r"
with open(pickle_f, 'r') as fhand:
obj = pickle.load(fhand)
# works on both "r+b"
with open(pickle_f, 'r+b') as fhand:
obj = pickle.load(fhand)
I never read about "r+b" mode anywhere, but found mentioning about it in the documentation.
I am getting EOFError on Linux if I use "rb" mode and on Windows if "r" is used. I just gave "r+b" mode a shot and it's working on both.
What's "r+b" mode? What's the difference between "rb" and "r+b"? Why does it work when the others don't?
r+ is used for reading, and writing mode. b is for binary.
r+b mode is open the binary file in read or write mode.
You can read more here.
r opens for reading, whereas r+ opens for reading and writing. The b is for binary.
This is spelled out in the documentation:
The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists), and 'a' for appending (which on some Unix systems means that all writes append to the end of the file regardless of the current seek position). If mode is omitted, it defaults to 'r'. The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.) See below for more possible values of mode.
Modes 'r+', 'w+' and 'a+' open the file for updating (note that 'w+' truncates the file). Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.
My understanding is that adding r+ opens for both read and write (just like w+, though as pointed out in the comment, will truncate the file). The b just opens it in binary mode, which is supposed to be less aware of things like line separators (at least in C++).
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
Source: Reading and Writing Files
I want to read a file and write it back out. Here's my code:
file = open( zipname , 'r' )
content = file.read()
file.close()
alt = open('x.zip', 'w')
alt.write(content )
alt.close()
This doesn't work, why?????
Edit:
The rewritten file is corrupt
(python 2.7.1 on windows)
Read and write in the binary mode, 'rb' and 'wb':
f = open(zipname , 'rb')
content = f.read()
f.close()
alt = open('x.zip', 'wb')
alt.write(content )
alt.close()
The reason the text mode didn't work on Windows is that the newline translation from '\r\n' to '\r' mangled the binary data in the zip file.
From this bit of the manual:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
If I run this program on my OS X or Linux box, it works exactly as you would expect. The file x.zip has exactly the same checksum as the original zip file and is not corrupt. I believe that Windows is one of the platforms where you need to explicitly open files in binary mode; try:
file = open(zipname, 'rb')