This is how the code is
with open(pickle_f, 'r') as fhand:
obj = pickle.load(fhand)
This works fine on Linux systems but not on Windows. Its showing EOFError.
I have to use rb mode to make it work on Windows.. now this isn't working on Linux.
Why this is happening, and how to fix it?
Always use b mode when reading and writing pickles (open(f, 'wb') for writing, open(f, 'rb') for reading). To "fix" the file you already have, convert its newlines using dos2unix.
Related
The issue described here looked initially like it was solvable by just having the spreadsheet closed in Excel before running the program.
It transpires, however, that having Excel closed is a necessary, but not sufficient, condition. The issue still occurs, but not on every Windows machine, and not every time (sometimes it occurs after a single execution, sometimes two).
I've modified the program such that it now reads from one spreadsheet and writes to a different one, still the issue presents itself. I even go on to programmatically kill any lingering Python processes before running the program. Still no joy.
The openpyxl save() function instantiates ZipFile thus:
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
... with Zipfile then using that to attempt to open the file in mode 'wb' thus:
if isinstance(file, basestring):
self._filePassed = 0
self.filename = file
modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
try:
self.fp = open(file, modeDict[mode])
except IOError:
if mode == 'a':
mode = key = 'w'
self.fp = open(file, modeDict[mode])
else:
raise
According to the docs:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
... which explains why mode 'wb' must be used.
Is there something in Python file opening that could possibly leave the file in some state of "openness"?
Windows: 8
Python: 2.7.10
openpyxl: latest
Two suggestions:
First is to use with to close the file correctly.
with open("some.xls", "wb") as excel_file:
#Do something
At the end of that the file will close on its own (see this).
You can also make a copy of the file and work on the copied file.
import shutil
shutil.copyfile(src, dst)
https://docs.python.org/2/library/shutil.html#shutil.copyfile
So I just spent a long time processing and writing out files for a project I'm working on. The files contain objects pickled with cPickle. Now I'm trying to load the pickled files and I'm running into the problem: "Can't import module ...". The thing is, I can import the module directly from the python prompt just fine.
I started noticing that my code had the problem reading the file (getting EOF error) and I noted that I was reading it with open('file','r'). Others noted that I need to specify that it's a binary file. I don't get the EOF error anymore, but now I'm getting this error.
It seems to me that I've screwed up the writing of my files initially by writing out with 'w' and not 'wb'.
The question I have is, is there a way to process the binary file and fix what 'w' changed? Possibly by searching for line returns and changing them (which is what I think the big difference is between 'w' and 'wb' on Windows).
Any help doing this would be amazing, as otherwise I will have lost weeks of work. Thanks.
I found the answer here. It talks about a solution to the same problem having, but not before outlining the traditional solution in python 2 (to all those that do this, thank you).
The solution comes down to this:
data = open(filename, "rb").read()
newdata = data.replace("\r\n", "\n")
if newdata != data:
f = open(filename, "wb")
f.write(newdata)
f.close()
Basically, just replace all instances of "\r\n" with "\n". It seems to have worked well, I can now open the file and unpickle it just fine.
I'm running into a problem that I haven't seen anyone on StackOverflow encounter or even google for that matter.
My main goal is to be able to replace occurences of a string in the file with another string. Is there a way there a way to be able to acess all of the lines in the file.
The problem is that when I try to read in a large text file (1-2 gb) of text, python only reads a subset of it.
For example, I'll do a really simply command such as:
newfile = open("newfile.txt","w")
f = open("filename.txt","r")
for line in f:
replaced = line.replace("string1", "string2")
newfile.write(replaced)
And it only writes the first 382 mb of the original file. Has anyone encountered this problem previously?
I tried a few different solutions such as using:
import fileinput
for i, line in enumerate(fileinput.input("filename.txt", inplace=1)
sys.stdout.write(line.replace("string1", "string2")
But it has the same effect. Nor does reading the file in chunks such as using
f.read(10000)
I've narrowed it down to mostly likely being a reading in problem and not a writing problem because it happens for simply printing out lines. I know that there are more lines. When I open it in a full text editor such as Vim, I can see what the last line should be, and it is not the last line that python prints.
Can anyone offer any advice or things to try?
I'm currently using a 32-bit version of Windows XP with 3.25 gb of ram, and running Python 2.7
Try:
f = open("filename.txt", "rb")
On Windows, rb means open file in binary mode. According to the docs, text mode vs. binary mode only has an impact on end-of-line characters. But (if I remember correctly) I believe opening files in text mode on Windows also does something with EOF (hex 1A).
You can also specify the mode when using fileinput:
fileinput.input("filename.txt", inplace=1, mode="rb")
Are you sure the problem is with reading and not with writing out?
Do you close the file that is written to, either explicitly newfile.close() or using the with construct?
Not closing the output file is often the source of such problems when buffering is going on somewhere. If that's the case in your setting too, closing should fix your initial solutions.
If you use the file like this:
with open("filename.txt") as f:
for line in f:
newfile.write(line.replace("string1", "string2"))
It should only read into memory one line at a time, unless you keep a reference to that line in memory.
After each line is read it will be up to pythons garbage collector to get rid of it. Give this a try and see if it works for you :)
Found to solution thanks to Gareth Latty. Using an iterator:
def read_in_chunks(file, chunk_size=1000):
while True:
data = file.read(chunk_size)
if not data: break
yield data
This answer was posted as an edit to the question Python Does Not Read Entire Text File by the OP user1297872 under CC BY-SA 3.0.
I'm learning Python, and have run into a bit of a problem. On my OSX install of Python 3.1, this happens in the console:
>>> filename = "test"
>>> reader = open(filename, 'r')
>>> writer = open(filename, 'w')
>>> reader.read()
''
>>> writer.write("hello world\n")
12
>>> reader.read()
''
And calling more test in BASH confirms that there is nothing in test. What's going on?
Thanks.
There are two potential reasons why you are seeing this behaviour.
When you open a file for writing (with the "w" open mode in Python), the OS removes the original file and creates a totally new one. So by opening the file for reading first and then writing, the original reading handle refers to a file that no longer has a name (the file still exists until you close it). At that point you're reading from a different file than you're writing to.
After you swap the order of opening so you open for writing and then reading, you won't necessarily be able to read the data from the file until you flush it:
>>> writer.flush()
>>> reader.read()
'hello world\n'
Flushing the file writes any data that might be in Python's file buffers to the OS, so that when you read from the file from the other handle, the OS will return the data. Note that Python itself doesn't know these two handles refer to the same file, but the OS does.
You're probably trashing your file. It's not usually a good idea to open a file for reading and writing at the same time.
Buffering. If you really want to read and write to the same file open one handle using "w+".
And with the buttering, you will need to force the buffer to be emptied before reading. Closing the file is a good way to do this.
I want to read a image in binary mode so that I could save it into my database, like this:
img = open("Last_Dawn.jpg")
t = img.read()
save_to_db(t)
This is working on Mac. But on Windows, what img.read() is incorrect. It's just a little out of the whole set.
So my first question is: why code above doesn't work in Windows?
And second is: is there any other way to do this?
Thanks a lot!
You need to open in binary mode:
img = open("Last_Dawn.jpg", 'rb')
You need to tell Python to open the file in binary mode:
img = open('whatever.whatever', 'rb')
See the documentation for the open function here: http://docs.python.org/library/functions.html#open
Can't say for sure but I do know that the ISO C standard doesn't distinguish between the binary and non-binary modes when calling fopen and yet Windows does.
It's likely that the Python code just uses fopen("Last_Dawn.jpg","r") under the covers (since it's written in C) and this is being opened in Windows in non-binary mode.
This will most likely convert line end characters (LF -> CRLF) and possibly others.
If you yourself specify the mode as 'rb' on your open statement, that should fix it:
img = open("Last_Dawn.jpg", "rb")
open(filename, 'rb')