Im having an issue with a pretty simple piece of code. i have a file of 6551 lines(ASCII), and all i've managed to do so far is to read the file and print it.
a_file = open(myfile_path).readlines()
print a_file
upon trying to print, the interpreter gets completely stuck for a few minutes.
i've tried this both in idle and in jetbrains pycharm. im runnning windows server 2012 as my work work station and windows 7 at home - funny thing is that this worked perfectly on the weaker windows 7 machine back home(q9550 and 8gb ram) - but i cant(and neither the it guy) find a solution for this on my work station(i7 on x99, 64gb ram, gtx980).
would appreciate all and any assistance.
It's not a good idea to read a file to memory (that's what you do) exactly because you can face problems you encountered with.
If you want to print every line of the file, you can use the following construction:
with open(myfile_path) as input_file:
for line in input_file:
print line
for more complicated actions you (or the IT-guy) should better address to the documentation for open() method and file operations.
First, try this:
import sys
with open(myfile_path) as f:
for line in f:
print f
sys.stdout.flush()
It could be a number of things causing the script to hang such as the file being open or locked or the output blocking. This will cause everything that can print to print.
Additionally, generally doing things a line at a time is beneficial unless you actually need all of the lines at the same time in a list This shouldn't be the root cause here unless the line lengths are truly enormous but it's good style to not allocate giant data structures.
Additionally, setting non-blocking mode on the file will resolve issues such as the file being written to and locked (though this won't give you a solution that is stable but it will stop blocking on read instead). This is OS dependent and probably won't help you more than it will hurt.
If the issue is that the file is being written to (I've come across this a lot in Windows), another option is to copy the file being written to into a temp file and handing that copy to the script.
What you choose to do will depend greatly on whether you want to ensure you have all the data possible or whether you want to ensure the script runs immediately.
Related
I've found an inconsistent behavior of python.
Under Windows if the file changes the program will notice. Under Linux the program will not notice.
I am using python 3.6.8 and Ubuntu 18.04.
Is this a bug or do I something wrong?
import time
if __name__ == '__main__':
file = open('CurrentData.txt', 'r')
while True:
lines = file.readlines()
print(lines)
time.sleep(1)
file.seek(0)
file.close()
The only thing that's wrong with your Python program is that it's making unfounded assumptions.
There are two different ways to change a file's contents in UNIX:
You can modify the file in-place, changing the contents of the existing inode; seek()ing back to the front and rereading will see that, so if your file were edited with this method, your existing code would work.
You can create a whole new inode, write the contents, and only after the write is successful rename() it over the old one.
That's often considered the better practice, because it means programs that were in the middle of reading your old file will retain the handle they had; they won't have surprising/inconsistent/broken behavior because the contents changed out from under them. If you do it right (which might involve fsync() calls on not just the file but also the directory it's in), a writer using this method can also ensure that in the event of a power loss, the new system will have one copy of the file or the other, but not a half-written intermediate state you can get if you truncate an existing inode and rewrite from the beginning.
If you want to handle both cases, you can't hang onto your existing handle, but should actually re-open() the file when you want to see changes.
I'm having a bit of a conceptual problem. For writing to the file "to_file", this works:
out_file = open(to_file, 'w')
out_file.write(indata)
...but this doesn't:
(open(to_file, 'w')).write(indata)
In theory, shouldn't swapping out a variable's (out_file) definition for the variable itself produce the same result? I'm confused as to why the extra step of creating the variable is necessary.
As others have pointed out, your code will actually open and write to the file. However,...
In your second, single-line code, you now have no reference to the open file. Therefore you have no way to close it or do anything else with it.
Leaving a file open is a resource leak. If your program closes right away, Python will try to close the file just before ending. But Python could possible fail, for a variety of reasons. For example, the removable disk drive containing the file may be removed after you write to the file but before your program ends. That could make the file unreadable on the removable drive--and I have seen this happen. And if your program does not close right away, you have this extra resource hanging around that takes memory and other resources that need not be taken. If your program continues for a long time, the growing resources could slow down or stop the computer.
Even if your program will close right away, this is a bad habit to develop. You don't just want to write programs, you want to write code that will work well in a variety of situations. You may think "I will never use this code in a long-running program." Such declarations often turn out to be mistaken. Coding is difficult enough--don't make it harder for yourself. Avoid the "anti-pattern" of your second example.
There is a better pattern in Python for such things, using the with statement. Read that link and use that pattern rather than either of your two examples.
with open(to_file, 'w') as out_file:
out_file.write(indata)
Those two lines opened the file, wrote the data to the file, then closed the file. If you want to do more with the file before it is closed, put that code in the indented section under the with statement.
In Python 2.7, both of your provided examples will work and write to the file.
This question already has answers here:
Is explicitly closing files important?
(7 answers)
Is close() necessary when using iterator on a Python file object [duplicate]
(8 answers)
Closed 8 years ago.
Usually when I open files I never call the close() method, and nothing bad happens. But I've been told this is bad practice. Why is that?
For the most part, not closing files is a bad idea, for the following reasons:
It puts your program in the garbage collectors hands - though the file in theory will be auto closed, it may not be closed. Python 3 and Cpython generally do a pretty good job at garbage collecting, but not always, and other variants generally suck at it.
It can slow down your program. Too many things open, and thus more used space in the RAM, will impact performance.
For the most part, many changes to files in python do not go into effect until after the file is closed, so if your script edits, leaves open, and reads a file, it won't see the edits.
You could, theoretically, run in to limits of how many files you can have open.
As #sai stated below, Windows treats open files as locked, so legit things like AV scanners or other python scripts can't read the file.
It is sloppy programming (then again, I'm not exactly the best at remembering to close files myself!)
Found some good answers:
(1) It is a matter of good programming practice. If you don't close
them yourself, Python will eventually close them for you. In some
versions of Python, that might be the instant they are no longer
being used; in others, it might not happen for a long time. Under
some circumstances, it might not happen at all.
(2) When writing to a file, the data may not be written to disk until
the file is closed. When you say "output.write(...)", the data is
often cached in memory and doesn't hit the hard drive until the file
is closed. The longer you keep the file open, the greater the
chance that you will lose data.
(3) Since your operating system has strict limits on how many file
handles can be kept open at any one instant, it is best to get into
the habit of closing them when they aren't needed and not wait for
"maid service" to clean up after you.
(4) Also, some operating systems (Windows, in particular) treat open
files as locked and private. While you have a file open, no other
program can also open it, even just to read the data. This spoils
backup programs, anti-virus scanners, etc.
http://python.6.x6.nabble.com/Tutor-Why-do-you-have-to-close-files-td4341928.html
https://docs.python.org/2/tutorial/inputoutput.html
Open files use resources and may be locked, preventing other programs from using them. Anyway, it is good practice to use with when reading files, as it takes care of closing the file for you.
with open('file', 'r') as f:
read_data = f.read()
Here's an example of something "bad" that might happen if you leave a file open.
Open a file for writing in your python interpreter, write a string to it, then open that file in a text editor. On my system, the file will be empty until I close the file handle.
The close() method of a file object flushes any unwritten information and closes the file object, after which no more writing can be done.
Python automatically closes a file when the reference object of a file is reassigned to another file. It is a good practice to use the close() method to close a file.Here is the link about the close() method. I hope this helps.
You only have to call close() when you're writing to a file.
Python automatically closes files most of the time, but sometimes it won't, so you want to call it manually just in case.
I had a problem with that recently:
I was writing some stuff to a file in a for-loop, but if I interrupt the script with ^C, a lot of data which should have actually been written to the file wasn't there. It looks like Python stops to writing there for no reason. I opened the file before the for loop. Then I changed the code so that Python opens and closes the file for ever single pass of the loop.
Basically, if you write stuff for your own and you don't have any issues - it's fine, if you write stuff for more people than just yourself - put a close() inside the code, because someone could randomly get an error message and you should try to prevent this.
I am writing a program in which I would like to be able to view a log file before the program is complete. I have noticed that, in python (2.7 and 3), that file.write() does not save the file, file.close() does. I don't want to create a million little log files with unique names but I would like to be able to view the updated log file before the program is finished. How can I do this?
Now, to be clear I am scripting using Ansys Workbench (trying to batch some CFX runs). Here's a link to a tutorial that shows what I'm talking about. They appear to have wrapped python, and by running the script I can send commands to the various modules. When the script is running there is no console onscreen and it appears to be eating all of the print statements, so the only way I can report what's happening is via a file. Also, I don't want to bring a console window up because eventually I will just run the program in batch mode (no interface). But the simulations take a long time to run and I can't wait for the program to finish before checking on what's happening.
You would need this:
file.flush()
# typically the above line would do. however this is used to ensure that the file is written
os.fsync(file.fileno())
Check this: http://docs.python.org/2/library/stdtypes.html#file.flush
file.flush()
Flush the internal buffer, like stdio‘s fflush(). This may be a no-op on some file-like objects.
Note flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.
EDITED: See this question for detailed explanations: what exactly the python's file.flush() is doing?
Does file.flush() after each write help?
Hannu
This will write the file to disk immediately:
file.flush()
os.fsync(file.fileno())
According to the documentation https://docs.python.org/2/library/os.html#os.fsync
Force write of file with filedescriptor fd to disk. On Unix, this calls the native fsync() function; on Windows, the MS _commit() function.
If you’re starting with a Python file object f, first do f.flush(), and then do os.fsync(f.fileno()), to ensure that all internal buffers associated with f are written to disk.
I'm working with only one txt file which is about 4 MB, and the file needs frequently I/O such as append new lines/search for certain lines which includes specific phrases/replace certain line with another line etc.
In order to process the file "at the same time", threading.RLock() is used to lock the resource when its under operation. As it's not a big file, I simply use readlines() to read them all into a list and do the search job, and also use read() to read the whole file into a string FileContent, and use FileContent.replace("demo", "test") to replace certain phrases with anything I want.
But the problem is, I'm occasionally facing "MemoryError", I mean sometimes every 3 or 4 days, sometimes longer like a week or so. I've checked my code carefully and there's no unclosed file object when each thread ends. As to file operation, I simply use:
CurrentFile = open("TestFile.txt", "r")
FileContent = CurrentFile.read()
CurrentFile.close()
I think maybe python is not deleting useless variables as fast as I expected which finally result into out of memory, so I'm considering to use with statement which might be quick in garbage collecting. I'm not experienced with such statement, anybody knows if this would help? Or is there a better solution for my problem?
Thanks a lot.
Added: My script would do lots of replacement in a short period of time, so my guess is maybe hundreds of threads using FileContent = CurrentFile.read() would cause out of memory if FileContent not deleted quickly? How do I debug such problem?
Without seeing more of your code, it's impossible to know why you are running out of memory. The with statement is the preferred way to open files and close them when done though:
with open("TestFile.txt", "r") as current_file:
file_content = current_file.read()
(sorry, UpperCamelCase for variables just doesn't look right to me...)
Frankly, I doubt this will solve your problem if you are really closing files as you show in the question, but it's still good practice.
Sounds like you are leaking memory. Python will use all available system memory before giving MemoryError and 4 MB does not sound much. Where you leak memory depends on your code which you didn't give in your question.
Have you watched the memory usage in the task manage of the OS?
Here is a tool to debug Python memory usage (needs Python debug compiliation):
http://guppy-pe.sourceforge.net/#Heapy
Use it to analyze your code memory usage and see what objects you are creating which don't get freed.