Why a variable is required to open files - python

I'm having a bit of a conceptual problem. For writing to the file "to_file", this works:
out_file = open(to_file, 'w')
out_file.write(indata)
...but this doesn't:
(open(to_file, 'w')).write(indata)
In theory, shouldn't swapping out a variable's (out_file) definition for the variable itself produce the same result? I'm confused as to why the extra step of creating the variable is necessary.

As others have pointed out, your code will actually open and write to the file. However,...
In your second, single-line code, you now have no reference to the open file. Therefore you have no way to close it or do anything else with it.
Leaving a file open is a resource leak. If your program closes right away, Python will try to close the file just before ending. But Python could possible fail, for a variety of reasons. For example, the removable disk drive containing the file may be removed after you write to the file but before your program ends. That could make the file unreadable on the removable drive--and I have seen this happen. And if your program does not close right away, you have this extra resource hanging around that takes memory and other resources that need not be taken. If your program continues for a long time, the growing resources could slow down or stop the computer.
Even if your program will close right away, this is a bad habit to develop. You don't just want to write programs, you want to write code that will work well in a variety of situations. You may think "I will never use this code in a long-running program." Such declarations often turn out to be mistaken. Coding is difficult enough--don't make it harder for yourself. Avoid the "anti-pattern" of your second example.
There is a better pattern in Python for such things, using the with statement. Read that link and use that pattern rather than either of your two examples.
with open(to_file, 'w') as out_file:
out_file.write(indata)
Those two lines opened the file, wrote the data to the file, then closed the file. If you want to do more with the file before it is closed, put that code in the indented section under the with statement.

In Python 2.7, both of your provided examples will work and write to the file.

Related

Why does seeking to the beginning of a file to reread it see updates on Windows but not Linux?

I've found an inconsistent behavior of python.
Under Windows if the file changes the program will notice. Under Linux the program will not notice.
I am using python 3.6.8 and Ubuntu 18.04.
Is this a bug or do I something wrong?
import time
if __name__ == '__main__':
file = open('CurrentData.txt', 'r')
while True:
lines = file.readlines()
print(lines)
time.sleep(1)
file.seek(0)
file.close()
The only thing that's wrong with your Python program is that it's making unfounded assumptions.
There are two different ways to change a file's contents in UNIX:
You can modify the file in-place, changing the contents of the existing inode; seek()ing back to the front and rereading will see that, so if your file were edited with this method, your existing code would work.
You can create a whole new inode, write the contents, and only after the write is successful rename() it over the old one.
That's often considered the better practice, because it means programs that were in the middle of reading your old file will retain the handle they had; they won't have surprising/inconsistent/broken behavior because the contents changed out from under them. If you do it right (which might involve fsync() calls on not just the file but also the directory it's in), a writer using this method can also ensure that in the event of a power loss, the new system will have one copy of the file or the other, but not a half-written intermediate state you can get if you truncate an existing inode and rewrite from the beginning.
If you want to handle both cases, you can't hang onto your existing handle, but should actually re-open() the file when you want to see changes.

Is there a need to close files that have no reference to them?

As a complete beginner to programming, I am trying to understand the basic concepts of opening and closing files. One exercise I am doing is creating a script that allows me to copy the contents from one file to another.
in_file = open(from_file)
indata = in_file.read()
out_file = open(to_file, 'w')
out_file.write(indata)
out_file.close()
in_file.close()
I have tried to shorten this code and came up with this:
indata = open(from_file).read()
open(to_file, 'w').write(indata)
This works and looks a bit more efficient to me. However, this is also where I get confused. I think I left out the references to the opened files; there was no need for the in_file and out_file variables. However, does this leave me with two files that are open, but have nothing referring to them? How do I close these, or is there no need to?
Any help that sheds some light on this topic is much appreciated.
The pythonic way to deal with this is to use the with context manager:
with open(from_file) as in_file, open(to_file, 'w') as out_file:
indata = in_file.read()
out_file.write(indata)
Used with files like this, with will ensure all the necessary cleanup is done for you, even if read() or write() throw errors.
The default python interpeter, CPython, uses reference counting. This means that once there are no references to an object, it gets garbage collected, i.e. cleaned up.
In your case, doing
open(to_file, 'w').write(indata)
will create a file object for to_file, but not asign it to a name - this means there is no reference to it. You cannot possibly manipulate the object after this line.
CPython will detect this, and clean up the object after it has been used. In the case of a file, this means closing it automatically. In principle, this is fine, and your program won't leak memory.
The "problem" is this mechanism is an implementation detail of the CPython interpreter. The language standard explicitly gives no guarantee for it! If you are using an alternate interpreter such as pypy, automatic closing of files may be delayed indefinitely. This includes other implicit actions such as flushing writes on close.
This problem also applies to other resources, e.g. network sockets. It is good practice to always explicitly handle such external resources. Since python 2.6, the with statement makes this elegant:
with open(to_file, 'w') as out_file:
out_file.write(in_data)
TLDR: It works, but please don't do it.
You asked about the "basic concepts", so let's take it from the top: When you open a file, your program gains access to a system resource, that is, to something outside the program's own memory space. This is basically a bit of magic provided by the operating system (a system call, in Unix terminology). Hidden inside the file object is a reference to a "file descriptor", the actual OS resource associated with the open file. Closing the file tells the system to release this resource.
As an OS resource, the number of files a process can keep open is limited: Long ago the per-process limit was about 20 on Unix. Right now my OS X box imposes a limit of 256 open files (though this is an imposed limit, and can be raised). Other systems might set limits of a few thousand, or in the tens of thousands (per user, not per process in this case). When your program ends, all resources are automatically released. So if your program opens a few files, does something with them and exits, you can be sloppy and you'll never know the difference. But if your program will be opening thousands of files, you'll do well to release open files to avoid exceeding OS limits.
There's another benefit to closing files before your process exits: If you opened a file for writing, closing it will first "flush its output buffer". This means that i/o libraries optimize disk use by collecting ("buffering") what you write out, and saving it to disk in batches. If you write text to a file and immediately try to reopen and read it without first closing the output handle, you'll find that not everything has been written out. Also, if your program is closed too abruptly (with a signal, or occasionally even through normal exit), the output might never be flushed.
There's already plenty of other answers on how to release files, so here's just a brief list of the approaches:
Explicitly with close(). (Note for python newbies: Don't forget the parens! My students like to write in_file.close, which does nothing.)
Recommended: Implicitly, by opening files with the with statement. The close() method will be called when the end of the with block is reached, even in the event of abnormal termination (from an exception).
with open("data.txt") as in_file:
data = in_file.read()
Implicitly by the reference manager or garbage collector, if your python engine implements it. This is not recommended since it's not entirely portable; see the other answers for details. That's why the with statement was added to python.
Implicitly, when your program ends. If a file is open for output, this may run a risk of the program exiting before everything has been flushed to disk.
It is good practice to use the with keyword when dealing with file objects. This has the advantage that the file is properly closed after its suite finishes, even if an exception is raised on the way. It is also much shorter than writing equivalent try-finally blocks:
>>> with open('workfile', 'r') as f:
... read_data = f.read()
>>> f.closed
True
The answers so far are absolutely correct when working in python. You should use the with open() context manager. It's a great built-in feature, and helps shortcut a common programming task (opening and closing a file).
However, since you are a beginner and won't have access to context managers
and automated reference counting for the entirety of your career, I'll address the question from a general programming stance.
The first version of your code is perfectly fine. You open a file, save the reference, read from the file, then close it. This is how a lot of code is written when the language doesn't provide a shortcut for the task. The only thing I would improve is to move close() to where you are opening and reading the file. Once you open and read the file, you have the contents in memory and no longer need the file to be open.
in_file = open(from_file)
indata = in_file.read()
out_file.close()
out_file = open(to_file, 'w')
out_file.write(indata)
in_file.close()
A safe way to open files without having to worry that you didn't close them is like this:
with open(from_file, 'r') as in_file:
in_data = in_file.read()
with open(to_file, 'w') as out_file:
outfile.write(in_data)

Why should I close files in Python? [duplicate]

This question already has answers here:
Is explicitly closing files important?
(7 answers)
Is close() necessary when using iterator on a Python file object [duplicate]
(8 answers)
Closed 8 years ago.
Usually when I open files I never call the close() method, and nothing bad happens. But I've been told this is bad practice. Why is that?
For the most part, not closing files is a bad idea, for the following reasons:
It puts your program in the garbage collectors hands - though the file in theory will be auto closed, it may not be closed. Python 3 and Cpython generally do a pretty good job at garbage collecting, but not always, and other variants generally suck at it.
It can slow down your program. Too many things open, and thus more used space in the RAM, will impact performance.
For the most part, many changes to files in python do not go into effect until after the file is closed, so if your script edits, leaves open, and reads a file, it won't see the edits.
You could, theoretically, run in to limits of how many files you can have open.
As #sai stated below, Windows treats open files as locked, so legit things like AV scanners or other python scripts can't read the file.
It is sloppy programming (then again, I'm not exactly the best at remembering to close files myself!)
Found some good answers:
(1) It is a matter of good programming practice. If you don't close
them yourself, Python will eventually close them for you. In some
versions of Python, that might be the instant they are no longer
being used; in others, it might not happen for a long time. Under
some circumstances, it might not happen at all.
(2) When writing to a file, the data may not be written to disk until
the file is closed. When you say "output.write(...)", the data is
often cached in memory and doesn't hit the hard drive until the file
is closed. The longer you keep the file open, the greater the
chance that you will lose data.
(3) Since your operating system has strict limits on how many file
handles can be kept open at any one instant, it is best to get into
the habit of closing them when they aren't needed and not wait for
"maid service" to clean up after you.
(4) Also, some operating systems (Windows, in particular) treat open
files as locked and private. While you have a file open, no other
program can also open it, even just to read the data. This spoils
backup programs, anti-virus scanners, etc.
http://python.6.x6.nabble.com/Tutor-Why-do-you-have-to-close-files-td4341928.html
https://docs.python.org/2/tutorial/inputoutput.html
Open files use resources and may be locked, preventing other programs from using them. Anyway, it is good practice to use with when reading files, as it takes care of closing the file for you.
with open('file', 'r') as f:
read_data = f.read()
Here's an example of something "bad" that might happen if you leave a file open.
Open a file for writing in your python interpreter, write a string to it, then open that file in a text editor. On my system, the file will be empty until I close the file handle.
The close() method of a file object flushes any unwritten information and closes the file object, after which no more writing can be done.
Python automatically closes a file when the reference object of a file is reassigned to another file. It is a good practice to use the close() method to close a file.Here is the link about the close() method. I hope this helps.
You only have to call close() when you're writing to a file.
Python automatically closes files most of the time, but sometimes it won't, so you want to call it manually just in case.
I had a problem with that recently:
I was writing some stuff to a file in a for-loop, but if I interrupt the script with ^C, a lot of data which should have actually been written to the file wasn't there. It looks like Python stops to writing there for no reason. I opened the file before the for loop. Then I changed the code so that Python opens and closes the file for ever single pass of the loop.
Basically, if you write stuff for your own and you don't have any issues - it's fine, if you write stuff for more people than just yourself - put a close() inside the code, because someone could randomly get an error message and you should try to prevent this.

What does it mean to flush file contents in Python?

I am trying to teach myself Python by reading documentation. I am trying to understand what it means to flush a file buffer. According to documentation, "file.flush" does the following.
Flush the internal buffer, like stdio‘s fflush().
This may be a no-op on some file-like objects.
I don't know what "internal buffer" and "no-op" mean, but I think it says that flush writes data from some buffer to a file.
Hence, I ran this file toggling the pound sign in the line in the middle.
with open("myFile.txt", "w+") as file:
file.write("foo")
file.write("bar")
# file.flush()
file.write("baz")
file.write("quux")
However, I seem to get the same myFile.txt with and without the call to file.flush(). What effect does file.flush() have?
Python buffers writes to files. That is, file.write returns before the data is actually written to your hard drive. The main motivation of this is that a few large writes are much faster than many tiny writes, so by saving up the output of file.write until a bit has accumulated, Python can maintain good writing speeds.
file.flush forces the data to be written out at that moment. This is hand when you know that it might be a while before you have more data to write out, but you want other processes to be able to view the data you've already written. Imagine a log file that grows slowly. You don't want to have to wait ages before enough entries have built up to cause the data to be written out in one big chunk.
In either case, file.close causes the remaining data to be flushed, so "quux" in your code will be written out as soon as file (which is a really bad name as it shadows the builtin file constructor) falls out of scope of the with context manager and gets closed.
Note: your OS does some buffering of its own, but I believe every OS where Python is implemented will honor file.flush's request to write data out to the drive. Someone please correct me if I'm wrong.
By the way, "no-op" means "no operation", as in it won't actually do anything. For example, StringIO objects manipulate strings in memory, not files on your hard drive. StringIO.flush probably just immediately returns because there's not really anything for it to do.
Buffer content might be cached to improve performance. Flush makes sure that the content is written to disk completely, avoiding data loss. It is also useful when, for example, you want the line asking for user input printed completely on-screen before the next file operation takes place.

Why truncate when we open a file in 'w' mode in python

I am going through Zed Shaw's Python Book. I am currently working on the opening and reading files chapters. I am wondering why we need to do a truncate, when we are already opening the file in a 'w' mode?
print "Opening the file..."
target = open(filename, 'w')
print "Truncating the file. Goodbye!"
target.truncate()
It's redundant since, as you noticed, opening in write mode will overwrite the file. More information at Input and Output section of Python documentation.
So Zed Shaw calls truncate() on a file that is already truncated. OK, that's pretty pointless. Why does he do that? Who knows!? Ask him!
Maybe he does it to show that the method exists? Could be, but that would be pretty daft, since I've never needed to truncate a file in my 15 years as a programmer so it has no place in a newbie book.
Maybe he does it because he thinks he has to truncate the file, and he simply isn't aware that it's pointless?
Maybe he does it intentionally to confuse newbies? That would fit with his general modus operandi, which seems to be to intentionally piss people off for absolutely no reason.
Update: The reason he does this is now clear. In later editions he lists this question as a "common question" in the chapter, and tells you to go read the docs. It's hence there to:
Teach you to read the documentation.
Understand every part of code you copy paste from somewhere before you copy-paste it.
You can debate if this is good teaching style or not, I wouldn't know.
The number of "Help I don't understand Zed Shaws book"-questions on SO had dwindled, so I can't say that it's any worse than any other book out there, which probably means it's better than many. :-)
If you would READ the questions before asking it, he answers it for you:
Extra Credit: " If you feel you do not understand this, go back
through and use the comment trick to get it squared away in your mind.
One simple English comment above each line will help you understand,
or at least let you know what you need to research more.
Write a script similar to the last exercise that uses read and argv to
read the file you just created.
There's too much repetition in this file. Use strings, formats, and
escapes to print out line1, line2, and line3 with just one
target.write() command instead of 6.
Find out why we had to pass a 'w' as an extra parameter to open. Hint:
open tries to be safe by making you explicitly say you want to write a
file.
If you open the file with 'w' mode, then do you really need the
target.truncate()?
Go read the docs for Python's open function and see if that's true." -
Zed Shaw.
He explicitly wants you to find these things out for yourself, this is why his extra credit is important.
He also EXPLICITLY states that he wants you to PAY ATTENTION TO DETAIL. Every little thing matters.
While it's not useful to truncate when opening in 'w' mode, it is useful in 'r+'. Though that's not the OP's question, I'm going to leave this here for anyone who gets lead here by Google as I did.
Let's say you open (with mode 'r+', remember there is no 'rw' mode) a 5 line indented json file and modify the json.load-ed object to be only 3 lines. If you target.seek(0) before writing the data back to the file, you will end up with 2 lines of trailing garbage. If you target.truncate() it you will not.
I know this seems obvious, but I'm here because I am fixing a bug that occurred after an object that stayed the exact same size for years... shrank because of a signing algorithm change. (What is not obvious is the unit tests I had to add to prevent this in the future. I wrote my longest docstring ever explaining why I'm testing signing with 2 ridiculously contrived algorithms.)
Hope this helps someone.
With truncate(), you can declare how much of the file you want to remove, based on where you're currently at in the file. Without parameters, truncate() acts like w, whereas w always just wipes the whole file clean. So, these two methods can act identically, but they don't necessarily.
That's just a reflection of the standard posix semantics. see man fopen(3). Python just wraps that.
When you open a file in write mode, you truncate the original (everything that was there before is deleted). Then whatever you write is added to the file. The problem is, write wants to add information from the beginning, and raises an IOError when the pointer is left at the end. For this type of writing you want to use append (open the file with the 'a+' argument).
Recently came across a scenario where I needed to create big files for test purposes. One quick way to do this is to use truncate:
with open('filename.bin', 'wb') as f:
f.truncate(1024 * 1024 * 1024) # 1GB
The file has no content, but reports to the OS the size you want and works in many testing scenarios.
Scenario:
I was making a ransomware and needed to encrypt the file, My aim is not to encrypt the complete file but that much only to corrupt it because I want it to be fast in what it does and so saving time in encrypting it all, so I decided to edit some text only.
Now
If I use write then my purpose is destroyed here because I would have to write the file a to z. Then what can I do?
well here truncate can be put in use.
Below is my code which just takes a token of last 16 digits in a file:
with open('saver.txt', 'rb+') as f:
text_len = len(f.read())
f.truncate(text_len-16)
f.close()
I open the file
Truncate only 16 characters from file which will be replaced by me later.
Notice I am using it in read only mode, If I use in write mode than File is truncated completely and it will throw error when our truncate command comes in.
Answering this question after 8.4 years. :)

Categories

Resources