Why truncate when we open a file in 'w' mode in python

Why truncate when we open a file in 'w' mode in python - python

I am going through Zed Shaw's Python Book. I am currently working on the opening and reading files chapters. I am wondering why we need to do a truncate, when we are already opening the file in a 'w' mode?
print "Opening the file..."
target = open(filename, 'w')
print "Truncating the file. Goodbye!"
target.truncate()

It's redundant since, as you noticed, opening in write mode will overwrite the file. More information at Input and Output section of Python documentation.

So Zed Shaw calls truncate() on a file that is already truncated. OK, that's pretty pointless. Why does he do that? Who knows!? Ask him!
Maybe he does it to show that the method exists? Could be, but that would be pretty daft, since I've never needed to truncate a file in my 15 years as a programmer so it has no place in a newbie book.
Maybe he does it because he thinks he has to truncate the file, and he simply isn't aware that it's pointless?
Maybe he does it intentionally to confuse newbies? That would fit with his general modus operandi, which seems to be to intentionally piss people off for absolutely no reason.
Update: The reason he does this is now clear. In later editions he lists this question as a "common question" in the chapter, and tells you to go read the docs. It's hence there to:
Teach you to read the documentation.
Understand every part of code you copy paste from somewhere before you copy-paste it.
You can debate if this is good teaching style or not, I wouldn't know.
The number of "Help I don't understand Zed Shaws book"-questions on SO had dwindled, so I can't say that it's any worse than any other book out there, which probably means it's better than many. :-)

If you would READ the questions before asking it, he answers it for you:
Extra Credit: " If you feel you do not understand this, go back
through and use the comment trick to get it squared away in your mind.
One simple English comment above each line will help you understand,
or at least let you know what you need to research more.
Write a script similar to the last exercise that uses read and argv to
read the file you just created.
There's too much repetition in this file. Use strings, formats, and
escapes to print out line1, line2, and line3 with just one
target.write() command instead of 6.
Find out why we had to pass a 'w' as an extra parameter to open. Hint:
open tries to be safe by making you explicitly say you want to write a
file.
If you open the file with 'w' mode, then do you really need the
target.truncate()?
Go read the docs for Python's open function and see if that's true." -
Zed Shaw.
He explicitly wants you to find these things out for yourself, this is why his extra credit is important.
He also EXPLICITLY states that he wants you to PAY ATTENTION TO DETAIL. Every little thing matters.

While it's not useful to truncate when opening in 'w' mode, it is useful in 'r+'. Though that's not the OP's question, I'm going to leave this here for anyone who gets lead here by Google as I did.
Let's say you open (with mode 'r+', remember there is no 'rw' mode) a 5 line indented json file and modify the json.load-ed object to be only 3 lines. If you target.seek(0) before writing the data back to the file, you will end up with 2 lines of trailing garbage. If you target.truncate() it you will not.
I know this seems obvious, but I'm here because I am fixing a bug that occurred after an object that stayed the exact same size for years... shrank because of a signing algorithm change. (What is not obvious is the unit tests I had to add to prevent this in the future. I wrote my longest docstring ever explaining why I'm testing signing with 2 ridiculously contrived algorithms.)
Hope this helps someone.

With truncate(), you can declare how much of the file you want to remove, based on where you're currently at in the file. Without parameters, truncate() acts like w, whereas w always just wipes the whole file clean. So, these two methods can act identically, but they don't necessarily.

That's just a reflection of the standard posix semantics. see man fopen(3). Python just wraps that.

When you open a file in write mode, you truncate the original (everything that was there before is deleted). Then whatever you write is added to the file. The problem is, write wants to add information from the beginning, and raises an IOError when the pointer is left at the end. For this type of writing you want to use append (open the file with the 'a+' argument).

Recently came across a scenario where I needed to create big files for test purposes. One quick way to do this is to use truncate:
with open('filename.bin', 'wb') as f:
f.truncate(1024 * 1024 * 1024) # 1GB
The file has no content, but reports to the OS the size you want and works in many testing scenarios.

Scenario:
I was making a ransomware and needed to encrypt the file, My aim is not to encrypt the complete file but that much only to corrupt it because I want it to be fast in what it does and so saving time in encrypting it all, so I decided to edit some text only.
Now
If I use write then my purpose is destroyed here because I would have to write the file a to z. Then what can I do?
well here truncate can be put in use.
Below is my code which just takes a token of last 16 digits in a file:
with open('saver.txt', 'rb+') as f:
text_len = len(f.read())
f.truncate(text_len-16)
f.close()
I open the file
Truncate only 16 characters from file which will be replaced by me later.
Notice I am using it in read only mode, If I use in write mode than File is truncated completely and it will throw error when our truncate command comes in.
Answering this question after 8.4 years. :)

Related

Is it important to specify file mode('w', 'r', 'a') when opening a file? What happens when you don't?

Hi I'm a beginner programmer and I recently started writing some code. I found out that when opening files, you can specify the file mode or the operation you want to do with the file.
For example: with open(file.txt, 'r') as file:
I tried experimenting and found out that if I just do with open(file.txt) as file: I can still do the things I want to do with the file which is read its contents into memory.
My question then are:
Is it really important to specify the file mode when accessing/opening it?
Is it bad if I don't specify?
Does not specifying only work with python?
Thanks in advance

Hi #Mendrezzzzz it's nice that you starting programming
No if you just want to read it it is not needed, because "r" is default, but it makes it more readable, so that if you or someone else later is reading your Code immediately knows which mode this file is opened
For Readability in my opinion yes. But it doesn't break your Code so practically no. But please consider the readability for the future you or person who is reading your code
Well this is tricky there are many languages, where you have to give a Mode for a Filestream or where you have to Use completely different Classes for the different Modes (Java), but there sure are other languages like python making Reading the default mode (I don't know every language)
I hope this helps you. I would recommend to learn also Coding Conventions if you want really to learn programming.
Good Python tutorial https://automatetheboringstuff.com/2e/chapter1/

Error! blahfile is not UTF-8 encoded. Saving disabled

So, I'm trying to write a gzip file, actually from the net, but to simplify I wrote some very basic test.
import gzip
LINES = [b'I am a test line' for _ in range(100_000)]
f = gzip.open('./test.text.gz', 'wb')
for line in LINES:
f.write(line)
f.close()
It runs great, and I can see in Jupyter that it has created the test.txt.gz file in the directory listing. So I click on it expecting a whole host of garbage characters indicative of a binary file, like you would see in Notepad.
However, instead I get this ...
Error! test.text.gz is not UTF-8 encoded.
Saving disabled.
See console for more details
Which makes me think, oh my god, coding error, something is wrong with my encoding, my saving, can I save bytes ? Am I using the correct routines ?? And then spend 5 hours trying all combinations of code and modules.

The very simple answer to this is none of the above. This is a very misleading error message, especially when the code you've written was designed to save a binary file with a weird extension.
What this actually means is ...
I HAVE NO IDEA HOW TO DISPLAY THIS DATA ! - Yours Jupyter
So, go to your File Explorer, Finder navigate to the just saved file and open it. Voila !!
Everything worked exactly as planned, there is no error.
Hope this saves other people many hours of debugging, and please Jupyter, change your error message.

It is also possible to select the file and, instead of double-clicking to open, go to 'view' as it interprets it correctly (or well, mostly, depending on special characters, mine is in Spanish and apparently it doesn't support accents).
This way we can avoid looking for the directory where we got the file and not getting out of jupyter :)

Why a variable is required to open files

I'm having a bit of a conceptual problem. For writing to the file "to_file", this works:
out_file = open(to_file, 'w')
out_file.write(indata)
...but this doesn't:
(open(to_file, 'w')).write(indata)
In theory, shouldn't swapping out a variable's (out_file) definition for the variable itself produce the same result? I'm confused as to why the extra step of creating the variable is necessary.

As others have pointed out, your code will actually open and write to the file. However,...
In your second, single-line code, you now have no reference to the open file. Therefore you have no way to close it or do anything else with it.
Leaving a file open is a resource leak. If your program closes right away, Python will try to close the file just before ending. But Python could possible fail, for a variety of reasons. For example, the removable disk drive containing the file may be removed after you write to the file but before your program ends. That could make the file unreadable on the removable drive--and I have seen this happen. And if your program does not close right away, you have this extra resource hanging around that takes memory and other resources that need not be taken. If your program continues for a long time, the growing resources could slow down or stop the computer.
Even if your program will close right away, this is a bad habit to develop. You don't just want to write programs, you want to write code that will work well in a variety of situations. You may think "I will never use this code in a long-running program." Such declarations often turn out to be mistaken. Coding is difficult enough--don't make it harder for yourself. Avoid the "anti-pattern" of your second example.
There is a better pattern in Python for such things, using the with statement. Read that link and use that pattern rather than either of your two examples.
with open(to_file, 'w') as out_file:
out_file.write(indata)
Those two lines opened the file, wrote the data to the file, then closed the file. If you want to do more with the file before it is closed, put that code in the indented section under the with statement.

In Python 2.7, both of your provided examples will work and write to the file.

What does it mean to flush file contents in Python?

I am trying to teach myself Python by reading documentation. I am trying to understand what it means to flush a file buffer. According to documentation, "file.flush" does the following.
Flush the internal buffer, like stdio‘s fflush().
This may be a no-op on some file-like objects.
I don't know what "internal buffer" and "no-op" mean, but I think it says that flush writes data from some buffer to a file.
Hence, I ran this file toggling the pound sign in the line in the middle.
with open("myFile.txt", "w+") as file:
file.write("foo")
file.write("bar")
# file.flush()
file.write("baz")
file.write("quux")
However, I seem to get the same myFile.txt with and without the call to file.flush(). What effect does file.flush() have?

Python buffers writes to files. That is, file.write returns before the data is actually written to your hard drive. The main motivation of this is that a few large writes are much faster than many tiny writes, so by saving up the output of file.write until a bit has accumulated, Python can maintain good writing speeds.
file.flush forces the data to be written out at that moment. This is hand when you know that it might be a while before you have more data to write out, but you want other processes to be able to view the data you've already written. Imagine a log file that grows slowly. You don't want to have to wait ages before enough entries have built up to cause the data to be written out in one big chunk.
In either case, file.close causes the remaining data to be flushed, so "quux" in your code will be written out as soon as file (which is a really bad name as it shadows the builtin file constructor) falls out of scope of the with context manager and gets closed.
Note: your OS does some buffering of its own, but I believe every OS where Python is implemented will honor file.flush's request to write data out to the drive. Someone please correct me if I'm wrong.
By the way, "no-op" means "no operation", as in it won't actually do anything. For example, StringIO objects manipulate strings in memory, not files on your hard drive. StringIO.flush probably just immediately returns because there's not really anything for it to do.

Buffer content might be cached to improve performance. Flush makes sure that the content is written to disk completely, avoiding data loss. It is also useful when, for example, you want the line asking for user input printed completely on-screen before the next file operation takes place.

does write mode create a new file if not existing?

I'm trying to write to a file that does not already exist using a file context manager.
a=open ('C:/c.txt' , 'w')
The above does not succeed. How would I create a file for writing if it does already exist?

Yes, 'w' is specified as creating a new file -- as the docs put it,
'w' for writing (truncating the file
if it already exists),
(clearly inferring it's allowed to not already exist). Please show the exact traceback, not just your own summary of it, as details matters -- e.g. if the actual path you're using is different, what's missing might be the drive, or some intermediate directory; or there might be permission problems.

[Edited to reflect that the problem is likely not forward vs. back slash]
If I understood correctly, you want the file to be automatically created for you, right?
open in write mode does create the file for you. It would be more clear if you told us the exact error you're getting. It might be something like you not having permission to write in C:.
I had previously suggested that it might be because of the forward slash, and indicated that the OP could try:
a = open(r'C:\c.txt', 'w')
Note the r before the file path, indicating raw mode (that is, the backslash won't be interpreted as special).
However, as Brian Neal pointed out (as well as others, commenting elsewhere), that's likely not the reason for the error. I'm keeping it here simply for historical purposes.

You most probably are trying to write to a directory that doesn't exist or one that you don't have permission writing to.
If you want to write to C:\foo\bar\foobar.txt then make sure that you've got a C:\foo\bar\ that exists (and in case permissions work on Windows, make sure you've got the permission to write there).
Now when you open the file in write mode, a file should be created.

If you're asking how to be warned when the file doesn't exist, then you need to explicitly check for that.
See here

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.