Having trouble understanding Python's 'truncate' and 'write' - python

I'm in the Python shell and I'm trying to understand the basics. Here is what I have typed:
doc = open('test.txt', 'w') # this opens the file or creates it if it doesn't exist
doc.write('blah blah')
doc.truncate()
I understand the first line. However, in the second line, isn't it supposed to write 'blah blah' to the file? It doesn't do that. However, when I run a truncate function to the file, 'blah blah' suddenly shows up. Can someone explain to me how this logic works?
I thought truncate was supposed to erase the contents of the file? Why does it make the previous write line show up?

From the manual:
file.truncate([size])
[...] The size defaults to the current position [...]
so as you have opened and written to the file, the current position is the end of the file. Basically, you are truncating from the end of the file. Hence the absence of effect other than actually flushing the buffers, and getting the text written to disk. (truncate flushes before truncating)
Try with truncate(0); that will empty the file.

Same as you with the context manager instead:
with open('test.txt', 'w') as doc:
doc.write('blah blah')
# doc.truncate()
The above will truncate to the current position, which is at the end of the file, meaning it doesn't truncate anything.
Do this instead, it will truncate the file at the 0th byte, effectively clearing it.
doc.truncate(0)
I see from your comments that you may still be having trouble, trouble that may be solved by using the context manager:
>>> def foofile():
... with open('/temp/foodeleteme', 'w') as foo:
... foo.write('blah blah blah')
... foo.truncate(0)
... with open('/temp/foodeleteme') as foo:
... print foo.read()
...
>>> foofile()
prints nothing.

If you don't specify size parameter. The function take current position.
If you want to erase content of file:
doc = open('test.txt', 'w') # this opens the file or creates it if it doesn't exist
doc.write('blah blah')
doc.truncate(0)
or better:
with open('test.txt', 'w') as doc:
doc.write('blah blah')
doc.truncate(0)

Related

Writing into a file then reading it on Python 3.6.2

target=open("test.txt",'w+')
target.write('ffff')
print(target.read())
When running the following python script (test.txt is an empty file), it prints an empty string.
However, when reopening the file, it can read it just fine:
target=open("test.txt",'w+')
target.write('ffff')
target=open("test.txt",'r')
print(target.read())
This prints out 'ffff' as needed.
Why is this happening? Is 'target' still recognized as having no content, even though I updated it in line 2, and I have to reassign test.txt to it?
A file has a read/write position. Writing to the file puts that position at the end of the written text; reading starts from the same position.
Put that position back to the start with the seek method:
with open("test.txt",'w+') as target:
target.write('ffff')
target.seek(0) # to the start again
print(target.read())
Demo:
>>> with open("test.txt",'w+') as target:
... target.write('ffff')
... target.seek(0) # to the start again
... print(target.read())
...
4
0
ffff
The numbers are the return values of target.write() and target.seek(); they are the number of characters written, and the new position.
No need to close and re-open it. You just need to seek back to the file's starting point before reading it:
with open("test.txt",'w+') as f:
f.write('ffff')
f.seek(0)
print(f.read())
Try flushing, then seeking the beginning of the file:
f = open(path, 'w+')
f.write('foo')
f.write('bar')
f.flush()
f.seek(0)
print(f.read())
you have to close() the file before reading it. You cannot read and write to a file at the same time. this causes inconsistency.

Python is reading past the end of the file. Is this a security risk? [duplicate]

Started Python a week ago and I have some questions to ask about reading and writing to the same files. I've gone through some tutorials online but I am still confused about it. I can understand simple read and write files.
openFile = open("filepath", "r")
readFile = openFile.read()
print readFile
openFile = open("filepath", "a")
appendFile = openFile.write("\nTest 123")
openFile.close()
But, if I try the following I get a bunch of unknown text in the text file I am writing to. Can anyone explain why I am getting such errors and why I cannot use the same openFile object the way shown below.
# I get an error when I use the codes below:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
readFile = openFile.read()
print readFile
openFile.close()
I will try to clarify my problems. In the example above, openFile is the object used to open file. I have no problems if I want write to it the first time. If I want to use the same openFile to read files or append something to it. It doesn't happen or an error is given. I have to declare the same/different open file object before I can perform another read/write action to the same file.
#I have no problems if I do this:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
openFile2 = open("filepath", "r+")
readFile = openFile2.read()
print readFile
openFile.close()
I will be grateful if anyone can tell me what I did wrong here or is it just a Pythong thing. I am using Python 2.7. Thanks!
Updated Response:
This seems like a bug specific to Windows - http://bugs.python.org/issue1521491.
Quoting from the workaround explained at http://mail.python.org/pipermail/python-bugs-list/2005-August/029886.html
the effect of mixing reads with writes on a file open for update is
entirely undefined unless a file-positioning operation occurs between
them (for example, a seek()). I can't guess what
you expect to happen, but seems most likely that what you
intend could be obtained reliably by inserting
fp.seek(fp.tell())
between read() and your write().
My original response demonstrates how reading/writing on the same file opened for appending works. It is apparently not true if you are using Windows.
Original Response:
In 'r+' mode, using write method will write the string object to the file based on where the pointer is. In your case, it will append the string "Test abc" to the start of the file. See an example below:
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\n'
>>> f.write("foooooooooooooo")
>>> f.close()
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\nfoooooooooooooo'
The string "foooooooooooooo" got appended at the end of the file since the pointer was already at the end of the file.
Are you on a system that differentiates between binary and text files? You might want to use 'rb+' as a mode in that case.
Append 'b' to the mode to open the file in binary mode, on systems
that differentiate between binary and text files; on systems that
don’t have this distinction, adding the 'b' has no effect.
http://docs.python.org/2/library/functions.html#open
Every open file has an implicit pointer which indicates where data will be read and written. Normally this defaults to the start of the file, but if you use a mode of a (append) then it defaults to the end of the file. It's also worth noting that the w mode will truncate your file (i.e. delete all the contents) even if you add + to the mode.
Whenever you read or write N characters, the read/write pointer will move forward that amount within the file. I find it helps to think of this like an old cassette tape, if you remember those. So, if you executed the following code:
fd = open("testfile.txt", "w+")
fd.write("This is a test file.\n")
fd.close()
fd = open("testfile.txt", "r+")
print fd.read(4)
fd.write(" IS")
fd.close()
... It should end up printing This and then leaving the file content as This IS a test file.. This is because the initial read(4) returns the first 4 characters of the file, because the pointer is at the start of the file. It leaves the pointer at the space character just after This, so the following write(" IS") overwrites the next three characters with a space (the same as is already there) followed by IS, replacing the existing is.
You can use the seek() method of the file to jump to a specific point. After the example above, if you executed the following:
fd = open("testfile.txt", "r+")
fd.seek(10)
fd.write("TEST")
fd.close()
... Then you'll find that the file now contains This IS a TEST file..
All this applies on Unix systems, and you can test those examples to make sure. However, I've had problems mixing read() and write() on Windows systems. For example, when I execute that first example on my Windows machine then it correctly prints This, but when I check the file afterwards the write() has been completely ignored. However, the second example (using seek()) seems to work fine on Windows.
In summary, if you want to read/write from the middle of a file in Windows I'd suggest always using an explicit seek() instead of relying on the position of the read/write pointer. If you're doing only reads or only writes then it's pretty safe.
One final point - if you're specifying paths on Windows as literal strings, remember to escape your backslashes:
fd = open("C:\\Users\\johndoe\\Desktop\\testfile.txt", "r+")
Or you can use raw strings by putting an r at the start:
fd = open(r"C:\Users\johndoe\Desktop\testfile.txt", "r+")
Or the most portable option is to use os.path.join():
fd = open(os.path.join("C:\\", "Users", "johndoe", "Desktop", "testfile.txt"), "r+")
You can find more information about file IO in the official Python docs.
Reading and Writing happens where the current file pointer is and it advances with each read/write.
In your particular case, writing to the openFile, causes the file-pointer to point to the end of file. Trying to read from the end would result EOF.
You need to reset the file pointer, to point to the beginning of the file before through seek(0) before reading from it
You can read, modify and save to the same file in python but you have actually to replace the whole content in file, and to call before updating file content:
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
I needed a function to go through all subdirectories of folder and edit content of the files based on some criteria, if it helps:
new_file_content = ""
for directories, subdirectories, files in os.walk(folder_path):
for file_name in files:
file_path = os.path.join(directories, file_name)
# open file for reading and writing
with io.open(file_path, "r+", encoding="utf-8") as edit_file:
for current_line in edit_file:
if condition in current_line:
# update current line
current_line = current_line.replace('john', 'jack')
new_file_content += current_line
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
# delete actual file content
edit_file.truncate()
# rewrite updated file content
edit_file.write(new_file_content)
# empties new content in order to set for next iteration
new_file_content = ""
edit_file.close()

Beginner Python: Reading and writing to the same file

Started Python a week ago and I have some questions to ask about reading and writing to the same files. I've gone through some tutorials online but I am still confused about it. I can understand simple read and write files.
openFile = open("filepath", "r")
readFile = openFile.read()
print readFile
openFile = open("filepath", "a")
appendFile = openFile.write("\nTest 123")
openFile.close()
But, if I try the following I get a bunch of unknown text in the text file I am writing to. Can anyone explain why I am getting such errors and why I cannot use the same openFile object the way shown below.
# I get an error when I use the codes below:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
readFile = openFile.read()
print readFile
openFile.close()
I will try to clarify my problems. In the example above, openFile is the object used to open file. I have no problems if I want write to it the first time. If I want to use the same openFile to read files or append something to it. It doesn't happen or an error is given. I have to declare the same/different open file object before I can perform another read/write action to the same file.
#I have no problems if I do this:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
openFile2 = open("filepath", "r+")
readFile = openFile2.read()
print readFile
openFile.close()
I will be grateful if anyone can tell me what I did wrong here or is it just a Pythong thing. I am using Python 2.7. Thanks!
Updated Response:
This seems like a bug specific to Windows - http://bugs.python.org/issue1521491.
Quoting from the workaround explained at http://mail.python.org/pipermail/python-bugs-list/2005-August/029886.html
the effect of mixing reads with writes on a file open for update is
entirely undefined unless a file-positioning operation occurs between
them (for example, a seek()). I can't guess what
you expect to happen, but seems most likely that what you
intend could be obtained reliably by inserting
fp.seek(fp.tell())
between read() and your write().
My original response demonstrates how reading/writing on the same file opened for appending works. It is apparently not true if you are using Windows.
Original Response:
In 'r+' mode, using write method will write the string object to the file based on where the pointer is. In your case, it will append the string "Test abc" to the start of the file. See an example below:
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\n'
>>> f.write("foooooooooooooo")
>>> f.close()
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\nfoooooooooooooo'
The string "foooooooooooooo" got appended at the end of the file since the pointer was already at the end of the file.
Are you on a system that differentiates between binary and text files? You might want to use 'rb+' as a mode in that case.
Append 'b' to the mode to open the file in binary mode, on systems
that differentiate between binary and text files; on systems that
don’t have this distinction, adding the 'b' has no effect.
http://docs.python.org/2/library/functions.html#open
Every open file has an implicit pointer which indicates where data will be read and written. Normally this defaults to the start of the file, but if you use a mode of a (append) then it defaults to the end of the file. It's also worth noting that the w mode will truncate your file (i.e. delete all the contents) even if you add + to the mode.
Whenever you read or write N characters, the read/write pointer will move forward that amount within the file. I find it helps to think of this like an old cassette tape, if you remember those. So, if you executed the following code:
fd = open("testfile.txt", "w+")
fd.write("This is a test file.\n")
fd.close()
fd = open("testfile.txt", "r+")
print fd.read(4)
fd.write(" IS")
fd.close()
... It should end up printing This and then leaving the file content as This IS a test file.. This is because the initial read(4) returns the first 4 characters of the file, because the pointer is at the start of the file. It leaves the pointer at the space character just after This, so the following write(" IS") overwrites the next three characters with a space (the same as is already there) followed by IS, replacing the existing is.
You can use the seek() method of the file to jump to a specific point. After the example above, if you executed the following:
fd = open("testfile.txt", "r+")
fd.seek(10)
fd.write("TEST")
fd.close()
... Then you'll find that the file now contains This IS a TEST file..
All this applies on Unix systems, and you can test those examples to make sure. However, I've had problems mixing read() and write() on Windows systems. For example, when I execute that first example on my Windows machine then it correctly prints This, but when I check the file afterwards the write() has been completely ignored. However, the second example (using seek()) seems to work fine on Windows.
In summary, if you want to read/write from the middle of a file in Windows I'd suggest always using an explicit seek() instead of relying on the position of the read/write pointer. If you're doing only reads or only writes then it's pretty safe.
One final point - if you're specifying paths on Windows as literal strings, remember to escape your backslashes:
fd = open("C:\\Users\\johndoe\\Desktop\\testfile.txt", "r+")
Or you can use raw strings by putting an r at the start:
fd = open(r"C:\Users\johndoe\Desktop\testfile.txt", "r+")
Or the most portable option is to use os.path.join():
fd = open(os.path.join("C:\\", "Users", "johndoe", "Desktop", "testfile.txt"), "r+")
You can find more information about file IO in the official Python docs.
Reading and Writing happens where the current file pointer is and it advances with each read/write.
In your particular case, writing to the openFile, causes the file-pointer to point to the end of file. Trying to read from the end would result EOF.
You need to reset the file pointer, to point to the beginning of the file before through seek(0) before reading from it
You can read, modify and save to the same file in python but you have actually to replace the whole content in file, and to call before updating file content:
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
I needed a function to go through all subdirectories of folder and edit content of the files based on some criteria, if it helps:
new_file_content = ""
for directories, subdirectories, files in os.walk(folder_path):
for file_name in files:
file_path = os.path.join(directories, file_name)
# open file for reading and writing
with io.open(file_path, "r+", encoding="utf-8") as edit_file:
for current_line in edit_file:
if condition in current_line:
# update current line
current_line = current_line.replace('john', 'jack')
new_file_content += current_line
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
# delete actual file content
edit_file.truncate()
# rewrite updated file content
edit_file.write(new_file_content)
# empties new content in order to set for next iteration
new_file_content = ""
edit_file.close()

f.read coming up empty

I'm doing all this in the interpreter..
loc1 = '/council/council1'
file1 = open(loc1, 'r')
at this point i can do file1.read() and it prints the file's contents as a string to standard output
but if i add this..
string1 = file1.read()
string 1 comes back empty.. i have no idea what i could be doing wrong. this seems like the most basic thing!
if I go on to type file1.read() again, the output to standard output is just an empty string. so, somehow i am losing my file when i try to create a string with file1.read()
You can only read a file once. After that, the current read-position is at the end of the file.
If you add file1.seek(0) before you re-read it, you should be able to read the contents again. A better approach, however, is to read into a string the first time and then keep it in memory:
loc1 = '/council/council1'
file1 = open(loc1, 'r')
string1 = file1.read()
print string1
You do not lose it, you just move offset pointer to the end of file and try to read some more data. Since it is the end of the file, no more data is available and you get empty string. Try reopening file or seeking to zero position:
f.read()
f.seek(0)
f.read()
Using with is the best syntax to use because it closes the connection to the file after using it(since python 2.5):
with open('/council/council1', 'r') as input_file:
text = input_file.read()
print(text)
To quote the official documentation on read():
To read a file’s contents, call f.read(size)
When size is omitted or negative, the entire contents of the file will
be read and returned;
And the most relevant part:
If the end of the file has been reached, f.read() will return an empty
string ('').
Which means that if you use read() twice consecutively, it is expected that the second time you'll get an empty string. Either store it the first time or use f.seek(0) to go back to the start. Together, they provide a lower level API to give you greater control.
Besides using a context manager to automatically open and close the file, there's another way to read a whole text file, using pathlib, example below:
#!/usr/bin/env python3
from pathlib import Path
txt_file = Path("myfile.txt")
try:
content = txt_file.read_text()
except FileNotFoundError:
print("Could not find file")
else:
print(f"The content is: {content}")
print(f"I can also read again: {txt_file.read_text()}")
As you can see, you can call read_text() several times and you'll get the full content, no surprises. Of course you wouldn't want to do that in production code since read_text() opens and closes the file each time, it's still best to store it. I could recommend pathlib highly when dealing with files and file paths.
It's outside the scope, but it may be worth noting a difference when reading line by line. Unlike the file object obtained by open(), PosixPath returned by Path() is not iterable. The equivalent of:
with open('file.txt') as f:
for line in f:
print(line)
Would be something like:
for line in Path('file.txt').read_text().split('\n'):
print(line)
One advantage of the first approach, with open, is that the entire file is not read into memory at once.
make sure your location is correct. Do you actually have a directory called /council under your root directory (/) ?. also use, os.path.join() to create your path
loc1 = os.path.join("/path","dir1","dir2")

Why the second time I run "readlines" on the same file nothing is returned?

>>> f = open('/tmp/version.txt', 'r')
>>> f
<open file '/tmp/version.txt', mode 'r' at 0xb788e2e0>
>>> f.readlines()
['2.3.4\n']
>>> f.readlines()
[]
>>>
I've tried this in Python's interpreter. Why does this happen?
You need to seek to the beginning of the file. Use f.seek(0) to return to the begining:
>>> f = open('/tmp/version.txt', 'r')
>>> f
<open file '/tmp/version.txt', mode 'r' at 0xb788e2e0>
>>> f.readlines()
['2.3.4\n']
>>> f.seek(0)
>>> f.readlines()
['2.3.4\n']
>>>
Python keeps track of where you are in the file. When you're at the end, it doesn't automatically roll back over. Try f.seek(0).
The important part to understand that some of the other posters don't explicitly state is that files are read with a cursor that marks the current position in the file. So on the first readlines() call the cursor is at the beginning of your file, and is progressed all the way to the end of the file since all the files data was returned. On the second readlines call the cursor is at the end of the file, so when it reads to the end of the file, it doesn't move at all, and no data is returned. For educational purposes, you could write a quick bit of code that would open a file, read a few bytes or lines out, and then call readlines(), you will see that the output of the readlines() call begins where you left off with your previous reads, and continues until the end of the file.
The seek(0) call mentioned by other will allow you to reset the cursor at the beginning of the file to start over with the reads.
In addition to seeking to the beginning of the file, you can also store the value as something that you can reuse later if you just need them in memory. Something like this:
with open('tmp/version.txt', 'r') as f:
lines = f.readlines()
The with statement is new in 2.6 I believe, in prior versions you'd need to import it from future.

Categories

Resources