Read the file while python is still rewriting the file - python

I am trying to access the content of a file while that file is still being updated. Following is my code that does the writing to file job:
for i in range(100000):
fp = open("text.txt", "w")
fp.write(str(i))
fp.close()
#time.sleep(1)
My problem now is that whenever I try to open my file while the for loop is still running, I get an empty file in text editor(I except to see an "updated" number).
I am wondering is there a way that allows me to view the content of file before the for loop ends?
Thanks in advance for any help: )

Do not open file inside for loop. It is a bad practice and bad code.
Each and every time you create a new object. That is why you get an empty file.
fp = open("text.txt", "r+")
for i in range(100000):
fp.seek(0)
fp.write(str(i))
fp.truncate()
fp.close()

Modern operating systems buffer file writes to improve performance: several consecutive writes are lumped together before they are actually written to the disc. If you want the writes to propagate to the disk immediately, use method flush(), but remember that it drastically reduces application performance:
with open("text.txt", "w") as fp:
for i in range(100000):
fp.write(str(i))
fp.flush()

When you write to a file it usually does not actually get written to disk until file is closed. If you want it to be written immediately you need to add a flush to each iteration, hence:
fp = open("text.txt", "w")
for i in range(100000):
fp.write(str(i))
fp.write("\n")
fp.flush()
fp.close()

Related

Python saving data inside Memory? (ram)

I am new to Python, but I didn't know this til yet.
I have a basic program inside a for loop, that requests data from a site and saves it to a text file
But when I checked inside my task manager I saw that the memory usage only increase? This might be a problem for me when running this for a long time.
Is it standard for Python to do this or can you change it?
Here is a what the program basically is
savefile = open("file.txt", "r+")
for i in savefile:
#My code goes here
savefile.write(i)
#end of loop
savefile.close()
Python does not write to file until you call .close() or .flush() or until it hits a specified buffer size. This question might help you: How often does python flush to a file?
As #Almog said, Python does not write to the file immediately. Because of this, every line you write to the file gets stored into RAM until you use savefile.close(), which flushes the internal buffer and writes everything to the file. This would explain the extra memory usage.
Try changing the loop to this:
savefile = open('file.txt', 'r+')
for i in savefile:
savefile.write(i)
savefile.flush() #flushes buffer, saving RAM
savefile.close()
There is a better Solution, in pythonic way, to this:
with open("your_file.txt", "write_mode") as file_variable_name:
for line in file_name:
file_name.write(line)
file_name.flush()
This code flushes the File for each line and after it's execution it closes the File thanks to the with-Statement

Can you write to a text file and then read from that same text file in the same program?

Basically I want to be able to calculate a parameter store it was a text file then read it back in later in the program.
myFile = 'example.txt'
Using with will automatically close the file when you leave that structure
# perform your writing
with open(myFile, 'w') as f:
f.write('some stuff')
# doing other work
# more code
# perform your reading
with open(myFile, 'r') as f:
data = f.read()
# do stuff with data
You need to use close() before changing mode (read / write):
def MyWrite(myfile):
file = open(myfile, "w")
file.write("hello world in the new file\n")
file.close()
def MyRead(myfile):
file = open(myfile, "r")
file.read()
file.close()
Also, you could open a file for reading AND writing, using:
fd = open(myfile, "r+")
However, you must be very careful, since every operation, either read or write, changes the pointer position, so you may need to use fd.seek to make sure you're placed in the right position where you want to read or write.
Also, keep in mind that your file becomes a sort of memory mapped string(*) that sometimes syncs with the disk. If you want to save changes at a specific point, you must use fd.flush and os.fsync(fd) to efectively commit the changes to disk without closing the file.
All in all, I'd say its better to stick to one mode of operation and then closing the file and opening again, unless there's a very good reason to have read/write available without switching modes.
* There's also a module for memory mapped files, but I think thats way beyond what you were asking.

Can I consistently open a file in python?

I find each time I open a file and read its content, it automatically close. So next time I have to open that file again to read it. I know this may be a good idea to reduce memory usage. But I need to consistently open a file due to my current task.
How should I do it?
This is my current way of reading a file
f = open('some_file', 'rU')
f.read()
After you do f.read(), the file doesn't close. In fact, it doesn't close unless you explicitly call f.close() or you use it in a with block like this:
with open('some_file') as f:
...
In which case, the file will be closed for you when the with block ends. It will also tend to be closed in any case when the file object has no more variables associated with it (ie, when f falls out of scope or gets reassigned), but this isn't guaranteed behaviour. If none of these things happen, the file is kept open.
The problem you are most likely seeing is that calling read again will get you an empty string. This doesn't happen because the file is closed - reading from a closed file gives you an error. Rather, files keep track of where you have read up to, so that if you only read part of it, you can then request the next part and it will start at the right place. To set it back to read from the start of the file again again, you can use the seek method:
with open('some_file') as f:
contents1 = f.read()
f.seek(0)
contents2 = f.read()
will give you contents1 and contents2 both containing the full contents of the file, rather than contents2 being empty. However, you probably don't want to do this unless the file could have changed in the meantime.
To make sure your file will consisently open or be consistently open, you need to be closing files or seek to 0.
while True:
f = open(...)
x = f.read()
print x
f.close()
or
f = open(...)
while True:
x = f.read()
print x
f.seek(0)
...unless you are going to write a one-liner, which will close automatically.
print open('some_file', 'rU').read()
This avoids any limit on the number of open files.
Additional thought: You can also use for line in open(...): pass, again as long as you remember to get to the beginning of the file one way or another.

How to confirm that a file object is empty? [Python]

in a py module, I write:
outFile = open(fileName, mode='w')
if A:
outFile.write(...)
if B:
outFile.write(...)
and in these lines, I didn't use flush or close method.
Then after these lines, I want to check whether this "outFile" object is empty or not. How can I do with it?
There are a few problems with your code.
You can't .write to a file that you opened with 'r'. You need to open(fileName, 'w').
If A or B then you've certainly written to the file, so it's not empty!
Barring those. you can get the length of a file with
os.stat(outFile.fileno())
EDIT: I'll explain what flush does. Python is often used to do quite large amounts of file reads and writes, which can be slow. It is thus tweaked to make them as fast as possible. One way that is does so is to "buffer" such writes and then do them all in one big block: when you write a small string, Python will remember it but won't actually write it to the file until it thinks it should.
This means that if you want to tell whether you have written data to the file by inspecting the file, you have to tell Python to write all the data it's remembering first, or else you might not see it. flush is the command to write all the buffered data.
Of course, if you ask Python whether it's written anything to the file, say by inspecting the position in the file (.tell()), then it will know about the buffering.
If you've already written to the file, you can use .tell() to check if the current file position is nonzero:
>>> handle = open('/tmp/file.txt', 'w')
>>> handle.write('foo')
>>> handle.tell()
3
This won't work if you .seek() back to the beginning of the file.
You can use os.stat to get file info:
import os
fileSize = os.stat(fileName).st_size
with open("filename.txt", "r+") as f:
if f.read():
# file isn't empty
f.write("something")
# uncomment this line if you want to delete everything else in the file
# f.truncate()
else:
# file is empty
f.write("somethingelse")
"r+" mode always you to read & write.
"with" will automatically close file

Python Overwriting files after parsing

I'm new to Python, and I need to do a parsing exercise. I got a file, and I need to parse it (just the headers), but after the process, i need to keep the file the same format, the same extension, and at the same place in disk, but only with the differences of new headers..
I tried this code...
for line in open ('/home/name/db/str/dir/numbers/str.phy'):
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
print linepars
..and it does the job, but I don't know how to "overwrite" the file with the new parsing.
The easiest way, but not the most efficient (by far, and especially for long files) would be to rewrite the complete file.
You could do this by opening a second file handle and rewriting each line, except in the case of the header, you'd write the parsed header. For example,
fr = open('/home/name/db/str/dir/numbers/str.phy')
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
fr.close()
EDIT: Note that this does not use readlines(), so its more memory efficient. It also does not store every output line, but only one at a time, writing it to file immediately.
Just as a cool trick, you could use the with statement on the input file to avoid having to close it (Python 2.5+):
fw = open('/home/name/db/str/dir/numbers/str.phy.parsed', 'w') # Name this whatever makes sense
with open('/home/name/db/str/dir/numbers/str.phy') as fr:
for line in fr:
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
fw.write(linepars)
else:
fw.write(line)
fw.close()
P.S. Welcome :-)
As others are saying here, you want to open a file and use that file object's .write() method.
The best approach would be to open an additional file for writing:
import os
current_cfg = open(...)
parsed_cfg = open(..., 'w')
for line in current_cfg:
new_line = parse(line)
print new_line
parsed.cfg.write(new_line + '\n')
current_cfg.close()
parsed_cfg.close()
os.rename(....) # Rename old file to backup name
os.rename(....) # Rename new file into place
Additionally I'd suggest looking at the tempfile module and use one of its methods for either naming your new file or opening/creating it. Personally I'd favor putting the new file in the same directory as the existing file to ensure that os.rename will work atomically (the configuration file named will be guaranteed to either point at the old file or the new file; in no case would it point at a partially written/copied file).
The following code DOES the job.
I mean it DOES overwrite the file ON ONESELF; that's what the OP asked for. That's possible because the transformations are only removing characters, so the file's pointer fo that writes is always BEHIND the file's pointer fi that reads.
import re
regx = re.compile('\AENS([A-Z]+)0+([0-9]{6})')
with open('bomo.phy','rb+') as fi, open('bomo.phy','rb+') as fo:
fo.writelines(regx.sub('\\1\\2',line) for line in fi)
I think that the writing isn't performed by the operating system one line at a time but through a buffer. So several lines are read before a pool of transformed lines are written. That's what I think.
newlines = []
for line in open ('/home/name/db/str/dir/numbers/str.phy').readlines():
if line.startswith('ENS'):
linepars = re.sub ('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
newlines.append( linepars )
open ('/home/name/db/str/dir/numbers/str.phy', 'w').write('\n'.join(newlines))
(sidenote: Of course if you are working with large files, you should be aware that the level of optimization required may depend on your situation. Python by nature is very non-lazily-evaluated. The following solution is not a good choice if you are parsing large files, such as database dumps or logs, but a few tweaks such as nesting the with clauses and using lazy generators or a line-by-line algorithm can allow O(1)-memory behavior.)
targetFile = '/home/name/db/str/dir/numbers/str.phy'
def replaceIfHeader(line):
if line.startswith('ENS'):
return re.sub('ENS([A-Z]+)0+([0-9]{6})','\\1\\2',line)
else:
return line
with open(targetFile, 'r') as f:
newText = '\n'.join(replaceIfHeader(line) for line in f)
try:
# make backup of targetFile
with open(targetFile, 'w') as f:
f.write(newText)
except:
# error encountered, do something to inform user where backup of targetFile is
edit: thanks to Jeff for suggestion

Categories

Resources