can someone tell me why in python 3.4.2 when I try
import codecs
f = codecs.open('/home/filename', 'w', 'utf-8')
print ('something', file = f)
it gives me an empty file?
Previously it was working well, but only suddenly it stopped printing to file
File writing is buffered to avoid hitting the performance drain that is writing to a disk. Flushing the buffer takes place when you reach a threshold, flush explicitly or close the file.
You have not closed the file, did not flush the buffer, and haven't written enough to the file to auto-flush the buffer.
Do one of the following:
Flush the buffer:
f.flush()
This can be done with the flush argument to print() as well:
print('something', file=f, flush=True)
but the argument requires Python 3.3 or newer.
Close the file:
f.close()
or use the file as a context manager (using the with stamement):
with open('/home/filename', 'w', encoding='utf-8') as f:
print('something', file=f)
and the file will be closed automatically when the block is exited (on completion, or an exception).
Write more data to the file; how much depends on the buffering configuration.
Related
I am sniffing network packets using Tshark (Command-Line wireshark) and writing them to a file just as I receive them. My code block is similar to following:
documents = PriorityQueue(maxsize=0)
writing_enabled = True
with open("output.txt", 'w') as opened_file:
while writing_enabled:
try:
data = documents.get(timeout=1)
except Exception as e:
#No document pushed by producer thread
continue
opened_file.write(json.dumps(data) + "\n")
If I receive files from Tshark thread, I put them into queue then another thread writes it to a file using the code above. However, after file reaches 600+ MB process slows down and then change status into Not Responding. After a research I think that this is because of default buffering mechanism of open file method. Is it reasonable to change the with open("output.txt", 'w') as opened_file:
into with open("output.txt", 'w', 1000) as opened_file: to use 1000 byte of buffer in writing mode? Or there is another way to overcome this?
For writing the internal buffer to the file you can use the files flush function. However, this should generally be handled by your operating system which has a default buffer size. You can use something like this to open your file if you want to specify your own buffer size:
f = open('file.txt', 'w', buffering=bufsize)
Please also see the following question: How often does Python flush to file
Alternatively to flushing the buffer you could also try to use rolling files, i.e. open a new file if the size of your currently opened file exceeds a certain size. This is generally good practice if you intend to write a lot of data.
I am trying to access the content of a file while that file is still being updated. Following is my code that does the writing to file job:
for i in range(100000):
fp = open("text.txt", "w")
fp.write(str(i))
fp.close()
#time.sleep(1)
My problem now is that whenever I try to open my file while the for loop is still running, I get an empty file in text editor(I except to see an "updated" number).
I am wondering is there a way that allows me to view the content of file before the for loop ends?
Thanks in advance for any help: )
Do not open file inside for loop. It is a bad practice and bad code.
Each and every time you create a new object. That is why you get an empty file.
fp = open("text.txt", "r+")
for i in range(100000):
fp.seek(0)
fp.write(str(i))
fp.truncate()
fp.close()
Modern operating systems buffer file writes to improve performance: several consecutive writes are lumped together before they are actually written to the disc. If you want the writes to propagate to the disk immediately, use method flush(), but remember that it drastically reduces application performance:
with open("text.txt", "w") as fp:
for i in range(100000):
fp.write(str(i))
fp.flush()
When you write to a file it usually does not actually get written to disk until file is closed. If you want it to be written immediately you need to add a flush to each iteration, hence:
fp = open("text.txt", "w")
for i in range(100000):
fp.write(str(i))
fp.write("\n")
fp.flush()
fp.close()
I am attempting to output a new txt file but it come up blank. I am doing this
my_file = open("something.txt","w")
#and then
my_file.write("hello")
Right after this line it just says 5 and then no text comes up in the file
What am I doing wrong?
You must close the file before the write is flushed. If I open an interpreter and then enter:
my_file = open('something.txt', 'w')
my_file.write('hello')
and then open the file in a text program, there is no text.
If I then issue:
my_file.close()
Voila! Text!
If you just want to flush once and keep writing, you can do that too:
my_file.flush()
my_file.write('\nhello again') # file still says 'hello'
my_file.flush() # now it says 'hello again' on the next line
By the way, if you happen to read the beautiful, wonderful documentation for file.write, which is only 2 lines long, you would have your answer (emphasis mine):
Write a string to the file. There is no return value. Due to buffering, the string may not actually show up in the file until the flush() or close() method is called.
If you don't want to care about closing file, use with:
with open("something.txt","w") as f:
f.write('hello')
Then python will take care of closing the file for you automatically.
As Two-Bit Alchemist pointed out, the file has to be closed. The python file writer uses a buffer (BufferedIOBase I think), meaning it collects a certain number of bytes before writing them to disk in bulk. This is done to save overhead when a lot of write operations are performed on a single file.
Also: When working with files, try using a with-environment to make sure your file is closed after you are done writing/reading:
with open("somefile.txt", "w") as myfile:
myfile.write("42")
# when you reach this point, i.e. leave the with-environment,
# the file is closed automatically.
The python file writer uses a buffer (BufferedIOBase I think), meaning
it collects a certain number of bytes before writing them to disk in
bulk. This is done to save overhead when a lot of write operations are
performed on a single file. Ref #m00am
Your code is also okk. Just add a statement for close file, then work correctly.
my_file = open("fin.txt","w")
#and then
my_file.write("hello")
my_file.close()
I'm trying to call a process on a file after part of it has been read. For example:
with open('in.txt', 'r') as a, open('out.txt', 'w') as b:
header = a.readline()
subprocess.call(['sort'], stdin=a, stdout=b)
This works fine if I don't read anything from a before doing the subprocess.call, but if I read anything from it, the subprocess doesn't see anything. This is using python 2.7.3. I can't find anything in the documentation that explains this behaviour, and a (very) brief glance at the subprocess source didn't reveal a cause.
If you open the file unbuffered then it works:
import subprocess
with open('in.txt', 'rb', 0) as a, open('out.txt', 'w') as b:
header = a.readline()
rc = subprocess.call(['sort'], stdin=a, stdout=b)
subprocess module works at a file descriptor level (low-level unbuffered I/O of the operating system). It may work with os.pipe(), socket.socket(), pty.openpty(), anything with a valid .fileno() method if OS supports it.
It is not recommended to mix the buffered and unbuffered I/O on the same file.
On Python 2, file.flush() causes the output to appear e.g.:
import subprocess
# 2nd
with open(__file__) as file:
header = file.readline()
file.seek(file.tell()) # synchronize (for io.open and Python 3)
file.flush() # synchronize (for C stdio-based file on Python 2)
rc = subprocess.call(['cat'], stdin=file)
The issue can be reproduced without subprocess module with os.read():
#!/usr/bin/env python
# 2nd
import os
with open(__file__) as file: #XXX fully buffered text file EATS INPUT
file.readline() # ignore header line
os.write(1, os.read(file.fileno(), 1<<20))
If the buffer size is small then the rest of the file is printed:
#!/usr/bin/env python
# 2nd
import os
bufsize = 2 #XXX MAY EAT INPUT
with open(__file__, 'rb', bufsize) as file:
file.readline() # ignore header line
os.write(2, os.read(file.fileno(), 1<<20))
It eats more input if the first line size is not evenly divisible by bufsize.
The default bufsize and bufsize=1 (line-buffered) behave similar on my machine: the beginning of the file vanishes -- around 4KB.
file.tell() reports for all buffer sizes the position at the beginning of the 2nd line. Using next(file) instead of file.readline() leads to file.tell() around 5K on my machine on Python 2 due to the read-ahead buffer bug (io.open() gives the expected 2nd line position).
Trying file.seek(file.tell()) before the subprocess call doesn't help on Python 2 with default stdio-based file objects. It works with open() functions from io, _pyio modules on Python 2 and with the default open (also io-based) on Python 3.
Trying io, _pyio modules on Python 2 and Python 3 with and without file.flush() produces various results. It confirms that mixing buffered and unbuffered I/O on the same file descriptor is not a good idea.
It happens because the subprocess module extracts the File handle from the File Object.
http://hg.python.org/releasing/2.7.6/file/ba31940588b6/Lib/subprocess.py
In line 1126, coming from 701.
The File Object uses buffers and has already read a lot from the file handle when the subprocess extracts it.
As mentioned by #jfs
When using popen it passes the file descriptor to the process,
At the same time python reads in chunks (e.g. 4096 bytes),
The result is that the position at the fd level is different than what you would expect.
I solved it in python 2.7 by aligning the file descriptor position.
_file = open(some_path)
_file.read(codecs.BOM_UTF8)
os.lseek(_file.fileno(), _file.tell(), os.SEEK_SET)
truncate_null_cmd = ['tr','-d', '\\000']
subprocess.Popen(truncate_null_cmd, stdin=_file, stdout=subprocess.PIPE)
I'm writing a cross-platform program for Windows (not cygwin!) and Mac.
I'm writing to a file, and then immediately trying to get the file's new length, without closing the file first.
Do I need to flush the file after I write, in order to be guaranteed to get an accurate length?
with open("myfile.bin", "r+b") as f:
f.seek(100)
f.truncate()
f.write("hello world")
# Do I need to f.flush here?
f.seek(0, 2) # seeks to end of file
fileSize = f.tell()
# Is fileSize guaranteed to be correct?
No, you don't need to call flush. flush is called to force the bytes in the buffer to be written to the stream.
In addition to that, f.seek(0,2) is also not required because file position is already at the end of the file after the truncate, write method call.