Writing to File Using Buffer In Python

Writing to File Using Buffer In Python - python

I am sniffing network packets using Tshark (Command-Line wireshark) and writing them to a file just as I receive them. My code block is similar to following:
documents = PriorityQueue(maxsize=0)
writing_enabled = True
with open("output.txt", 'w') as opened_file:
while writing_enabled:
try:
data = documents.get(timeout=1)
except Exception as e:
#No document pushed by producer thread
continue
opened_file.write(json.dumps(data) + "\n")
If I receive files from Tshark thread, I put them into queue then another thread writes it to a file using the code above. However, after file reaches 600+ MB process slows down and then change status into Not Responding. After a research I think that this is because of default buffering mechanism of open file method. Is it reasonable to change the with open("output.txt", 'w') as opened_file:
into with open("output.txt", 'w', 1000) as opened_file: to use 1000 byte of buffer in writing mode? Or there is another way to overcome this?

For writing the internal buffer to the file you can use the files flush function. However, this should generally be handled by your operating system which has a default buffer size. You can use something like this to open your file if you want to specify your own buffer size:
f = open('file.txt', 'w', buffering=bufsize)
Please also see the following question: How often does Python flush to file
Alternatively to flushing the buffer you could also try to use rolling files, i.e. open a new file if the size of your currently opened file exceeds a certain size. This is generally good practice if you intend to write a lot of data.

Related

Reading a file without locking it in Python

I want to read a file, but without any lock on it.
with open(source, "rb") as infile:
data = infile.read()
Can the code above lock the source file?
This source file can be updated at any time with new rows (during my script running for example).
I think not because it is only in reading mode ("rb"). But I found that we can use Windows API to read it without lock. I did not find an simple answer for my question.
My script runs locally but the source file and the script/software which appends changes on it are not (network drive).

Opening a file does not put a lock on it. In fact, if you needed to ensure that separate processes did not access a file simultaneously, all these processes would have to to cooperatively take special steps to ensure that only a single process accessed the file at one time (see Locking a file in Python). This can also be demonstrated by the following small program that purposely takes its time in reading a file to give another process (namely me with a text editor) a chance to append some data to the end of the file while the program is running. This program reads and outputs the file one byte at a time pausing .1 seconds between each read. During the running of the program I added some additional text to the end of the file and the program printed the additional text:
import time
with open('test.txt', "rb") as infile:
while True:
data = infile.read(1)
if data == b'':
break
time.sleep(.1)
print(data.decode('ascii'), end='', flush=True)
You can read your file in pieces and then join these pieces together if you need one single byte string. But this will not be as memory efficient as reading the file with a single read:
BLOCKSIZE = 64*1024 # or some other value depending on the file size
with open(source, "rb") as infile:
blocks = []
while True:
data = infile.read(BLOCKSIZE)
if data == b'':
break
blocks.append(data)
# if you need the data in one piece (otherwise the pieces are in blocks):
data = b''.join(blocks)

One alternative is to make a copy of the file temporarily and read the copy.
You can use the shutil package for such a task:
import os
import time
from shutil import copyfile
def read_file_non_blocking(file):
temp_file = f"{filename}-{time.time()}" # Stores it in the local directory
copyfile(file, temp_file)
with open(temp_file, 'r') as my_file:
# Do Something cool
my_file.close()
os.remove(temp_file)

Windows is weird in how it handles files if you, like myself, are used to Posix style file handling. I have run into this issue numerous times and I have been luck enough to avoid solving it. However in this case, if I had to solve it, I would look at the flags that can passed to os.open and see if any of those can disable to locking.
https://docs.python.org/3/library/os.html#os.open
I would do a little testing but I don't have a non-production critical Windows workstation to test on.

Python stops printing to output file [duplicate]

I'm running a test, and found that the file doesn't actually get written until I control-C to abort the program. Can anyone explain why that would happen?
I expected it to write at the same time, so I could read the file in the middle of the process.
import os
from time import sleep
f = open("log.txt", "a+")
i = 0
while True:
f.write(str(i))
f.write("\n")
i += 1
sleep(0.1)

Writing to disk is slow, so many programs store up writes into large chunks which they write all-at-once. This is called buffering, and Python does it automatically when you open a file.
When you write to the file, you're actually writing to a "buffer" in memory. When it fills up, Python will automatically write it to disk. You can tell it "write everything in the buffer to disk now" with
f.flush()
This isn't quite the whole story, because the operating system will probably buffer writes as well. You can tell it to write the buffer of the file with
os.fsync(f.fileno())
Finally, you can tell Python not to buffer a particular file with open(f, "w", 0) or only to keep a 1-line buffer with open(f,"w", 1). Naturally, this will slow down all operations on that file, because writes are slow.

You need to f.close() to flush the file write buffer out to the file. Or in your case you might just want to do a f.flush(); os.fsync(); so you can keep looping with the opened file handle.
Don't forget to import os.

You have to force the write, so I i use the following lines to make sure a file is written:
# Two commands together force the OS to store the file buffer to disc
f.flush()
os.fsync(f.fileno())

You will want to check out file.flush() - although take note that this might not write the data to disk, to quote:
Note:
flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.
Closing the file (file.close()) will also ensure that the data is written - using with will do this implicitly, and is generally a better choice for more readability and clarity - not to mention solving other potential problems.

This is a windows-ism. If you add an explicit .close() when you're done with file, it'll appear in explorer at that time. Even just flushing it might be enough (I don't have a windows box handy to test). But basically f.write does not actually write, it just appends to the write buffer - until the buffer gets flushed you won't see it.
On unix the files will typically show up as a 0-byte file in this situation.

File Handler to be flushed.
f.flush()

The file does not get written, as the output buffer is not getting flushed until the garbage collection takes effect, and flushes the I/O buffer (more than likely by calling f.close()).
Alternately, in your loop, you can call f.flush() followed by os.fsync(), as documented here.
f.flush()
os.fsync()
All that being said, if you ever plan on sharing the data in that file with other portions of your code, I would highly recommend using a StringIO object.

Whether my file is opened in RAM while in append mode?

I have written a code which keep on append the file. Here is the code for it:
writel = open('able.csv','a',encoding='utf-8',errors='ignore')
with open('test','r',encoding='utf-8',errors='ignore') as file:
for i in file.readlines():
data = functionforprocess(i)
if data is not "":
writel.write(data)
if count% 10000 == 0:
log = open('log','w')
log.write(str(count))
log.close()
My question is: whether the file that I have opened in the append mode is available in RAM? Does that file is acting like a buffer, means If I store the data in variable and then write the variable to file is equal to open a file in append mode and write directly?
Kindly, get me out of this confusion.

Appending is a basic function of file I/O and is carried out by the operating system. For instance, fopen with mode a or a+ is part of the POSIX standard. With file I/O, the OS will also tend to buffer reads and writes; for instance, for most purposes it's not necessary to make sure that the data that you've passed to write is actually on the disk all the time. Sometimes it sits in a buffer somewhere in the OS; sometimes the OS dumps these buffers out to disk. You can force writes using fsync if it's important to you; this is also a really good reason to make sure that you always call close on your open file objects when you're done with them (or use a context manager); if you forget, you might get weird behaviour because of those buffers hanging around in the OS.
So, to answer your question. The file that you opened is most likely in RAM at any given moment. However, as far as I know, it's not available to you. You can interact with the data in the file using file I/O methods, but it's not like there's a buffer that you can get the memory address of, and read back what you just wrote. As to if append-mode writing is equivalent to storing something in a buffer and then writing to disk, I guess I would say no. Any kind of file I/O write will probably be buffered the same way by the OS, and the reason this is efficient is that the OS gets to make the decision on when to flush the buffers. If you store things in a variable and then write them out atomically to disk, you get to decide when the writes take place.

The signature of the open function is:
open(file, mode=’r’, buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
If you open in "a" (append) mode, it means: open for writing, appending to the end of the file if it exists. There is nothing about buffering.
Buffering can be customized with the buffering parameter. Quoting the doc:
buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows:
Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device’s “block size” and falling back on io.DEFAULT_BUFFER_SIZE. On many systems, the buffer will typically be 4096 or 8192 bytes long.
“Interactive” text files (files for which isatty() returns True) use line buffering. Other text files use the policy described above for binary files.
In your example, your file is opened for append in text mode.
So, only a chunk of your data is stored in RAM during writing. If you write a "big" data, it will be divided into several chunks.

Python saving data inside Memory? (ram)

I am new to Python, but I didn't know this til yet.
I have a basic program inside a for loop, that requests data from a site and saves it to a text file
But when I checked inside my task manager I saw that the memory usage only increase? This might be a problem for me when running this for a long time.
Is it standard for Python to do this or can you change it?
Here is a what the program basically is
savefile = open("file.txt", "r+")
for i in savefile:
#My code goes here
savefile.write(i)
#end of loop
savefile.close()

Python does not write to file until you call .close() or .flush() or until it hits a specified buffer size. This question might help you: How often does python flush to a file?

As #Almog said, Python does not write to the file immediately. Because of this, every line you write to the file gets stored into RAM until you use savefile.close(), which flushes the internal buffer and writes everything to the file. This would explain the extra memory usage.
Try changing the loop to this:
savefile = open('file.txt', 'r+')
for i in savefile:
savefile.write(i)
savefile.flush() #flushes buffer, saving RAM
savefile.close()

There is a better Solution, in pythonic way, to this:
with open("your_file.txt", "write_mode") as file_variable_name:
for line in file_name:
file_name.write(line)
file_name.flush()
This code flushes the File for each line and after it's execution it closes the File thanks to the with-Statement

writing output for python not functioning

I am attempting to output a new txt file but it come up blank. I am doing this
my_file = open("something.txt","w")
#and then
my_file.write("hello")
Right after this line it just says 5 and then no text comes up in the file
What am I doing wrong?

You must close the file before the write is flushed. If I open an interpreter and then enter:
my_file = open('something.txt', 'w')
my_file.write('hello')
and then open the file in a text program, there is no text.
If I then issue:
my_file.close()
Voila! Text!
If you just want to flush once and keep writing, you can do that too:
my_file.flush()
my_file.write('\nhello again') # file still says 'hello'
my_file.flush() # now it says 'hello again' on the next line
By the way, if you happen to read the beautiful, wonderful documentation for file.write, which is only 2 lines long, you would have your answer (emphasis mine):
Write a string to the file. There is no return value. Due to buffering, the string may not actually show up in the file until the flush() or close() method is called.

If you don't want to care about closing file, use with:
with open("something.txt","w") as f:
f.write('hello')
Then python will take care of closing the file for you automatically.

As Two-Bit Alchemist pointed out, the file has to be closed. The python file writer uses a buffer (BufferedIOBase I think), meaning it collects a certain number of bytes before writing them to disk in bulk. This is done to save overhead when a lot of write operations are performed on a single file.
Also: When working with files, try using a with-environment to make sure your file is closed after you are done writing/reading:
with open("somefile.txt", "w") as myfile:
myfile.write("42")
# when you reach this point, i.e. leave the with-environment,
# the file is closed automatically.

The python file writer uses a buffer (BufferedIOBase I think), meaning
it collects a certain number of bytes before writing them to disk in
bulk. This is done to save overhead when a lot of write operations are
performed on a single file. Ref #m00am
Your code is also okk. Just add a statement for close file, then work correctly.
my_file = open("fin.txt","w")
#and then
my_file.write("hello")
my_file.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.