I want to read a file, but without any lock on it.
with open(source, "rb") as infile:
data = infile.read()
Can the code above lock the source file?
This source file can be updated at any time with new rows (during my script running for example).
I think not because it is only in reading mode ("rb"). But I found that we can use Windows API to read it without lock. I did not find an simple answer for my question.
My script runs locally but the source file and the script/software which appends changes on it are not (network drive).
Opening a file does not put a lock on it. In fact, if you needed to ensure that separate processes did not access a file simultaneously, all these processes would have to to cooperatively take special steps to ensure that only a single process accessed the file at one time (see Locking a file in Python). This can also be demonstrated by the following small program that purposely takes its time in reading a file to give another process (namely me with a text editor) a chance to append some data to the end of the file while the program is running. This program reads and outputs the file one byte at a time pausing .1 seconds between each read. During the running of the program I added some additional text to the end of the file and the program printed the additional text:
import time
with open('test.txt', "rb") as infile:
while True:
data = infile.read(1)
if data == b'':
break
time.sleep(.1)
print(data.decode('ascii'), end='', flush=True)
You can read your file in pieces and then join these pieces together if you need one single byte string. But this will not be as memory efficient as reading the file with a single read:
BLOCKSIZE = 64*1024 # or some other value depending on the file size
with open(source, "rb") as infile:
blocks = []
while True:
data = infile.read(BLOCKSIZE)
if data == b'':
break
blocks.append(data)
# if you need the data in one piece (otherwise the pieces are in blocks):
data = b''.join(blocks)
One alternative is to make a copy of the file temporarily and read the copy.
You can use the shutil package for such a task:
import os
import time
from shutil import copyfile
def read_file_non_blocking(file):
temp_file = f"{filename}-{time.time()}" # Stores it in the local directory
copyfile(file, temp_file)
with open(temp_file, 'r') as my_file:
# Do Something cool
my_file.close()
os.remove(temp_file)
Windows is weird in how it handles files if you, like myself, are used to Posix style file handling. I have run into this issue numerous times and I have been luck enough to avoid solving it. However in this case, if I had to solve it, I would look at the flags that can passed to os.open and see if any of those can disable to locking.
https://docs.python.org/3/library/os.html#os.open
I would do a little testing but I don't have a non-production critical Windows workstation to test on.
Related
I have a file in my python folder called data.txt and i have another file read.py trying to read text from data.txt but when i change something in data.txt my read doesn't show anything new i put
Something else i tried wasn't working and i found something that read, but when i changed it to something that was actually meaningful it didn't print the new text.
Can someone explain why it doesn't refresh, or what i need to do to fix it?
with open("data.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
First and foremost, strings are immutable in Python - once you use file.read(), that returned object cannot change.
That being said, you must re-read the file at any given point the file contents may change.
For example
read.py
def get_contents(filepath):
with open(filepath) as f:
return f.read().rstrip("\n")
main.py
from read import get_contents
import time
print(get_contents("data.txt"))
time.sleep(30)
# .. change file somehow
print(get_contents("data.txt"))
Now, you could setup an infinite loop that watches the file's last modification timestamp from the OS, then always have the latest changes, but that seems like a waste of resources unless you have a specific need for that (e.g. tailing a log file), however there are arguably better tools for that
It was unclear from your question if you do the read once or multiple times. So here are steps to do:
Make sure you call the read function repeatedly with a certain interval
Check if you actually save file after modification
Make sure there are no file usage conflicts
So here is a description of each step:
When you read a file the way you shared it gets closed, meaning it is read only once, you need to read it multiple times if you want to see changes, so make it with some kind of interval in another thread or async or whatever suits your application best.
This step is obvious, remember to hit ctrl+c
It may happen that a single file is being accessed by multiple processes, for example your editor and the script, now to prevent errors try the following code:
def read_file(file_name: str):
while True:
try:
with open(file_name) as f:
return f.read().rstrip("\n")
except IOError:
pass
I'm running a test, and found that the file doesn't actually get written until I control-C to abort the program. Can anyone explain why that would happen?
I expected it to write at the same time, so I could read the file in the middle of the process.
import os
from time import sleep
f = open("log.txt", "a+")
i = 0
while True:
f.write(str(i))
f.write("\n")
i += 1
sleep(0.1)
Writing to disk is slow, so many programs store up writes into large chunks which they write all-at-once. This is called buffering, and Python does it automatically when you open a file.
When you write to the file, you're actually writing to a "buffer" in memory. When it fills up, Python will automatically write it to disk. You can tell it "write everything in the buffer to disk now" with
f.flush()
This isn't quite the whole story, because the operating system will probably buffer writes as well. You can tell it to write the buffer of the file with
os.fsync(f.fileno())
Finally, you can tell Python not to buffer a particular file with open(f, "w", 0) or only to keep a 1-line buffer with open(f,"w", 1). Naturally, this will slow down all operations on that file, because writes are slow.
You need to f.close() to flush the file write buffer out to the file. Or in your case you might just want to do a f.flush(); os.fsync(); so you can keep looping with the opened file handle.
Don't forget to import os.
You have to force the write, so I i use the following lines to make sure a file is written:
# Two commands together force the OS to store the file buffer to disc
f.flush()
os.fsync(f.fileno())
You will want to check out file.flush() - although take note that this might not write the data to disk, to quote:
Note:
flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.
Closing the file (file.close()) will also ensure that the data is written - using with will do this implicitly, and is generally a better choice for more readability and clarity - not to mention solving other potential problems.
This is a windows-ism. If you add an explicit .close() when you're done with file, it'll appear in explorer at that time. Even just flushing it might be enough (I don't have a windows box handy to test). But basically f.write does not actually write, it just appends to the write buffer - until the buffer gets flushed you won't see it.
On unix the files will typically show up as a 0-byte file in this situation.
File Handler to be flushed.
f.flush()
The file does not get written, as the output buffer is not getting flushed until the garbage collection takes effect, and flushes the I/O buffer (more than likely by calling f.close()).
Alternately, in your loop, you can call f.flush() followed by os.fsync(), as documented here.
f.flush()
os.fsync()
All that being said, if you ever plan on sharing the data in that file with other portions of your code, I would highly recommend using a StringIO object.
I am sniffing network packets using Tshark (Command-Line wireshark) and writing them to a file just as I receive them. My code block is similar to following:
documents = PriorityQueue(maxsize=0)
writing_enabled = True
with open("output.txt", 'w') as opened_file:
while writing_enabled:
try:
data = documents.get(timeout=1)
except Exception as e:
#No document pushed by producer thread
continue
opened_file.write(json.dumps(data) + "\n")
If I receive files from Tshark thread, I put them into queue then another thread writes it to a file using the code above. However, after file reaches 600+ MB process slows down and then change status into Not Responding. After a research I think that this is because of default buffering mechanism of open file method. Is it reasonable to change the with open("output.txt", 'w') as opened_file:
into with open("output.txt", 'w', 1000) as opened_file: to use 1000 byte of buffer in writing mode? Or there is another way to overcome this?
For writing the internal buffer to the file you can use the files flush function. However, this should generally be handled by your operating system which has a default buffer size. You can use something like this to open your file if you want to specify your own buffer size:
f = open('file.txt', 'w', buffering=bufsize)
Please also see the following question: How often does Python flush to file
Alternatively to flushing the buffer you could also try to use rolling files, i.e. open a new file if the size of your currently opened file exceeds a certain size. This is generally good practice if you intend to write a lot of data.
I have been searching for a solution for this and haven't been able to find one. I have a directory of folders which contain multiple, very-large csv files. I'm looping through each csv in each folder in the directory to replace values of certain headers. I need the headers to be consistent (from file to file) in order to run a different script to process all the data properly.
I found this solution that I though would work: change first line of a file in python.
However this is not working as expected. My code:
from_file = open(filepath)
# for line in f:
# if
data = from_file.readline()
# print(data)
# with open(filepath, "w") as f:
print 'DBG: replacing in file', filepath
# s = s.replace(search_pattern, replacement)
for i in range(len(search_pattern)):
data = re.sub(search_pattern[i], replacement[i], data)
# data = re.sub(search_pattern, replacement, data)
to_file = open(filepath, mode="w")
to_file.write(data)
shutil.copyfileobj(from_file, to_file)
I want to replace the header values in search_pattern with values in replacement without saving or writing to a different file - I want to modify the file. I have also tried
shutil.copyfileobj(from_file, to_file, -1)
As I understand it that should copy the whole file rather than breaking it up in chunks, but it doesn't seem to have an effect on my output. Is it possible that the csv is just too big?
I haven't been able to determine a different way to do this or make this way work. Any help would be greatly appreciated!
this answer from change first line of a file in python you copied from doesn't work in windows
On Linux, you can open a file for reading & writing at the same time. The system ensures that there's no conflict, but behind the scenes, 2 different file objects are being handled. And this method is very unsafe: if the program crashes while reading/writing (power off, disk full)... the file has a great chance to be truncated/corrupt.
Anyway, in Windows, you cannot open a file for reading and writing at the same time using 2 handles. It just destroys the contents of the file.
So there are 2 options, which are portable and safe:
create a file in the same directory, once copied, delete first file, and rename the new one
Like this:
import os
import shutil
filepath = "test.txt"
with open(filepath) as from_file, open(filepath+".new","w") as to_file:
data = from_file.readline()
to_file.write("something else\n")
shutil.copyfileobj(from_file, to_file)
os.remove(filepath)
os.rename(filepath+".new",filepath)
This doesn't take much longer, because the rename operation is instantaneous. Besides, if the program/computer crashes at any point, one of the files (old or new) is valid, so it's safe.
if patterns have the same length, use read/write mode
like this:
filepath = "test.txt"
with open(filepath,"r+") as rw_file:
data = rw_file.readline()
data = "h"*(len(data)-1) + "\n"
rw_file.seek(0)
rw_file.write(data)
Here we, read the line, replace the first line by the same amount of h characters, rewind the file and write the first line back, overwriting previous contents, keeping the rest of the lines. This is also safe, and even if the file is huge, it's very fast. The only constraint is that the pattern must be of the exact same size (else you would have remainders of the previous data, or you would overwrite the next line(s) since no data is shifted)
I would like to store data into csv file. But the data are incrementing with time. I wrote a simple example to show the problem :
import csv
import time
i = 0
with open('testfile.csv','wb') as csvfile:
writer = csv.writer(csvfile,delimiter=';',quoting=csv.QUOTE_NONE)
while True:
i = i+1
print i
writer.writerow([i])
time.sleep(2)
When the while loop is running the csv file is not written. But when I stop the program then the data are stored in the csv file.
Is there a possibility to keep the program running and 'force' the writing into the csv file?
writing in python is buffered. you can force to write the output (flush the buffer) with:
csvfile.flush()
in your code i suggest you add this line right after writer.writerow([i]).
you could also pass a buffering argument to the open() function - but i suggest you do not do that; switching buffering off comes with a performance penalty.