File Lock in Python [duplicate] - python

I need to lock a file for writing in Python. It will be accessed from multiple Python processes at once. I have found some solutions online, but most fail for my purposes as they are often only Unix based or Windows based.

Alright, so I ended up going with the code I wrote here, on my website link is dead, view on archive.org (also available on GitHub). I can use it in the following fashion:
from filelock import FileLock
with FileLock("myfile.txt"):
# work with the file as it is now locked
print("Lock acquired.")

The other solutions cite a lot of external code bases. If you would prefer to do it yourself, here is some code for a cross-platform solution that uses the respective file locking tools on Linux / DOS systems.
try:
# Posix based file locking (Linux, Ubuntu, MacOS, etc.)
# Only allows locking on writable files, might cause
# strange results for reading.
import fcntl, os
def lock_file(f):
if f.writable(): fcntl.lockf(f, fcntl.LOCK_EX)
def unlock_file(f):
if f.writable(): fcntl.lockf(f, fcntl.LOCK_UN)
except ModuleNotFoundError:
# Windows file locking
import msvcrt, os
def file_size(f):
return os.path.getsize( os.path.realpath(f.name) )
def lock_file(f):
msvcrt.locking(f.fileno(), msvcrt.LK_RLCK, file_size(f))
def unlock_file(f):
msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, file_size(f))
# Class for ensuring that all file operations are atomic, treat
# initialization like a standard call to 'open' that happens to be atomic.
# This file opener *must* be used in a "with" block.
class AtomicOpen:
# Open the file with arguments provided by user. Then acquire
# a lock on that file object (WARNING: Advisory locking).
def __init__(self, path, *args, **kwargs):
# Open the file and acquire a lock on the file before operating
self.file = open(path,*args, **kwargs)
# Lock the opened file
lock_file(self.file)
# Return the opened file object (knowing a lock has been obtained).
def __enter__(self, *args, **kwargs): return self.file
# Unlock the file and close the file object.
def __exit__(self, exc_type=None, exc_value=None, traceback=None):
# Flush to make sure all buffered contents are written to file.
self.file.flush()
os.fsync(self.file.fileno())
# Release the lock on the file.
unlock_file(self.file)
self.file.close()
# Handle exceptions that may have come up during execution, by
# default any exceptions are raised to the user.
if (exc_type != None): return False
else: return True
Now, AtomicOpen can be used in a with block where one would normally use an open statement.
WARNINGS:
If running on Windows and Python crashes before exit is called, I'm not sure what the lock behavior would be.
The locking provided here is advisory, not absolute. All potentially competing processes must use the "AtomicOpen" class.
As of (Nov 9th, 2020) this code only locks writable files on Posix systems. At some point after the posting and before this date, it became illegal to use the fcntl.lock on read-only files.

There is a cross-platform file locking module here: Portalocker
Although as Kevin says, writing to a file from multiple processes at once is something you want to avoid if at all possible.
If you can shoehorn your problem into a database, you could use SQLite. It supports concurrent access and handles its own locking.

I have been looking at several solutions to do that and my choice has been
oslo.concurrency
It's powerful and relatively well documented. It's based on fasteners.
Other solutions:
Portalocker: requires pywin32, which is an exe installation, so not possible via pip
fasteners: poorly documented
lockfile: deprecated
flufl.lock: NFS-safe file locking for POSIX systems.
simpleflock : Last update 2013-07
zc.lockfile : Last update 2016-06 (as of 2017-03)
lock_file : Last update in 2007-10

I prefer lockfile — Platform-independent file locking

Locking is platform and device specific, but generally, you have a few options:
Use flock(), or equivalent (if your os supports it). This is advisory locking, unless you check for the lock, it's ignored.
Use a lock-copy-move-unlock methodology, where you copy the file, write the new data, then move it (move, not copy - move is an atomic operation in Linux -- check your OS), and you check for the existence of the lock file.
Use a directory as a "lock". This is necessary if you're writing to NFS, since NFS doesn't support flock().
There's also the possibility of using shared memory between the processes, but I've never tried that; it's very OS-specific.
For all these methods, you'll have to use a spin-lock (retry-after-failure) technique for acquiring and testing the lock. This does leave a small window for mis-synchronization, but its generally small enough to not be a major issue.
If you're looking for a solution that is cross platform, then you're better off logging to another system via some other mechanism (the next best thing is the NFS technique above).
Note that sqlite is subject to the same constraints over NFS that normal files are, so you can't write to an sqlite database on a network share and get synchronization for free.

Coordinating access to a single file at the OS level is fraught with all kinds of issues that you probably don't want to solve.
Your best bet is have a separate process that coordinates read/write access to that file.

Here's an example of how to use the filelock library, which is similar to Evan Fossmark's implementation:
from filelock import FileLock
lockfile = r"c:\scr.txt"
lock = FileLock(lockfile + ".lock")
with lock:
file = open(path, "w")
file.write("123")
file.close()
Any code within the with lock: block is thread-safe, meaning that it will be finished before another process has access to the file.

Locking a file is usually a platform-specific operation, so you may need to allow for the possibility of running on different operating systems. For example:
import os
def my_lock(f):
if os.name == "posix":
# Unix or OS X specific locking here
elif os.name == "nt":
# Windows specific locking here
else:
print "Unknown operating system, lock unavailable"

I have been working on a situation like this where I run multiple copies of the same program from within the same directory/folder and logging errors. My approach was to write a "lock file" to the disc before opening the log file. The program checks for the presence of the "lock file" before proceeding, and waits for its turn if the "lock file" exists.
Here is the code:
def errlogger(error):
while True:
if not exists('errloglock'):
lock = open('errloglock', 'w')
if exists('errorlog'): log = open('errorlog', 'a')
else: log = open('errorlog', 'w')
log.write(str(datetime.utcnow())[0:-7] + ' ' + error + '\n')
log.close()
remove('errloglock')
return
else:
check = stat('errloglock')
if time() - check.st_ctime > 0.01: remove('errloglock')
print('waiting my turn')
EDIT---
After thinking over some of the comments about stale locks above I edited the code to add a check for staleness of the "lock file." Timing several thousand iterations of this function on my system gave and average of 0.002066... seconds from just before:
lock = open('errloglock', 'w')
to just after:
remove('errloglock')
so I figured I will start with 5 times that amount to indicate staleness and monitor the situation for problems.
Also, as I was working with the timing, I realized that I had a bit of code that was not really necessary:
lock.close()
which I had immediately following the open statement, so I have removed it in this edit.

this worked for me:
Do not occupy large files, distribute in several small ones
you create file Temp, delete file A and then rename file Temp to A.
import os
import json
def Server():
i = 0
while i == 0:
try:
with open(File_Temp, "w") as file:
json.dump(DATA, file, indent=2)
if os.path.exists(File_A):
os.remove(File_A)
os.rename(File_Temp, File_A)
i = 1
except OSError as e:
print ("file locked: " ,str(e))
time.sleep(1)
def Clients():
i = 0
while i == 0:
try:
if os.path.exists(File_A):
with open(File_A,"r") as file:
DATA_Temp = file.read()
DATA = json.loads(DATA_Temp)
i = 1
except OSError as e:
print (str(e))
time.sleep(1)

The scenario is like that:
The user requests a file to do something. Then, if the user sends the same request again, it informs the user that the second request is not done until the first request finishes. That's why, I use lock-mechanism to handle this issue.
Here is my working code:
from lockfile import LockFile
lock = LockFile(lock_file_path)
status = ""
if not lock.is_locked():
lock.acquire()
status = lock.path + ' is locked.'
print status
else:
status = lock.path + " is already locked."
print status
return status

I found a simple and worked(!) implementation from grizzled-python.
Simple use os.open(..., O_EXCL) + os.close() didn't work on windows.

You may find pylocker very useful. It can be used to lock a file or for locking mechanisms in general and can be accessed from multiple Python processes at once.
If you simply want to lock a file here's how it works:
import uuid
from pylocker import Locker
# create a unique lock pass. This can be any string.
lpass = str(uuid.uuid1())
# create locker instance.
FL = Locker(filePath='myfile.txt', lockPass=lpass, mode='w')
# aquire the lock
with FL as r:
# get the result
acquired, code, fd = r
# check if aquired.
if fd is not None:
print fd
fd.write("I have succesfuly aquired the lock !")
# no need to release anything or to close the file descriptor,
# with statement takes care of that. let's print fd and verify that.
print fd

If you just need Mac/POSIX this should work without external packages.
import sys
import stat
import os
filePath = "<PATH TO FILE>"
if sys.platform == 'darwin':
flags = os.stat(filePath).st_flags
if flags & ~stat.UF_IMMUTABLE:
os.chflags(filePath, flags & stat.UF_IMMUTABLE)
and if you want to unlock a file just change,
if flags & stat.UF_IMMUTABLE:
os.chflags(filePath, flags & ~stat.UF_IMMUTABLE)

Related

Changing text file permission on Python [duplicate]

I need to lock a file for writing in Python. It will be accessed from multiple Python processes at once. I have found some solutions online, but most fail for my purposes as they are often only Unix based or Windows based.
Alright, so I ended up going with the code I wrote here, on my website link is dead, view on archive.org (also available on GitHub). I can use it in the following fashion:
from filelock import FileLock
with FileLock("myfile.txt"):
# work with the file as it is now locked
print("Lock acquired.")
The other solutions cite a lot of external code bases. If you would prefer to do it yourself, here is some code for a cross-platform solution that uses the respective file locking tools on Linux / DOS systems.
try:
# Posix based file locking (Linux, Ubuntu, MacOS, etc.)
# Only allows locking on writable files, might cause
# strange results for reading.
import fcntl, os
def lock_file(f):
if f.writable(): fcntl.lockf(f, fcntl.LOCK_EX)
def unlock_file(f):
if f.writable(): fcntl.lockf(f, fcntl.LOCK_UN)
except ModuleNotFoundError:
# Windows file locking
import msvcrt, os
def file_size(f):
return os.path.getsize( os.path.realpath(f.name) )
def lock_file(f):
msvcrt.locking(f.fileno(), msvcrt.LK_RLCK, file_size(f))
def unlock_file(f):
msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, file_size(f))
# Class for ensuring that all file operations are atomic, treat
# initialization like a standard call to 'open' that happens to be atomic.
# This file opener *must* be used in a "with" block.
class AtomicOpen:
# Open the file with arguments provided by user. Then acquire
# a lock on that file object (WARNING: Advisory locking).
def __init__(self, path, *args, **kwargs):
# Open the file and acquire a lock on the file before operating
self.file = open(path,*args, **kwargs)
# Lock the opened file
lock_file(self.file)
# Return the opened file object (knowing a lock has been obtained).
def __enter__(self, *args, **kwargs): return self.file
# Unlock the file and close the file object.
def __exit__(self, exc_type=None, exc_value=None, traceback=None):
# Flush to make sure all buffered contents are written to file.
self.file.flush()
os.fsync(self.file.fileno())
# Release the lock on the file.
unlock_file(self.file)
self.file.close()
# Handle exceptions that may have come up during execution, by
# default any exceptions are raised to the user.
if (exc_type != None): return False
else: return True
Now, AtomicOpen can be used in a with block where one would normally use an open statement.
WARNINGS:
If running on Windows and Python crashes before exit is called, I'm not sure what the lock behavior would be.
The locking provided here is advisory, not absolute. All potentially competing processes must use the "AtomicOpen" class.
As of (Nov 9th, 2020) this code only locks writable files on Posix systems. At some point after the posting and before this date, it became illegal to use the fcntl.lock on read-only files.
There is a cross-platform file locking module here: Portalocker
Although as Kevin says, writing to a file from multiple processes at once is something you want to avoid if at all possible.
If you can shoehorn your problem into a database, you could use SQLite. It supports concurrent access and handles its own locking.
I have been looking at several solutions to do that and my choice has been
oslo.concurrency
It's powerful and relatively well documented. It's based on fasteners.
Other solutions:
Portalocker: requires pywin32, which is an exe installation, so not possible via pip
fasteners: poorly documented
lockfile: deprecated
flufl.lock: NFS-safe file locking for POSIX systems.
simpleflock : Last update 2013-07
zc.lockfile : Last update 2016-06 (as of 2017-03)
lock_file : Last update in 2007-10
I prefer lockfile — Platform-independent file locking
Locking is platform and device specific, but generally, you have a few options:
Use flock(), or equivalent (if your os supports it). This is advisory locking, unless you check for the lock, it's ignored.
Use a lock-copy-move-unlock methodology, where you copy the file, write the new data, then move it (move, not copy - move is an atomic operation in Linux -- check your OS), and you check for the existence of the lock file.
Use a directory as a "lock". This is necessary if you're writing to NFS, since NFS doesn't support flock().
There's also the possibility of using shared memory between the processes, but I've never tried that; it's very OS-specific.
For all these methods, you'll have to use a spin-lock (retry-after-failure) technique for acquiring and testing the lock. This does leave a small window for mis-synchronization, but its generally small enough to not be a major issue.
If you're looking for a solution that is cross platform, then you're better off logging to another system via some other mechanism (the next best thing is the NFS technique above).
Note that sqlite is subject to the same constraints over NFS that normal files are, so you can't write to an sqlite database on a network share and get synchronization for free.
Coordinating access to a single file at the OS level is fraught with all kinds of issues that you probably don't want to solve.
Your best bet is have a separate process that coordinates read/write access to that file.
Here's an example of how to use the filelock library, which is similar to Evan Fossmark's implementation:
from filelock import FileLock
lockfile = r"c:\scr.txt"
lock = FileLock(lockfile + ".lock")
with lock:
file = open(path, "w")
file.write("123")
file.close()
Any code within the with lock: block is thread-safe, meaning that it will be finished before another process has access to the file.
Locking a file is usually a platform-specific operation, so you may need to allow for the possibility of running on different operating systems. For example:
import os
def my_lock(f):
if os.name == "posix":
# Unix or OS X specific locking here
elif os.name == "nt":
# Windows specific locking here
else:
print "Unknown operating system, lock unavailable"
I have been working on a situation like this where I run multiple copies of the same program from within the same directory/folder and logging errors. My approach was to write a "lock file" to the disc before opening the log file. The program checks for the presence of the "lock file" before proceeding, and waits for its turn if the "lock file" exists.
Here is the code:
def errlogger(error):
while True:
if not exists('errloglock'):
lock = open('errloglock', 'w')
if exists('errorlog'): log = open('errorlog', 'a')
else: log = open('errorlog', 'w')
log.write(str(datetime.utcnow())[0:-7] + ' ' + error + '\n')
log.close()
remove('errloglock')
return
else:
check = stat('errloglock')
if time() - check.st_ctime > 0.01: remove('errloglock')
print('waiting my turn')
EDIT---
After thinking over some of the comments about stale locks above I edited the code to add a check for staleness of the "lock file." Timing several thousand iterations of this function on my system gave and average of 0.002066... seconds from just before:
lock = open('errloglock', 'w')
to just after:
remove('errloglock')
so I figured I will start with 5 times that amount to indicate staleness and monitor the situation for problems.
Also, as I was working with the timing, I realized that I had a bit of code that was not really necessary:
lock.close()
which I had immediately following the open statement, so I have removed it in this edit.
this worked for me:
Do not occupy large files, distribute in several small ones
you create file Temp, delete file A and then rename file Temp to A.
import os
import json
def Server():
i = 0
while i == 0:
try:
with open(File_Temp, "w") as file:
json.dump(DATA, file, indent=2)
if os.path.exists(File_A):
os.remove(File_A)
os.rename(File_Temp, File_A)
i = 1
except OSError as e:
print ("file locked: " ,str(e))
time.sleep(1)
def Clients():
i = 0
while i == 0:
try:
if os.path.exists(File_A):
with open(File_A,"r") as file:
DATA_Temp = file.read()
DATA = json.loads(DATA_Temp)
i = 1
except OSError as e:
print (str(e))
time.sleep(1)
The scenario is like that:
The user requests a file to do something. Then, if the user sends the same request again, it informs the user that the second request is not done until the first request finishes. That's why, I use lock-mechanism to handle this issue.
Here is my working code:
from lockfile import LockFile
lock = LockFile(lock_file_path)
status = ""
if not lock.is_locked():
lock.acquire()
status = lock.path + ' is locked.'
print status
else:
status = lock.path + " is already locked."
print status
return status
I found a simple and worked(!) implementation from grizzled-python.
Simple use os.open(..., O_EXCL) + os.close() didn't work on windows.
You may find pylocker very useful. It can be used to lock a file or for locking mechanisms in general and can be accessed from multiple Python processes at once.
If you simply want to lock a file here's how it works:
import uuid
from pylocker import Locker
# create a unique lock pass. This can be any string.
lpass = str(uuid.uuid1())
# create locker instance.
FL = Locker(filePath='myfile.txt', lockPass=lpass, mode='w')
# aquire the lock
with FL as r:
# get the result
acquired, code, fd = r
# check if aquired.
if fd is not None:
print fd
fd.write("I have succesfuly aquired the lock !")
# no need to release anything or to close the file descriptor,
# with statement takes care of that. let's print fd and verify that.
print fd
If you just need Mac/POSIX this should work without external packages.
import sys
import stat
import os
filePath = "<PATH TO FILE>"
if sys.platform == 'darwin':
flags = os.stat(filePath).st_flags
if flags & ~stat.UF_IMMUTABLE:
os.chflags(filePath, flags & stat.UF_IMMUTABLE)
and if you want to unlock a file just change,
if flags & stat.UF_IMMUTABLE:
os.chflags(filePath, flags & ~stat.UF_IMMUTABLE)

Python read YAML: where does it go wrong: with open() or yaml.load()?

I have a script which loads a YAML file as an object. The related part is very simple:
def run_test_spec(self, file_path):
try:
with open(file_path, 'r') as f:
test_spec = yaml.load(f)
if test_spec:
do_test(test_spec)
else:
print("empty test_spec")
except BaseException as err:
print("error in loading yaml file:", file_path)
The file_path was passed in after finishing some comparisons on file entries with for entries in os.scandir(some_directory) (there is no break statement within the for loop).
It has been running fine until recently. The test_spec gets the value None after the first run. I debugged it with Pycharm. It the breakpoint is set at the line if test_spec:, test_spec is None but if the breakpoint is set either at the line with open(...) or yaml.load(), test_spec gets loaded properly. In the end, I added a time.sleep(0.2) statement before with open(...), then it works all the time.
What was the likely cause of it? Is it the problem of with open(...) or yaml.load()? How do get it right without the sleep?
Edited on June 27, 2018,
I did further debugging, and found the line in the code which makes the difference. In file /usr/local/lib/python3.5/dist-packages/yaml/reader.py on my machine:
def update_raw(self, size=4096):
data = self.stream.read(size)
if self.raw_buffer is None:
self.raw_buffer = data
else:
self.raw_buffer += data
self.stream_pointer += len(data)
if not data:
self.eof = True
If the breakpoint is set to the first line (data = ...), data is read fine with the content of the file, however, it the breakpoint is set to the second line (if self.raw_buffer is None:), data is read in as an empty string, which caused a StreamEndEvent and thus the empty return from yaml.load().
I could not step in self.stream.read(size), which only got me to some code in /usr/lib/python3.5/codecs.py.
I don't think it is the Python library caused this problem. Probably it has something to do with my code. I noticed this happens after I run a test, which involves spawning two child processes running as pipe and kills the second process with terminate(). I checked the program with psutil, there is only one thread, no child processes, no open files after the run. Looks like it is clean. But then the new request files could not be read, unless I added a sleep or did a break before the stream read. If the second process, also in a pipe, terminates by itself, the issue does not occur.
If no breakpoint is set but just print the f.tell() before calling yaml.load(f), it is always 0, whether the yaml.load(f) returns None or not.
PyYAML got a new release yesterday (2018-06-26). There were no announcements that this indicated an API break, but as can be expected from the major version number change there was.
The (unsafe) load() that you use has been renamed
danger_load() by the merge of this PR
You can pin your PyYAML install on 3.x ( pip install "pyyaml<4" ) or change your code to use danger_load(). The best solution would probably be to write explicit representers for the objects that now are dumped using !!python/path_to_your_type, so you can use the safe_load().
I could not find any announcement of possible breakage in the documentation.

Python (Watchdog) - Waiting for file to be created correctly

I'm new to Python and I'm trying to implement a good "file creation" detection. If I do not put a time.sleep(x) my files are elaborated in a wrong way since they are still being "created" in the folder. (buffer is not empty)
How can I circumvent this thing without waiting x seconds every time a file is created?
This is my code:
Main:
while 1:
if len(parser()) > 0: # arguments are valid
if len(parser()) == 3:
log_path = parser()['log_path']
else:
log_path = os.getcwd()
paths = parser()
if paths:
handler = Event_Handler()
observer = Observer()
observer.schedule(handler, paths['src_fld'], True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
else:
exit(1)
Event_Handler class:
class Event_Handler(FileSystemEventHandler):
def on_created(self, event):
if not event.is_directory:
time.sleep(1)
As I said, without that time.sleep(1) if I try to process a big file I'll fail since it's still not completely written.
For the sake of any future readers who stumble upon this question, as I have, the answer appears to be that you cannot. Watchdog does not and will not support any feature to tell if a file is "complete" as Windows doesn't allow for it and Watchdog is meant to be system-agnostic.
If you're on Linux or some distro of it, inotify is probably a safe bet. Otherwise, on Windows, the best solutions I've found are:
Upload a big file, bigfile, and then another file, bigfile-complete. When you find a file name-complete, you go back and upload/transfer/react to the original file name. In this case, your files would all be added to the monitored directory in a queue going file, file-complete, file2, file2-complete, . . .
Poll on the size of the file until it has remained fixed for a suitable length of time. When it hasn't changed in long enough that you can be reasonably certain it is finished, react to it as normal.
Similarly, when a file is being uploaded to your directory in bits and pieces, it will generate a constant stream of file-modified Watchdog events. You can poll these instead of file size, waiting until they've stopped for a reasonable length of time, and then assume the file is complete and proceed.
None of these solutions are perfect, but this seems to be an inherent issue to Watchdog on Windows. Unfortunately the "perfect" solution seems to be "swap to Linux and use inotify".
Try reading the file in a while loop:
def on_created(event):
...
# WAITING FOR FILE TRANSFER
file = None
while file is None:
try:
file = open(event.src_path)
except OSError:
file = None
print("WAITING FOR FILE TRANSFER....")
time.sleep(3)
continue
Instead of using elapsed time as an indicator, the cleanest solution would be to monitor only certain types of files, using the patterns variable of a PatternMatchingEventHandler.
Simply append '.temp' to every file you are uploading/writing, and rename them to their real name when they're finished.
Set the patterns to look for '*.temp' files, and monitor their renaming to whatever type of file you desire using the FileSystemMovedEvent event (and its associated Handler.on_moved() method) and its dest_path value, which will include the new name of the file, now completely written.

Python: Reread contents of a file

I have a file that an application updates every few seconds, and I want to extract a single number field in that file, and record it into a list for use later. So, I'd like to make an infinite loop where the script reads a source file, and any time it notices a change in a particular figure, it writes that figure to an output file.
I'm not sure why I can't get Python to notice that the source file is changing:
#!/usr/bin/python
import re
from time import gmtime, strftime, sleep
def write_data(new_datapoint):
output_path = '/media/USBHDD/PythonStudy/torrent_data_collection/data_one.csv'
outfile = open(output_path, 'a')
outfile.write(new_datapoint)
outfile.close()
forever = 0
previous_data = "0"
while forever < 1:
input_path = '/var/lib/transmission-daemon/info/stats.json'
infile = open(input_path, "r")
infile.seek(0)
contents = infile.read()
uploaded_bytes = re.search('"uploaded-bytes":\s(\d+)', contents)
if uploaded_bytes:
current_time = strftime("%Y-%m-%d %X", gmtime())
current_data = uploaded_bytes.group(1)
if current_data != previous_data:
write_data(","+ current_time + "$" + uploaded_bytes.group(1))
previous_data = uploaded_bytes.group(1)
infile.close()
sleep(5)
else:
print "couldn't write" + strftime("%Y-%m-%d %X", gmtime())
infile.close()
sleep(60)
As is now, the (messy) script writes once correctly, and then I can see that although my source file (stats.json) file is changing, my script never picks up on any changes. It keeps on running, but my output file doesn't grow.
I thought that an open() and a close() would do the trick, and then tried throwing in a .seek(0).
What file method am I missing to ensure that python re-opens and re-reads my source file, (stats.json)?
Unless you are implementing some synchronization mechanism or could guarantee somehow atomic read and write, I think you are calling for race condition and subtle bugs here.
Imagine the "reader" accessing the file whereas the "writer" hasn't completed its write cycle. There is a risk of reading incomplete/inconsistent data. In "modern" systems, you could also hit the cache -- and not seeing file modifications "live" as they appends.
I can think of two possible solutions:
You forgot the parentheses on the close in the else of the infinite loop.
infile.close --> infile.close()
The program that is changing the JSON file is not closing the file, and therefore it is not actually changing.
Two problems I see:
Are you sure your file is really updated on filesystem? I do not know on what operating system you are playing with your code, but caching may kick your a$$ in this case, if the files is not flushed by producer.
Your problem is worth considering pipe instead of file, however I cannot guarantee what transmission will do if it stuck on writing to pipe if your consumer is dead.
Answering your problems, consider using one of the following:
pynotifyu
watchdog
watcher
These modules are intended to monitor changes on filesystem and then call proper actions. Method in your example is primitive, has big performance penalty and couple other problems mentioned already in other answers.
Ilya, would it help to check(os.path.getmtime), whether stats.json changed before you process the file?
Moreover, i'd suggest to make advantage of the fact it's JSON file:
import json
import os
import sys
dir_name ='/home/klaus/.config/transmission/'
# stats.json of daemon might be elsewhere
file_name ='stats.json'
full_path = os.path.join(dir_name, file_name)
with open(full_path) as fp:
json.load(fp)
data = json.load(fp)
print data['uploaded-bytes']
Thanks for all the answers, unfortunately my error was in the shell, and not in the script with Python.
The cause of the problem turned out to be the way I was putting the script in the background. I was doing: Ctrl+Z which I thought would put the task in the background. But it does not, Ctrl+Z only suspends the task and returns you to the shell, a subsequent bg command is necessary for the script to run on infinite loop in the background

What is the python "with" statement designed for?

I came across the Python with statement for the first time today. I've been using Python lightly for several months and didn't even know of its existence! Given its somewhat obscure status, I thought it would be worth asking:
What is the Python with statement
designed to be used for?
What do
you use it for?
Are there any
gotchas I need to be aware of, or
common anti-patterns associated with
its use? Any cases where it is better use try..finally than with?
Why isn't it used more widely?
Which standard library classes are compatible with it?
I believe this has already been answered by other users before me, so I only add it for the sake of completeness: the with statement simplifies exception handling by encapsulating common preparation and cleanup tasks in so-called context managers. More details can be found in PEP 343. For instance, the open statement is a context manager in itself, which lets you open a file, keep it open as long as the execution is in the context of the with statement where you used it, and close it as soon as you leave the context, no matter whether you have left it because of an exception or during regular control flow. The with statement can thus be used in ways similar to the RAII pattern in C++: some resource is acquired by the with statement and released when you leave the with context.
Some examples are: opening files using with open(filename) as fp:, acquiring locks using with lock: (where lock is an instance of threading.Lock). You can also construct your own context managers using the contextmanager decorator from contextlib. For instance, I often use this when I have to change the current directory temporarily and then return to where I was:
from contextlib import contextmanager
import os
#contextmanager
def working_directory(path):
current_dir = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(current_dir)
with working_directory("data/stuff"):
# do something within data/stuff
# here I am back again in the original working directory
Here's another example that temporarily redirects sys.stdin, sys.stdout and sys.stderr to some other file handle and restores them later:
from contextlib import contextmanager
import sys
#contextmanager
def redirected(**kwds):
stream_names = ["stdin", "stdout", "stderr"]
old_streams = {}
try:
for sname in stream_names:
stream = kwds.get(sname, None)
if stream is not None and stream != getattr(sys, sname):
old_streams[sname] = getattr(sys, sname)
setattr(sys, sname, stream)
yield
finally:
for sname, stream in old_streams.iteritems():
setattr(sys, sname, stream)
with redirected(stdout=open("/tmp/log.txt", "w")):
# these print statements will go to /tmp/log.txt
print "Test entry 1"
print "Test entry 2"
# back to the normal stdout
print "Back to normal stdout again"
And finally, another example that creates a temporary folder and cleans it up when leaving the context:
from tempfile import mkdtemp
from shutil import rmtree
#contextmanager
def temporary_dir(*args, **kwds):
name = mkdtemp(*args, **kwds)
try:
yield name
finally:
shutil.rmtree(name)
with temporary_dir() as dirname:
# do whatever you want
I would suggest two interesting lectures:
PEP 343 The "with" Statement
Effbot Understanding Python's
"with" statement
1.
The with statement is used to wrap the execution of a block with methods defined by a context manager. This allows common try...except...finally usage patterns to be encapsulated for convenient reuse.
2.
You could do something like:
with open("foo.txt") as foo_file:
data = foo_file.read()
OR
from contextlib import nested
with nested(A(), B(), C()) as (X, Y, Z):
do_something()
OR (Python 3.1)
with open('data') as input_file, open('result', 'w') as output_file:
for line in input_file:
output_file.write(parse(line))
OR
lock = threading.Lock()
with lock:
# Critical section of code
3.
I don't see any Antipattern here.
Quoting Dive into Python:
try..finally is good. with is better.
4.
I guess it's related to programmers's habit to use try..catch..finally statement from other languages.
The Python with statement is built-in language support of the Resource Acquisition Is Initialization idiom commonly used in C++. It is intended to allow safe acquisition and release of operating system resources.
The with statement creates resources within a scope/block. You write your code using the resources within the block. When the block exits the resources are cleanly released regardless of the outcome of the code in the block (that is whether the block exits normally or because of an exception).
Many resources in the Python library that obey the protocol required by the with statement and so can used with it out-of-the-box. However anyone can make resources that can be used in a with statement by implementing the well documented protocol: PEP 0343
Use it whenever you acquire resources in your application that must be explicitly relinquished such as files, network connections, locks and the like.
Again for completeness I'll add my most useful use-case for with statements.
I do a lot of scientific computing and for some activities I need the Decimal library for arbitrary precision calculations. Some part of my code I need high precision and for most other parts I need less precision.
I set my default precision to a low number and then use with to get a more precise answer for some sections:
from decimal import localcontext
with localcontext() as ctx:
ctx.prec = 42 # Perform a high precision calculation
s = calculate_something()
s = +s # Round the final result back to the default precision
I use this a lot with the Hypergeometric Test which requires the division of large numbers resulting form factorials. When you do genomic scale calculations you have to be careful of round-off and overflow errors.
An example of an antipattern might be to use the with inside a loop when it would be more efficient to have the with outside the loop
for example
for row in lines:
with open("outfile","a") as f:
f.write(row)
vs
with open("outfile","a") as f:
for row in lines:
f.write(row)
The first way is opening and closing the file for each row which may cause performance problems compared to the second way with opens and closes the file just once.
See PEP 343 - The 'with' statement, there is an example section at the end.
... new statement "with" to the Python
language to make
it possible to factor out standard uses of try/finally statements.
points 1, 2, and 3 being reasonably well covered:
4: it is relatively new, only available in python2.6+ (or python2.5 using from __future__ import with_statement)
The with statement works with so-called context managers:
http://docs.python.org/release/2.5.2/lib/typecontextmanager.html
The idea is to simplify exception handling by doing the necessary cleanup after leaving the 'with' block. Some of the python built-ins already work as context managers.
Another example for out-of-the-box support, and one that might be a bit baffling at first when you are used to the way built-in open() behaves, are connection objects of popular database modules such as:
sqlite3
psycopg2
cx_oracle
The connection objects are context managers and as such can be used out-of-the-box in a with-statement, however when using the above note that:
When the with-block is finished, either with an exception or without, the connection is not closed. In case the with-block finishes with an exception, the transaction is rolled back, otherwise the transaction is commited.
This means that the programmer has to take care to close the connection themselves, but allows to acquire a connection, and use it in multiple with-statements, as shown in the psycopg2 docs:
conn = psycopg2.connect(DSN)
with conn:
with conn.cursor() as curs:
curs.execute(SQL1)
with conn:
with conn.cursor() as curs:
curs.execute(SQL2)
conn.close()
In the example above, you'll note that the cursor objects of psycopg2 also are context managers. From the relevant documentation on the behavior:
When a cursor exits the with-block it is closed, releasing any resource eventually associated with it. The state of the transaction is not affected.
In python generally “with” statement is used to open a file, process the data present in the file, and also to close the file without calling a close() method. “with” statement makes the exception handling simpler by providing cleanup activities.
General form of with:
with open(“file name”, “mode”) as file_var:
processing statements
note: no need to close the file by calling close() upon file_var.close()
The answers here are great, but just to add a simple one that helped me:
with open("foo.txt") as file:
data = file.read()
open returns a file
Since 2.6 python added the methods __enter__ and __exit__ to file.
with is like a for loop that calls __enter__, runs the loop once and then calls __exit__
with works with any instance that has __enter__ and __exit__
a file is locked and not re-usable by other processes until it's closed, __exit__ closes it.
source: http://web.archive.org/web/20180310054708/http://effbot.org/zone/python-with-statement.htm

Categories

Resources