How to get a file close event in python

How to get a file close event in python - python

Using python 2.7 on windows 7 64 bit machine.
How to get a file close event:
when file is opened in a new process of file opener (like notepad, wordpad which opens file everytime in new process of wordpad)
when file is opened in a tab of file opener (like notepad++, which opens all files in new tab but there exist only a single process of notepad++ running)
So, how to get file close event in above cases? Is it possible to achieve above cases through a common code? I am dealing with different file types

This has proven to be a very easy task for *nix systems, but on Windows, getting a file close event is not a simple task. Read below the summary of common methods grouped by OS'es.
For Linux
On Linux, the filesystem changes can be easily monitored, and in great detail. The best tool for this is the kernel feature called inotify, and there is a Python implementation that uses it, called Pynotify.
Pyinotify
Pyinotify is a Python module for monitoring filesystems changes. Pyinotify relies on a Linux Kernel feature (merged in kernel 2.6.13) called inotify, which is an event-driven notifier. Its notifications are exported from kernel space to user space through three system calls. Pyinotify binds these system calls and provides an implementation on top of them offering a generic and abstract way to manipulate those functionalities.
Here you can find the list of the events that can be monitored with Pynotify.
Example usage:
import pyinotify
class EventHandler(pyinotify.ProcessEvent):
def process_IN_CLOSE_NOWRITE(self, event):
print "File was closed without writing: " + event.pathname
def process_IN_CLOSE_WRITE(self, event):
print "File was closed with writing: " + event.pathname
def watch(filename):
wm = pyinotify.WatchManager()
mask = pyinotify.IN_CLOSE_NOWRITE | pyinotify.IN_CLOSE_WRITE
wm.add_watch(filename, mask)
eh = EventHandler()
notifier = pyinotify.Notifier(wm, eh)
notifier.loop()
if __name__ == '__main__':
watch('/path/to/file')
For Windows
Situation for Windows is quite a bit more complex than for Linux. Most libraries rely on ReadDirectoryChanges API which is restricted and can't detect finer details like file close event. There are however other methods for detecting such events, so read on to find out more.
Watcher
Note: Watcher has been last updated in February 2011, so its probably safe to skip this one.
Watcher is a low-level C extension for receiving file system updates using the ReadDirectoryChangesW API on Windows systems. The package also includes a high-level interface to emulate most of the .NET FileSystemWatcher API.
The closest one can get to detecting file close events with Watcher is to monitor the FILE_NOTIFY_CHANGE_LAST_WRITE and/or FILE_NOTIFY_CHANGE_LAST_ACCESS events.
Example usage:
import watcher
w = watcher.Watcher(dir, callback)
w.flags = watcher.FILE_NOTIFY_CHANGE_LAST_WRITE
w.start()
Watchdog
Python API and shell utilities to monitor file system events. Easy install: $ pip install watchdog. For more info visit the documentation.
Watchdog on Windows relies on the ReadDirectoryChangesW API, which brings its caveats as with Watcher and other libraries relying on the same API.
Pywatch
A python near-clone of the Linux watch command. The pywatch.watcher.Watcher class can be told to watch a set of files, and given a set of commands to run whenever any of those files change. It can only monitor the file changed event, since it relies on polling the stat's st_mtime.
Bonus for Windows with NTFS:
NTFS USN Journal
The NTFS USN (Update Sequence Number) Journal is a feature of NTFS which maintains a record of changes made to the volume. The reason it is listed as a Bonus is because unlike the other entries, it is not a specific library, but rather a feature existing on NTFS system. So if you are using other Windows filesystems (like FAT, ReFS, etc..) this does not apply.
The way it works it that the system records all changes made to the volume in the USN Journal file, with each volume having its own instance. Each record in the Change Journal contains the USN, the name of the file, and information about what the change was.
The main reason this method is interesting for this question is that, unlike most of the other methods, this one provides a way to detect a file close event, defined as USN_REASON_CLOSE. More information with a complete list of events can be found in this MSDN article. For a complete documentation about USN Journaling, visit this MSDN page.
There are multiple ways to access the USN Journal from Python, but the only mature option seems to be the ntfsjournal module.
The "proper" way for Windows:
File system filter driver
As descibed on the MSDN page:
A file system filter driver is an optional driver that adds value to
or modifies the behavior of a file system. A file system filter driver
is a kernel-mode component that runs as part of the Windows executive.
A file system filter driver can filter I/O operations for one or more
file systems or file system volumes. Depending on the nature of the
driver, filter can mean log, observe, modify, or even prevent. Typical
applications for file system filter drivers include antivirus
utilities, encryption programs, and hierarchical storage management
systems.
It is not an easy task to implement a file system filter driver, but for someone who would like to give it a try, there is a good introduction tutorial on CodeProject.
P.S. Check #ixe013's answer for some additional info about this method.
Multiplatform
Qt's QFileSystemWatcher
The QFileSystemWatcher class provides an interface for monitoring files and directories for modifications. This class was introduced in Qt 4.2.
Unfortunately, its functionality is fairly limited, as it can only detect when a file has been modified, renamed or deleted, and when a new file was added to a directory.
Example usage:
import sys
from PyQt4 import QtCore
def directory_changed(path):
print('Directory Changed: %s' % path)
def file_changed(path):
print('File Changed: %s' % path)
app = QtCore.QCoreApplication(sys.argv)
paths = ['/path/to/file']
fs_watcher = QtCore.QFileSystemWatcher(paths)
fs_watcher.directoryChanged.connect(directory_changed)
fs_watcher.fileChanged.connect(file_changed)
app.exec_()

The problem you are facing is not with Python, but with Windows. It can be done, but you will have to write some non-trival C/C++ code for it.
A file open or a file close user mode notification does not exist in userland on Windows. That's why the libraries suggested by others do not have file close notification. In Windows, the API to detect changes in userland is ReadDirectoryChangesW. It will alert you of one of the following notifications :
FILE_ACTION_ADDED if a file was added to the directory.
FILE_ACTION_REMOVED if a file was removed from the directory.
FILE_ACTION_MODIFIED if a file was modified. This can be a change in the time stamp or attributes.
FILE_ACTION_RENAMED_OLD_NAME if a file was renamed and this is the old name.
FILE_ACTION_RENAMED_NEW_NAME if a file was renamed and this is the new name.
No amount of Python can change what Windows provides you with.
To get a file close notification, tools like Process Monitor install a Minifilter that lives in the kernel, near the top of other filters like EFS.
To acheive what you want, you would need to:
Install a Minifilter that has the code to send events back to userland. Use Microsoft's Minispy sample, it is stable and fast.
Convert the code from the user program to make it a Python extension (minispy.pyd) that exposes a generator that produces the events. This is the hard part, I will get back to that.
You will have to filter out events, you won't beleive the amount of IO goes on an idle Windows box!
Your Python program can then import your extension and do its thing.
The whole thing looks something like this :
Of course you can have EFS over NTFS, this is just to show that your minifilter would be above all that.
The hard parts :
Your minifilter will have to be digitally signed by an authority Microsoft trusts. Verising comes to mind but there are others.
Debugging requires a separate (virtual) machine, but you can make your interface easy to mock.
You will need to install the minifilter with an account that has adminstrator rights. Any user will be able to read events.
You will have to deal with multi-users your self. There is only one minifilter for many users.
You will have to convert user program from the MiniSpy sample to a DLL, which you will wrap with a Python extension.
The last two are the hardest.

You can use Pyfanotyfi or butter.
I think you'll find this link very usefull: Linux file system events with C, Python and Ruby
There you will find an example about doing exactly what you want(using pyinotify) this is the code:
import pyinotify
DIR_TO_WATCH="/tmp/notify-dir"
FILE_TO_WATCH="/tmp/notify-dir/notify-file.txt"
wm = pyinotify.WatchManager()
dir_events = pyinotify.IN_DELETE | pyinotify.IN_CREATE
file_events = pyinotify.IN_OPEN | pyinotify.IN_CLOSE_WRITE | pyinotify.IN_CLOSE_NOWRITE
class EventHandler(pyinotify.ProcessEvent):
def process_IN_DELETE(self, event):
print("File %s was deleted" % event.pathname) #python 3 style print function
def process_IN_CREATE(self, event):
print("File %s was created" % event.pathname)
def process_IN_OPEN(self, event):
print("File %s was opened" % event.pathname)
def process_IN_CLOSE_WRITE(self, event):
print("File %s was closed after writing" % event.pathname)
def process_IN_CLOSE_NOWRITE(self, event):
print("File %s was closed after reading" % event.pathname)
event_handler = EventHandler()
notifier = pyinotify.Notifier(wm, event_handler)
wm.add_watch(DIR_TO_WATCH, dir_events)
wm.add_watch(FILE_TO_WATCH, file_events)
notifier.loop()

I have not found a package that captures open and close events on Windows. As others have mentioned, pyinotify, is an excellent option for Linux based operating systems.
Since I wasn't able to watch for the closed event, I settled for the modified event. It's very much an 'after the fact' type of solution (ie. I can't pause until I see a file is closed). But, this has worked surprisingly well.
I've used the watchdog package. The code below is from their sample implementation and watches the current directory if you don't pass a path on the command line, otherwise it watches the path you pass.
Example call: python test.py or python test.py C:\Users\Administrator\Desktop
import sys
import time
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO,
format='%(asctime)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
path = sys.argv[1] if len(sys.argv) > 1 else '.'
event_handler = LoggingEventHandler()
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
This code will show you when files are created, modified, deleted or renamed/moved. You can filter by just modified by watching for the on_modified event.

Related

A flask website, when it delete a file (os.remove("abc.txt")), the file is removed but the space is not reclaimed

The program is a standard flask program, and it does some cleanup as part of the initialization. In the cleanup() procedure, using os.remove("abc.txt"), I noticed that the file is removed, but not reclaimed by the OS.
I use both "python website.py" and "gunicorn website:app" to run the application and both have the same problem, in Linux environment. In MacOS, I can't reproduce it.
After the file is os.remove , it is no longer listed in "ls" command, but when I run
lsof | grep deleted
I can still see this file being listed as deleted but opened by the python application.
Because this file is already "os.remove", it is not listed in ls command, and du will not calculate this file.
But if this file is big enough, df command will show the space of this file is still being occupied, not being reclaimed. Because this file is still "being open by the flask application", as the lsof program claimed.
As soon as I stop the flask application from running, the lsof will not have this file, and the space is reclaimed.
Usually when the file is too small, or when the application stops or restarts frequently, you won't notice the space is being occupied. But this is not very reasonable for keeping the space. I would expect the website running for years.
When searching internet for "open but deleted files", most suggestions are "find the application and kill it". Is there a way to keep the flask application running without restarting it? My application doesn't actually "open" this file, but simply os.remove it.
Suggestion on how to delete file and re-claim the space immediately?

The Flask application either needs the large file to continue running, or does not release unneeded resources.
If the app needs the large file, that's it. Otherwise, the app is buggy and in need to be corrected.
In both cases, the "being open" status of the large file (that, at least on Linux, leads to the file still being present in the mass memory system) cannot be controlled by your script.

The os.remove() only delegates the removal of the file to the operating system. If the file is still somewhere referenced in your code, lsof will show the file, of course. Without providing code, it is hard to tell where the unwanted behavior comes from. But at least I can give you some insights about the referencing behavior.
Here is a small script that should only show you that a file could be still considered as open if it is referenced.
import os
import psutil
PATH = "abc.txt"
def write_file(filepath):
"""Simulating existing file with correctly closing it at the end"""
with open(filepath, "x") as file:
file.write("Hello, world!")
def remove_file(filepath):
"""Let the operating system handle the file removement"""
os.remove(filepath)
def lsof():
"""Simulating lsof command (requires e.g. `pip install psutil`)"""
p = psutil.Process()
open_files = p.open_files()
if open_files:
return "\n".join(os.path.basename(p.path) for p in p.open_files())
else:
return "No open files found."
if __name__ == "__main__":
print("\n----- EXAMPLE 1 -----\n")
write_file(PATH)
print(lsof())
remove_file(PATH)
print(lsof())
print("\n----- EXAMPLE 2 -----\n")
write_file(PATH)
file = open(PATH) # referenced!
print(lsof())
remove_file(PATH)
print(lsof())
And the output of example 2 shows you, that after the file was referenced, it is also available to lsof command:
----- EXAMPLE 1 -----
No open files found.
No open files found.
----- EXAMPLE 2 -----
abc.txt
No open files found.
Both examples show you also that there is no open file descriptor any more after removing the file.
You can maybe try to debug your code e.g. with psutil.Process.open_files() similar to my example to find out where a mismatch of the expectation exists that a specific file should be closed.

Get file location and names from Windows Camera

I was running into troubles using QCamera with focusing and other things, so I thought I can use the Camerasoftware served with Windows 10. Based on the thread of opening the Windows Camera I did some trials to aquire the taken images and use them for my program. In the documentation and its API I didn't find usable snippets (for me), so I created the hack mentioned below. It assumes that the images are in the target folder 'C:\\Users\\*username*\\Pictures\\Camera Roll' which is mentioned in the registry (See below), but I don't know if this is reliable or how to get the proper key name.
I don't think that this is the only and cleanest solution. So, my question is how to get taken images and open/close the Camera proper?
Actualy the function waits till the 'WindowsCamera.exe' has left the processlist and return newly added images / videos in the target folder
In the registry I found:
Entry: Computer\HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\User Shell Folders with key name {3B193882-D3AD-4eab-965A-69829D1FB59F}for the target folder. I don't think that this key is usable.
Working example of my hack:
import subprocess
import pathlib
import psutil
def check_for_files(path, pattern):
print(" check_for_files:", (path, pattern))
files = []
for filename in pathlib.Path(path).rglob(pattern):
files.append (filename)
return files
def get_Windows_Picture(picpath):
prefiles = check_for_files(picpath, '*.jpg')
x = subprocess.call('start microsoft.windows.camera:', shell=True)
processlist = [proc.info['name'] for proc in psutil.process_iter (['name'])]
while 'WindowsCamera.exe' in processlist:
processlist = [proc.info['name'] for proc in psutil.process_iter (['name'])]
postfiles = check_for_files(picpath, '*.jpg')
newfiles = []
for file in postfiles:
if file not in prefiles:
newfiles.append(str(file))
return newfiles
if __name__ == "__main__":
picpath = str (pathlib.Path ("C:/Users/*user*/Pictures/Camera Roll"))
images = get_Windows_Picture(picpath)
print("Images:", images)

The Camera Roll is a "known Windows folder" which means some APIs can retrieve the exact path (even if it's non-default) for you:
SHGetKnownFolderPath
SHGetKnownFolderIDList
SHSetKnownFolderPath
The knownfolderid documentation will give you the constant name of the required folder (in your case FOLDERID_CameraRoll). As you can see in the linked page, the default is %USERPROFILE%\Pictures\Camera Roll (It's the default, so this doesn't mean it's the same for everyone).
The problem in Python is that you'll need to use ctypes which can be cumbersome some times (especially in your case when you'll have to deal with GUIDs and releasing the memory returned by the API).
This gist gives a good example on how to call SHGetKnownFolderPath from Python with ctypes. In your case you'll only need the CameraRoll member in the FOLDERID class so you can greatly simplify the code.
Side note: Don't poll for the process end, just use the wait() function on the Popen object.

Lock file for access on windows

Using portalocker we can lock a file for access through the following way:
f=open("M99","r+")
portalocker.lock(f,portalocker.LOCK_EX)
The lock over the file can be removed using
f.close() #or
portalocker.unlock(file) #needs `file` ie reference to file it locked ..pretty obvious too
Can this same thing be done by any other way in python wherein
We can lock the file for access
Restart Python (so no longer have the original Python file object or file number).
Unlock the file for access in the new process.
I cannot save f or file object so can't use pickle or something either. Is there a way using the Python standard library or some win32api call?
Any windows utility will also do...any command line from windows?

It appears you want to lock access to resources where the lock persists between program invocations. You need a different strategy for that.
Create a lock file using exclusive create mode; in Python 2 this requires using the os.open() call (followed by os.fdopen() to produce a Python file object), in Python 3 you can use the 'x' mode when using the built-in open().
In Python 2:
import os
LOCKFILE = r'some\path\to\lockfile'
class AlreadyLocked(Exception):
pass
def lock():
try:
fd = os.open(LOCKFILE, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
except IOError:
# file already exists
raise AlreadyLocked()
with os.fdopen(fd, 'w') as lockfile:
# write the PID of the current process so you can debug
# later if a lockfile can be deleted after a program crash
lockfile.write(os.getpid())
def unlock():
os.remove(LOCKFILE)
In Python 3 the lock() function would be:
def lock():
try:
with open(LOCKFILE, 'x') as lockfile:
# write the PID of the current process so you can debug
# later if a lockfile can be deleted after a program crash
lockfile.write(os.getpid())
except IOError:
# file already exists
raise AlreadyLocked()
You need to use exclusive create mode to avoid race conditions; in exclusive create mode the file can only be created if it doesn't yet exist, a condition checked by the Operating System, rather than by a separate step in Python which would open a window for another program to create the lock as well.
Now you can lock and unlock without tracking the file descriptor. The lockfile is now a signal file; if it is present something has claimed a lock, and deleting the file means something is unlocked.
This does mean that access to the files or directories you are trying to protect is only protected because all your code honours this lock system, not because the OS is enforcing locks on those files or directories.
This all means that this only works if all access to the shared resource is handled by processes that cooperate in this strategy. It cannot be used if another process doesn't honour this scheme. In that case your only option is to use OS level locking and you have to keep your process running for the full duration of the lock.

there is a method in win32api to set file attributes if you have a read of the following:
python SetFileAttributes
MSDN file attributes
these give you the python method to set file attributes:
win32api.SetFileAttributes(file, win32con.FILE_ATTRIBUTE_NORMAL)
where file is the name/path of the file, and not a file object
and the second argument is a attribute mask, is you wanted to set several attributes at once, you can use bitwise xor to add them:
win32con.FILE_ATTRIBUTE_HIDDEN | win32con.FILE_ATTRIBUTE_READONLY
and there are more constants named in the MSDN page.
EDIT:
for file locking you can also look at the win32file.LockFileEx method
i haven't used this before so it may take some playing around, but it appears to need you to pass it a file object (not a path) and then certain constants to set the access permissions, more info on the constants can be found on MSDN

You could use subprocess to open the file in notepad or excel:
import subprocess, time
subprocess.call('start excel.exe "\lockThisFile.txt\"', shell = True)
time.sleep(10) # if you need the file locked before executing the next commands, you may need to sleep it for a few seconds
or
subprocess.call('notepad > lockThisFile.txt', shell = True)
As written you need shell = True, otherwise windows will give you a syntax error.
(subprocess.Popen() works as well)
You can then close the process later using:
subprocess.call('taskkill /f /im notepad.exe') # or excel.exe
Other options include
-write some C++ code and call it from python (https://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx)
-call 3rd party programs with subprocess.call():
FileLocker http://www.jensscheffler.de/filelocker (https://superuser.com/questions/294826/how-to-purposefully-exclusively-lock-a-file)
Easy File Locker http://www.xoslab.com/efl.html and Dispatch (from win32com.client import Dispatch), although last choice is the most complex

How can I directly open a custom file with python on a double click?

I am programming on a windows machine and I have an app that reads file selected by the user. Is it possible to allow them to open the file directly when they double click. This needs to work when the program is "compiled" as an .exe with cxfreeze.
What I am really asking is this:
Is there a way to allow the user to double click on a custom file (.lpd) and when they do windows starts the program (a compiled cxfreeze .exe) and passes it the file path as an argument.

The only way Windows associates files with a particular program is by their extension, so this only works if your files have a unique extension (which it looks like maybe they do). So your user would need to setup the association on their machine, which varies depending on the version of Windows. For instance, in Windows 7 it would probably be through Control Panel\All Control Panel Items\Default Programs\Set Associations.
It is possible for you to automatically setup this association on their system (probably by editing the Windows registry), but that would generally be done during an installation, and you should ask the users permission to do this first.

I used PyInstaller for exe-generation.
Here is a small example:
import sys
class Test():
def __init__(self, path=None):
super().__init__()
self.path = path
def start(self):
if self.path == None:
pass
else:
print(self.path)
if __name__ == '__main__':
if len(sys.argv) > 1 :
mytest = Test(sys.argv[1])
else:
mytest = Test()
mytest.start()

PyGTK/GIO: monitor directory for changes recursively

Take the following demo code (from the GIO answer to this question), which uses a GIO FileMonitor to monitor a directory for changes:
import gio
def directory_changed(monitor, file1, file2, evt_type):
print "Changed:", file1, file2, evt_type
gfile = gio.File(".")
monitor = gfile.monitor_directory(gio.FILE_MONITOR_NONE, None)
monitor.connect("changed", directory_changed)
import glib
ml = glib.MainLoop()
ml.run()
After running this code, I can then create and modify child nodes and be notified of the changes. However, this only works for immediate children (I am aware that the docs don't say otherwise). The last of the following shell commands will not result in a notification:
touch one
mkdir two
touch two/three
Is there an easy way to make it recursive? I'd rather not manually code something that looks for directory creation and adds a monitor, removing them on deletion, etc.
The intended use is for a VCS file browser extension, to be able to cache the statuses of files in a working copy and update them individually on changes. So there might by anywhere from tens to thousands (or more) directories to monitor. I'd like to just find the root of the working copy and add the file monitor there.
I know about pyinotify, but I'm avoiding it so that this works under non-Linux kernels such as FreeBSD or... others. As far as I'm aware, the GIO FileMonitor uses inotify underneath where available, and I can understand not emphasising the implementation to maintain some degree of abstraction, but it suggested to me that it should be possible.
(In case it matters, I originally posted this on the PyGTK mailing list.)

"Is there an easy way to make it
recursive?"
I'm not aware of any "easy way" to achieve this. The underlying systems, such as inotify on Linux or kqueue on BSDs don't provide facilities to automatically add recursive watches. I'm also not aware of any library layering what you want atop GIO.
So you'll most likely have to build this yourself. As this can be a bit trick in some corner cases (e.g. mkdir -p foo/bar/baz) I would suggest looking at how pynotify implements its auto_add functionality (grep through the pynotify source) and porting that over to GIO.

I'm not sure if GIO allows you to have more than one monitor at once, but if it does there's no* reason you can't do something like this:
import gio
import os
def directory_changed(monitor, file1, file2, evt_type):
if os.path.isdir(file2): #maybe this needs to be file1?
add_monitor(file2)
print "Changed:", file1, file2, evt_type
def add_monitor(dir):
gfile = gio.File(dir)
monitor = gfile.monitor_directory(gio.FILE_MONITOR_NONE, None)
monitor.connect("changed", directory_changed)
add_monitor('.')
import glib
ml = glib.MainLoop()
ml.run()
*when I say no reason, there's the possibility that this could become a resource hog, though with nearly zero knowledge about GIO I couldn't really say. It's also entirely possible to roll your own in Python with a few commands (os.listdir among others). It might look something like this
import time
import os
class Watcher(object):
def __init__(self):
self.dirs = []
self.snapshots = {}
def add_dir(self, dir):
self.dirs.append(dir)
def check_for_changes(self, dir):
snapshot = self.snapshots.get(dir)
curstate = os.listdir(dir)
if not snapshot:
self.snapshots[dir] = curstate
else:
if not snapshot == curstate:
print 'Changes: ',
for change in set(curstate).symmetric_difference(set(snapshot)):
if os.path.isdir(change):
print "isdir"
self.add_dir(change)
print change,
self.snapshots[dir] = curstate
print
def mainloop(self):
if len(self.dirs) < 1:
print "ERROR: Please add a directory with add_dir()"
return
while True:
for dir in self.dirs:
self.check_for_changes(dir)
time.sleep(4) # Don't want to be a resource hog
w = Watcher()
w.add_dir('.')
w.mainloop()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.