Run script when a folder or file is created

Run script when a folder or file is created - python

I have a perl script that sorts files from one incoming directory into other directories on a Ubuntu server.
As it is now I'm running it as a cron job every few minutes but it can give problems if the scripts starts while a files is getting written to the incoming dir.
A better solution would be to start it when a file is written to the incoming dir or any sub dirs.
I'm thinking I could run another script as a service that will call my sorting script whenever a dir change occur, however I have no idea of how to go about doing it.

On Linux you can use pyinotify library: https://github.com/seb-m/pyinotify
For watching subdirectories use rec=True in add_watch() invocation. Complete example monitoring /tmp directory and its subdirectories for file creation:
import pyinotify
class EventHandler(pyinotify.ProcessEvent):
def process_IN_CREATE(self, event):
# Processing of created file goes here.
print "Created:", event.pathname
wm = pyinotify.WatchManager()
notifier = pyinotify.Notifier(wm, EventHandler())
wm.add_watch('/tmp', pyinotify.IN_CREATE, rec=True)
notifier.loop()

Related

Running all Python scripts with the same name across many directories

I have a file structure that looks something like this:
Master:
First
train.py
other1.py
Second
train.py
other2.py
Third
train.py
other3.py
I want to be able to have one Python script that lives in the Master directory that will do the following when executed:
Loop through all the subdirectories (and their subdirectories if they exist)
Run every Python script named train.py in each of them, in whatever order necessary
I know how to execute a given python script from another file (given its name), but I want to create a script that will execute whatever train.py scripts it encounters. Because the train.py scripts are subject to being moved around and being duplicated/deleted, I want to create an adaptable script that will run all those that it finds.
How can I do this?

You can use os.walk to recursively collect all train.py scripts and then run them in parallel using ProcessPoolExecutor and the subprocess module.
import os
import subprocess
from concurrent.futures import ProcessPoolExecutor
def list_python_scripts(root):
"""Finds all 'train.py' scripts in the given directory recursively."""
scripts = []
for root, _, filenames in os.walk(root):
scripts.extend([
os.path.join(root, filename) for filename in filenames
if filename == 'train.py'
])
return scripts
def main():
# Make sure to change the argument here to the directory you want to scan.
scripts = list_python_scripts('master')
with ProcessPoolExecutor(max_workers=len(scripts)) as pool:
# Run each script in parallel and accumulate CompletedProcess results.
results = pool.map(subprocess.run,
[['python', script] for script in scripts])
for result in results:
print(result.returncode, result.stdout)
if __name__ == '__main__':
main()

Which OS are you using ?
If Ubuntu/CentOS try this combination:
import os
//put this in master and this lists every file in master + subdirectories and then after the pipe greps train.py
train_scripts = os.system("find . -type d | grep train.py ")
//next execute them
python train_scripts

If you are using Windows you could try running them from a PowerShell script. You can run two python scripts at once with just this:
python Test1.py
python Folder/Test1.py
And then add a loop and or a function that goes searching for the files. Because it's Windows Powershell, you have a lot of power when it comes to the filesystem and controlling Windows in general.

Easy way to automate running same python script across several EC2 instances?

I have 4 linux EC2 instances created from the same AMI that I use to process files in S3.
I run the same python script on each instance. It takes a directory of files in S3 to process and a number telling it which files it is supposed to process.
Say mydir contains myfile1 ... myfile8.
On instance 0 I call:
python process.py mydir 0
This causes it to process myfile1 and myfile5.
On instance 1 I call:
python process.py mydir 1
This causes it to process myfile2 and myfile2.
And so on.
Inside the script I do:
keys = keys[pid::4] where pid is the argument from the command line.
I redistribute changes to my python script by syncing from S3.
Is there a simple way to automate this further?
I would like to press one button and say dir=yourdir and have it sync code from s3 and run on each instance.

You can try using Fabric.
Example taken from Fabric documentation:
from fabric import Connection
result = Connection('web1.example.com').run('uname -s', hide=True)
msg = "Ran {0.command!r} on {0.connection.host}, got stdout:\n{0.stdout}"
print(msg.format(result))
# Output:
# Ran 'uname -s' on web1.example.com, got stdout:
# Linux

Cron Job File Creation - Created File Permissions

I'm running an hourly cron job for testing. This job runs a python file called "rotateLogs". Cron can't use extensions, so the first line of the file is #!/usr/bin/python. This python file(fileA) then calls another python file(fileB) elsewhere in the computer. fileB logs out to a log file with the time stamp, etc. However, when fileB is run through fileA as a cron job, it creates its log files as rw-r--r-- files.
The problem is that if I then try to log to the files from fileB, they can't write to it unless you run them with sudo permissions. So I am looking for some way to deal with this. Ideally, it would be nice to simply make the files as rw-rw-r-- files, but I don't know how to do that with cron. Thank you for any help.
EDIT: rotateLogs(intentionally not .py):
#!/usr/bin/python
#rotateLogs
#Calls the rotateLog function in the Communote scripts folder
#Designed to be run as a daily log rotation cron job
import sys,logging
sys.path.append('/home/graeme/Communote/scripts')
import localLogging
localLogging.localLog("Hourly log",logging.error)
print "Hello"
There is no command in crontab, but it is running properly on the hourly cron(at 17 minutes past the hour).
FileB's relevant function:
def localLog(strToLog,severityLevel):
#Allows other scripts to log easily
#Takes their log request and appends it to the log file
logging.basicConfig(filename=logDirPath+getFileName(currDate),format="%(asctime)s %(message)s")
#Logs strToLog, such as logging.warning(strToLog)
severityLevel(strToLog)
return
I'm not sure how to find the user/group of the cronjob, but it's simply in /etc/cron.hourly, which I think is root?

It turns out that cron does not source any shell profiles (/etc/profile, ~/.bashrc), so the umask has to be set in the script that is being called by cron.
When using user-level crontabs (crontab -e), the umask can be simply set as follows:
0 * * * * umask 002; /path/to/script
This will work even if it is a python script, as the default value of os.umask inherits from the shell's umask.
However, placing a python script in /etc/cron.hourly etc., there is no way to set the umask except in the python script itself:
import os
os.umask(002)

How to get a file close event in python

Using python 2.7 on windows 7 64 bit machine.
How to get a file close event:
when file is opened in a new process of file opener (like notepad, wordpad which opens file everytime in new process of wordpad)
when file is opened in a tab of file opener (like notepad++, which opens all files in new tab but there exist only a single process of notepad++ running)
So, how to get file close event in above cases? Is it possible to achieve above cases through a common code? I am dealing with different file types

This has proven to be a very easy task for *nix systems, but on Windows, getting a file close event is not a simple task. Read below the summary of common methods grouped by OS'es.
For Linux
On Linux, the filesystem changes can be easily monitored, and in great detail. The best tool for this is the kernel feature called inotify, and there is a Python implementation that uses it, called Pynotify.
Pyinotify
Pyinotify is a Python module for monitoring filesystems changes. Pyinotify relies on a Linux Kernel feature (merged in kernel 2.6.13) called inotify, which is an event-driven notifier. Its notifications are exported from kernel space to user space through three system calls. Pyinotify binds these system calls and provides an implementation on top of them offering a generic and abstract way to manipulate those functionalities.
Here you can find the list of the events that can be monitored with Pynotify.
Example usage:
import pyinotify
class EventHandler(pyinotify.ProcessEvent):
def process_IN_CLOSE_NOWRITE(self, event):
print "File was closed without writing: " + event.pathname
def process_IN_CLOSE_WRITE(self, event):
print "File was closed with writing: " + event.pathname
def watch(filename):
wm = pyinotify.WatchManager()
mask = pyinotify.IN_CLOSE_NOWRITE | pyinotify.IN_CLOSE_WRITE
wm.add_watch(filename, mask)
eh = EventHandler()
notifier = pyinotify.Notifier(wm, eh)
notifier.loop()
if __name__ == '__main__':
watch('/path/to/file')
For Windows
Situation for Windows is quite a bit more complex than for Linux. Most libraries rely on ReadDirectoryChanges API which is restricted and can't detect finer details like file close event. There are however other methods for detecting such events, so read on to find out more.
Watcher
Note: Watcher has been last updated in February 2011, so its probably safe to skip this one.
Watcher is a low-level C extension for receiving file system updates using the ReadDirectoryChangesW API on Windows systems. The package also includes a high-level interface to emulate most of the .NET FileSystemWatcher API.
The closest one can get to detecting file close events with Watcher is to monitor the FILE_NOTIFY_CHANGE_LAST_WRITE and/or FILE_NOTIFY_CHANGE_LAST_ACCESS events.
Example usage:
import watcher
w = watcher.Watcher(dir, callback)
w.flags = watcher.FILE_NOTIFY_CHANGE_LAST_WRITE
w.start()
Watchdog
Python API and shell utilities to monitor file system events. Easy install: $ pip install watchdog. For more info visit the documentation.
Watchdog on Windows relies on the ReadDirectoryChangesW API, which brings its caveats as with Watcher and other libraries relying on the same API.
Pywatch
A python near-clone of the Linux watch command. The pywatch.watcher.Watcher class can be told to watch a set of files, and given a set of commands to run whenever any of those files change. It can only monitor the file changed event, since it relies on polling the stat's st_mtime.
Bonus for Windows with NTFS:
NTFS USN Journal
The NTFS USN (Update Sequence Number) Journal is a feature of NTFS which maintains a record of changes made to the volume. The reason it is listed as a Bonus is because unlike the other entries, it is not a specific library, but rather a feature existing on NTFS system. So if you are using other Windows filesystems (like FAT, ReFS, etc..) this does not apply.
The way it works it that the system records all changes made to the volume in the USN Journal file, with each volume having its own instance. Each record in the Change Journal contains the USN, the name of the file, and information about what the change was.
The main reason this method is interesting for this question is that, unlike most of the other methods, this one provides a way to detect a file close event, defined as USN_REASON_CLOSE. More information with a complete list of events can be found in this MSDN article. For a complete documentation about USN Journaling, visit this MSDN page.
There are multiple ways to access the USN Journal from Python, but the only mature option seems to be the ntfsjournal module.
The "proper" way for Windows:
File system filter driver
As descibed on the MSDN page:
A file system filter driver is an optional driver that adds value to
or modifies the behavior of a file system. A file system filter driver
is a kernel-mode component that runs as part of the Windows executive.
A file system filter driver can filter I/O operations for one or more
file systems or file system volumes. Depending on the nature of the
driver, filter can mean log, observe, modify, or even prevent. Typical
applications for file system filter drivers include antivirus
utilities, encryption programs, and hierarchical storage management
systems.
It is not an easy task to implement a file system filter driver, but for someone who would like to give it a try, there is a good introduction tutorial on CodeProject.
P.S. Check #ixe013's answer for some additional info about this method.
Multiplatform
Qt's QFileSystemWatcher
The QFileSystemWatcher class provides an interface for monitoring files and directories for modifications. This class was introduced in Qt 4.2.
Unfortunately, its functionality is fairly limited, as it can only detect when a file has been modified, renamed or deleted, and when a new file was added to a directory.
Example usage:
import sys
from PyQt4 import QtCore
def directory_changed(path):
print('Directory Changed: %s' % path)
def file_changed(path):
print('File Changed: %s' % path)
app = QtCore.QCoreApplication(sys.argv)
paths = ['/path/to/file']
fs_watcher = QtCore.QFileSystemWatcher(paths)
fs_watcher.directoryChanged.connect(directory_changed)
fs_watcher.fileChanged.connect(file_changed)
app.exec_()

The problem you are facing is not with Python, but with Windows. It can be done, but you will have to write some non-trival C/C++ code for it.
A file open or a file close user mode notification does not exist in userland on Windows. That's why the libraries suggested by others do not have file close notification. In Windows, the API to detect changes in userland is ReadDirectoryChangesW. It will alert you of one of the following notifications :
FILE_ACTION_ADDED if a file was added to the directory.
FILE_ACTION_REMOVED if a file was removed from the directory.
FILE_ACTION_MODIFIED if a file was modified. This can be a change in the time stamp or attributes.
FILE_ACTION_RENAMED_OLD_NAME if a file was renamed and this is the old name.
FILE_ACTION_RENAMED_NEW_NAME if a file was renamed and this is the new name.
No amount of Python can change what Windows provides you with.
To get a file close notification, tools like Process Monitor install a Minifilter that lives in the kernel, near the top of other filters like EFS.
To acheive what you want, you would need to:
Install a Minifilter that has the code to send events back to userland. Use Microsoft's Minispy sample, it is stable and fast.
Convert the code from the user program to make it a Python extension (minispy.pyd) that exposes a generator that produces the events. This is the hard part, I will get back to that.
You will have to filter out events, you won't beleive the amount of IO goes on an idle Windows box!
Your Python program can then import your extension and do its thing.
The whole thing looks something like this :
Of course you can have EFS over NTFS, this is just to show that your minifilter would be above all that.
The hard parts :
Your minifilter will have to be digitally signed by an authority Microsoft trusts. Verising comes to mind but there are others.
Debugging requires a separate (virtual) machine, but you can make your interface easy to mock.
You will need to install the minifilter with an account that has adminstrator rights. Any user will be able to read events.
You will have to deal with multi-users your self. There is only one minifilter for many users.
You will have to convert user program from the MiniSpy sample to a DLL, which you will wrap with a Python extension.
The last two are the hardest.

You can use Pyfanotyfi or butter.
I think you'll find this link very usefull: Linux file system events with C, Python and Ruby
There you will find an example about doing exactly what you want(using pyinotify) this is the code:
import pyinotify
DIR_TO_WATCH="/tmp/notify-dir"
FILE_TO_WATCH="/tmp/notify-dir/notify-file.txt"
wm = pyinotify.WatchManager()
dir_events = pyinotify.IN_DELETE | pyinotify.IN_CREATE
file_events = pyinotify.IN_OPEN | pyinotify.IN_CLOSE_WRITE | pyinotify.IN_CLOSE_NOWRITE
class EventHandler(pyinotify.ProcessEvent):
def process_IN_DELETE(self, event):
print("File %s was deleted" % event.pathname) #python 3 style print function
def process_IN_CREATE(self, event):
print("File %s was created" % event.pathname)
def process_IN_OPEN(self, event):
print("File %s was opened" % event.pathname)
def process_IN_CLOSE_WRITE(self, event):
print("File %s was closed after writing" % event.pathname)
def process_IN_CLOSE_NOWRITE(self, event):
print("File %s was closed after reading" % event.pathname)
event_handler = EventHandler()
notifier = pyinotify.Notifier(wm, event_handler)
wm.add_watch(DIR_TO_WATCH, dir_events)
wm.add_watch(FILE_TO_WATCH, file_events)
notifier.loop()

I have not found a package that captures open and close events on Windows. As others have mentioned, pyinotify, is an excellent option for Linux based operating systems.
Since I wasn't able to watch for the closed event, I settled for the modified event. It's very much an 'after the fact' type of solution (ie. I can't pause until I see a file is closed). But, this has worked surprisingly well.
I've used the watchdog package. The code below is from their sample implementation and watches the current directory if you don't pass a path on the command line, otherwise it watches the path you pass.
Example call: python test.py or python test.py C:\Users\Administrator\Desktop
import sys
import time
import logging
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO,
format='%(asctime)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
path = sys.argv[1] if len(sys.argv) > 1 else '.'
event_handler = LoggingEventHandler()
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
This code will show you when files are created, modified, deleted or renamed/moved. You can filter by just modified by watching for the on_modified event.

Python script and class in the same file

I have a Python class that supposes to perform some tasks in the background by submitting itself to a cluster environment. e.g.
class AwesomeTaskController(object):
def run(bunch_of_tasks):
for task in bunch_of_tasks:
cmd = "%s %s" % (os.path.abspath(__file__), build_cli_paramters(task))
# call the API to submit the cmd
if __name__ == "__main__":
#blah blah do stuff with given parameters
All is well for the first time that this class was run. When it was run the first time, a pyc file is created. This pyc file isn't executable (permission wise).
So the 2nd time I use this class, the command will use the pyc directly and complains that permission is denied. Perhaps I am approaching this from the wrong angle?

.pyc files aren't executable themselves; you always have to execute the .py file. The .pyc file is just a compiled version of the .py file that Python generates on the fly to save itself some time the next time you run the .py file.
In your case, all you should need to do is check to see if __file__ ends with ".pyc" and remove the trailing "c". You could do that by, say, replacing __file__ in your script with:
(__file__[:-1] if __file__.endswith(".pyc") else __file__)
and that should solve your problem.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.